MetalLB L2 and BGP mode¶
MetalLB L2 mode¶
In L2 mode, Metalb will announce the address of LoadBalancerIP through ARP (for ipv4), NDP (for ipv6). Before Metallb < v0.13.2, Metallb can only be configured via configMap. After v0.13.2, Metallb is configured through CRD resources, and the method of configMap has been deprecated.
In Layer2 mode, when creating a service, Metalb (speaker component) will elect a node in the cluster for this service as the host exposed to the outside world. When a request is made to the externalIP of the Service, this node will reply to the arp request instead of this externalIP. Therefore, the request sent to the Service will first reach this node in the cluster, then pass through the kube-proxy component on this node, and finally direct the traffic to a specific endpoint (endpoint) of this service.
There are three main points in the logic of service election nodes:
- First filter out the nodes that are not ready and the nodes where the endpoint is not ready
- If the endpoint of the service is distributed on the same node, then filter this node as the
arpresponder of the service IP - If the endpoints of the service are distributed on different nodes, after calculating
node + # + externalIPthroughsha256, take the first one according to the dictionary order
In this way, MetalLB will select a node for each Service as the exposed host. metallb will direct the traffic of this single Service to a certain node, so this node may become a bottleneck that limits performance. The bandwidth limit of Service will also depend on the bandwidth of a single node, which is also the most important limitation of using ARP or NDP.
Also, when this node fails, MetalLB needs to re-elect a new node for the service. Metallb will then send a "gratis" arp to the client, telling the client that their Mac address cache needs to be updated. Traffic is still forwarded to the failed node until the client updates the cache. So from a certain point of view: the time of failover depends on the speed at which the client updates the Mac address cache.
Usage¶
-
Create an IP pool
-
addresses: IP address list, each list member can be a CIDR, it can be an address range (such as 192.168.9.1 - 192.168.9.5), or it can be differentipFamily,Metallbwill allocate IP from it ServiceLoadBalancer -
autoAssign: Whether to automatically assign the IP address, the default is true. In some cases (insufficient IP addresses or public IPs), you don't want the IPs in the pool to be assigned easily, can be set to false. You can set annotations:metallb.universe.tf/address-pool: pool-namein service. Or set the IP in thespec.LoadBalancerIPfield (note that this method has been marked as abandoned by k8s). -
avoidBuggyIPs: Whether to avoid using.0or.255addresses in the pool, the default is false. -
Configure
LoadBalancerIPadvertisement rule (L2)Bind IP pools via
L2Advertisement, which tellsMetallbthat these addresses should be advertised byARPorNDP. -
ipAddressPools: optional, filter IP pools by name, ifipAddressPoolsandipAddressPoolSelectorsare not specified at the same time, it will be applied to all IP pools. -
ipAddressPoolSelectors: optional, filter IP pools through labels, ifipAddressPoolsandipAddressPoolSelectorsare not specified at the same time, it will act on all IP pools. -
nodeSelectors: Optional, used to filter which nodes are used as the next hop ofloadBalancerIP, default to all nodes. -
Create
LoadBalancerServiceapiVersion: v1 kind: Service metadata: name: metallb1-cluster labels: name: metallb #annotations: #metallb.universe.tf/address-pool: lan spec: type: LoadBalancer allocateLoadBalancerNodePorts: false ports: - port: 18081 targetPort: 8080 protocol: TCP selector: app: metallb-clusterJust specify
spec.type=LoadBalancer, so thatMetallbwill naturally take over the lifecycle of thisService.Note
If you want the Service to allocate addresses from the specified IP pool, specify through
annotations: metallb.universe.tf/address-pool: <pool-name>. Or specify the IP through theservice.spec.loadBalancerIPfield (need to ensure that it exists in a pool, this method is not recommended). If there are multiple load balancers, they can be specified through theservice.spec.loadBalancerClassfield. When deployingMetalb, it can be configured by--lb-classflag.
Load Balancing¶
-
When
Service.spec.externalTrafficPolicy=clusterIn this mode, it has good load balancing, but the traffic may go through multiple hops, which will hide the source IP of the client.
______________________________________________________________________________ | -> kube-proxy(SNAT) -> pod A | | | | client -> loadBalancerIP:port -> | -> node A(Leader) -> | | | | | -> kube-proxy(SNAT) -> node B -> kube-proxy -> pod B | ------------------------------------------------------------------------------ -
When
Service.spec.externalTrafficPolicy=localIn this mode, the source IP of the client will be reserved, but the load balancing is poor, and the traffic will go to a certain backend Pod.
__________________________________________________________________________________________ | -> kube-proxy -> pod A (the backend Pod is on this node) | | | | client -> loadBalancerIP:port -> | -> node A(Leader) -> | | | | | -> kube-proxy -> node B -> kube-proxy -> pod B (the backend Pod is on a different node) | ------------------------------------------------------------------------——————————————————
MetalLB BGP Mode(L3)¶
The Layer2 mode is limited to a two-layer network, and the traffic flowing to the Service will be forwarded to a specific node first, which is not a real load balancing. The BGP mode is not limited to a Layer 2 network. Each node in the cluster will establish a BGP session with the BGP Router, and declare that the next hop of the ExternalIP of the Service is the cluster node itself. In this way, external traffic can be connected to the cluster through the BGP Router, and every time the BGP Router receives new traffic destined for the LoadBalancer IP address, it will create a new connection to the node. But which node to choose, each router manufacturer has a specific algorithm to achieve. So from that point of view, this has good load balancing.
Usage¶
-
Create an IP pool
-
Configure
LoadBalancerIPadvertisement rule (L3)Note
BGP mode requires hardware support to run the BGP protocol. If not, software such as
frr,birdcan be used instead.It is recommended to use
frrfor installation:frrconfiguresBGP:router bgp 7675 # Bgp as number bgp router-id 172.16.1.1 # route-id is usually the interface IP no bgp ebgp-requires-policy # close ebpf filter !!! neighbor 172.16.1.11 remote-as 7776 # Configure ebgp -> neighbor 1, 172.16.1.11 as a cluster node neighbor 172.16.1.11 description master1 # description neighbor 172.16.2.21 remote-as 7776 # node 2 neighbor 172.16.2.21 description woker1Metalbconfiguration: -
Configure
BGPAdvertisementThis CRD is mainly used to specify the IP pool that needs to be announced through BGP. Like the L2 mode, it can be filtered by the pool name or
labelSelector. At the same time, some attributes of BGP can be configured:apiVersion: metallb.io/v1beta1 kind: BGPAdvertisement metadata: name: local namespace: metallb-system spec: ipAddressPools: -bgp-pool aggregationLength: 32aggregationLength: route suffix aggregation length, the default is 32, which means that the mask of the route advertised by BGP is 32, the value can be reduced to aggregate the number of routesaggregationLengthV6: Same as above, for ipv6, default is 128ipAddressPools: []string, select the IP pools that need to be advertised by BGPipAddressPoolSelectors: filter IP pools by labelnodeSelectors: Filter the next hop nodes ofloadBalancerIPby node label, default is all nodespeers: []string, the name of aBGPPeerobject declaring which BGP sessions thisBGPAdvertisementapplies tocommunities: Refer to BGP communities, you can configure it directly, or specify the name of the communities CRD
-
Configure BGP Peers
BGP Peer is used to configure BGP session configuration, including peer BGP AS and IP, etc.
apiVersion: metallb.io/v1beta2 kind: BGPPeer metadata: name: test namespace: metallb-system spec: myASN: 7776 peerASN: 7675 peerAddress: 172.16.1.1 routerID: 172.16.1.11myASN: local ASN, the range is1-64511(public AS),64512-65535(private AS)peerASN: Peer ASN, the scope is the same as above. if both are equal, theniBGP; otherwise,eBGPpeerAddress: peer router IP addresssourceAddress: Specify the address for establishing a BGP session in this segment, which is automatically selected from the network card of this node by defaultnodeSelectors: Specify which nodes need to establish a session with the BGP Router according to the node label
-
Create a Service of type
LoadBalancer
Verify¶
You can see the routes learned through BGP on the BGP Router:
$ vtysh
Hello, this is FRRouting (version 8.1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
router# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t-trapped, o-offload failure
K>* 0.0.0.0/0 [0/100] via 10.0.2.2, eth0, src 10.0.2.15, 03:52:17
C>* 10.0.2.0/24 [0/100] is directly connected, eth0, 03:52:17
K>* 10.0.2.2/32 [0/100] is directly connected, eth0, 03:52:17
B>* 10.254.254.1/32 [20/0] via 172.16.1.11, eth1, weight 1, 03:32:16
* via 172.16.2.21, eth2, weight 1, 03:32:16
C>* 172.16.1.0/24 is directly connected, eth1, 03:52:17
You can see that the next hops to LoadBalancerIP are cluster node 1 and node 2 respectively, and perform a connectivity test on the BGP Router:
root@router:~# curl 10.254.254.1:18081
{"pod_name":"metallb-demo","pod_ip":"172.20.166.20","host_name":"worker1","client_ip":"172.20.161.0"}
FRR Mode¶
Currently there are two Backend implementations of Metallb BGP mode: Native BGP and FRR BGP.
FRR BGP is currently in the experimental stage. Compared with Native BGP, FRR BGP has the following advantages:
BFDprotocol support (improves fault response capability, shortens fault time)- Support
IPV6 BGP - Support
ECMP