MLAG

  1. vPC
    1. vPC peer-link
    2. CFSoE
      1. ARP sync
      2. Type 1 consistency
      3. Type 2 consistency
    3. vPC back-to-back
    4. Peer switch
    5. Peer gateway
    6. Auto-recovery
    7. Orphan ports
    8. Fate sharing
    9. vPC routing

vPC

  • components
    • vPC peers:
      • 2 switches in domain
      • 1 vPC domain ID per switch/VDC:
        • unique in L2 domain
        • 10 bit MAC LSB: STP BID, LACP ID
      • same type of Nexus switches
    • vPC peer-link
    • vPC keepalive:
      • L3 path, OOB
      • UDP 3200
    • orphan port
    • vPC port-channel: L2
      • peers send the same LACP ID
  • separate control plane: required for SAN A/B separation ⇒ VSS is not suited
  • loop avoidance:
    • triggers:
      • packet is received via peer-link
      • packet is received via vPC port-channel (vPC egress on peer is up)
    • packets are not sent via vPC, only through orphans
  • election
    • primary
      • receive + generate BPDU
      • first to enable links in LACP port-channel
      • after failure becomes Operational Secondary, if secondary switched to primary
    • secondary
      • if peer-link fails and keepalive is up – secondary disables its ports and SVIs for vPC VLANs ≡ allowed on peer-link
    • election order: vPC primary sticky bit set → priority → MAC
      • the lower, the better
      • sticky bit: no preemption even after peer-link or keepalive is down
        • 0: default, reset by change of priority
        • 1: after switching secondary → primary
(config)# feature vpc

; ID is unique within L2 segment
(config)# vpc domain <ID>
(config-vpc-domain)# peer-keepalive destination <DST_IP> source <SRC_IP> vrf <VRF>
(config-vpc-domain)# role priority <ROLE_PRIORITY>

; LACP, must match on peers
(config-vpc-domain)# system-mac <MAC>
(config-vpc-domain)# system-priority <LACP_PRIORITY>
(config)# interface port-channel <M>

; N must match on peers
(config-if)# vpc <N>
# show vpc [brief]
# show vpc role
# show vpc peer-keepalive
# show vpc orphan-ports 
  • 10GE at least
  • port-channel only
  • FabricPath encapsulation
  • traffic
    • CFSoE, BPDU, LACP, HSRP
    • user data – after failure
    • BUM
    • orphan trafic
    • no FCoE traffic
  • STP:
    • always STP forwarding
    • STP Bridge Assurance
    • on secondary – always one of root ports
  • timers
    • keepalive: 1s default
    • timeout: 5s default
    • hold timeout: 3s default, delay before considering peer to be down
; N9k, L3 SVI over peer-link only, routing if uplinks fail
(config)# system nve infra-vlans <VLAN>

; N9kv, L3 SVI over peer-link only, routing if uplinks fail
(config)# system vlan nve-overlay id <VLAN>

; N5k, L3 SVI over peer-link only, routing if uplinks fail
(config)# vpc nve peer-link-vlan <N>
(config)# interface port-channel <M>
(config-if)# vpc peer-link

CFSoE

  • data plane sync
    • CAM table
  • control plane sync
    • IGMP snooping status
    • ARP sync
  • vPC port status monitor: if all vPC links are down – notify peer that vPC is orphan
  • vPC config consistency check

ARP sync

  • together with FHRP
  • over CFSoE
  • FHRP Master replies to ARP
  • FHRP Slave may forward traffic because ARP entries match (active/active)
    • vMAC is marked in CAM with G flag ≡ gateway
    • secondary vPC processes MAC with G flag
    • uplink tracking is required to avoid blackholing because if A/A
(config-vpc-domain)# ip arp synchronize

Type 1 consistency

  • if check fails, secondary disabled own vPC ports
  • does not affect non-vPC ports
  • types:
    • global
    • interface-specific
  • features:
    • MST region inconsistency
    • MTU mismatch
    • STP setting mismatch: BA, LoopGuard, RootGuard
    • port-channel mode, trunk mode, native VLAN
    • VXLAN-VNI mapping
    • LACP system priority

Type 2 consistency

  • recommendations, do not affect vPC but may affect routing: failure ≡ potential blackhole
  • features:
    • VLANs: except FCoE
    • SVI: mismatching VLANs that are suspended
    • QoS mismatch

vPC back-to-back

  • avoid MAC flapping on upstream switch

Peer switch

  • default in vPC:
    • only primary sends BPDUs
    • secondary just forwards BPDUs over peer-link
  • primary and secondary with peer-switch:
    • both are root
    • both send BPDUs
    • vPC ports:
      • root BID = virtual bridge ID
      • designated BID = virtual bridge ID
    • non-vPC ports:
      • root BID = virtual bridge ID
      • designated BID = physical bridge ID
  • prevents RSTP Sync if role is changed: root BID does not change ⇒ priority must match
  • non-vPC peer forwarding can be tuned using designated BID, not only port priority
  • STP root – not vPC pair
    • L2 uplinks – STP forwarding on the peer, that is closest to root according to cost
    • if tie – primary
  • no need to sync roles of vPC primary and STP root
(config-vpc-domain)# peer-switch

Peer gateway

  • needed if server uses BIA MAC instead of HSRP MAC
    • peer has to send frame to BIA over peer-link
    • if the frame is to be routed to another vPC – discard due to loop avoidance
    • servers: Dell EMC, NetApp NAS, F5 Auto-Lasthop
  • exchange BIA MAC, forward BIA MAC locally (G flag in CAM)
  • must be disabled for VLANs, used for IGP peering
    • if packet arrives to wrong peer, it decapsulates the frame due to G flag, decreasing IP TTL because of forwarding further ⇒ TTL = 1 → TTL = 0 ⇒ discard
(config-vpc-domain)# peer-gateway

Auto-recovery

  • by default vPC port-channels are suspended until peer-link is restored after reboot or peer-link failure with keepalive on
    • if one peer does not boot, then no vPC is reactivated
    • if primary fails after peer-link, secondary does not reenable vPC
  • enables Secondary to assume Primary role if both peer-link and keepalive are down ⇒ reenables vPC
  • reload-delay:
    • if no peer is available after reboot after timeout ⇒ assume Primary and reenable vPC
    • 240s default
(config-vpc-domain)# auto-recovery [reload-delay <sec>]

Orphan ports

  • if port is in vPC VLAN and peer-link is down, no connectivity with default GW on secondary (SVI down)
  • devices:
    • usually IPMI/ILO ports
      • better: connect through switch, dual-homed to vPC
    • active/standby: ADC
      • non-vPC VLAN or SVI must be up after failure, otherwise secondary shuts down SVI with ADC active
; SVI stays up if peer link fails, but keepalive remains
(config-vpc-domain)# dual-active exclude interface-vlan <N>
; shuts down interface if peer-link fails, A/S failover
(config-if)# vpc orphan-port suspend

Fate sharing

  • if uplink and peer-link are on the same LC on primary, peer-link failure ≡ L3 connectivity with WAN is lost (SVI on secondary – down)
  • vPC self isolation:
    • alternative to object tracking
    • primary notifies secondary via keepalive, that it is isolated
    • isolated: peer-link down && no IGP routes
(config)# track <N> interface <INTF> line-protocol

(config)# track <M> list boolean OR|AND
(config-track)# object <N>
; if fails, primary → secondary
(config-vpc-domain)# track <M>

(config-vpc-domain)# self-isolation

vPC routing