- ASN
- BGP
- MP-BGP
- Path attributes
- BGP best path selection
- Regexp
- Authentication
- QoS policy propagation via BGP (QPPB)
- Aggregation
- Redistribution
- Confederation
- Route reflector
- Prefix independent convergence (PIC)
- Graceful restart
- BGP wedgie
ASN
- reserved: 0, 65535
- IANA-assigned: 1 – 64495
- documentation: 64496 – 64511
- private
- 64512 – 65534
- 4200000000 – 2³²-1
- 23456: replace 32bit ASN for routers that do not support 32bit ASN
; inbound: removes private AS
; outbound: replaces private AS with own single AS
(config-router)# neighbor <IP> remove-private-as
(config-router)# bgp asnotation dot
BGP
- neighbours are configured manually
- TCP 179, CS6
- attributes in lieu of metric ⇒ path-vector protocol
- aimed at scalability, policy implementation
- single best route from BGP RIB is announced
- AD
- eBGP = 20
- iBGP = 200
- the lower route in BGP RIB is, the older it is
(config-router)# distance bgp <EBGP> <IBGP> <LOCAL_BGP>
; augment config to AF mode
(config-router)# bgp upgrade-cli
RID
- must be unique
- selection
- manual config
- loopback
- IOS: highest IP
- NX-OS: loopback0
- highest IP from interface in up/up
; disabled by default, compares RID instead of selecting oldest eBGP prefix
(config-router)# bgp bestpath compare-routerid
Messages
- TTL
- 1 for eBGP
- 255 for iBGP
- maximum size – 4096 words ⇒ limitation for SR policy
- implicit withdraw: Update for prefix with different next-hop (by default only single bestpath)
- messages
- OPEN
- establish peering
- exchange capabilities, extensions
- UPDATE
- exchange routes
- single UPDATE for one set of attributes
- several UPDATE in a single TCP segment
- KEEPALIVE alternative
- exchange routes
- NOTIFICATION
- error notification
- KEEPALIVE
- keep peering open
- acknowledge parameters, received in OPEN
- OPEN
(config-router)# neighbor <IP> ebgp-multihop <TTL>
Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|_ _|
|_ Marker _|
|_ (deprecated, all binary 1) _|
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Length – bytes, including header
Type:
- Open
- Update
- Notification
- Keepalive (no payload)
- Route refresh
Open
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+
| Version |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| My autonomous system | Hold time |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| BGP RID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Param length | Optional parameters (variable length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Optional parameters – TLVs (type, length – 1 byte)
- 0x02 ≡ capability
- 0x01: MP-BGP
- 0x03: ORF
- 0x40: graceful restart
- 0x45: addpath
- 0x46: enchanced route refresh
Update
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Withdrawn routes length | Withdrawn routes (var length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Path attributes length | Path attributes (var length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\ \
/ NLRI /
\ \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Withdrawn routes length = 0 ⇒ Withdrawn routes is not included
Path attributes length = 0 ⇒ attributes and NLRI are not included
Withdrawn routes/NLRI:
- length (1 byte) + prefix (variable length)
- length = 0 ⇒ match all routes
Notification
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Subcode | Data (var length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Code:
- Message header error
- connection not synced
- bad message length
- bad message type
- Open message error
- unsupported version
- bad peer AS
- bad BGP RID
- unsupported optional parameter
- authentication failure (deprecated)
- unacceptable hold time
- unsupported capability (≠ unknown capability)
- Update message error
- malformed attribute list
- unrecognized well-known attribute
- missing well-known attribute
- attribute flags error
- attribute length error
- invalid ORIGIN
- AS routing loop (deprecated)
- invalid NEXT_HOP
- optional attribute error
- invalid network field
- malformed AS_PATH
- Hold timer expired
- FSM error
- Cease (alternative to TCP FIN)
Route refresh
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| AFI | Subtype | SAFI |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Subtype:
- 0: normal
- 1: SOR
- 2: EOR
- 255: reserved
; use soft reconfiguration if route refresh is not supported by peer
(config-router)# bgp soft-reconfig-backup
; preserves in memory raw Update from peer before processing
(config-router)# neighbor <IP> soft-reconfiguration inbound
; use BGP route refresh to resend Update
# clear ip bgp in
; use route refresh, if possible
# clear ip bgp soft
Enchanced route refresh
- capability 0x46 (zero length)
- removes stale routes that are not refreshed by Route refresh ≡ consistency check
- sends Start-of-RIB (SOR) before prefixes and End-of-RIB (EOR) after prefixes
; 0 ≡ disabled (default), max time to generate EOR, can be useful against flaps
(config-router)# bgp refresh max-eor-time <sec>
; 0 ≡ disabled (default), removes stale routes without receiving EOR on expiration
(config-router)# bgp refresh stalepath-time <sec>
Peering
- must differ: RID
- must match: minimum hold values
- successful authC
- directly connected for eBGP
- peer is reachable via non-default route (at least for one of the peers)
- connection collision:
- 2 sessions are initially established, only one remains
- highest RIB – BGP server (TCP 179)
; enabled by default
(config-router)# bgp transport path-mtu-discovery
(config-router)# neighbor <IP> remote-as <ASN>
; does not remove TTL = 1 verification
(config-router)# neighbor <IP> disable-connected-check
; TTL is not decreases for packets to the router
(config-router)# neighbor <IP> update-source <INTF>
; passive waits for TCP, does not send SYN itself – for passing strict ACLs
(config-router)# neighbor <IP> transport connection-mode active|passive
; on default
(config-router)# neighbor <IP> transport path-mtu-discovery [disable]
; uses local ASN instead of BGP process number for peering
; no-prepend: do not add local ASN to AS_PATH of ingress Update
; replace-as: replace real ASN with local ASN for egress eBGP Update
; dual-as: peering with local ASN and real ASN
(config-router)# neighbor <IP> local-as <ASN> [no-prepend] [replace-as] [dual-as]
; replace all peer ASN in AS_PATH with own AS
(config-router)# neighbor <IP> as-override
; sends messages with TTL = 255, eBGP only
; eBGP peer must be N hops away, otherwise – discard, no ICMP error
; N = 1 – connected, TTL = 254 after decreasing on peer
(config-router)# neighbor <IP> ttl-securiry hops <N>
; tear down directly connected peering after link/BFD is down or BFD
(config-router)# neighbor <IP> fallover [bfd]
; checks route to nexthop (not address), if route matches deny – reset session
(config-router)# neighbor <IP> fallover route-map <MAP>
; tear down session with directly connected eBGP peer iflink is down
(config-if)# ip bgp fast-external-fallover permit|deny
; is peer capable of route refresh?
# show ip bgp neighbor <IP>
; requires soft-reconfiguration inbound, Update from peer before processing
# show ip bgp neighbor <IP> received-routes
# show ip bgp neighbor <IP> routes
; Update after filtering
# show ip bgp neighbor <IP> advertised-routes
# show ip bgp neighbor <IP> policy [detail]
; soft out ≡ out
# clear ip bgp <IP> soft out
; hard reset
# clear ip bgp <IP>|*
; processes Update from cache (deprecated approach)
# clear ip bgp soft in
BGP update-group
- single Update per group instead of per peer ⇒ saves CPU cycles
- determined automatically
# show ip bgp update-group
BGP peer-group
- forms update group ⇒ does not permit filter per peer (can be fixed using templates instead)
(config-router)# neighbor <IP> peer-group <PEER_GROUP>
Templates
- replaces peer-group
- types
- session
- inherits from direct parent only
- up to 8 templates in the chain
- policy
- inherits up to 7 templates (directly and indirectly)
- up to 8 templates in the chain
- template with larger number overwrites template with lower number if there is conflict
- session
(config-router)# template peer-session <SESSION_TMPL>
(config-router-stmp)# inherit peer-session <PARENT_SESSION>
(config-router)# template peer-policy <POLICY_TMPL>
(config-router-ptmp)# inherit peer-policy <PARENT_POLICY>
State
- IDLE
- does not wait for TCP
- CONNECT
- waits for TCP, passive neighbour
- TCP SYN+ACK is sent, waiting for ACK
- ACTIVE
- TCP session initiated, active neighbour
- up to 16 retries
- TCP SYN is sent, waiting for SYN+ACK
- TCP RST is received
- OPENSEND
- TCP is established and OPEN is sent
- OPENCONFIRM
- OPEN is received
- ESTABLISHED
- peering established
Timers
- if timers do not match, lower value is used
- hold timer
- 180s by default
- announced in OPEN
- cannot be less than 3s
- on expiration peer is dead
- keepalive
- not announced but calculated
- hold = 0 ⇒ no keepalive
- selection
- hold = local hold: use local value
- hold ≠ local hold, local hold < ⅓ hold: use local value
- floor(⅓ hold)
- Connect relay
- 120s, constant
- interval between attempts to establish session
- doubled with every tick
- Minimum route advertisement interval (MRAI)
- defaults:
- eBGP: 30s
- iBGP: 0s
- eBGP VRF: 0s
- always 0s for NX-OS
- defaults:
; negotiated in Open ⇒ requires hard reset to be in effect
(config-router)# timers bgp <KEEP> <HOLD> [<MIN_NEIGHBOUR_HOLD>]
; MRAI
(config-router)# neighbor <IP> advertisement-interval <sec>
(config-router)# neighbor <IP> timers <KEEP> <HOLD> [<MIN_NEIGHBOUR_HOLD>]
Prefix announce
- best path only
- iBGP
- split horizon: routes, received via iBGP, are not sent via iBGP ⇒ iBGP full-mesh within AS
- solutions: route reflector, confederation
- split horizon: routes, received via iBGP, are not sent via iBGP ⇒ iBGP full-mesh within AS
- eBGP:
- does not announce prefixes transiting AS{n} (last in AS_PATH) towards AS{n} (NX-OS)
- prefix and peer must be reachable via same interface (disable for peering on loopbacks)
- triggered update only
- iBGP: 5s
- eBGP: 30s
- loop prevention:
- protection from external loop through own AS
- redistribute BGP → IGP not recommended, because IGP are not designed for such a number of prefixes
; eBGP prefix uses AD = <local AD> in order to prefer IGP route
(config-router)# network <IP> mask <MASK> backdoor
# show ip bgp <PREFIX>
Uplink types
- single onehomed
- dual onehomed
- single multihomed
- dual multihomed
Prevent transit
- AS list filter
- no-export community
- prefix-list
- distribute-list
Filter preference
- ingress:
- filter-list
- route-map
- prefix-list/distribute-list
- egress:
- filter-list
- route-map
- unsuppress-map: sets attributes for unsuppressed prefixes
- advertise-map
- prefix-list/distribute-list
- prefix-list and distribute-list are mutually exclusive
- if not-existent ACL/prefix-list is applied ≡ permit all
- extended ACL filter: prefix & mask
Route-map
(config)# ip route-tag list <TAG_LIST>
; alternative to multiple match in route-map
(config)# ip policy-list <PLIST> permit|deny
(config-policy-list)# match ...
(config-route-map)# match tag list <TAG_LIST>
; BGP only
(config-route-map)# match policy-list <PLIST>
(config-router)# neighbor <IP> route-map <MAP> in
Keyword continue
- BGP filtering only
- if matched by entry with continue, implicit deny is not applied
Route flap dampening
- RFC 2439
- if prefix flaps or changes prequently – do not include in Update and best path selection
- reset not considered a flap
- not recommended: may dampen prefixes, reachable via different AS
- eBGP routes only
(config-router)# bgp dampening [<HALF_LIFE> <REUSE> <SUPPRESS> <MAX_SUPPRESS>]
; routes with penalty
# show ip bgp flap-statistics
; dampened routes
# show ip bgp dampened-paths
; information about penalty
# show ip bgp <PREFIX>
; removes penalties
# clear ip bgp flap-statistics
; removes penalties from dampened
# clear ip bgp dampening
Outbound route filter (ORF)
- RFC 5291
- sends inbound filters to peer that applies them outbound
- capability is negotiated in OPEN
- filters are sent in Route Refresh: AFI, SAFI, action, when to refresh
- limitation
- IPv4/v6 unicast
- prefix-list only
- eBGP only
(config-router)# neighbor <IP> capability orf prefix-list both|send|receive
; received ORF
# show ip bgp neighbor <IP> received prefix-lists
; refresh ORF on neighbour
# clear ip bgp <IP> in prefix-filter
ORF capability (0x03)
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Capabil Code | Capab length | AFI |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved | SAFI |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Route refresh with ORF
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| AFI | Reserved | SAFI |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Refresh | ORF type | ORF length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| A |M|Reserved | Value (variable length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| A |M|Reserved | Value (variable length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Refresh:
- 0x01 ≡ immediate
- 0x02 ≡ defer (accumulate several Update with ORF)
A: action
- 0 ≡ add
- 1 ≡ remove
- 2 ≡ remove-all
M: match
- 0 ≡ permit
- 1 ≡ deny
ORF type:
- 128 ≡ Cisco prefix list
BGP sync
- do not consider iBGP route to be “best” unless same prefix is found in RIB from IGP or static
- fixes routing loop or blackhole on BGP → IGP redistribution
- deprecated, not available in NX-OS
- symptoms:
- valid route (marked by *) is not used
- show ip bgp <PREFIX> has not synchronized status
- OSPF RID of the IGP route must match BGP peer RID, otherwise not synchronized
(config-router)# synchronisation
Conditional route injection
- injects more precise prefixes into BGP RIB on receiving aggregate via BGP (not local)
- traffic engineering
; without copy-attributes: origin = incomplete, null AS_PATH
(config-router)# bgp inject-map <INJECT> exist-map <EXIST> [copy-attributes]
(config)# ip prefix-list <AGG> permit <PREFIX>
; only /32 towards route source, others ignored
(config)# ip prefix-list <SRC> permit <PREFIX>
; if not part of subnet AGG – ignore
(config)# ip prefix-list <INJ> permit <PREFIX>
(config)# route-map <INJECT>
(config-route-map)# set ip address prefix <INJ>
(config)# route-map <EXIST>
(config-route-map)# match ip address prefix <AGG>
; mandatory
(config-route-map)# match ip route-source prefix <SRC>
Topology codes
- *: valid (Update passed validity check, e.g., nexthop is present)
- >: best (placed in RIB, announced)
- r: RIB failure (not placed in RIB)
- i: internal (received from iBGP peer)
; maximum prefix count, if exceeded – tear peering down
; THRESHOLD: 75% default, when to generate warning
; warning-only: do not tear peering down on exceeding, only alert
; restart: try to reestablish peering every mins
(config-router)# neighbor <IP> maximum-prefix <NUM> [<THRESHOLD>] [warning-only] [restart <mins>]
; disabled by default, do not send prefixes that are RIB failure
(config-router)# bgp suppress-inactive
# show ip bgp rib-failure
Processes
- I/O
- BGP queue interaction with TCP
- Router
- process Update (including filtering)
- best path selection
- RIB modification
- run triggers:
- 1s regular interval
- peer established/removed/soft-reconfigured
- medium priority
- Scanner
- tracks RIB changes and adds prefixes from RIB to BGP
- route dampening
- conditional route advertisement
- run every 60s by default (timer between runs, does not include processing time)
- low priority
- Nexthop tracker
- lightweight scanner
- tracks nexthop changes instead of Scanner
- event-driven (by RIB change)
- hook on list of nexthops
- IP RIB Updates call the hook
- 5s delay between RIB change and hook: allow IGP to converge first
- Event
- triggered by network or redistribute
- Import scanner
- adds prefixes to VPNv4/v6 RIB
- interval between processings – 15s by default
- Open
- per peer
- (re)establishes BGP session
; resolves nexthop using RIB
(config-router)# bgp scan-time <sec>
; delay NHT after RIB Update before BGP RIB Update
(config-router)# bgp nexthop trigger delay <sec>
; enabled by default, NHT
(config-router)# bgp nexthop trigger enable
; permits only prefixes, whose nexthops are reachable via routes, permitted by MAP
; if route to nexthop is denies by MAP, BGP prefix – inaccessible
(config-router)# bgp nexthop route-map <MAP>
; RIB updates
# debug ip routing
; BGP messages without Update contents
# debug ip bgp
; BGP messages including Update contents
# debug ip bgp updates
; Nexthop tracker debug
# debug ip bgp rib-filter
MP-BGP
- RFC 4760
- several AF in a single Update
- AFI – address-family identifier
- 1 ≡ IPv4
- 2 ≡ IPv6
- 25 ≡ VPLS
- SAFI – subsequent AFI
- 1 ≡ unicast
- 2 ≡ mcast
- 3 ≡ unicast + mcast
- 4 ≡ MPLS label
- 65 ≡ VPLS Kompella mode
- 66 ≡ MDT
- 70 ≡ EVPN
- 128 ≡ MPLS L3VPN
- 129 ≡ mcast VPN
- 132 ≡ rtfilter
- 133 ≡ flowspec
- 134 ≡ L3VPN flowspec
- MP_REACH_NLRI, MP_UNREACH_NLRI: optional non-transitive
- negotiated in Open via capabilities; if not supported → terminate and Open without MP-BGP
MP-BGP capability (0x01)
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Capabil Code |Capab length(4)| AFI |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved | SAFI |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
IPv4 over IPv6
- nexthop – first 4 bytes of IPv6 nexthop
IPv6 over IPv4
- nexthop – mapped IPv4
; disable auto-selection of next-hop (IPv4-mapped IPv6 for IPv4-only interface)
; peering interface must have global IPv6 address to accept IPv6 nexthop as valid
(config-router)# no bgp default ipv6-nexthop
(config-router)# neighbor <IPv4> remote <ASN>
(config-router)# address-family ipv6
(config-router-af)# network <NETWORK6>
(config-router-af)# neighbor <IPv4> activate
; rewrite IPv4 next-hop with IPv6 next-hop
(config-router-af)# neighbor <IPv4> route-map <RMAP> out
(config)# route-map <RMAP>
(config-route-map)# set ipv6 next-hop <IPv6>
IPv6 native
(config-router)# neighbor <IPv6> remote <ASN>
(config-router)# address-family ipv6
(config-router-af)# network <NETWORK6>
(config-router-af)# neighbor <IPv6> activate
Multicast BGP
- mRIB has more priority over uRIB for RPF
- if nexthop in BGP RIB is known via interface, not enabled for PIM – inaccessible
(config-router)# address-family ipv4 multicast
(config-router-af)# neighbor <IP> activate
Flowspec
- BGP SAFI
- passes flow information to apply policy (class-map + policy-map) for DDoS mitigation (more granular than RTBH)
- applied after QoS
- no support for mcast, MPLS
- match – NLRI type
- IPv4/v6 dst
- IP src
- IPv4 protocol / IPv6 next header
- TCP/UDP src/dst
- TCP/UDP dst
- TCP/UDP src
- ICMP type
- ICMP code
- TCP flags
- IP length
- DSCP
- is a fragment
- action – extended community
- 0x0800: IP nexthop redirect
- 0x8006: drop (rate = 0) or police
- 0x8008: VRF redirect using RT
- 0x8009: mark DSCP
- validation
- IPv4/v6 only, VPN is not verified
- rejects redirect IP community, if eBGP ASN ≠ last AS in AS_PATH
- conditions (logical OR)
- flow originator ≡ originator of best route to destination
- AS_PATH empty (no AS_SET/AS_SEQUENCE)
Flowspec server (IOS XR)
(config)# class-map type traffic match-all <CMAP>
(config)# policy-map type pbr <PMAP>
(config-pmap)# class type traffic <CMAP>
(config)# flowspec
(config-flowspec)# address-family ipv4
(config-flowspec-af)# service-policy type pbr <PMAP>
Flowspec client (IOS XE)
(config)# flowspec
(config-flowspec)# address-family ipv4
(config-flowspec-af)# local-install interface-all
(config-if)# ip flowspec disable
Route target constraint (RTC)
- RFC 4684
- AFI = 1, SAFI = 132 (AF rtfilter)
- negotiated via capabilities (MP-BGP support), requires support on both peers
- PE announces RT in use
- RR filters out prefixes, unused by PE, based on RT (instead of being dropped by PE on receiving)
(config-router)# address-family rtfilter unicast
(config-router-af)# neighbor <IP> activate
(config-router-af)# neighbor <IP> send-community extended
Route-target membership MP_(UN)REACH_NLRI prefix
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Origin ASN |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|_ RT _|
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
If prefix length in NLRI is zero ≡ accept all RT (default RT)
Path attributes
- well-known mandatory
- well-known discretionary
- optional transitive: has to be forwarded further
- optional non-transitive: not forwarded further if not recognized
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|O|T|P|E| Rsrvd | Type code | Attribute length (var length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\ \
/ Attribute data /
\ \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
O: optional bit, 0 ≡ well-known
T: transitive bit, 0 ≡ non-transitive
P: partial bit, 0 ≡ optional transitive attribute is complete
E: extended length bit, 0 ≡ attribute length of 1 byte, 1 ≡ attribute length of 2 bytes
Type code:
- 1: ORIGIN
- 2: AS_PATH
- segment types
- 1 ≡ AS_SET
- 2 ≡ AS_SEQUENCE
- 3 ≡ AS_CONFED_SET
- 4 ≡ AS_CONFED_SEQUENCE
- segment types
- 3: Nexthop
- 4: MED (4 bytes)
- 5: Local preference (4 bytes)
- 6: Atomic aggregate
- 7: Aggregator
- 8: community
- 9: originator ID
- 10: cluster list
- 14: MP_REACH_NLRI
- 15: MP_UNREACH_NLRI
- 16: extended community
- 17: AS4_PATH: 4 byte ASN
- 18: AS4_AGGREGATOR
; sets parameters of BGP routes that are installed in RIB
; filter ≡ only permitted prefixes are installed in RIB
(config-router)# tanle-map <MAP> [filter]
AS_PATH
- well-known mandatory
- segment types
- AS_SEQUENCE – ordered
- AS_SET – set (after prefix aggregation)
- includes AS from AS_PATH of all routes subordinate to aggregate ⇒ aggregate is less stable
- only +1 to AS_PATH length
- can be added to aggregated prefix for loop prevention
- ASN number is appended on transmitting update to other ASN
- best practice: AS should be contiguous (blackhole otherwise)
- functions
- routing loops discovery:
- route is discarded if AS_PATH contains own ASN
- filtering on ingress, reason – allowas-in
- routing policy enforcement:
- AS_PATH prepend: ingress traffic engineering
- optimal route selection
- lists ASNs along the path to destination
- hops – ASNs, not routers
- routing loops discovery:
(config)# ip as-path access-list <N> permit|deny <REGEXP>
; eBGP only
(config-route-map)# set as-path prepend <ASN>
; disable loop prevention, allow own AS in AS_PATH up to NUM times
(config-router)# neighbor <IP> allowas-in <NUM>
; hidden, load-balance over eBGP routes with different AS_PATH of same length
; disabled by default, iBGP load-balancing is done any way
(config-router)# bgp bestpath as-path multipath-relax
; enabled by default, tear down eBGP session with directly connected peer iflink is down
; can be disabled iflink flaps
(config-router)# bgp fast-external-failover
; max AS_PATH length, if exceeded – discard prefix
(config-router)# bgp maxas-limit <N>
(config-router)# bgp bestpath as-path ignore
Nexthop
- well-known mandatory
- IP within next AS to reach destination (eBGP peers)
- not changed within AS
- recursive routing
- IGP load-balancing is easier
- convergence depends on IGP (quick, distributed)
- next-hop-self
- convergence depends on WITHDRAW (slow)
- auto-enabled for iBGP if eBGP peering is over link-local IPv6 addresses
- recursive routing
- 0.0.0.0 ≡ network/prefix is self-originated
- added before transmitting over eBGP
- not changed for eBGP sessions (route server):
- if next-hop is within directly connected LAN (≠ reachable through same interface!) AND
- peering on LAN interface
- IPv6: link-local nexthop address is appended after global nexthop
- route-server: does not change AS_PATH, nexthop, MED
IOS XE CLI
; disable verification that peer ASN ≠ last ASN, route-server client
(config-router)# no bgp enforce-first-as
; for iBGP neighbour, does not work on RR (route-map or keyword all)
(config-router)# neighbor <IP> next-hop-self
; for eBGP neigbour
(config-router)# neighbor <IP> next-hop-unchanged
; on route server
(config-router)# neighbor <IP> router-server-client
(config-route-map)# set ip next-hop self
NX-OS CLI
; disable verification that peer ASN ≠ last ASN, route-server client
(config-router-neighbor-af)# disable-peer-as-check
Origin
- well-known mandatory
- how prefix ended up in BGP
- values:
- 0 ≡ received via network/summary-address
- 1 ≡ EGP
- 2 ≡ other, incomplete (via redistribute)
; hidden command
(config-route-map)# set origin egp <ASN>
Local preference
- well-known discretionary
- within AS, set by ASBR
- the higher, the more priority (100 default)
- ignored by eBGP
- egress traffic engineering (how to exit AS)
(config-router)# bgp default local-preference <N>
Community
- optional transitive
- types:
- standard: 4 bytes
- extended: 8 bytes for IPv4, 20 bytes for IPv6
- prefix announce engineering
- values:
- reserved: 0:*
- private: 0x00010000 – 0xFFFEFFFF
- 0xFFFFFF01:
- NO_EXPORT
- do not announce via eBGP (permitted within confederation)
- 0xFFFFFF02:
- NO_ADVERTISE
- do not announce at all
- 0xFFFFFF03
- NO_EXPORT_SUBCONFED
- do not announce via eBGP (including confederation)
- 0x00000000
- Internet
- announce to everyone
- Cisco-defined, not RFC
- match-any in community-list
- ASN:666
- blackhole
; show communities in format of <ASN>:<N> instead of raw integer
(config)# ip bgp-community new-format
(config)# ip community-list standard <LIST> ...
(config-route-map)# set community no-export
(config-route-map)# set community-list <LST> delete
(config-route-map)# set community <ASN>:<NUM> [additive]
(config-route-map)# match community <LIST>
(config-router)# neighbor <IP> send-community
Extended community
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|I|T| Type High | Type Low | Value (variable length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Value:
- 4 bytes for Ipv4
- 18 bytes for IPv6
I: IANA authority
- 0 ≡ FIFO policy
- 1 ≡ IANA assigned
T: transitive, 0 ≡ transitive, 1 ≡ non-transitive
Types:
- 0x00: 2-octet AS-specific
- 0x03: Site of Origin
- 0x05: OSPF domain ID
- 0x40: AS-specific (2 bytes)
- 0x01: IPv4 address specific
- 0x03: Site of Origin
- 0x05: OSPF domain ID
- 0x07: OSPF RID
- 0x41: IPv4 address specific
- 0x02: 4-octet AS-specific
- 0x03: Site of Origin
- 0x05: OSPF domain ID
- 0x42: AS-specific (4 bytes)
- 0x03: opaque
- 0x00: OSPF route type + area + options
- 0x01: cost community
- 0x43: opaque
- 0x80: OSPF route type + area + options
- 0x04: QoS marking
- 0x44: QoS
- 0x05: CoS capability
- 0x45: CoS
- 0x06: EVPN
- 0x00: OSPF route type + area + options
- 0x46: EVPN
- 0x80: OSPF route type + area + options
- 0x07: flowspec
- 0x08: flowspec redirect/mirror
- 0x00: flowspec IP nexthop redirect
- 0x40: first come first served
- 0x04: DMZ link BW
- 0x43
- 0x01: cost community
- 0x80: generic
- 0x01: OSPF RIDs
- 0x05: OSPF domain ID
- 0x06: flowspec drop/police
- 0x08: flowspec VRF redirect
- 0x09: flowspec DSCP
Cost community
- extended, optional (non-)transitive
- format: ::
- the lower, the more priority
- POI – point of insertion, IGP by default, when is accounted for in bestpath selection
- prebestpath POI
- consider community before bestpath
- allows to prefer iBGP over locally originated
- protection against suboptimal routing (BGP ≈ IGP)
(config-router)# bgp bestpath cost-community ignore
(config-route-map)# set extcommunity cost prebestpath <ID> <VALUE>
Site of Origin (SoO)
- extended community, optional transitive
- protection against routing loop
- enhances convergence compared to max hop count protection
- routes with SoO, that matches local value, are discarded
- set once, not changed in IGP during flooding
- useful if AS_PATH check is not reliable: allowas-in, override-as
(config-route-map)# set extcommunity soo <VALUE>
; for distance-vector IGP, incoming update check
(config-if)# ip vrf sitemap <MAP>
(config-router-af)# neighbor <IP> soo <VALUE>
DMZ link BW
- optional non-transitive
- balancing over eBGP links only, internal links not accounted for
; enables load-balancing on BW between PE and eBGP peer
(config-router)# bgp dmzlink-bw
; adds community with BW value to the ingress route from eBGP peer
(config-router)# neighbor <IP> dmzlink-bw
Multi-exit discriminator (MED)
- optional non-transitive
- inform peer AS, which path towards own AS is better
- sent by ASBR
- not transmitted over eBGP by default: discarded on AS exit if not locally originated
- passed over iBGP by default
- checked only for paths via same peer AS
- not passed beyond peer AS
- the lower, the more priority
- 0 by default (metric in IOS)
- addition to NRLI
- on redistribute: = IGP metric
- on network: = IGP metric
- passing eBGP prefix to iBGP peer: = 0
- change of MED triggers Update only once per 10 minutes
; for redistributed routes
(config-router)# default-metric <N>
(config-router)# bgp bestpath med missing-as-worst
; check MED for paths via different AS
(config-router)# bgp always-compare-med
Determenistic MED
- by default entries in BGP RIB are in the order of being received
- MED is not compared for all entries
- undetermenistic behaviour
- groups prefixes in BGP RIB on peer AS
- best routes per peer AS are selected
- best route out of previous best is selected
- enabled by default in NX-OS
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| # of entry | AS_PATH | MED | BGP | RID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+--------+
| 1 | 500 | 150 | external | 172.16.13.1 | | – best |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
| 2 | 100 | 200 | internal | 1.1.1.1 | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+ |
| 2 | 500 | 100 | internal | 172.16.8.4 | | – best via 172.16.8.4
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-----------+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| # of entry | AS_PATH | MED | BGP | RID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+--------+
| 1 | 500 | 150 | external | 172.16.13.1 | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
| 2 | 500 | 100 | internal | 172.16.8.4 | | – best |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+ |
| 2 | 100 | 200 | internal | 1.1.1.1 | | – best via 1.1.1.1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-----------+
(config-router)# bgp deterministic-med
Weight
- Cisco proprietary
- ≈ local preference, but locally significant (not distributed within AS)
- default values:
- received prefixes: 0
- locally injected: 32768
(config-router)# neighbor <IP> weight <VALUE>
# show ip bgp
Atomic aggregate
- well-known discretionary
- added to aggregated prefix to signal the degradation of prefix precision
- not discarded from attributes
Aggregator
- optional transitive
- ASN and RID of aggregating router
Accumulated IGP cost for BGP (AIBGP)
- optional non-transitive
- compared before AS_PATH length
- IOS XR
MP_REACH_NLRI
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| AFI | SAFI | NH addr len |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\ \
/ Next-hop /
\ \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved | Prefix length | Prefix (variable length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
MP_UNREACH_NLRI
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| AFI | SAFI | Prefix length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\ \
/ Withdrawn prefix /
\ \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
BGP best path selection
- prefixes in BGP RIB are compared sequentially by default: 1 & 2 → best & 3 → best & 4 …
- order of prefixes in BGP RIB – newly received at the top
- selection
- next-hop for the prefix is reachable
- max weight
- max local preference
- locally injected: network/redistribute > summary
- lowest AS_PATH length
- min origin
- min MED
- eBGP > iBGP (eBGP > confed eBGP > iBGP)
- closest next-hop according to IGP
- if static/connected – check not performed
- 10k OSPF is better than 156k EIGRP
- ECMP for RIB if possible (up to maximum path)
- oldest eBGP route
- lowest RID
- if route from RR, Originator ID is used instead
- enabled separately if step 11 has to be overriden
- route not from RR (without Originator ID)
- reflected route with shorter cluster list
- lowest IP
- enabled separately if step 11 has to be overriden
; 120s default, max delay between peering established and bestpath selection is run
(config-router)# bgp update-delay <sec>
; global VRF only
(config-router)# bgp bestpath igp-metric ignore
eiBGP multipath
- enables load-balancing across eBGP and iBGP in RIB
- does not affect bestpath selection
- must match:
- weight
- local preference
- AS_PATH or AS_PATH length (with multipath relax)
- origin
- MED
- IGP metric to next-hop
(config-router)# address-family ipv4 vrf <VRF>
; across eBGP
(config-router-af)# maximum-paths <N>
(config-router-af)# maximum-paths iBGP <N>
(config-router-af)# maximum-paths eiBGP <N>
Regexp
- special symbols:
- . ≡ any symbol, including whitespace
- * ≡ zero or more matches with expression
- + ≡ one or more matches with expression
- ? ≡ zero or single match with expression
- ^ ≡ line start
- $ ≡ line end
- _ ≡ any separator
- \ ≡ escape symbol
- | ≡ logical OR
- [] ≡ symbol set
- examples:
- _67_ ≡ via AS 67
- ^67$ ≡ from AS 67, directly connected
- _67$ ≡ originated from AS 67
- ^67_ ≡ behind AS 67
- ^& ≡ local AS
- .* ≡ any string
# show ip bgp regexp <REGEXP>
Authentication
- TCP MD5 authC option
- no encryption
(config-router)# neighbor <IP> password <PASS>
QoS policy propagation via BGP (QPPB)
- CEF required
- injects IPP or qos-group information into RIB and FIB
- QoS is based on packet source/destination address on ingress
- destination has more priority
- mark packet with IPP or qos-group
(config-route-map)# set ip qos-group <N>
(config-route-map)# set ip precedence <N>
(config-router-af)# table-map <MAP>
(config-if)# bgp-policy source|destination ip-qos-map|ip-prec-map
# show ip route <PREFIX>
# show ip cef <PREFIX> detail
Aggregation
- adds route to summary via Null0
- adds ATOMIC_AGGREGATE attribute without AS_SET
- aggregate inherits:
- highest local preference
- highest origin
- does not inherit:
- MED
- cluster list
- originator ID
- community
; agregating IGP prefixes inherits parameters of the most specific route
; adds sumamry route to Null0
(config-router)# aggregate-address <PREFIX> [summary-only]
; removes some more specific routes from Update
(config-router)# aggregate-address <PREFIX> suppress-map <MAP>
; which prefixes to use to form summary
(config-router)# aggregate-address <PREFIX> advertise-map <MAP>
; set attributes
(config-router)# aggregate-address <PREFIX> attribute-map <MAP>
; inherits communities and ASNs
(config-router)# aggregate-address <PREFIX> as-set
; off default
; redistributed: up to classful boundary, no subnets in BGP RIB
; network: inserts classful only if BGP RIB has more specific route,
; network must have classful mask
(config-router)# auto-summary
; EIGRP leak-map alternative, matched prefixes ignore summary-only of aggregate
(config-router)# neighbor <IP> unsuppress-map <MAP>
; announces prefixes from ADV incase BGP RIB has prefixes, matching MAP
; prefixes, not matching ADV, are announced as usual
(config-router)# neighbor <IP> advertise-map <ADV> exist-map <MAP>
Redistribution
; include default on redistribution from static, EIGRP, RIP
(config-router)# default-information originate
; advertise default to peer, does not have to be in BGP RIB
(config-router)# neighbor <IP> default-originate
; sets tag as AS_PATH, by default on BGP → OSPF tag = peer ASN
(config-route-map)# set as-path tag
; MED = IGP metric to nexthop, no route cost!
(config-route-map)# set metric-type internal
Defaults
- OSPF: internal routes only
- iBGP routes are not redistributed to IGP
; permit iBGP → IGP
(config-router)# bgp redistribute-internal
Confederation
- divides AS into sub-AS
- within sub-AS – confederation iBGP
- full-mesh
- between sub-AS – confederation eBGP
- announces sub-AS iBGP routes to other sub-AS
- TTL = 1
- the rest behaviour is according to iBGP rules
- exchanges MED, local preference
- does not change nexthop
- within sub-AS – confederation iBGP
- loop protection
- split horizon is active within sub-AS
- AS_PATH + AS_CONFED_SEQUENCE (segment type) + AS_CONFED_SET (set of ASN)
- confederation internal ASN in AS_PATH ≡ loop
- iBGP AS_PATH: {65500, 65035}, 700, 600
- confederation length is not accounted in AS_PATH length (= 0)
- confederation ASN are removed from AS_PATH before sending outside of confederation
- MED
- not compared for internal confederation prefixes by default (no external ASN in AS_PATH) between different sub-AS
- compared if AS_PATH_CONFED is empty
- private ASN inside confederation
- avoid dropping valid routes with same AS in AS_PATH as sub-AS
; global ASN, sub-ASN ≡ BGP process ID
(config-router)# bgp confederation identifier <ASN>
; sub-AS list
(config-router)# bgp confederation peers <SUB_ASN> ...
; compare MED for routes from confederation, by default – only via same external AS
(config-router)# bgp bestpath med confed
Route reflector
- relaxes iBGP full-mesh requirement
- roles:
- RR server: hub
- client: spoke
- non-client
- preserve information about RR clusters along the Update
- Originator ID
- optional non-transitive
- RID of the peer that announced the prefix
- added by RR before transmitting over iBGP
- if prefix with own Originator ID is received – reject
- Cluster list
- optional non-transitive
- cluster ID is added by RR before transmitting over iBGP
- if prefix with has own ID in cluster list – reject ⇒ loop protection between clusters
- link between clusters – high cost
- selection:
- manual
- BGP RR RID
- RR does not change next-hop even with next-hop-self
- can be set by route-map
- can be overriden by next-hop-self all
- backreflection
- RR sends Update, received from client, towards clients and non-clients
- update group of sender client contains at least one other peer
- suboptimal routing
- RR conceals topology information
- RR route selection influences options on other speakers (RR sends only bestpath)
- RR should be placed according to physical topology to avoid suboptimal routing and loops
Source | Announce to client | Announce to non-client |
---|---|---|
client | + | + |
non-client | + | – |
eBGP | + | + |
(config-router)# bgp cluster-id <ID>
; RR config
(config-router)# neighbor <IP> route-reflector-client
; disable passing client routes to clients, pass to iBGP/eBGP peers only
(config-router)# no bgp client-to-client reflection
; disable intra and inter-cluster exchange
(config-router)# no bgp client-to-client reflection all
; restrict within cluster
(config-router)# no bgp client-to-client reflection intra-cluster cluster-id <ID>
# show ip bgp update-group# show ip bgp cluster-ids
Persistent route oscillation
- RFC 3345
- solutions:
- inter-cluster link with high IGP cost
- do not accept MED
- always compare MED (including different AS)
- use path attributes with more priority for path engineering
Prefix independent convergence (PIC)
- installs next best into BGP RIB, RIB, FIB along with best ⇒ on failure switchover does not require BGP RIB processing
- local significance
- CEF recursion
- slows down PIC on RR:
- backup nexthop is already calculated via BGP in lieu of CEF
- no need to search backup nexthop for primary nexthop in RIB among directly connected
- not set for iBGP
- set for IPv4, VPNv4
- slows down PIC on RR:
- no support for mcast, L2VPN
- overrides local convergence feature
- per VRF (VRF AF mode) or for all VRF (VPN AF mode)
- enabled by default in NX-OS
(config-router)# bgp additional-paths install
; disables recursion for /32 and directly connected
(config-router)# no bgp recursion host
; RIB with backup paths
# show ip route repair-paths
Addpath
- capability 69 ⇒ disruptive, required session reset
- replaces implicit Withdraw per prefix, allows several paths per prefix from same peer
- different paths have different Path ID (4 bytes), prepended to NLRI
- iBGP only
(config-router-af)# bgp additional-paths receive|send [receive]
; defines prefixes enabled globally for addpath:
; N bestpaths, 1 bestpath per nexthop AS, backup has more priority
(config-router-af)# bgp additional-paths select all
(config-router-af)# bgp additional-paths select best <N>
(config-router-af)# bgp additional-paths select group-best
(config-router-af)# bgp additional-paths select best-external [backup]
; negotiate capability
(config-router-af)# neighbor <IP> additional-paths ...
; defines what to announce
(config-router-af)# neighbor <IP> advertise additional-paths all
(config-router-af)# neighbor <IP> advertise additional-paths best <N>
(config-router-af)# neighbor <IP> advertise additional-paths group-best
(config-router-af)# neighbor <IP> advertise additional-paths best-external
; no support forin, because advertise-set – internal entity
(config-router-af)# neighbor <IP> route-map <MAP> out
(config-route-map)# match additional-paths advertise-set best <N>
(config-route-map)# match additional-paths advertise-set best-range <N> <M>
(config-route-map)# match additional-paths advertise-set group-best [best <N>]
Addpath capability (0x45)
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Capabil Code | Capab length | AFI |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SAFI | Send/Receive |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Send/Receive:
- 1: receive
- 2: send
- 3: both
Diverse path
- RR towards clients
- shadow RR announces backup path or some from multipath instead of best
- can be performed with single RR with two sessions
(config-router)# bgp bestpath igp-metric ignore
(config-router)# bgp additional-paths select [backup]
(config-router)# bgp additional-paths install
(config-router)# neighbor <IP> route-reflector-client
; for RR clients, sends backup
(config-router)# neighbor <IP> advertise diverse-path [backup] [mpath]
; for non-clients, sends best eBGP path
(config-router)# neighbor <IP> advertise best-external
Graceful restart
- RFC 4724
- capability 64
- GR-aware: understands GR, continue forwarding
- GR-capable: dual RP
- non-stop routing
- alternative to NSF, seamless switchover
- routing information copies on both RP
- more resource consumption
- no need for support on peer
- without NSF on TCP resetup:
- new BGP session
- BGP session collision
- BGP notification, until hold timer expires
- NSF on TCP resetup:
- closes local stale TCP session
- switch to new TCP session without Notification
; enable NSF
(config-router)# bgp graceful-restart
; 120s default, waitfor session to be reestablished
(config-router)# bgp graceful-restart restart-time <RST>
; 360s default, lifetime for stale routes after session is reestablished
(config-router)# bgp graceful-restart stalepath-time <ST>
(config-router-stmp)# ha-mode graceful-restart
Graceful restart capability (0x40)
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Capabil Code | Capab length |R| Rsvd | Restart time (sec) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
| AFI | SAFI | AF flags | | AFI information entry
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
R: restart flag, 1 ≡ peer restarted, can send routes
AF flags:
- 0x80: forwarding state preserved
BGP wedgie
- RFC 4264
- non-deteministic behaviour, depends on Update ordering
- root causes
- policy enforcement in transit AS
- only best path is announced
- solution for primary/backup scenario – conditional route advetisement
- track static /32 peer address
- static is tied to track, track is tied to ICMP SLA
- peer is down → static /32 is down → announce to other peer