BGP

  1. ASN
  2. BGP
    1. RID
    2. Messages
      1. Header
      2. Open
      3. Update
      4. Notification
      5. Route refresh
        1. Enchanced route refresh
    3. Peering
      1. BGP update-group
      2. BGP peer-group
      3. Templates
    4. State
    5. Timers
    6. Prefix announce
      1. Uplink types
      2. Prevent transit
      3. Filter preference
      4. Route-map
        1. Keyword continue
      5. Route flap dampening
      6. Outbound route filter (ORF)
        1. ORF capability (0x03)
        2. Route refresh with ORF
      7. BGP sync
      8. Conditional route injection
    7. Topology codes
    8. Processes
  3. MP-BGP
    1. MP-BGP capability (0x01)
    2. IPv4 over IPv6
    3. IPv6 over IPv4
    4. IPv6 native
    5. Multicast BGP
    6. Flowspec
      1. Flowspec server (IOS XR)
      2. Flowspec client (IOS XE)
    7. Route target constraint (RTC)
    8. Route-target membership MP_(UN)REACH_NLRI prefix
  4. Path attributes
    1. AS_PATH
    2. Nexthop
      1. IOS XE CLI
      2. NX-OS CLI
    3. Origin
    4. Local preference
    5. Community
      1. Extended community
      2. Cost community
      3. Site of Origin (SoO)
      4. DMZ link BW
    6. Multi-exit discriminator (MED)
      1. Determenistic MED
    7. Weight
    8. Atomic aggregate
    9. Aggregator
    10. Accumulated IGP cost for BGP (AIBGP)
    11. MP_REACH_NLRI
    12. MP_UNREACH_NLRI
  5. BGP best path selection
    1. eiBGP multipath
  6. Regexp
  7. Authentication
  8. QoS policy propagation via BGP (QPPB)
  9. Aggregation
  10. Redistribution
    1. Defaults
  11. Confederation
  12. Route reflector
    1. Persistent route oscillation
  13. Prefix independent convergence (PIC)
    1. Addpath
      1. Addpath capability (0x45)
    2. Diverse path
  14. Graceful restart
    1. Graceful restart capability (0x40)
  15. BGP wedgie

ASN

  • reserved: 0, 65535
  • IANA-assigned: 1 – 64495
  • documentation: 64496 – 64511
  • private
    • 64512 – 65534
    • 4200000000 – 2³²-1
  • 23456: replace 32bit ASN for routers that do not support 32bit ASN
; inbound: removes private AS
; outbound: replaces private AS with own single AS
(config-router)# neighbor <IP> remove-private-as

(config-router)# bgp asnotation dot

BGP

  • neighbours are configured manually
  • TCP 179, CS6
  • attributes in lieu of metric ⇒ path-vector protocol
  • aimed at scalability, policy implementation
  • single best route from BGP RIB is announced
  • AD
    • eBGP = 20
    • iBGP = 200
  • the lower route in BGP RIB is, the older it is
(config-router)# distance bgp <EBGP> <IBGP> <LOCAL_BGP>

; augment config to AF mode
(config-router)# bgp upgrade-cli

RID

  • must be unique
  • selection
    1. manual config
    2. loopback
      • IOS: highest IP
      • NX-OS: loopback0
    3. highest IP from interface in up/up
; disabled by default, compares RID instead of selecting oldest eBGP prefix
(config-router)# bgp bestpath compare-routerid

Messages

  • TTL
    • 1 for eBGP
    • 255 for iBGP
  • maximum size – 4096 words ⇒ limitation for SR policy
  • implicit withdraw: Update for prefix with different next-hop (by default only single bestpath)
  • messages
    1. OPEN
      • establish peering
      • exchange capabilities, extensions
    2. UPDATE
      • exchange routes
        • single UPDATE for one set of attributes
        • several UPDATE in a single TCP segment
      • KEEPALIVE alternative
    3. NOTIFICATION
      • error notification
    4. KEEPALIVE
      • keep peering open
      • acknowledge parameters, received in OPEN
(config-router)# neighbor <IP> ebgp-multihop <TTL>
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|_                                                             _|
|_                            Marker                           _|
|_                  (deprecated, all binary 1)                 _|
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             Length            |      Type     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Length – bytes, including header
Type:

  1. Open
  2. Update
  3. Notification
  4. Keepalive (no payload)
  5. Route refresh

Open

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                                                +-+-+-+-+-+-+-+-+
                                                |    Version    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     My autonomous system      |           Hold time           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            BGP RID                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Param length  |     Optional parameters (variable length)     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Optional parameters – TLVs (type, length – 1 byte)

  • 0x02 ≡ capability
    • 0x01: MP-BGP
    • 0x03: ORF
    • 0x40: graceful restart
    • 0x45: addpath
    • 0x46: enchanced route refresh

Update

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Withdrawn routes length    | Withdrawn routes (var length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Path attributes length     | Path attributes (var length)  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\                                                               \
/                              NLRI                             /
\                                                               \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Withdrawn routes length = 0 ⇒ Withdrawn routes is not included
Path attributes length = 0 ⇒ attributes and NLRI are not included

Withdrawn routes/NLRI:

  • length (1 byte) + prefix (variable length)
  • length = 0 ⇒ match all routes

Notification

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      Code     |    Subcode    |       Data (var length)       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Code:

  1. Message header error
    1. connection not synced
    2. bad message length
    3. bad message type
  2. Open message error
    1. unsupported version
    2. bad peer AS
    3. bad BGP RID
    4. unsupported optional parameter
    5. authentication failure (deprecated)
    6. unacceptable hold time
    7. unsupported capability (≠ unknown capability)
  3. Update message error
    1. malformed attribute list
    2. unrecognized well-known attribute
    3. missing well-known attribute
    4. attribute flags error
    5. attribute length error
    6. invalid ORIGIN
    7. AS routing loop (deprecated)
    8. invalid NEXT_HOP
    9. optional attribute error
    10. invalid network field
    11. malformed AS_PATH
  4. Hold timer expired
  5. FSM error
  6. Cease (alternative to TCP FIN)

Route refresh

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             AFI               |    Subtype    |      SAFI     | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Subtype:

  • 0: normal
  • 1: SOR
  • 2: EOR
  • 255: reserved
; use soft reconfiguration if route refresh is not supported by peer
(config-router)# bgp soft-reconfig-backup

; preserves in memory raw Update from peer before processing
(config-router)# neighbor <IP> soft-reconfiguration inbound
; use BGP route refresh to resend Update
# clear ip bgp in

; use route refresh, if possible
# clear ip bgp soft

Enchanced route refresh

  • capability 0x46 (zero length)
  • removes stale routes that are not refreshed by Route refresh ≡ consistency check
  • sends Start-of-RIB (SOR) before prefixes and End-of-RIB (EOR) after prefixes
; 0 ≡ disabled (default), max time to generate EOR, can be useful against flaps
(config-router)# bgp refresh max-eor-time <sec>

; 0 ≡ disabled (default), removes stale routes without receiving EOR on expiration
(config-router)# bgp refresh stalepath-time <sec>

Peering

  • must differ: RID
  • must match: minimum hold values
  • successful authC
  • directly connected for eBGP
  • peer is reachable via non-default route (at least for one of the peers)
  • connection collision:
    • 2 sessions are initially established, only one remains
    • highest RIB – BGP server (TCP 179)
; enabled by default
(config-router)# bgp transport path-mtu-discovery

(config-router)# neighbor <IP> remote-as <ASN>

; does not remove TTL = 1 verification
(config-router)# neighbor <IP> disable-connected-check

; TTL is not decreases for packets to the router
(config-router)# neighbor <IP> update-source <INTF>

; passive waits for TCP, does not send SYN itself – for passing strict ACLs
(config-router)# neighbor <IP> transport connection-mode active|passive

; on default
(config-router)# neighbor <IP> transport path-mtu-discovery [disable]

; uses local ASN instead of BGP process number for peering
; no-prepend: do not add local ASN to AS_PATH of ingress Update
; replace-as: replace real ASN with local ASN for egress eBGP Update
; dual-as: peering with local ASN and real ASN
(config-router)# neighbor <IP> local-as <ASN> [no-prepend] [replace-as] [dual-as]

; replace all peer ASN in AS_PATH with own AS
(config-router)# neighbor <IP> as-override

; sends messages with TTL = 255, eBGP only
; eBGP peer must be N hops away, otherwise – discard, no ICMP error
; N = 1 – connected, TTL = 254 after decreasing on peer 
(config-router)# neighbor <IP> ttl-securiry hops <N>

; tear down directly connected peering after link/BFD is down or BFD
(config-router)# neighbor <IP> fallover [bfd]

; checks route to nexthop (not address), if route matches deny – reset session
(config-router)# neighbor <IP> fallover route-map <MAP>
; tear down session with directly connected eBGP peer iflink is down
(config-if)# ip bgp fast-external-fallover permit|deny
; is peer capable of route refresh?
# show ip bgp neighbor <IP>

; requires soft-reconfiguration inbound, Update from peer before processing
# show ip bgp neighbor <IP> received-routes

# show ip bgp neighbor <IP> routes

; Update after filtering
# show ip bgp neighbor <IP> advertised-routes

# show ip bgp neighbor <IP> policy [detail]
; soft out ≡ out
# clear ip bgp <IP> soft out

; hard reset
# clear ip bgp <IP>|*

; processes Update from cache (deprecated approach)
# clear ip bgp soft in

BGP update-group

  • single Update per group instead of per peer ⇒ saves CPU cycles
  • determined automatically
# show ip bgp update-group

BGP peer-group

  • forms update group ⇒ does not permit filter per peer (can be fixed using templates instead)
(config-router)# neighbor <IP> peer-group <PEER_GROUP>

Templates

  • replaces peer-group
  • types
    1. session
      • inherits from direct parent only
      • up to 8 templates in the chain
    2. policy
      • inherits up to 7 templates (directly and indirectly)
      • up to 8 templates in the chain
      • template with larger number overwrites template with lower number if there is conflict
(config-router)# template peer-session <SESSION_TMPL>
(config-router-stmp)# inherit peer-session <PARENT_SESSION>

(config-router)# template peer-policy <POLICY_TMPL>
(config-router-ptmp)# inherit peer-policy <PARENT_POLICY>

State

  1. IDLE
    • does not wait for TCP
  2. CONNECT
    • waits for TCP, passive neighbour
    • TCP SYN+ACK is sent, waiting for ACK
  3. ACTIVE
    • TCP session initiated, active neighbour
    • up to 16 retries
    • TCP SYN is sent, waiting for SYN+ACK
    • TCP RST is received
  4. OPENSEND
    • TCP is established and OPEN is sent
  5. OPENCONFIRM
    • OPEN is received
  6. ESTABLISHED
    • peering established

Timers

  • if timers do not match, lower value is used
  • hold timer
    • 180s by default
    • announced in OPEN
    • cannot be less than 3s
    • on expiration peer is dead
  • keepalive
    • not announced but calculated
    • hold = 0 ⇒ no keepalive
    • selection
      1. hold = local hold: use local value
      2. hold ≠ local hold, local hold < ⅓ hold: use local value
      3. floor(⅓ hold)
  • Connect relay
    • 120s, constant
    • interval between attempts to establish session
    • doubled with every tick
  • Minimum route advertisement interval (MRAI)
    • defaults:
      • eBGP: 30s
      • iBGP: 0s
      • eBGP VRF: 0s
    • always 0s for NX-OS
; negotiated in Open ⇒ requires hard reset to be in effect
(config-router)# timers bgp <KEEP> <HOLD> [<MIN_NEIGHBOUR_HOLD>]

; MRAI
(config-router)# neighbor <IP> advertisement-interval <sec>

(config-router)# neighbor <IP> timers <KEEP> <HOLD> [<MIN_NEIGHBOUR_HOLD>]

Prefix announce

  • best path only
  • iBGP
    • split horizon: routes, received via iBGP, are not sent via iBGP ⇒ iBGP full-mesh within AS
      • solutions: route reflector, confederation
  • eBGP:
    • does not announce prefixes transiting AS{n} (last in AS_PATH) towards AS{n} (NX-OS)
    • prefix and peer must be reachable via same interface (disable for peering on loopbacks)
  • triggered update only
    • iBGP: 5s
    • eBGP: 30s
  • loop prevention:
    • protection from external loop through own AS
    • redistribute BGP → IGP not recommended, because IGP are not designed for such a number of prefixes
; eBGP prefix uses AD = <local AD> in order to prefer IGP route
(config-router)# network <IP> mask <MASK> backdoor
# show ip bgp <PREFIX>
  • single onehomed
  • dual onehomed
  • single multihomed
  • dual multihomed

Prevent transit

  • AS list filter
  • no-export community
  • prefix-list
  • distribute-list

Filter preference

  • ingress:
    1. filter-list
    2. route-map
    3. prefix-list/distribute-list
  • egress:
    1. filter-list
    2. route-map
    3. unsuppress-map: sets attributes for unsuppressed prefixes
    4. advertise-map
    5. prefix-list/distribute-list
  • prefix-list and distribute-list are mutually exclusive
  • if not-existent ACL/prefix-list is applied ≡ permit all
  • extended ACL filter: prefix & mask

Route-map

(config)# ip route-tag list <TAG_LIST>

; alternative to multiple match in route-map
(config)# ip policy-list <PLIST> permit|deny
(config-policy-list)# match ...
(config-route-map)# match tag list <TAG_LIST>

; BGP only
(config-route-map)# match policy-list <PLIST>
(config-router)# neighbor <IP> route-map <MAP> in

Keyword continue

  • BGP filtering only
  • if matched by entry with continue, implicit deny is not applied

Route flap dampening

  • RFC 2439
  • if prefix flaps or changes prequently – do not include in Update and best path selection
    • reset not considered a flap
  • not recommended: may dampen prefixes, reachable via different AS
  • eBGP routes only
(config-router)# bgp dampening [<HALF_LIFE> <REUSE> <SUPPRESS> <MAX_SUPPRESS>]
; routes with penalty
# show ip bgp flap-statistics

; dampened routes
# show ip bgp dampened-paths

; information about penalty
# show ip bgp <PREFIX>
; removes penalties
# clear ip bgp flap-statistics 

; removes penalties from dampened
# clear ip bgp dampening 

Outbound route filter (ORF)

  • RFC 5291
  • sends inbound filters to peer that applies them outbound
    • capability is negotiated in OPEN
    • filters are sent in Route Refresh: AFI, SAFI, action, when to refresh
  • limitation
    • IPv4/v6 unicast
    • prefix-list only
    • eBGP only
(config-router)# neighbor <IP> capability orf prefix-list both|send|receive
; received ORF
# show ip bgp neighbor <IP> received prefix-lists

; refresh ORF on neighbour
# clear ip bgp <IP> in prefix-filter

ORF capability (0x03)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Capabil Code  | Capab length  |              AFI              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   Reserved    |     SAFI      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Route refresh with ORF

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              AFI              |    Reserved   |      SAFI     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Refresh    |    ORF type   |           ORF length          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| A |M|Reserved |            Value (variable length)            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| A |M|Reserved |            Value (variable length)            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Refresh:

  • 0x01 ≡ immediate
  • 0x02 ≡ defer (accumulate several Update with ORF)

A: action

  • 0 ≡ add
  • 1 ≡ remove
  • 2 ≡ remove-all

M: match

  • 0 ≡ permit
  • 1 ≡ deny

ORF type:

  • 128 ≡ Cisco prefix list

BGP sync

  • do not consider iBGP route to be “best” unless same prefix is found in RIB from IGP or static
  • fixes routing loop or blackhole on BGP → IGP redistribution
  • deprecated, not available in NX-OS
  • symptoms:
    • valid route (marked by *) is not used
    • show ip bgp <PREFIX> has not synchronized status
  • OSPF RID of the IGP route must match BGP peer RID, otherwise not synchronized
(config-router)# synchronisation

Conditional route injection

  • injects more precise prefixes into BGP RIB on receiving aggregate via BGP (not local)
  • traffic engineering
; without copy-attributes: origin = incomplete, null AS_PATH
(config-router)# bgp inject-map <INJECT> exist-map <EXIST> [copy-attributes]
(config)# ip prefix-list <AGG> permit <PREFIX>

; only /32 towards route source, others ignored
(config)# ip prefix-list <SRC> permit <PREFIX>

; if not part of  subnet AGG – ignore
(config)# ip prefix-list <INJ> permit <PREFIX>
(config)# route-map <INJECT>
(config-route-map)# set ip address prefix <INJ>

(config)# route-map <EXIST>
(config-route-map)# match ip address prefix <AGG>

; mandatory
(config-route-map)# match ip route-source prefix <SRC>

Topology codes

  • *: valid (Update passed validity check, e.g., nexthop is present)
  • >: best (placed in RIB, announced)
  • r: RIB failure (not placed in RIB)
  • i: internal (received from iBGP peer)
; maximum prefix count, if exceeded – tear peering down
; THRESHOLD: 75% default, when to generate warning
; warning-only: do not tear peering down on exceeding, only alert
; restart: try to reestablish peering every mins
(config-router)# neighbor <IP> maximum-prefix <NUM> [<THRESHOLD>] [warning-only] [restart <mins>]

; disabled by default, do not send prefixes that are RIB failure
(config-router)# bgp suppress-inactive
# show ip bgp rib-failure

Processes

  1. I/O
    • BGP queue interaction with TCP
  2. Router
    • process Update (including filtering)
    • best path selection
    • RIB modification
    • run triggers:
      • 1s regular interval
      • peer established/removed/soft-reconfigured
    • medium priority
  3. Scanner
    • tracks RIB changes and adds prefixes from RIB to BGP
    • route dampening
    • conditional route advertisement
    • run every 60s by default (timer between runs, does not include processing time)
    • low priority
  4. Nexthop tracker
    • lightweight scanner
    • tracks nexthop changes instead of Scanner
    • event-driven (by RIB change)
      • hook on list of nexthops
      • IP RIB Updates call the hook
      • 5s delay between RIB change and hook: allow IGP to converge first
  5. Event
    • triggered by network or redistribute
  6. Import scanner
    • adds prefixes to VPNv4/v6 RIB
    • interval between processings – 15s by default
  7. Open
    • per peer
    • (re)establishes BGP session
; resolves nexthop using RIB
(config-router)# bgp scan-time <sec>

; delay NHT after RIB Update before BGP RIB Update
(config-router)# bgp nexthop trigger delay <sec>

; enabled by default, NHT
(config-router)# bgp nexthop trigger enable

; permits only prefixes, whose nexthops are reachable via routes, permitted by MAP
; if route to nexthop is denies by MAP, BGP prefix – inaccessible
(config-router)# bgp nexthop route-map <MAP>
; RIB updates
# debug ip routing

; BGP messages without Update contents
# debug ip bgp

; BGP messages including Update contents
# debug ip bgp updates

; Nexthop tracker debug
# debug ip bgp rib-filter

MP-BGP

  • RFC 4760
  • several AF in a single Update
  • AFI – address-family identifier
    • 1 ≡ IPv4
    • 2 ≡ IPv6
    • 25 ≡ VPLS
  • SAFI – subsequent AFI
    • 1 ≡ unicast
    • 2 ≡ mcast
    • 3 ≡ unicast + mcast
    • 4 ≡ MPLS label
    • 65 ≡ VPLS Kompella mode
    • 66 ≡ MDT
    • 70 ≡ EVPN
    • 128 ≡ MPLS L3VPN
    • 129 ≡ mcast VPN
    • 132 ≡ rtfilter
    • 133 ≡ flowspec
    • 134 ≡ L3VPN flowspec
  • MP_REACH_NLRI, MP_UNREACH_NLRI: optional non-transitive
  • negotiated in Open via capabilities; if not supported → terminate and Open without MP-BGP

MP-BGP capability (0x01)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Capabil Code  |Capab length(4)|              AFI              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   Reserved    |      SAFI     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+        

IPv4 over IPv6

  • nexthop – first 4 bytes of IPv6 nexthop

IPv6 over IPv4

  • nexthop – mapped IPv4
; disable auto-selection of next-hop (IPv4-mapped IPv6 for IPv4-only interface)
; peering interface must have global IPv6 address to accept IPv6 nexthop as valid
(config-router)# no bgp default ipv6-nexthop

(config-router)# neighbor <IPv4> remote <ASN>
(config-router)# address-family ipv6
(config-router-af)# network <NETWORK6>
(config-router-af)# neighbor <IPv4> activate

; rewrite IPv4 next-hop with IPv6 next-hop
(config-router-af)# neighbor <IPv4> route-map <RMAP> out  
(config)# route-map <RMAP>
(config-route-map)# set ipv6 next-hop <IPv6>

IPv6 native

(config-router)# neighbor <IPv6> remote <ASN>
(config-router)# address-family ipv6
(config-router-af)# network <NETWORK6>
(config-router-af)# neighbor <IPv6> activate

Multicast BGP

  • mRIB has more priority over uRIB for RPF
  • if nexthop in BGP RIB is known via interface, not enabled for PIM – inaccessible
(config-router)# address-family ipv4 multicast
(config-router-af)# neighbor <IP> activate

Flowspec

  • BGP SAFI
  • passes flow information to apply policy (class-map + policy-map) for DDoS mitigation (more granular than RTBH)
  • applied after QoS
  • no support for mcast, MPLS
  • match – NLRI type
    1. IPv4/v6 dst
    2. IP src
    3. IPv4 protocol / IPv6 next header
    4. TCP/UDP src/dst
    5. TCP/UDP dst
    6. TCP/UDP src
    7. ICMP type
    8. ICMP code
    9. TCP flags
    10. IP length
    11. DSCP
    12. is a fragment
  • action – extended community
    • 0x0800: IP nexthop redirect
    • 0x8006: drop (rate = 0) or police
    • 0x8008: VRF redirect using RT
    • 0x8009: mark DSCP
  • validation
    • IPv4/v6 only, VPN is not verified
    • rejects redirect IP community, if eBGP ASN ≠ last AS in AS_PATH
    • conditions (logical OR)
      1. flow originator ≡ originator of best route to destination
      2. AS_PATH empty (no AS_SET/AS_SEQUENCE)

Flowspec server (IOS XR)

(config)# class-map type traffic match-all <CMAP>
(config)# policy-map type pbr <PMAP>
(config-pmap)# class type traffic <CMAP>

(config)# flowspec
(config-flowspec)# address-family ipv4
(config-flowspec-af)# service-policy type pbr <PMAP>

Flowspec client (IOS XE)

(config)# flowspec
(config-flowspec)# address-family ipv4
(config-flowspec-af)# local-install interface-all
(config-if)# ip flowspec disable

Route target constraint (RTC)

  • RFC 4684
  • AFI = 1, SAFI = 132 (AF rtfilter)
  • negotiated via capabilities (MP-BGP support), requires support on both peers
  • PE announces RT in use
  • RR filters out prefixes, unused by PE, based on RT (instead of being dropped by PE on receiving)
(config-router)# address-family rtfilter unicast
(config-router-af)# neighbor <IP> activate
(config-router-af)# neighbor <IP> send-community extended

Route-target membership MP_(UN)REACH_NLRI prefix

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Origin ASN                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|_                              RT                             _|
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

If prefix length in NLRI is zero ≡ accept all RT (default RT)

Path attributes

  1. well-known mandatory
  2. well-known discretionary
  3. optional transitive: has to be forwarded further
  4. optional non-transitive: not forwarded further if not recognized
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|O|T|P|E| Rsrvd |   Type code   | Attribute length (var length) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\                                                               \
/                         Attribute data                        /
\                                                               \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

O: optional bit, 0 ≡ well-known
T: transitive bit, 0 ≡ non-transitive
P: partial bit, 0 ≡ optional transitive attribute is complete
E: extended length bit, 0 ≡ attribute length of 1 byte, 1 ≡ attribute length of 2 bytes

Type code:

  • 1: ORIGIN
  • 2: AS_PATH
    • segment types
      • 1 ≡ AS_SET
      • 2 ≡ AS_SEQUENCE
      • 3 ≡ AS_CONFED_SET
      • 4 ≡ AS_CONFED_SEQUENCE
  • 3: Nexthop
  • 4: MED (4 bytes)
  • 5: Local preference (4 bytes)
  • 6: Atomic aggregate
  • 7: Aggregator
  • 8: community
  • 9: originator ID
  • 10: cluster list
  • 14: MP_REACH_NLRI
  • 15: MP_UNREACH_NLRI
  • 16: extended community
  • 17: AS4_PATH: 4 byte ASN
  • 18: AS4_AGGREGATOR
; sets parameters of BGP routes that are installed in RIB
; filter ≡ only permitted prefixes are installed in RIB
(config-router)# tanle-map <MAP> [filter]

AS_PATH

  • well-known mandatory
  • segment types
    • AS_SEQUENCE – ordered
    • AS_SET – set (after prefix aggregation)
      • includes AS from AS_PATH of all routes subordinate to aggregate ⇒ aggregate is less stable
      • only +1 to AS_PATH length
      • can be added to aggregated prefix for loop prevention
  • ASN number is appended on transmitting update to other ASN
  • best practice: AS should be contiguous (blackhole otherwise)
  • functions
    • routing loops discovery:
      • route is discarded if AS_PATH contains own ASN
      • filtering on ingress, reason – allowas-in
    • routing policy enforcement:
      • AS_PATH prepend: ingress traffic engineering
    • optimal route selection
      • lists ASNs along the path to destination
      • hops – ASNs, not routers
(config)# ip as-path access-list <N> permit|deny <REGEXP>
; eBGP only
(config-route-map)# set as-path prepend <ASN>
; disable loop prevention, allow own AS in AS_PATH up to NUM times
(config-router)# neighbor <IP> allowas-in <NUM>

; hidden, load-balance over eBGP routes with different AS_PATH of same length
; disabled by default, iBGP load-balancing is done any way
(config-router)# bgp bestpath as-path multipath-relax

; enabled by default, tear down eBGP session with directly connected peer iflink is down
; can be disabled iflink flaps
(config-router)# bgp fast-external-failover

; max AS_PATH length, if exceeded – discard prefix
(config-router)# bgp maxas-limit <N>

(config-router)# bgp bestpath as-path ignore

Nexthop

  • well-known mandatory
  • IP within next AS to reach destination (eBGP peers)
  • not changed within AS
    • recursive routing
      • IGP load-balancing is easier
      • convergence depends on IGP (quick, distributed)
    • next-hop-self
      • convergence depends on WITHDRAW (slow)
      • auto-enabled for iBGP if eBGP peering is over link-local IPv6 addresses
  • 0.0.0.0 ≡ network/prefix is self-originated
  • added before transmitting over eBGP
  • not changed for eBGP sessions (route server):
    • if next-hop is within directly connected LAN (≠ reachable through same interface!) AND
    • peering on LAN interface
    • IPv6: link-local nexthop address is appended after global nexthop
  • route-server: does not change AS_PATH, nexthop, MED

IOS XE CLI

; disable verification that peer ASN ≠ last ASN, route-server client
(config-router)# no bgp enforce-first-as
; for iBGP neighbour, does not work on RR (route-map or keyword all)
(config-router)# neighbor <IP> next-hop-self

; for eBGP neigbour
(config-router)# neighbor <IP> next-hop-unchanged

; on route server
(config-router)# neighbor <IP> router-server-client
(config-route-map)# set ip next-hop self

NX-OS CLI

; disable verification that peer ASN ≠ last ASN, route-server client
(config-router-neighbor-af)# disable-peer-as-check 

Origin

  • well-known mandatory
  • how prefix ended up in BGP
  • values:
    • 0 ≡ received via network/summary-address
    • 1 ≡ EGP
    • 2 ≡ other, incomplete (via redistribute)
; hidden command
(config-route-map)# set origin egp <ASN>

Local preference

  • well-known discretionary
  • within AS, set by ASBR
  • the higher, the more priority (100 default)
  • ignored by eBGP
  • egress traffic engineering (how to exit AS)
(config-router)# bgp default local-preference <N>

Community

  • optional transitive
  • types:
    • standard: 4 bytes
    • extended: 8 bytes for IPv4, 20 bytes for IPv6
  • prefix announce engineering
  • values:
    • reserved: 0:*
    • private: 0x00010000 – 0xFFFEFFFF
    • 0xFFFFFF01:
      • NO_EXPORT
      • do not announce via eBGP (permitted within confederation)
    • 0xFFFFFF02:
      • NO_ADVERTISE
      • do not announce at all
    • 0xFFFFFF03
      • NO_EXPORT_SUBCONFED
      • do not announce via eBGP (including confederation)
    • 0x00000000
      • Internet
      • announce to everyone
      • Cisco-defined, not RFC
      • match-any in community-list
    • ASN:666
      • blackhole
; show communities in format of <ASN>:<N> instead of raw integer
(config)# ip bgp-community new-format

(config)# ip community-list standard <LIST> ...
(config-route-map)# set community no-export
(config-route-map)# set community-list <LST> delete
(config-route-map)# set community <ASN>:<NUM> [additive]
(config-route-map)# match community <LIST>
(config-router)# neighbor <IP> send-community

Extended community

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|I|T| Type High |    Type Low   |    Value (variable length)    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Value:

  • 4 bytes for Ipv4
  • 18 bytes for IPv6

I: IANA authority

  • 0 ≡ FIFO policy
  • 1 ≡ IANA assigned

T: transitive, 0 ≡ transitive, 1 ≡ non-transitive

Types:

  • 0x00: 2-octet AS-specific
    • 0x03: Site of Origin
    • 0x05: OSPF domain ID
    • 0x40: AS-specific (2 bytes)
  • 0x01: IPv4 address specific
    • 0x03: Site of Origin
    • 0x05: OSPF domain ID
    • 0x07: OSPF RID
    • 0x41: IPv4 address specific
  • 0x02: 4-octet AS-specific
    • 0x03: Site of Origin
    • 0x05: OSPF domain ID
    • 0x42: AS-specific (4 bytes)
  • 0x03: opaque
    • 0x00: OSPF route type + area + options
    • 0x01: cost community
    • 0x43: opaque
    • 0x80: OSPF route type + area + options
  • 0x04: QoS marking
    • 0x44: QoS
  • 0x05: CoS capability
    • 0x45: CoS
  • 0x06: EVPN
    • 0x00: OSPF route type + area + options
    • 0x46: EVPN
    • 0x80: OSPF route type + area + options
  • 0x07: flowspec
  • 0x08: flowspec redirect/mirror
    • 0x00: flowspec IP nexthop redirect
  • 0x40: first come first served
    • 0x04: DMZ link BW
  • 0x43
    • 0x01: cost community
  • 0x80: generic
    • 0x01: OSPF RIDs
    • 0x05: OSPF domain ID
    • 0x06: flowspec drop/police
    • 0x08: flowspec VRF redirect
    • 0x09: flowspec DSCP

Cost community

  • extended, optional (non-)transitive
  • format: ::
    • the lower, the more priority
    • POI – point of insertion, IGP by default, when is accounted for in bestpath selection
    • prebestpath POI
      • consider community before bestpath
      • allows to prefer iBGP over locally originated
      • protection against suboptimal routing (BGP ≈ IGP)
(config-router)# bgp bestpath cost-community ignore
(config-route-map)# set extcommunity cost prebestpath <ID> <VALUE>

Site of Origin (SoO)

  • extended community, optional transitive
  • protection against routing loop
  • enhances convergence compared to max hop count protection
  • routes with SoO, that matches local value, are discarded
  • set once, not changed in IGP during flooding
  • useful if AS_PATH check is not reliable: allowas-in, override-as
(config-route-map)# set extcommunity soo <VALUE>
; for distance-vector IGP, incoming update check
(config-if)# ip vrf sitemap <MAP>
(config-router-af)# neighbor <IP> soo <VALUE>
  • optional non-transitive
  • balancing over eBGP links only, internal links not accounted for
; enables load-balancing on BW between PE and eBGP peer
(config-router)# bgp dmzlink-bw

; adds community with BW value to the ingress route from eBGP peer
(config-router)# neighbor <IP> dmzlink-bw

Multi-exit discriminator (MED)

  • optional non-transitive
  • inform peer AS, which path towards own AS is better
    • sent by ASBR
    • not transmitted over eBGP by default: discarded on AS exit if not locally originated
    • passed over iBGP by default
    • checked only for paths via same peer AS
    • not passed beyond peer AS
  • the lower, the more priority
  • 0 by default (metric in IOS)
  • addition to NRLI
    • on redistribute: = IGP metric
    • on network: = IGP metric
    • passing eBGP prefix to iBGP peer: = 0
  • change of MED triggers Update only once per 10 minutes
; for redistributed routes
(config-router)# default-metric <N>

(config-router)# bgp bestpath med missing-as-worst

; check MED for paths via different AS
(config-router)# bgp always-compare-med

Determenistic MED

  • by default entries in BGP RIB are in the order of being received
    • MED is not compared for all entries
    • undetermenistic behaviour
  • groups prefixes in BGP RIB on peer AS
    1. best routes per peer AS are selected
    2. best route out of previous best is selected
  • enabled by default in NX-OS
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| # of entry  | AS_PATH | MED |    BGP    |     RID     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+--------+
|      1      |   500   | 150 | external  | 172.16.13.1 |  | – best |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  |        |
|      2      |   100   | 200 | internal  |   1.1.1.1   |  |        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+        |
|      2      |   500   | 100 | internal  | 172.16.8.4  |           | – best via 172.16.8.4
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-----------+

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| # of entry  | AS_PATH | MED |    BGP    |     RID     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+--------+
|      1      |   500   | 150 | external  | 172.16.13.1 |  |        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  |        |
|      2      |   500   | 100 | internal  | 172.16.8.4  |  | – best |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+        |
|      2      |   100   | 200 | internal  |   1.1.1.1   |           | – best via 1.1.1.1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-----------+
(config-router)# bgp deterministic-med

Weight

  • Cisco proprietary
  • ≈ local preference, but locally significant (not distributed within AS)
  • default values:
    • received prefixes: 0
    • locally injected: 32768
(config-router)# neighbor <IP> weight <VALUE>
# show ip bgp

Atomic aggregate

  • well-known discretionary
  • added to aggregated prefix to signal the degradation of prefix precision
  • not discarded from attributes

Aggregator

  • optional transitive
  • ASN and RID of aggregating router

Accumulated IGP cost for BGP (AIBGP)

  • optional non-transitive
  • compared before AS_PATH length
  • IOS XR

MP_REACH_NLRI

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              AFI              |      SAFI     |  NH addr len  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\                                                               \
/                           Next-hop                            /
\                                                               \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   Reserved    | Prefix length |    Prefix (variable length)   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

MP_UNREACH_NLRI

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              AFI              |      SAFI     | Prefix length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\                                                               \
/                       Withdrawn prefix                        /
\                                                               \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

BGP best path selection

  • prefixes in BGP RIB are compared sequentially by default: 1 & 2 → best & 3 → best & 4 …
  • order of prefixes in BGP RIB – newly received at the top
  • selection
    1. next-hop for the prefix is reachable
    2. max weight
    3. max local preference
    4. locally injected: network/redistribute > summary
    5. lowest AS_PATH length
    6. min origin
    7. min MED
    8. eBGP > iBGP (eBGP > confed eBGP > iBGP)
    9. closest next-hop according to IGP
      • if static/connected – check not performed
      • 10k OSPF is better than 156k EIGRP
    10. ECMP for RIB if possible (up to maximum path)
    11. oldest eBGP route
    12. lowest RID
      • if route from RR, Originator ID is used instead
      • enabled separately if step 11 has to be overriden
    13. route not from RR (without Originator ID)
    14. reflected route with shorter cluster list
    15. lowest IP
      • enabled separately if step 11 has to be overriden
; 120s default, max delay between peering established and bestpath selection is run
(config-router)# bgp update-delay <sec>

; global VRF only
(config-router)# bgp bestpath igp-metric ignore

eiBGP multipath

  • enables load-balancing across eBGP and iBGP in RIB
  • does not affect bestpath selection
  • must match:
    • weight
    • local preference
    • AS_PATH or AS_PATH length (with multipath relax)
    • origin
    • MED
    • IGP metric to next-hop
(config-router)# address-family ipv4 vrf <VRF>

; across eBGP
(config-router-af)# maximum-paths <N>
(config-router-af)# maximum-paths iBGP <N>
(config-router-af)# maximum-paths eiBGP <N>

Regexp

  • special symbols:
    • . ≡ any symbol, including whitespace
    • * ≡ zero or more matches with expression
    • + ≡ one or more matches with expression
    • ? ≡ zero or single match with expression
    • ^ ≡ line start
    • $ ≡ line end
    • _ ≡ any separator
    • \ ≡ escape symbol
    • | ≡ logical OR
    • [] ≡ symbol set
  • examples:
    • _67_ ≡ via AS 67
    • ^67$ ≡ from AS 67, directly connected
    • _67$ ≡ originated from AS 67
    • ^67_ ≡ behind AS 67
    • ^& ≡ local AS
    • .* ≡ any string
# show ip bgp regexp <REGEXP>

Authentication

  • TCP MD5 authC option
  • no encryption
(config-router)# neighbor <IP> password <PASS>

QoS policy propagation via BGP (QPPB)

  • CEF required
  • injects IPP or qos-group information into RIB and FIB
  • QoS is based on packet source/destination address on ingress
    • destination has more priority
    • mark packet with IPP or qos-group
(config-route-map)# set ip qos-group <N>
(config-route-map)# set ip precedence <N>
(config-router-af)# table-map <MAP>
(config-if)# bgp-policy source|destination ip-qos-map|ip-prec-map
# show ip route <PREFIX>
# show ip cef <PREFIX> detail

Aggregation

  • adds route to summary via Null0
  • adds ATOMIC_AGGREGATE attribute without AS_SET
  • aggregate inherits:
    • highest local preference
    • highest origin
  • does not inherit:
    • MED
    • cluster list
    • originator ID
    • community
; agregating IGP prefixes inherits parameters of the most specific route
; adds sumamry route to Null0
(config-router)# aggregate-address <PREFIX> [summary-only]

; removes some more specific routes from Update
(config-router)# aggregate-address <PREFIX> suppress-map <MAP>

; which prefixes to use to form summary
(config-router)# aggregate-address <PREFIX> advertise-map <MAP>

; set attributes
(config-router)# aggregate-address <PREFIX> attribute-map <MAP>

; inherits communities and ASNs
(config-router)# aggregate-address <PREFIX> as-set

; off default
; redistributed: up to classful boundary, no subnets in BGP RIB
; network: inserts classful only if BGP RIB has more specific route, 
; network must have classful mask
(config-router)# auto-summary

; EIGRP leak-map alternative, matched prefixes ignore summary-only of aggregate
(config-router)# neighbor <IP> unsuppress-map <MAP>

; announces prefixes from ADV incase BGP RIB has prefixes, matching MAP
; prefixes, not matching ADV, are announced as usual
(config-router)# neighbor <IP> advertise-map <ADV> exist-map <MAP>

Redistribution

; include default on redistribution from static, EIGRP, RIP
(config-router)# default-information originate

; advertise default to peer, does not have to be in BGP RIB
(config-router)# neighbor <IP> default-originate
; sets tag as AS_PATH, by default on BGP → OSPF tag = peer ASN
(config-route-map)# set as-path tag

; MED = IGP metric to nexthop, no route cost!
(config-route-map)# set metric-type internal

Defaults

  • OSPF: internal routes only
  • iBGP routes are not redistributed to IGP
; permit iBGP → IGP
(config-router)# bgp redistribute-internal

Confederation

  • divides AS into sub-AS
    • within sub-AS – confederation iBGP
      • full-mesh
    • between sub-AS – confederation eBGP
      • announces sub-AS iBGP routes to other sub-AS
      • TTL = 1
      • the rest behaviour is according to iBGP rules
        • exchanges MED, local preference
        • does not change nexthop
  • loop protection
    • split horizon is active within sub-AS
    • AS_PATH + AS_CONFED_SEQUENCE (segment type) + AS_CONFED_SET (set of ASN)
    • confederation internal ASN in AS_PATH ≡ loop
  • iBGP AS_PATH: {65500, 65035}, 700, 600
    • confederation length is not accounted in AS_PATH length (= 0)
    • confederation ASN are removed from AS_PATH before sending outside of confederation
  • MED
    • not compared for internal confederation prefixes by default (no external ASN in AS_PATH) between different sub-AS
    • compared if AS_PATH_CONFED is empty
  • private ASN inside confederation
    • avoid dropping valid routes with same AS in AS_PATH as sub-AS
; global ASN, sub-ASN ≡ BGP process ID
(config-router)# bgp confederation identifier <ASN>

; sub-AS list
(config-router)# bgp confederation peers <SUB_ASN> ...

; compare MED for routes from confederation, by default – only via same external AS
(config-router)# bgp bestpath med confed

Route reflector

  • relaxes iBGP full-mesh requirement
  • roles:
    • RR server: hub
    • client: spoke
    • non-client
  • preserve information about RR clusters along the Update
  • Originator ID
    • optional non-transitive
    • RID of the peer that announced the prefix
    • added by RR before transmitting over iBGP
    • if prefix with own Originator ID is received – reject
  • Cluster list
    • optional non-transitive
    • cluster ID is added by RR before transmitting over iBGP
    • if prefix with has own ID in cluster list – reject ⇒ loop protection between clusters
    • link between clusters – high cost
    • selection:
      • manual
      • BGP RR RID
  • RR does not change next-hop even with next-hop-self
    • can be set by route-map
    • can be overriden by next-hop-self all
  • backreflection
    • RR sends Update, received from client, towards clients and non-clients
    • update group of sender client contains at least one other peer
  • suboptimal routing
    • RR conceals topology information
    • RR route selection influences options on other speakers (RR sends only bestpath)
    • RR should be placed according to physical topology to avoid suboptimal routing and loops
SourceAnnounce to clientAnnounce to non-client
client++
non-client+
eBGP++
(config-router)# bgp cluster-id <ID>

; RR config
(config-router)# neighbor <IP> route-reflector-client

; disable passing client routes to clients, pass to iBGP/eBGP peers only
(config-router)# no bgp client-to-client reflection

; disable intra and inter-cluster exchange
(config-router)# no bgp client-to-client reflection all

; restrict within cluster
(config-router)# no bgp client-to-client reflection intra-cluster cluster-id <ID>
# show ip bgp update-group# show ip bgp cluster-ids

Persistent route oscillation

  • RFC 3345
  • solutions:
    • inter-cluster link with high IGP cost
    • do not accept MED
    • always compare MED (including different AS)
    • use path attributes with more priority for path engineering

Prefix independent convergence (PIC)

  • installs next best into BGP RIB, RIB, FIB along with best ⇒ on failure switchover does not require BGP RIB processing
  • local significance
  • CEF recursion
    • slows down PIC on RR:
      • backup nexthop is already calculated via BGP in lieu of CEF
      • no need to search backup nexthop for primary nexthop in RIB among directly connected
    • not set for iBGP
    • set for IPv4, VPNv4
  • no support for mcast, L2VPN
  • overrides local convergence feature
  • per VRF (VRF AF mode) or for all VRF (VPN AF mode)
  • enabled by default in NX-OS
(config-router)# bgp additional-paths install

; disables recursion for /32 and directly connected
(config-router)# no bgp recursion host 
; RIB with backup paths
# show ip route repair-paths

Addpath

  • capability 69 ⇒ disruptive, required session reset
  • replaces implicit Withdraw per prefix, allows several paths per prefix from same peer
  • different paths have different Path ID (4 bytes), prepended to NLRI
  • iBGP only
(config-router-af)# bgp additional-paths receive|send [receive]

; defines prefixes enabled globally for addpath:
; N bestpaths, 1 bestpath per nexthop AS, backup has more priority
(config-router-af)# bgp additional-paths select all
(config-router-af)# bgp additional-paths select best <N>
(config-router-af)# bgp additional-paths select group-best
(config-router-af)# bgp additional-paths select best-external [backup]

; negotiate capability
(config-router-af)# neighbor <IP> additional-paths ...

; defines what to announce
(config-router-af)# neighbor <IP> advertise additional-paths all
(config-router-af)# neighbor <IP> advertise additional-paths best <N>
(config-router-af)# neighbor <IP> advertise additional-paths group-best
(config-router-af)# neighbor <IP> advertise additional-paths best-external

; no support forin, because advertise-set – internal entity
(config-router-af)# neighbor <IP> route-map <MAP> out
(config-route-map)# match additional-paths advertise-set best <N>
(config-route-map)# match additional-paths advertise-set best-range <N> <M>
(config-route-map)# match additional-paths advertise-set group-best [best <N>] 

Addpath capability (0x45)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Capabil Code  | Capab length  |              AFI              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      SAFI     |  Send/Receive |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Send/Receive:

  • 1: receive
  • 2: send
  • 3: both

Diverse path

  • RR towards clients
  • shadow RR announces backup path or some from multipath instead of best
  • can be performed with single RR with two sessions
(config-router)# bgp bestpath igp-metric ignore
(config-router)# bgp additional-paths select [backup]
(config-router)# bgp additional-paths install

(config-router)# neighbor <IP> route-reflector-client

; for RR clients, sends backup
(config-router)# neighbor <IP> advertise diverse-path [backup] [mpath]

; for non-clients, sends best eBGP path
(config-router)# neighbor <IP> advertise best-external

Graceful restart

  • RFC 4724
  • capability 64
  • GR-aware: understands GR, continue forwarding
  • GR-capable: dual RP
  • non-stop routing
    • alternative to NSF, seamless switchover
    • routing information copies on both RP
    • more resource consumption
    • no need for support on peer
  • without NSF on TCP resetup:
    1. new BGP session
    2. BGP session collision
    3. BGP notification, until hold timer expires
  • NSF on TCP resetup:
    • closes local stale TCP session
    • switch to new TCP session without Notification
; enable NSF
(config-router)# bgp graceful-restart

; 120s default, waitfor session to be reestablished
(config-router)# bgp graceful-restart restart-time <RST>

; 360s default, lifetime for stale routes after session is reestablished
(config-router)# bgp graceful-restart stalepath-time <ST>
(config-router-stmp)# ha-mode graceful-restart

Graceful restart capability (0x40)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Capabil Code  | Capab length  |R| Rsvd  | Restart time (sec)  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+
|              AFI              |      SAFI     |    AF flags   |  | AFI information entry
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+           

R: restart flag, 1 ≡ peer restarted, can send routes
AF flags:

  • 0x80: forwarding state preserved

BGP wedgie

  • RFC 4264
  • non-deteministic behaviour, depends on Update ordering
  • root causes
    • policy enforcement in transit AS
    • only best path is announced
  • solution for primary/backup scenario – conditional route advetisement
    • track static /32 peer address
    • static is tied to track, track is tied to ICMP SLA
    • peer is down → static /32 is down → announce to other peer