QoS

  1. Scheduling
    1. WFQ
    2. WDRR
  2. Congestion avoidance
    1. WRED
  3. Congestion management
    1. Policing
      1. Single rate 3-color marking (srTCM)
      2. Two-rate 3-color marking (trTCM)
    2. Shaping
  4. Design
    1. Loss
    2. Delay
    3. Jitter
    4. Asymmetric routing
    5. Strategy
      1. 4-class
      2. 8-class
      3. 12-class
  5. DSCP
  6. Pre-classify
  7. Network-based application recognition (NBAR)
  8. DC QoS
    1. Link-level flow control (PAUSE)
    2. Priority-based flow control (PFC)
    3. Enhanced transmission selection
    4. Exchange protocol
    5. Congestion notification
  9. AutoQoS
  10. MQC
  11. QoS group
  12. Legacy
    1. Committed access rate (CAR)
    2. MLS QoS
    3. SRR
    4. Selective packet discard (SPD)

Scheduling

  • strict priority
  • round-robin: packet count
  • weighted round-robin: packet count
  • weighted fair queuing (WFQ): byte count
    • ideal bandwidth distribution
    • computationally complex: O(log(N)), N – number of queues
  • weighted deficit round robin (WDRR): byte count
    • easier than WFQ, similar result
    • tokens are allocated to every queue on each iteration
    • packet is sent, if its size is smaller than number of tokens

WFQ

WDRR

Congestion avoidance

  • methods:
    • tail drop: susceptible to TCP sync + starvation
    • head drop: susceptible to TCP sync + starvation
    • weighted random early discard (non-LLQ)
; 64 default, limit packets in queue
(config-pmap-c)# queue-limit <N>

; weighted tail drop: for DSCP = N drop, when queue is M% full
(config-pmap-c)# queue-limit dscp <N> percent <M>

; off default, if protocol not specified – compress both
(config-pmap-c)# compression header ip [tcp|rtp]

WRED

  • mark denominator probability = n: when queue reaches threshold, 1/n packets are dropped
  • not suitable for drop-sensitive traffic
; const (weight) for calculating average queue size
(config-pmap-c)# random-detect exponential-weighting-constant <N>

; LOW: lower threshold, WRED is in effect if crossed
; HIGH: upper threshold, tail drop if crossed
; MARK: mark denominator probability, 10 default
(config-pmap-c)# random-detect dscp <DSCP> <LOW> <HIGH> <MARK>

; IPP-based by default
(config-pmap-c)# random-detect [dscp-based]

Congestion management

  • methods:
    • FIFO, round-robin (class-default – FIFO)
    • priority queuing (PQ): susceptible to starvation
    • weighted fair queuing (WFQ), weighted round robin (WRR)
      • per flow (L3 + L4)
      • IP precedence based
      • IOS: just marketing, no IPP-based balancing
    • class-based WFQ (CB-WFQ)
      • CEF required
      • 256 queues
    • CB-WFQ + LLQ
      • LLQ ≡ PQ + policing
      • FIFO within LLQ
      • if registered BW in priority exceeds available physical BW – suspend policy
    • DWRR, DWRR + PQ
  • unallocated bandwidth for a class is distributed evenly across active queues
  • congestion ≡ HW queue is full
; flow-based FQ, no IPP mapping
(config-pmap-c)# fair-queue

; BW for class in CB-WFQ, ≈ weight
(config-pmap-c)# bandwidth <kbps>|percent <%>|remain percent <%>

; enable LLQ for class (one queue for all), implicitly policed on congestion
(config-pmap-c)# priority <kbps>|percent <%>

; off default, estimates BW so no more than 1/N packets drops happen,
; no more than 1/M packets have delay, exceeding DELAY
(config-pmap-c)# estimate bandwidth drop-in-one <N> [delay-one-in <M> milliseconds <DELAY>]
; statistics on packets and bytes over load-interval
# show policy-map interface <INTF>

Policing

  • ingress, egress
  • causes TCP retransmits
  • supports marking, remarking
  • less buffer utilization compared to shaping
  • does not support port-channel, tunnel (GRE – exception, class-default only)
  • adds tokens on packet arrival, token amount is proportional to idle time
  • defaults
    • Tc = 250 ms
    • Tc = 200 ms for PQ
  • if packet size exceeds Bc – ignore policing
; mark transport header
(config-pmap-c-police)# *-action set-dscp-tunnel-transmit <VALUE>

Single rate 3-color marking (srTCM)

  • allows bursts: temporarily exceed CIR if previously idle
  • commit bucket size (CBS) and exceed bucket size (EBS)

Two-rate 3-color marking (trTCM)

  • allow to exceed CIR on a constant basis – peak information rate (PIR)
    • overlow is marked correspondingly

Shaping

  • egress
  • reduces number of TCP retransmits
  • increases delay, jitter
  • does not support remarking, marking, etherchannel
  • defaults
    • ≤ 320 kbps: Bc = 8000 bits = Be
    • > 320 kbps: Bc = Be, Tc = 25ms
  • tokens are added on timer
  • if packet size exceeds Bc – ignore shaping
  • Bc on PE for policer should not drop below 50% ⇒ CIR match, shaper Bc = ½ policer Bc
; single-rate
(config-pmap-c)# shape average <kbps> <Bc> <Be>
(config-pmap-c)# shape average percent <N> <Bc> ms <Be> ms

; single-rate, adds Bc+Be tokens after Tc, rate = (Bc+Be)/(Bc/normal_shape_rate)
(config-pmap-c)# shape peak <kbps> <Bc> <Be>

Design

  • priority queue (EF)
    • not more than 33% of BW
    • no WRED
    • admission control
  • best effort (BE)
    • 25% BW
  • BW allocation – 75% of physical speed, because L2 overhead is not accounted (expect LLQ)
  • WRED lower threshold:
    • AFx3: 60%
    • AFx2: 70%
    • AFx1: 80%
  • WRED upper threshold: 100%
  • remark excess traffic to Scavenger
    • protect against worm spread ≡ congestion
    • large Bc to reduce effect on legitimate traffic
  • buffer size if proportional to BW
    • exception – PQ, because it does not need deep queue
  • divide TCP and UDP into different classes

Loss

  • interactive: ≤ 0.1%
  • voice, video: ≤ 1%

Delay

  • Windows TCP delay tolerance – 9s
  • real-time, voice, video: < 150 ms
  • reduce delay: policing instead of shaping, Tc = 10 ms (Bc = CIR/100)
  • components
    • serialization delay:
      • fixed
      • L2 → L1
    • propagation delay:
      • fixed
      • increased by repeater and amplifier
    • queuing:
      • targeted by QoS

Jitter

  • real-time, voice, video: < 30 ms

Asymmetric routing

  • return traffic does not pass active firewall
    • exceptions: Active-Active failover, ASR group
  • more delay and more jitter: one of paths is longer
    • critical for VoIP, video
  • out-of-order packets → drop (e.g., by RTP) for real-time traffic

Strategy

4-class

ClassDSCPBW allocationDataFeatures
Real timeEF33%voice
Signalling/CUCS37%
TransactionsAF3135%business-critical
Best effortDF25%WRED

8-class

ClassDSCPBW allocationFeatures
VoiceEF10%PQ (implicit policing), no WRED
Interactive videoAF4123%PQ (implicit policing), no WRED
Streaming videoAF3110%WRED
Network controlCS65%
SignallingCS32%
Transactional dataAF2x24%WRED
Best effortDF25%WRED
ScavengerCS11%WRED not needed

12-class

ClassDSCPBW allocationFeaturesData
VoiceEF10%PQ (implicit policing), no WRED
Broadcast videoCS510%PQ (implicit policing), no WRED
Realtime interactiveCS413%PQ (implicit policing), no WRED
Multimedia conferencingAF4x10%PQ (implicit policing), no WRED
Multimedia streamingAF3x10%WRED
Network controlCS62%IGP, IKE, HSRP
SignallingCS32%SIP, H.323, SCCP
OAMCS23%SSH, SNMP, Syslog
Transactional dataAF2x10%WRED, low latencyERP, CRM, DB
Bulk dataAF1x4%WRED, high BWNAS, e-mail, FTP, backup
Best effortDF25%WRED
ScavengerCS11%WRED not neededBittorrent, YouTube, XBox

DSCP

  • format: IPP(3) + delay(1) + throughput(1) + reliability (1)
    • delay + throughput ≡ drop probability
    • IP precedence (IPP)
      • 0: routine
      • 1: priority
      • 2: immediate
      • 3: flash
      • 4: flash override
      • 5: critical
      • 6: internetwork control, reserved
      • 7: network control, reserved
  • explicit congestion notification (ECN)
    • values
      • 00: not ECN-capable
      • 01, 10: ECN-capable
      • 11: congeestion experienced
    • IOS: WRED uses ECN
      • mark 11 instead of drop, if 10/01 set
  • classes
    • class selector (CS)
      • DSCP = XXX000, IPP compatibility
    • best effort (BE)
      • DF, 000XXX
    • expedited forwarding (EF)
      • DSCP = 101XX0 (usually 46), XX – drop probability (11 – 0%, 00 – 100%)
      • low-delay service, guaranteed BW
      • mapped to LLQ, if BW is exceeded – police
    • assured forwarding (AF)
      • DSCP = ZZZXX0
        • ZZZ = 001, 010, 011, 100
        • XX – drop probability (01 ≡ low, 10 ≡ medium, 11 ≡ high)
      • AFxy: x = IPP, y = drop probability
    • lower effort (LE)
      • BW constrained: YouTube, Bittorrent
  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
|Precedence |      unused       |
+---+---+---+---+---+---+---+---+

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+
|          DSCP         |  ECN  |
+---+---+---+---+---+---+---+---+

Pre-classify

  • tunnels copy inner ToS to outer ToS by default
  • classify is broken by NBAR
  • pre-classify applies policy using inner packet before encapsulation (header copy in RAM)
; mark transport header
(config-pmap-c)# set dscp tunnel <DSCP>
; GRE, IPIP, L2TP (virtual-template)
(config-if)# qos pre-classify
; IPsec
(config-crypto-map)# qos pre-classify

Network-based application recognition (NBAR)

  • DPI implementation, statistics collection and analysis
  • CPU-intensive
  • mandatory: CEF
  • no support: SSO, mcast, MPLS, fragmented packets, self-originated traffic, tunnels, pipelined HTTP
  • no required for match in route-map
  • IPv4 and IPv6 only
  • symmetric flows only
  • thresholds
    • not recommended to exceed 70% of max flows
    • if 100% flows is reached: Syslog, all extra flows → class Unknown
; custom NBAR template
(config)# ip nbar custom <NAME> <OFFSET> ascii|hex|dec <VALUE> [source|dest] tcp|udp <PORT>
(config)# flow record <RECORD>

; key field for flow record
(config-flow-record)# match ipv4|ipv6 destination|source address
(config-flow-record)# match transport <SRC_PORT>|tcp|udp
(config-cmap)# match flow record <RECORD>
(config-cmap)# match not ...
(config-cmap)# match protocol <PROTOCOL>

; category-based NBAR, macro
(config-cmap)# match protocol attribute <CATEGORY> <APPLICATION>
(config-cmap)# match protocol unknown
(config-if)# ip nbar protocol-discovery
# show ip nbar protocol-discovery [top-n <N>]

; how NBAR distingished the protocol
# show ip nbar portmap

; which attributes are matched against in class-map and route-map
# show ip nbar protocol-attribute <PROTOCOL>

; max allowed flows
# show ip nbar resources flow

DC QoS

  • IEEE 802.3x
  • sends PAUSE frame to peer ≡ stop transmitting all frames (does not account for queues)
    • PAUSE includes time to delay, 0 ≡ resume transmission
    • dst MAC: 0180.c200.0001
    • Ethertype = 0x8808 (MAC control, control code = 0x0001)

Priority-based flow control (PFC)

  • IEEE 802.1Qbb
  • enhanced PAUSE, lossless Ethernet
  • 8 priorities, time to pause Tx
  • dst MAC: 0180.c200.0001
  • EtherType = 0x8808 (MAC control, control code = 0x0101)

Enhanced transmission selection

  • IEEE 802.1Qaz
  • scheduler within priority
  • drop-free

Exchange protocol

  • IEEE 802.1Qaz
  • LLDP extension
  • feature negotiation: PFC is used only if supported on both ends
  • switch configures CNA

Congestion notification

  • IEEE 802.1Qau
  • backward congestion notification (BCN) is send back on congestion

AutoQoS

  • AutoQoS for VoIP
    • discovers VoIP phones on access ports via CDP, trusts their QoS
    • trusts QoS marking on trunk
    • EF queue: RTP, RTCP, IGP, ingress BPDU, egress voice
    • BE effort treatment unless specified otherwise
    • creates CoS-DSCP mapping
    • routers:
      • ≤ 768 kbps: enables compression and fragmentation, LFI for PPP
      • shaping
  • AutoQoS for Enterprise
    • routers
    • addition to autoQoS for VoIP, designed for WAN
    • requires CEF, uses NBAR
AutoQoS ClassMarkingData
RoutingCS6EIGRP, OSPF
VoIPEFRTP voice
Interactive videoAF41RTP video
Streaming videoCS4
ControlCS3RTCP, SIP, H.323
TransactionalAF21SSH, Telnet, SAP, Citrix
BulkAF11FTP, SMTP, POP3, Exchange
ScavengerCS1
ManagementCS2SNMP, Syslog, DHCP, DNS
OtherDF
; access switchport
(config-if)# auto qos voip cisco-phone|cisco-softphone

; switchport trunk uplink, router port
(config-if)# auto qos voip trust

; classify traffic using NBAR, AutoQoS for Enterprise
(config-if)# auto discovery qos [trust]

; generate and apply template, AutoQoS for Enterprise
(config-if)# auto qos
# show auto qos
# show mls qos
# show policy-map interface <INTF>

; NBAR statistics
# show auto discovery qos

MQC

  • 256 classes per policy-map
  • tunnels do not use policy on physical interfaces
  • class-default defaults:
    • FIFO, tail-drop
    • reserved 1% BW
  • nested class allows combining match-any and match-all
  • if BW is not specified, remaining BW is distributed evenly between such classes
  • subint requires parent policy with shape/police, because subint BW = intf BW ⇒ explicit limitation required, otherwise all policies align with interface BW
    • subint has no native congestion management mechanisms
  • does not rewrite CoS or DSCP on untrusted ports by default
  • by default CoS = 5 is mapped to DSCP 40 (not 46 for EF) ⇒ table-map for ports with CoS trust
; DSCP-CoS, EXP-DSCP, EXP-QoS group-DSCP
(config)# table-map <MAP> map from <N> to <M>
; DSCP → CoS
(config-pmap-c)# set cos dscp table <MAP>

QoS group

  • internal marking of traffic
  • useful on encapsulation demarcation point

Legacy

Committed access rate (CAR)

  • 2-color policer
  • IPP, MAC, IP ACL
  • default action: pass
  • searches matching policy sequentially
  • up to 100 policies per interface
  • does not police Be
  • CEF required
  • no support for port-channel and tunnels
; 0-99: IPP ACL, 100-199: MAC ACL, 200-299: EXP ACL
(config)# access-list rate-limit <N>
; continue ≡ goto next policy, Be(real) = Be - Bc
(config-if)# rate-limit input|output <bps> <Bc> <Be> conform <ACTION> exceed <ACTION>
(config-if)# rate-limit input|output dscp <N> <bps> ...
(config-if)# rate-limit input|output qos-group <N> <bps> ...
(config-if)# rate-limit input|output access-group <ACL_N> <bps> ...
(config-if)# rate-limit input|output access-group rate-limit <RATE_ACL> <bps> ...

MLS QoS

  • disabled by default
  • untrusted port remarks DSCP to 0
  • rewrites packet DSCP with internal DSCP on egress interface
    • DSCP transparency: disable this behaviour
  • policy on logical port-channel: ingress (4500, 6500)
  • policy on member port-channel: egress (2960, 3560, 3750, 4500, 6500), ingress (2960, 3560, 3750)
  • C3750 cannot configure class-default, egress buffers – pooled
  • policy-map: set dscp instead of trust utilizes less HW resources
  • 1P1Q4T ≡ 1 priority queue, 1 non-priority queue with 4 WTD thresholds
(config)# mls qos

; DSCP transparency
(config)# no mls qos rewrite ip dscp

; CoS = n → DSCP mapping
(config)# mls qos map cos-dscp 0 8 16 24 32 46 48 56
(config)# mls qos map dscp-mutation <NAME> <INPUT_DSCP> <OUTPUT_DSCP>
(config-if)# mls qos trust cos|dscp

; apply trust cos or trust dscp only if suitable device is discovered via CDP
(config-if)# mls qos trust device <TYPE>

; service-policy on SVI is applied to whole VLAN
(config-if)# mls qos vlan-based

; has effect only on ingress
(config-if)# mls qos dscp-mutation <NAME>
# show mls qos
# show mls qos map cos-dscp
# show mls qos map interface [<INTF>] [policers] [queuing]

SRR

  • shared round robin, shaped round robin
  • weighted tail drop (WTD) uses internal DSCP to determine Tn and queue
    • T3 = 100% always
  • 2 ingress queues (WTD), 4 egress queues (queue 1 ≡ LLQ)
  • BW ≡ weight, not an absolute value
  • if other queues are empty
    • shared uses all BW
    • shaped uses only configured BW (even if no congestion)
    • priority > shaped > shared
  • egress interface belong to queue-set (2 in total)
(config)# mls qos srr-queue input priority-queue 1|2 bandwidth <kbps>

; CoS = 5 into queue 2 by default
(config)# mls qos srr-queue input cos-map|dscp-map queue 1|2 <VALUES>

; 90% and 10% default, queue 2 – priority
(config)# mls qos srr-queue input buffers <QUEUE1> <QUEUE2>

; 4/4 default, weights
(config)# mls qos srr-queue input bandwidth <kbps1> <kbps2>

; 100 default, WTD for T1 and T2
(config)# mls qos srr-queue input threshold 1|2 <TH1> <TH2>

; on reaching threshold of queue depth – drop
(config)# mls qos srr-queue input cos-map|dscp-map threshold 1|2 <VALUE>
(config)# mls qos queue-set output 1|2 buffers <Q1> <Q2> <Q3> <Q4>

; WTD for T1 and T2, MAX_TH taked buffer from pool
(config)# mls qos queue-set output 1|2 threshold <QUEUE_ID> <TH1> <TH2> <RESV_TH> <MAX_TH> 
; BW = 25 by default
(config-if)# srr-queue bandwidth shape|share <BW1> <BW2> <BW3> <BW4>
# show mls qos maps cos-input-q|dscp-input-q

; remarking table for exceed action (X ≡ I, Y ≡ II digits in DSCP)
# show mls qos maps policed-dscp

Selective packet discard (SPD)

  • interface → process switching queue
  • all commands are hidden
  • queue processing priority:
    1. Extended Headroom: L2 keepalive, IGP, FHRP, ARP, LDP
    2. Headroom: BGP Update
    3. Interface input queue: SNMP, SSH
    • if there is no room – write into next queue
    • FIFO with strict priority + WRED
  • threshold is calculated from minimal queue depth from queues from all interfaces; if exceeded on one interface – action on all interfaces
  • modes:
    1. regular: default
    2. aggressive:
      • default for hardware-based platforms
      • drop packets that generate ICMP error, if number exceeds min-threshold
; enabled by default
(config)# spd enable

(config)# spd headroom <pkts>
(config)# spd extended-headroom <pkts>

(config)# ip|ipv6 spd mode aggressive

; disables autocalculation; SPD action – per interface
(config)# ip|ipv6 spd queue min-threshold <pkts>
(config)# ip|ipv6 spd queue max-threshold <pkts>
; 75 by default, has to be changed on all interfaces
(config-if)# hold-queue <pkt> in|out
# show ip|ipv6 spd