computationally complex: O(log(N)), N – number of queues
weighted deficit round robin (WDRR): byte count
easier than WFQ, similar result
tokens are allocated to every queue on each iteration
packet is sent, if its size is smaller than number of tokens
WFQ
WDRR
Congestion avoidance
methods:
tail drop: susceptible to TCP sync + starvation
head drop: susceptible to TCP sync + starvation
weighted random early discard (non-LLQ)
; 64 default, limit packets in queue
(config-pmap-c)# queue-limit <N>
; weighted tail drop: for DSCP = N drop, when queue is M% full
(config-pmap-c)# queue-limit dscp <N> percent <M>
; off default, if protocol not specified – compress both
(config-pmap-c)# compression header ip [tcp|rtp]
WRED
mark denominator probability = n: when queue reaches threshold, 1/n packets are dropped
not suitable for drop-sensitive traffic
; const (weight) for calculating average queue size
(config-pmap-c)# random-detect exponential-weighting-constant <N>
; LOW: lower threshold, WRED is in effect if crossed
; HIGH: upper threshold, tail drop if crossed
; MARK: mark denominator probability, 10 default
(config-pmap-c)# random-detect dscp <DSCP> <LOW> <HIGH> <MARK>
; IPP-based by default
(config-pmap-c)# random-detect [dscp-based]
Congestion management
methods:
FIFO, round-robin (class-default – FIFO)
priority queuing (PQ): susceptible to starvation
weighted fair queuing (WFQ), weighted round robin (WRR)
per flow (L3 + L4)
IP precedence based
IOS: just marketing, no IPP-based balancing
class-based WFQ (CB-WFQ)
CEF required
256 queues
CB-WFQ + LLQ
LLQ ≡ PQ + policing
FIFO within LLQ
if registered BW in priority exceeds available physical BW – suspend policy
DWRR, DWRR + PQ
unallocated bandwidth for a class is distributed evenly across active queues
congestion ≡ HW queue is full
; flow-based FQ, no IPP mapping
(config-pmap-c)# fair-queue
; BW for class in CB-WFQ, ≈ weight
(config-pmap-c)# bandwidth <kbps>|percent <%>|remain percent <%>
; enable LLQ for class (one queue for all), implicitly policed on congestion
(config-pmap-c)# priority <kbps>|percent <%>
; off default, estimates BW so no more than 1/N packets drops happen,
; no more than 1/M packets have delay, exceeding DELAY
(config-pmap-c)# estimate bandwidth drop-in-one <N> [delay-one-in <M> milliseconds <DELAY>]
; statistics on packets and bytes over load-interval
# show policy-map interface <INTF>
Policing
ingress, egress
causes TCP retransmits
supports marking, remarking
less buffer utilization compared to shaping
does not support port-channel, tunnel (GRE – exception, class-default only)
adds tokens on packet arrival, token amount is proportional to idle time
defaults
Tc = 250 ms
Tc = 200 ms for PQ
if packet size exceeds Bc – ignore policing
; mark transport header
(config-pmap-c-police)# *-action set-dscp-tunnel-transmit <VALUE>
Single rate 3-color marking (srTCM)
allows bursts: temporarily exceed CIR if previously idle
commit bucket size (CBS) and exceed bucket size (EBS)
Two-rate 3-color marking (trTCM)
allow to exceed CIR on a constant basis – peak information rate (PIR)
overlow is marked correspondingly
Shaping
egress
reduces number of TCP retransmits
increases delay, jitter
does not support remarking, marking, etherchannel
defaults
≤ 320 kbps: Bc = 8000 bits = Be
> 320 kbps: Bc = Be, Tc = 25ms
tokens are added on timer
if packet size exceeds Bc – ignore shaping
Bc on PE for policer should not drop below 50% ⇒ CIR match, shaper Bc = ½ policer Bc
; single-rate
(config-pmap-c)# shape average <kbps> <Bc> <Be>
(config-pmap-c)# shape average percent <N> <Bc> ms <Be> ms
; single-rate, adds Bc+Be tokens after Tc, rate = (Bc+Be)/(Bc/normal_shape_rate)
(config-pmap-c)# shape peak <kbps> <Bc> <Be>
Design
priority queue (EF)
not more than 33% of BW
no WRED
admission control
best effort (BE)
25% BW
BW allocation – 75% of physical speed, because L2 overhead is not accounted (expect LLQ)
WRED lower threshold:
AFx3: 60%
AFx2: 70%
AFx1: 80%
WRED upper threshold: 100%
remark excess traffic to Scavenger
protect against worm spread ≡ congestion
large Bc to reduce effect on legitimate traffic
buffer size if proportional to BW
exception – PQ, because it does not need deep queue
divide TCP and UDP into different classes
Loss
interactive: ≤ 0.1%
voice, video: ≤ 1%
Delay
Windows TCP delay tolerance – 9s
real-time, voice, video: < 150 ms
reduce delay: policing instead of shaping, Tc = 10 ms (Bc = CIR/100)
components
serialization delay:
fixed
L2 → L1
propagation delay:
fixed
increased by repeater and amplifier
queuing:
targeted by QoS
Jitter
real-time, voice, video: < 30 ms
Asymmetric routing
return traffic does not pass active firewall
exceptions: Active-Active failover, ASR group
more delay and more jitter: one of paths is longer
critical for VoIP, video
out-of-order packets → drop (e.g., by RTP) for real-time traffic
(config)# flow record <RECORD>
; key field for flow record
(config-flow-record)# match ipv4|ipv6 destination|source address
(config-flow-record)# match transport <SRC_PORT>|tcp|udp
(config-cmap)# match flow record <RECORD>
(config-cmap)# match not ...
(config-cmap)# match protocol <PROTOCOL>
; category-based NBAR, macro
(config-cmap)# match protocol attribute <CATEGORY> <APPLICATION>
(config-cmap)# match protocol unknown
(config-if)# ip nbar protocol-discovery
# show ip nbar protocol-discovery [top-n <N>]
; how NBAR distingished the protocol
# show ip nbar portmap
; which attributes are matched against in class-map and route-map
# show ip nbar protocol-attribute <PROTOCOL>
; max allowed flows
# show ip nbar resources flow
DC QoS
Link-level flow control (PAUSE)
IEEE 802.3x
sends PAUSE frame to peer ≡ stop transmitting all frames (does not account for queues)
PAUSE includes time to delay, 0 ≡ resume transmission
dst MAC: 0180.c200.0001
Ethertype = 0x8808 (MAC control, control code = 0x0001)
Priority-based flow control (PFC)
IEEE 802.1Qbb
enhanced PAUSE, lossless Ethernet
8 priorities, time to pause Tx
dst MAC: 0180.c200.0001
EtherType = 0x8808 (MAC control, control code = 0x0101)
Enhanced transmission selection
IEEE 802.1Qaz
scheduler within priority
drop-free
Exchange protocol
IEEE 802.1Qaz
LLDP extension
feature negotiation: PFC is used only if supported on both ends
switch configures CNA
Congestion notification
IEEE 802.1Qau
backward congestion notification (BCN) is send back on congestion
AutoQoS
AutoQoS for VoIP
discovers VoIP phones on access ports via CDP, trusts their QoS
trusts QoS marking on trunk
EF queue: RTP, RTCP, IGP, ingress BPDU, egress voice
BE effort treatment unless specified otherwise
creates CoS-DSCP mapping
routers:
≤ 768 kbps: enables compression and fragmentation, LFI for PPP
shaping
AutoQoS for Enterprise
routers
addition to autoQoS for VoIP, designed for WAN
requires CEF, uses NBAR
AutoQoS Class
Marking
Data
Routing
CS6
EIGRP, OSPF
VoIP
EF
RTP voice
Interactive video
AF41
RTP video
Streaming video
CS4
Control
CS3
RTCP, SIP, H.323
Transactional
AF21
SSH, Telnet, SAP, Citrix
Bulk
AF11
FTP, SMTP, POP3, Exchange
Scavenger
CS1
Management
CS2
SNMP, Syslog, DHCP, DNS
Other
DF
; access switchport
(config-if)# auto qos voip cisco-phone|cisco-softphone
; switchport trunk uplink, router port
(config-if)# auto qos voip trust
; classify traffic using NBAR, AutoQoS for Enterprise
(config-if)# auto discovery qos [trust]
; generate and apply template, AutoQoS for Enterprise
(config-if)# auto qos
# show auto qos
# show mls qos
# show policy-map interface <INTF>
; NBAR statistics
# show auto discovery qos
MQC
256 classes per policy-map
tunnels do not use policy on physical interfaces
class-default defaults:
FIFO, tail-drop
reserved 1% BW
nested class allows combining match-any and match-all
if BW is not specified, remaining BW is distributed evenly between such classes
subint requires parent policy with shape/police, because subint BW = intf BW ⇒ explicit limitation required, otherwise all policies align with interface BW
subint has no native congestion management mechanisms
does not rewrite CoS or DSCP on untrusted ports by default
by default CoS = 5 is mapped to DSCP 40 (not 46 for EF) ⇒ table-map for ports with CoS trust
; DSCP-CoS, EXP-DSCP, EXP-QoS group-DSCP
(config)# table-map <MAP> map from <N> to <M>
; DSCP → CoS
(config-pmap-c)# set cos dscp table <MAP>
(config-if)# mls qos trust cos|dscp
; apply trust cos or trust dscp only if suitable device is discovered via CDP
(config-if)# mls qos trust device <TYPE>
; service-policy on SVI is applied to whole VLAN
(config-if)# mls qos vlan-based
; has effect only on ingress
(config-if)# mls qos dscp-mutation <NAME>
# show mls qos
# show mls qos map cos-dscp
# show mls qos map interface [<INTF>] [policers] [queuing]
SRR
shared round robin, shaped round robin
weighted tail drop (WTD) uses internal DSCP to determine Tn and queue
threshold is calculated from minimal queue depth from queues from all interfaces; if exceeded on one interface – action on all interfaces
modes:
regular: default
aggressive:
default for hardware-based platforms
drop packets that generate ICMP error, if number exceeds min-threshold
; enabled by default
(config)# spd enable
(config)# spd headroom <pkts>
(config)# spd extended-headroom <pkts>
(config)# ip|ipv6 spd mode aggressive
; disables autocalculation; SPD action – per interface
(config)# ip|ipv6 spd queue min-threshold <pkts>
(config)# ip|ipv6 spd queue max-threshold <pkts>
; 75 by default, has to be changed on all interfaces
(config-if)# hold-queue <pkt> in|out