Voice

  1. Physics
    1. Reflection
    2. Background noise
    3. Room design
  2. Hardware
    1. Speaker
    2. Cable
    3. Connector
    4. Microphone
    5. Acoustic echo canceller (AEC)
    6. Sampling
    7. Phones
      1. Cisco 7800
      2. Cisco 8800
  3. Public switched telephone network (PSTN)
    1. Plain old telephony system (POTS)
    2. Private branch exchange (PBX)
    3. Direct current signalling
    4. Single frequency
    5. Dual-tone multifrequency signalling
    6. CLASS
    7. PSTN VPN
  4. VoIP
    1. Real-time transport protocol (RTP)
    2. RTP control protocol (RTCP)
    3. Voice VLAN
  5. Codecs
    1. G.114
    2. G.711
    3. G.722
    4. G.728
    5. G.729
    6. Advanced audio codec low delay (AAC-LD)
    7. Internet low bitrate codec (iLBC)
    8. Internet speech audio codec (iSAC)
    9. Codec features

Physics

Reflection

  • types
    1. direct sound
    2. early reflection
      • 30-40ms delay after direct
      • not distinguished by brain
    3. echo
      • corehent late reflection
      • always present in analog signal: Rx/Tx crosstalk, port impedance
    4. reverberation
      • incoherent late reflection
      • bad for voice
      • useful for music
  • reverberation time: time to decrease down to -60 dB
    • affecting factors
      • shape of a room
      • size of a room
      • objects in a room
      • humidity
    • increased by: wooden doors
    • decreased by: carpets, curtains
  • noice reduction coefficient (NRC)
    • 0 ≡ complete reflection
    • 1 ≡ sound does not get through
    • average of sound absorption coefficients on 250, 500, 1000 MHz
    • 0.75 for acoustic panels

Background noise

  • should not exceed 45 dB
  • sources: HVAC, lighting bulbs, noise through walls, street noise, airplanes, trains, subway

Room design

  • thick curtains on windows
  • acoustic panels on walls
  • acoustic dropped ceiling
  • rubber mats on tables as pedestals
  • soft floor cover
  • plants

Hardware

Speaker

  • types
    • active
      • built-in amplifier, line-level
      • has to be powered
    • passive
      • requires external amplifier
      • unshielded cable is enough
      • more flexible setup

Cable

  • types
    • unbalanced
      • data and ground
      • carries noise as well
      • ground – screen around data core
      • 4-6 m
    • balanced
      • +data, -data and ground
      • 2×data = (+data + noise) – (-data + noise)
      • ground – screen around ±data core

Connector

  • RCA
    • unbalanced
    • analog TV (yellow, white, red), white+red ≡ stereo sound
  • tip-sleeve (TS)
    • unbalanced
    • 1 ring ≡ Jack
  • tip-ring-sleeve (TRS)
    • balanced
    • 2 rings
  • XLR
    • balanced

Microphone

  • has to have a gap to amplifier max level to avoid clipping (cutting wave amplitude top)
    • hard clipping: digital sound, ≡ distortion
    • soft clipping: analog sound, ≡ overdrive, bass-guitars
  • automatic gain control (AGC): amplification tuning (e.g., conference microphones)
  • specifications
    • dynamic range: amplitude/frequency that can be sensed
    • frequency response: frequency range
    • polar pattern
  • types
    • passive
      • ≡ dynamic
      • does not require power
      • membrane with permanent magnet
    • active
      • ≡ condenser
      • electric field is changed due to vibration of condenser’s plates
  • phantom power: const DC level
    • can power active mic
    • has no effect on passive mic
  • form-factors
    • handheld: usually dynamic
    • lapel: ≡ tie-clip, usually active and directional
    • podium: ≡ desktop, usually active
    • ceiling-mounted: ≡ choir, active, high sensitivity
    • boundary: ≡ PZM (pressure zone mic), active, tabletop, low-profile
  • direction
    • directional: higher sensitivity
    • omnidirectional: susceptible to reverberation
  • distance factor: (distance to record SOUND with mic)/(distance to record SOUND with omnidirectional mic)
  • critical distance: (distance to target mic)/(distance to closest mic)
    • 1:3 ≡ 3m to target mic, 9m to any other mic

Acoustic echo canceller (AEC)

  • deducts the output signal from the input signal, output signal is run through non-linear processing (NLP) or sigital signalling processor (DSP)
  • DSP models room to calculate echo delay
  • limitation:
    • cannot handle acoustic anomalies
    • depends on room acoustics
    • speaker must be close to mic (≈ 0.5m)
    • network delay < 200ms
  • if several AEC-capable devices are chained (e.g., mic and amplifier), AEC must be enabled only on single device: otherwise, second NLP receives first NLP output, not pure input signal
  • echo tail: time that AEC waits to receive echo on receiver

Sampling

  • 1 bit depth increase ≡ 6 dB SNR increase in model, where noise affects LSB bits
  • 22 bit depth – max human sensitivity

Phones

Cisco 7800

  • voice-only
  • PoE-capable
  • hold, call forward, call transfer
  • CUCM, Webex, Expressway MRA + CUCM (VPN-less registration)
  • monochrome display
  • power
    • power save: disable screen and button highlighting after inactivity
    • power save plus: scheduled screen shutdown

Cisco 8800

  • HD video, voice
  • coloured display
  • Bluetooth, Wi-Fi
  • mobile voice (MV): move voice from mobile to 8800

Public switched telephone network (PSTN)

Plain old telephony system (POTS)

  • call routing – Class 5 switch

Private branch exchange (PBX)

  • call routing in circuit-switched networks (ISDN, POTS) from Enterprise side
  • functions
    • call hold, call transfer, call waiting, call return
    • conferencing
    • voice mail
    • auto-attendant

Direct current signalling

  • permanently holds trunk even without active call
  • types
    • subscriber loop: off-hook → DC current flows (-48 V)
    • E&M: recEive & transMit, uses two wires (E and M) for signalling

Single frequency

  • 2600 Hz tone ≡ on-hook/off-hook event
  • inter-office trunks
    • bypass user facilities with blue box and call for free (billing only on 1st switch)
  • multi-frequency: several tones are available, system-dependent

Dual-tone multifrequency signalling

  • tone signalling for button phones (disk phones use loop disconnect)
  • relay methods
    1. named telephony event (NTE)
      • RFC 2833
      • inband in RTP (distinguished from RTP by payload type)
    2. key press markup language (KPML)
      • RFC 4730
      • SUP SUBSCRIBE to register for DTMF
      • SIP NOTIFY to receive numbers as XML payload
    3. unsolicited notify (UN)
      • transport numbers in SIP NOTIFY without registration
      • 10-symbol blocks
    4. H.245 signal
      • H.323
      • explicit signalling
    5. H.245 alphanumeric
      • H.323
      • explicit signalling
      • transmits less info compared to H.245 signal
    6. Cisco proprietary RTP
      • ≈ NTE
    7. SCCP
      • explicit signalling
  • 4×4 keypad: button pressed ≡ send 2 tones
1209 Hz1336 Hz1477 Hz1633 Hz
697 Hz123A
770 Hz456B
852 Hz789C
941 Hz*0#D

CLASS

  • PSTN, SS7
  • features
    • customer-originated trace: dial code after harassing call ≡ call police
    • camp on: automatic callback, if called is busy
    • automatic recall: auto-call
    • display features: caller name, caller number
    • call screening: ACL on numbers

PSTN VPN

VoIP

  • VVID – voice VLAN ID
  • PVID – port VLAN ID (common traffic), native VLAN for multi-VLAN access port
  • phone is not physically protected (server room, locked cabinet) ≡ QoS trust boundary is moved outside of controlled space
    • protection: 802.1x, traffic policing, multi-VLAN access port
  • ≈ 106 kbps per call
  • sample size is fixed = 50 pps
  • EF PHB

Real-time transport protocol (RTP)

  • RFC 3550
  • UDP, even ports
  • end-to-end media streaming
  • timestamps
    • jitter compensation
    • packet loss detection
    • out-of-order packets detection
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC  |M| Payload Type  |        Sequence number        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Timestamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            Synchronization source identifier (SSRC)           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              Contributing Source Identifier (CSRC)            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

P: padding

  • 1: last byte = byte count in padding

X: extension, 1 ≡ extension header present
CC: CSRC count
Payload type

  • 0: PCMU
  • 3: GSM
  • 4: G.723
  • 9: G.722
  • 15: G.728
  • 18: G.729
  • 26: JPEG
  • 31: H.261
  • 32: MPV
  • 34: H.263

SSRC:

  • unique within session
  • source ID: microphone, camera, etc.

CSRC:

  • extra source IDs (e.g., SSRC = mixer, CSRC = active mic)

RTP control protocol (RTCP)

  • RFC 3550
  • UDP, RTP port + 1
  • flow monitoring, flow synchronization

Voice VLAN

  • access ports only (trunk can be configured, not supported)
    • modes
      1. multi-VLAN access port, if negotiation over CDPv2 is successful
      2. trunk port, if negotiation is successful over LLDP (no required info for multi-VLAN in MED)
        • native VLAN ≡ configured access VLAN
        • can be used along with port-security
    • disabled by default
    • PortFast is auto-enabled; when voice VLAN is disabled, PortFast in NOT disabled
  • Cisco CoS (RFC is reverse) defaults:
    • CoS = 5: voice
    • CoS = 3: voice control
  • required to accept tagged frames on access ports: CoS, platform-dependent
  • modes
    1. vlan-id
      • trunk
      • voice in VVID
      • data in PVID ≡ access VLAN
    2. dot1p
      • trunk
      • voice in VLAN 0
      • data in PVID
    3. untagged
      • trunk
      • voice in PVID
      • data in PVID
    4. none
      • not trunk
      • all data in access VLAN
      • default
(config-if)# switchport voice vlan <MODE>

Codecs

  • mean opinion score (MOS): subjective user perception of audio quality
  • voice activity detection (VAD)
    • stop sending packets after hangover pause (≈ 200 ms)
    • does not work if background noise is present
    • speech start is clipped

G.114

  • 150ms one-way delay

G.711

  • packet loss concealment (PLC)
    • if sample is lost – repeat last sample with less amplitude
    • up to 20 ms loss
  • BW per call
    • min: 87.2 kbps
    • recommended: 128 kbps
  • pulse-code modulation (PCM)
  • 300-3400 Hz
  • uncompressed
  • mandatory for H.320 (circuit-switched) and H.323 (packet-switched)

G.722

  • 64 kbps
  • uncompressed, lossy compressed
  • 50-7000 Hz

G.728

  • 16 kbps

G.729

  • BW per call: 8 kbps
  • loss tolerance: 5%
  • lossy, compressed
  • up to 4 kHz
  • VAD

Advanced audio codec low delay (AAC-LD)

  • low overhead MPEG-4 audio transport multiplex (LATM)
  • 48-128 kbps

Internet low bitrate codec (iLBC)

Internet speech audio codec (iSAC)

  • 10-32 kbps bitrate on peak activity
  • compressed
  • mode
    • channel-adaptive
      • lossy
      • adapts to channel dynamic state
      • in-band signalling
    • channel-independent
      • lossless
      • bitrate is fixed in advance

Codec features

CodecSampleIntervalPayloadMOSSampling frequency
G.71180 bytes10 ms160 bytes4.18 kHz
G.72280 bytes10 ms160 bytes4.116 kHz
G.723 (6.3 kbps)24 bytes30 ms24 bytes3.9
G.723 (5.3 kbps)20 bytes30 ms20 bytes3.8
G.726 (32 kbps)20 bytes5 ms80 bytes3.8
G.726 (24 kbps)15 bytes5 ms60 bytes3.8
G.72810 bytes5 ms60 bytes3.6
G.72910 bytes10 ms20 bytes3.9
AAC-LD20 kHz
iLBC (15.2 kbps)38 bytes20 ms38 bytes4.1
iLBC (13.3 kbps)50 bytes30 ms50 bytes4.1
iSAC30/60 ms16 kHz