Physics
Reflection
- types
- direct sound
- early reflection
- 30-40ms delay after direct
- not distinguished by brain
- echo
- corehent late reflection
- always present in analog signal: Rx/Tx crosstalk, port impedance
- reverberation
- incoherent late reflection
- bad for voice
- useful for music
- reverberation time: time to decrease down to -60 dB
- affecting factors
- shape of a room
- size of a room
- objects in a room
- humidity
- increased by: wooden doors
- decreased by: carpets, curtains
- affecting factors
- noice reduction coefficient (NRC)
- 0 ≡ complete reflection
- 1 ≡ sound does not get through
- average of sound absorption coefficients on 250, 500, 1000 MHz
- 0.75 for acoustic panels
Background noise
- should not exceed 45 dB
- sources: HVAC, lighting bulbs, noise through walls, street noise, airplanes, trains, subway
Room design
- thick curtains on windows
- acoustic panels on walls
- acoustic dropped ceiling
- rubber mats on tables as pedestals
- soft floor cover
- plants
Hardware
Speaker
- types
- active
- built-in amplifier, line-level
- has to be powered
- passive
- requires external amplifier
- unshielded cable is enough
- more flexible setup
- active
Cable
- types
- unbalanced
- data and ground
- carries noise as well
- ground – screen around data core
- 4-6 m
- balanced
- +data, -data and ground
- 2×data = (+data + noise) – (-data + noise)
- ground – screen around ±data core
- unbalanced
Connector
- RCA
- unbalanced
- analog TV (yellow, white, red), white+red ≡ stereo sound
- tip-sleeve (TS)
- unbalanced
- 1 ring ≡ Jack
- tip-ring-sleeve (TRS)
- balanced
- 2 rings
- XLR
- balanced
Microphone
- has to have a gap to amplifier max level to avoid clipping (cutting wave amplitude top)
- hard clipping: digital sound, ≡ distortion
- soft clipping: analog sound, ≡ overdrive, bass-guitars
- automatic gain control (AGC): amplification tuning (e.g., conference microphones)
- specifications
- dynamic range: amplitude/frequency that can be sensed
- frequency response: frequency range
- polar pattern
- types
- passive
- ≡ dynamic
- does not require power
- membrane with permanent magnet
- active
- ≡ condenser
- electric field is changed due to vibration of condenser’s plates
- passive
- phantom power: const DC level
- can power active mic
- has no effect on passive mic
- form-factors
- handheld: usually dynamic
- lapel: ≡ tie-clip, usually active and directional
- podium: ≡ desktop, usually active
- ceiling-mounted: ≡ choir, active, high sensitivity
- boundary: ≡ PZM (pressure zone mic), active, tabletop, low-profile
- direction
- directional: higher sensitivity
- omnidirectional: susceptible to reverberation
- distance factor: (distance to record SOUND with mic)/(distance to record SOUND with omnidirectional mic)
- critical distance: (distance to target mic)/(distance to closest mic)
- 1:3 ≡ 3m to target mic, 9m to any other mic
Acoustic echo canceller (AEC)
- deducts the output signal from the input signal, output signal is run through non-linear processing (NLP) or sigital signalling processor (DSP)
- DSP models room to calculate echo delay
- limitation:
- cannot handle acoustic anomalies
- depends on room acoustics
- speaker must be close to mic (≈ 0.5m)
- network delay < 200ms
- if several AEC-capable devices are chained (e.g., mic and amplifier), AEC must be enabled only on single device: otherwise, second NLP receives first NLP output, not pure input signal
- echo tail: time that AEC waits to receive echo on receiver
Sampling
- 1 bit depth increase ≡ 6 dB SNR increase in model, where noise affects LSB bits
- 22 bit depth – max human sensitivity
Phones
Cisco 7800
- voice-only
- PoE-capable
- hold, call forward, call transfer
- CUCM, Webex, Expressway MRA + CUCM (VPN-less registration)
- monochrome display
- power
- power save: disable screen and button highlighting after inactivity
- power save plus: scheduled screen shutdown
Cisco 8800
- HD video, voice
- coloured display
- Bluetooth, Wi-Fi
- mobile voice (MV): move voice from mobile to 8800
Public switched telephone network (PSTN)
Plain old telephony system (POTS)
- call routing – Class 5 switch
Private branch exchange (PBX)
- call routing in circuit-switched networks (ISDN, POTS) from Enterprise side
- functions
- call hold, call transfer, call waiting, call return
- conferencing
- voice mail
- auto-attendant
Direct current signalling
- permanently holds trunk even without active call
- types
- subscriber loop: off-hook → DC current flows (-48 V)
- E&M: recEive & transMit, uses two wires (E and M) for signalling
Single frequency
- 2600 Hz tone ≡ on-hook/off-hook event
- inter-office trunks
- bypass user facilities with blue box and call for free (billing only on 1st switch)
- multi-frequency: several tones are available, system-dependent
Dual-tone multifrequency signalling
- tone signalling for button phones (disk phones use loop disconnect)
- relay methods
- named telephony event (NTE)
- RFC 2833
- inband in RTP (distinguished from RTP by payload type)
- key press markup language (KPML)
- RFC 4730
- SUP SUBSCRIBE to register for DTMF
- SIP NOTIFY to receive numbers as XML payload
- unsolicited notify (UN)
- transport numbers in SIP NOTIFY without registration
- 10-symbol blocks
- H.245 signal
- H.323
- explicit signalling
- H.245 alphanumeric
- H.323
- explicit signalling
- transmits less info compared to H.245 signal
- Cisco proprietary RTP
- ≈ NTE
- SCCP
- explicit signalling
- named telephony event (NTE)
- 4×4 keypad: button pressed ≡ send 2 tones
1209 Hz | 1336 Hz | 1477 Hz | 1633 Hz | |
---|---|---|---|---|
697 Hz | 1 | 2 | 3 | A |
770 Hz | 4 | 5 | 6 | B |
852 Hz | 7 | 8 | 9 | C |
941 Hz | * | 0 | # | D |
CLASS
- PSTN, SS7
- features
- customer-originated trace: dial code after harassing call ≡ call police
- camp on: automatic callback, if called is busy
- automatic recall: auto-call
- display features: caller name, caller number
- call screening: ACL on numbers
PSTN VPN
VoIP
- VVID – voice VLAN ID
- PVID – port VLAN ID (common traffic), native VLAN for multi-VLAN access port
- phone is not physically protected (server room, locked cabinet) ≡ QoS trust boundary is moved outside of controlled space
- protection: 802.1x, traffic policing, multi-VLAN access port
- ≈ 106 kbps per call
- sample size is fixed = 50 pps
- EF PHB
Real-time transport protocol (RTP)
- RFC 3550
- UDP, even ports
- end-to-end media streaming
- timestamps
- jitter compensation
- packet loss detection
- out-of-order packets detection
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| Payload Type | Sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Synchronization source identifier (SSRC) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Contributing Source Identifier (CSRC) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P: padding
- 1: last byte = byte count in padding
X: extension, 1 ≡ extension header present
CC: CSRC count
Payload type
- 0: PCMU
- 3: GSM
- 4: G.723
- 9: G.722
- 15: G.728
- 18: G.729
- 26: JPEG
- 31: H.261
- 32: MPV
- 34: H.263
SSRC:
- unique within session
- source ID: microphone, camera, etc.
CSRC:
- extra source IDs (e.g., SSRC = mixer, CSRC = active mic)
RTP control protocol (RTCP)
- RFC 3550
- UDP, RTP port + 1
- flow monitoring, flow synchronization
Voice VLAN
- access ports only (trunk can be configured, not supported)
- modes
- multi-VLAN access port, if negotiation over CDPv2 is successful
- trunk port, if negotiation is successful over LLDP (no required info for multi-VLAN in MED)
- native VLAN ≡ configured access VLAN
- can be used along with port-security
- disabled by default
- PortFast is auto-enabled; when voice VLAN is disabled, PortFast in NOT disabled
- modes
- Cisco CoS (RFC is reverse) defaults:
- CoS = 5: voice
- CoS = 3: voice control
- required to accept tagged frames on access ports: CoS, platform-dependent
- modes
- vlan-id
- trunk
- voice in VVID
- data in PVID ≡ access VLAN
- dot1p
- trunk
- voice in VLAN 0
- data in PVID
- untagged
- trunk
- voice in PVID
- data in PVID
- none
- not trunk
- all data in access VLAN
- default
- vlan-id
(config-if)# switchport voice vlan <MODE>
Codecs
- mean opinion score (MOS): subjective user perception of audio quality
- voice activity detection (VAD)
- stop sending packets after hangover pause (≈ 200 ms)
- does not work if background noise is present
- speech start is clipped
G.114
- 150ms one-way delay
G.711
- packet loss concealment (PLC)
- if sample is lost – repeat last sample with less amplitude
- up to 20 ms loss
- BW per call
- min: 87.2 kbps
- recommended: 128 kbps
- pulse-code modulation (PCM)
- 300-3400 Hz
- uncompressed
- mandatory for H.320 (circuit-switched) and H.323 (packet-switched)
G.722
- 64 kbps
- uncompressed, lossy compressed
- 50-7000 Hz
G.728
- 16 kbps
G.729
- BW per call: 8 kbps
- loss tolerance: 5%
- lossy, compressed
- up to 4 kHz
- VAD
Advanced audio codec low delay (AAC-LD)
- low overhead MPEG-4 audio transport multiplex (LATM)
- 48-128 kbps
Internet low bitrate codec (iLBC)
Internet speech audio codec (iSAC)
- 10-32 kbps bitrate on peak activity
- compressed
- mode
- channel-adaptive
- lossy
- adapts to channel dynamic state
- in-band signalling
- channel-independent
- lossless
- bitrate is fixed in advance
- channel-adaptive
Codec features
Codec | Sample | Interval | Payload | MOS | Sampling frequency |
---|---|---|---|---|---|
G.711 | 80 bytes | 10 ms | 160 bytes | 4.1 | 8 kHz |
G.722 | 80 bytes | 10 ms | 160 bytes | 4.1 | 16 kHz |
G.723 (6.3 kbps) | 24 bytes | 30 ms | 24 bytes | 3.9 | |
G.723 (5.3 kbps) | 20 bytes | 30 ms | 20 bytes | 3.8 | |
G.726 (32 kbps) | 20 bytes | 5 ms | 80 bytes | 3.8 | |
G.726 (24 kbps) | 15 bytes | 5 ms | 60 bytes | 3.8 | |
G.728 | 10 bytes | 5 ms | 60 bytes | 3.6 | |
G.729 | 10 bytes | 10 ms | 20 bytes | 3.9 | |
AAC-LD | 20 kHz | ||||
iLBC (15.2 kbps) | 38 bytes | 20 ms | 38 bytes | 4.1 | |
iLBC (13.3 kbps) | 50 bytes | 30 ms | 50 bytes | 4.1 | |
iSAC | 30/60 ms | 16 kHz |