Physics

Reflection

types
1. direct sound
2. early reflection
  - 30-40ms delay after direct
  - not distinguished by brain
3. echo
  - corehent late reflection
  - always present in analog signal: Rx/Tx crosstalk, port impedance
4. reverberation
  - incoherent late reflection
  - bad for voice
  - useful for music
reverberation time: time to decrease down to -60 dB
- affecting factors
  - shape of a room
  - size of a room
  - objects in a room
  - humidity
- increased by: wooden doors
- decreased by: carpets, curtains
noice reduction coefficient (NRC)
- 0 ≡ complete reflection
- 1 ≡ sound does not get through
- average of sound absorption coefficients on 250, 500, 1000 MHz
- 0.75 for acoustic panels

Background noise

should not exceed 45 dB
sources: HVAC, lighting bulbs, noise through walls, street noise, airplanes, trains, subway

Room design

thick curtains on windows
acoustic panels on walls
acoustic dropped ceiling
rubber mats on tables as pedestals
soft floor cover
plants

Hardware

Speaker

types
- active
  - built-in amplifier, line-level
  - has to be powered
- passive
  - requires external amplifier
  - unshielded cable is enough
  - more flexible setup

Cable

types
- unbalanced
  - data and ground
  - carries noise as well
  - ground – screen around data core
  - 4-6 m
- balanced
  - +data, -data and ground
  - 2×data = (+data + noise) – (-data + noise)
  - ground – screen around ±data core

Connector

RCA
- unbalanced
- analog TV (yellow, white, red), white+red ≡ stereo sound
tip-sleeve (TS)
- unbalanced
- 1 ring ≡ Jack
tip-ring-sleeve (TRS)
- balanced
- 2 rings
XLR
- balanced

Microphone

has to have a gap to amplifier max level to avoid clipping (cutting wave amplitude top)
- hard clipping: digital sound, ≡ distortion
- soft clipping: analog sound, ≡ overdrive, bass-guitars
automatic gain control (AGC): amplification tuning (e.g., conference microphones)
specifications
- dynamic range: amplitude/frequency that can be sensed
- frequency response: frequency range
- polar pattern
types
- passive
  - ≡ dynamic
  - does not require power
  - membrane with permanent magnet
- active
  - ≡ condenser
  - electric field is changed due to vibration of condenser’s plates
phantom power: const DC level
- can power active mic
- has no effect on passive mic
form-factors
- handheld: usually dynamic
- lapel: ≡ tie-clip, usually active and directional
- podium: ≡ desktop, usually active
- ceiling-mounted: ≡ choir, active, high sensitivity
- boundary: ≡ PZM (pressure zone mic), active, tabletop, low-profile
direction
- directional: higher sensitivity
- omnidirectional: susceptible to reverberation
distance factor: (distance to record SOUND with mic)/(distance to record SOUND with omnidirectional mic)
critical distance: (distance to target mic)/(distance to closest mic)
- 1:3 ≡ 3m to target mic, 9m to any other mic

Acoustic echo canceller (AEC)

deducts the output signal from the input signal, output signal is run through non-linear processing (NLP) or sigital signalling processor (DSP)
DSP models room to calculate echo delay
limitation:
- cannot handle acoustic anomalies
- depends on room acoustics
- speaker must be close to mic (≈ 0.5m)
- network delay < 200ms
if several AEC-capable devices are chained (e.g., mic and amplifier), AEC must be enabled only on single device: otherwise, second NLP receives first NLP output, not pure input signal
echo tail: time that AEC waits to receive echo on receiver

Sampling

1 bit depth increase ≡ 6 dB SNR increase in model, where noise affects LSB bits
22 bit depth – max human sensitivity

Phones

Cisco 7800

voice-only
PoE-capable
hold, call forward, call transfer
CUCM, Webex, Expressway MRA + CUCM (VPN-less registration)
monochrome display
power
- power save: disable screen and button highlighting after inactivity
- power save plus: scheduled screen shutdown

Cisco 8800

HD video, voice
coloured display
Bluetooth, Wi-Fi
mobile voice (MV): move voice from mobile to 8800

Public switched telephone network (PSTN)

Plain old telephony system (POTS)

call routing – Class 5 switch

Private branch exchange (PBX)

call routing in circuit-switched networks (ISDN, POTS) from Enterprise side
functions
- call hold, call transfer, call waiting, call return
- conferencing
- voice mail
- auto-attendant

Direct current signalling

permanently holds trunk even without active call
types
- subscriber loop: off-hook → DC current flows (-48 V)
- E&M: recEive & transMit, uses two wires (E and M) for signalling

Single frequency

2600 Hz tone ≡ on-hook/off-hook event
inter-office trunks
- bypass user facilities with blue box and call for free (billing only on 1st switch)
multi-frequency: several tones are available, system-dependent

Dual-tone multifrequency signalling

tone signalling for button phones (disk phones use loop disconnect)
relay methods
1. named telephony event (NTE)
  - RFC 2833
  - inband in RTP (distinguished from RTP by payload type)
2. key press markup language (KPML)
  - RFC 4730
  - SUP SUBSCRIBE to register for DTMF
  - SIP NOTIFY to receive numbers as XML payload
3. unsolicited notify (UN)
  - transport numbers in SIP NOTIFY without registration
  - 10-symbol blocks
4. H.245 signal
  - H.323
  - explicit signalling
5. H.245 alphanumeric
  - H.323
  - explicit signalling
  - transmits less info compared to H.245 signal
6. Cisco proprietary RTP
  - ≈ NTE
7. SCCP
  - explicit signalling
4×4 keypad: button pressed ≡ send 2 tones

	1209 Hz	1336 Hz	1477 Hz	1633 Hz
697 Hz	1	2	3	A
770 Hz	4	5	6	B
852 Hz	7	8	9	C
941 Hz	*	0	#	D

CLASS

PSTN, SS7
features
- customer-originated trace: dial code after harassing call ≡ call police
- camp on: automatic callback, if called is busy
- automatic recall: auto-call
- display features: caller name, caller number
- call screening: ACL on numbers

PSTN VPN

VoIP

VVID – voice VLAN ID
PVID – port VLAN ID (common traffic), native VLAN for multi-VLAN access port
phone is not physically protected (server room, locked cabinet) ≡ QoS trust boundary is moved outside of controlled space
- protection: 802.1x, traffic policing, multi-VLAN access port
≈ 106 kbps per call
sample size is fixed = 50 pps
EF PHB

Real-time transport protocol (RTP)

RFC 3550
UDP, even ports
end-to-end media streaming
timestamps
- jitter compensation
- packet loss detection
- out-of-order packets detection

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC  |M| Payload Type  |        Sequence number        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Timestamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            Synchronization source identifier (SSRC)           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              Contributing Source Identifier (CSRC)            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

P: padding

1: last byte = byte count in padding

X: extension, 1 ≡ extension header present
CC: CSRC count
Payload type

0: PCMU
3: GSM
4: G.723
9: G.722
15: G.728
18: G.729
26: JPEG
31: H.261
32: MPV
34: H.263

SSRC:

unique within session
source ID: microphone, camera, etc.

CSRC:

extra source IDs (e.g., SSRC = mixer, CSRC = active mic)

RTP control protocol (RTCP)

RFC 3550
UDP, RTP port + 1
flow monitoring, flow synchronization

Voice VLAN

access ports only (trunk can be configured, not supported)
- modes
  1. multi-VLAN access port, if negotiation over CDPv2 is successful
  2. trunk port, if negotiation is successful over LLDP (no required info for multi-VLAN in MED)
    - native VLAN ≡ configured access VLAN
    - can be used along with port-security
- disabled by default
- PortFast is auto-enabled; when voice VLAN is disabled, PortFast in NOT disabled
Cisco CoS (RFC is reverse) defaults:
- CoS = 5: voice
- CoS = 3: voice control
required to accept tagged frames on access ports: CoS, platform-dependent
modes
1. vlan-id
  - trunk
  - voice in VVID
  - data in PVID ≡ access VLAN
2. dot1p
  - trunk
  - voice in VLAN 0
  - data in PVID
3. untagged
  - trunk
  - voice in PVID
  - data in PVID
4. none
  - not trunk
  - all data in access VLAN
  - default

(config-if)# switchport voice vlan <MODE>

Codecs

mean opinion score (MOS): subjective user perception of audio quality
voice activity detection (VAD)
- stop sending packets after hangover pause (≈ 200 ms)
- does not work if background noise is present
- speech start is clipped

G.114

150ms one-way delay

G.711

packet loss concealment (PLC)
- if sample is lost – repeat last sample with less amplitude
- up to 20 ms loss
BW per call
- min: 87.2 kbps
- recommended: 128 kbps
pulse-code modulation (PCM)
300-3400 Hz
uncompressed
mandatory for H.320 (circuit-switched) and H.323 (packet-switched)

G.722

64 kbps
uncompressed, lossy compressed
50-7000 Hz

G.728

16 kbps

G.729

BW per call: 8 kbps
loss tolerance: 5%
lossy, compressed
up to 4 kHz
VAD

Advanced audio codec low delay (AAC-LD)

low overhead MPEG-4 audio transport multiplex (LATM)
48-128 kbps

Internet low bitrate codec (iLBC)

Internet speech audio codec (iSAC)

10-32 kbps bitrate on peak activity
compressed
mode
- channel-adaptive
  - lossy
  - adapts to channel dynamic state
  - in-band signalling
- channel-independent
  - lossless
  - bitrate is fixed in advance

Codec features

Codec	Sample	Interval	Payload	MOS	Sampling frequency
G.711	80 bytes	10 ms	160 bytes	4.1	8 kHz
G.722	80 bytes	10 ms	160 bytes	4.1	16 kHz
G.723 (6.3 kbps)	24 bytes	30 ms	24 bytes	3.9
G.723 (5.3 kbps)	20 bytes	30 ms	20 bytes	3.8
G.726 (32 kbps)	20 bytes	5 ms	80 bytes	3.8
G.726 (24 kbps)	15 bytes	5 ms	60 bytes	3.8
G.728	10 bytes	5 ms	60 bytes	3.6
G.729	10 bytes	10 ms	20 bytes	3.9
AAC-LD					20 kHz
iLBC (15.2 kbps)	38 bytes	20 ms	38 bytes	4.1
iLBC (13.3 kbps)	50 bytes	30 ms	50 bytes	4.1
iSAC		30/60 ms			16 kHz

Networking and IT

Uncovering the Why

Voice

Physics

Reflection

Background noise

Room design

Hardware

Speaker

Cable

Connector

Microphone

Acoustic echo canceller (AEC)

Sampling

Phones

Cisco 7800

Cisco 8800

Public switched telephone network (PSTN)

Plain old telephony system (POTS)

Private branch exchange (PBX)

Direct current signalling

Single frequency

Dual-tone multifrequency signalling

CLASS

PSTN VPN

VoIP

Real-time transport protocol (RTP)

RTP control protocol (RTCP)

Voice VLAN

Codecs

G.114

G.711

G.722

G.728

G.729

Advanced audio codec low delay (AAC-LD)

Internet low bitrate codec (iLBC)

Internet speech audio codec (iSAC)

Codec features