Architecture

planes
- control:
  - central authority nodes
  - web-based UI
  - PAC servers: DNS round-robin LB
  - content distribution servers
  - DLP processing: ICAP from SE
  - API nodes
  - behavioral analysis
  - SCIM servers
  - root DNS
  - datasets storage cluster
  - CASB
  - SMTP
- data
  - IXL peering
  - SSL accelaration cards
  - Service Edge: cluster
    - enforcement node (SE)
    - intra-cluster load-balancers
    - inter-cluster load-balancing
    - ZT 2.0 terminators (SVPN)
    - IPsec terminators (ZVPN)
  - customized network stack (smnet)
    - no context switching
    - zero copying
    - unified queues
    - remove unused constructs (e.g. UNIX sockets)
    - in memory, no swap
    - DMA
- logging
  - cloud log router (CLR): GeoIP
  - statistical manager (SM): store logs
a server is dedicated to a cloud and role
- serves several tenants

Central Authority (CA)

authC
- hosted DB: credentials store
- LDAP: initiate user bind
- SAML: validate SAML Response
- Kerberos: KDC
monitoring: pull and push models
5 CAs:
- active/active cluster: write to leader, read from any
- Raft
caching CAs: faster lookup
- do not participate in authC
- do not become leaders
- not counted towards quorum
- locations
  1. same as CAs
  2. global: Amsterdam, Frankfurt, Zurich, Tokyo, Sydney
RAM only storage: indexed
policy retrieval
- tokenized info: IDs only (mapping only at CA)
- closest healthy CA
- policy – feature bitmap (192 bits)
- on-demand policy provisioning
- pull model
  - CA notify Service Edge about policy changes
  - compression, differential data

Service Edge

based on FreeBSD
direct server return (DSR): spoofs vIP used by client
- GRE, PAC
- impossible for IPsec, TLS due to statefulness
load-balancing over SE:
- known location: src+dst IP ≡ per session
- road warrior: src IP ≡ per user
- GRE: inner IPs are used
TCP SYN cookies
user mode processes
in memory processing
- memory allocation: on demand instead of greedy
- no persistent storage for data

Private Service Edge (PSE)

HW cluster: in pairs
- SE3
  - 3 SE + 1 LB
  - 1 Gbps ISP, 1 Gbps NIC
  - 1.3 Gbps throughput
- SE5
  - 5 SE + 1 LB
  - 10 Gbps NIC
  - 2.5 Gbps throughput
- one-armed deployment
pre-configured
UI does not know about PSE – requires location mapping
- dummy IP config
- bypasses authC, otherwise everyone is RW
active/standby LB
- same L2 segment with SE instances
- CARP vIP
- DSR from SEs
SSL offload card: throughput + number of sessions

Virtual Service Edge (VSE)

OVA in UI
does not need location config
one-armed & two-armed deployments
interfaces:
- em0: mgmt
- em1: proxy IP
- em2: LB IP (inactive in standalone mode)
  - same subnet with proxy IP
  - AWS ELB not supported
  - Azure and GCP native LB are supported

SVPN

load-balancing over SVPN: src IP+port
- dst IP LB is done GATEWAY_FX variable
- NAT pool size: available SEs / _FX pool size
offloads (D)TLS processing from SE if available per site
Internet-facing service, not available to tunnels
if FQDN/IP is speciifed in PAC (not GATEWAY), SVPN is not included in the flow
custom GRE to internal SE vIP (load-balancing)
SE cannot do DSR, has to go back through SVPN (TLS is stateful, not synced)

ZVPN

custom GRE to internal SE vIP (load-balancing)
SE cannot do DSR, has to go back through ZVPN (IPsec is stateful, SAs not synced)

ZCC

strict enforcement
- ZIA only with ZTunnel
- denies TCP 80/443 without ZCC running
SE healthcheck:
- initial: GET URL on SE itself
  - URL from gateway.<cloudname>.net/generate_204
- keepalives
  - every 30s
  - 3 retries: exponential backoff (×2)
SVPN healthcheck:
- if GRE to SE is down, close (D)TLS connections and reject new ones

ZTunnel 1.0

terminates on SE
on-demand HTTP(S) CONNECT tunnel: proxy, adding extra headers
separate tunnels to ZTE and for policy updates
policy updates refresh rate – 60 min
recommended on trusted networks (connected with IPsec/GRE)
vNIC is created for tunnel
- tunnel is assigned IP address from 100.64.0.0/16 (IPv4) or fc00::6440:0/112 (IPv6)
- if there is a conflict with existing application, another /16 is allocated from 100.64.0.0-100.84.0.0
- 100.64.0.6 is used to check if AV/FW blocks ZCC
- configures 3 DNS servers: 100.64.0.3, 100.64.0.4, 100.64.0.5 by default
- address is allocated per destination IP or FQDN
requests are mangled by packet splitter
- inserts proxy as destination IP:port
  - IP – either vNIC (route-based) or physical NIC (packet-filter)
  - port – 9000 by default
- puts real destination IP as source IP in mangled packets
- real destination port is preserved in mapping table
- packet-filter:
  - packets from ZCC proxy are intercepted and forwarded to packet splitter (source port match, 9000 by default)
  - exception: traffic to Service Edge
authentication – proxy digest
- includes device fingerprint, requested URL, device ID, PIN
- key is stored in registry (Win) or keychain (Mac)

ZTunnel 2.0

DTLS tunneling
- channels
  - control:
    - always TLS
    - user authC (PIN), device fingerprint
    - provides session ID to data channel – data authC
  - data:
    - null encryption available
    - can fallback to TLS
      - before 2.3: create TLS data channel
      - after 2.3: tunnel on control connection
TCP, UDP, ICMP traffic
single persistent tunnel to ZTE and for policy updates
- second tunnel cannot be built to another location
policy updates – through ZTE, real time
intercepts DNS traffic towards non-RFC1918 DNS server IP
not supported on mobile devices
fallback to ZT 1.0:
- DNS not blocked
packet splitter: sends all traffic through DTLS tunnel without mangling
PAC
- DIRECT in Forwarding Profile – send via ZT2
- PROXY ${ZAPP_LOCAL_PROXY} – “downgrade” to ZT1
- DIRECT in App Profile – send directly (ZT2 already bypassed)
authentication – TLV messages within TLS between ZCC and SVPN

PAC

static IP geolocation has no impact on PAC gateway selection
variables
- GATEWAY: closest SVPN/SE vIP based on incoming IP
  - _Fn: manual selection from the pool
  - _FX: round-robin selection
    - load-balance users behind same IP over different SEs
    - ZCC only
- GATEWAY_HOST: closest SE hostname based on incoming IP
- SRCIP: incoming IP address
- COUNTRY
  - COUNTRY_GATEWAY
  - COUNTRY_GATEWAY_HOST
- ZS_CUSTOM_PORT: 9401 for China, 443 otherwise
domain detection
- Road Warrior: from user domain
- Location: one IdP per location
ports
- 80/443:
  - SSL inspection disabled for unknown location, because policy cannot be determined
  - authC required
- 9443:
  - SSL inspection enforced for unknown location, because policy cannot be determined
  - authC required
- 9400
  - extra port that accepts incoming connections
  - can be used in China to avoid registering 80/443 ports
  - can be used for forwarding decision (e.g., PBR)
- 9480
  - authC bypass if location is known
- dedicated port
  - provisioned per tenant
  - SSL inspection according to the policy
variables instead of gateway.<cloud>.net: resolution based on client IP instead of resolution by DNS server (proximity may differ)
- FQDN could be used if PAC is not supported
authentication – cookie-based
- expire after configured period
- gateway cookie: login info
  - _sm_au_c
  - cloud name
  - number of logins
- domain cookie:
  - inserted for every visited domain
  - avoids login prompt per domain
    1. if missing, user is redirected to gateway
      - HTTP 307 to /auD
      - dummy cookie _sm_au_d=1 is set for domain
      - original URL is preserved as parameter
    2. redirect to gateway
      - HTTP 307 /auT
      - dummy cookie _sm_au_d=1 is set for gateway
      - if domain cookie for gateway is present, skip next step
    3. login page
    4. redirect to gateway
      - HTTP 302 /auL
      - cookie sm_au: user ID for gateway
      - dummy cookie for gateway is expired
      - gateway cookie is inserted as well
    5. redirect to original URL with cookie as parameter
      - HTTP 307
      - set domain cookie
    6. redirect to original URL
  - dummy cookies are used to check if browser supports cookies
- AUP cookie
subclouds
- subset of DCs
- private ZENs along with public ZENs
- regional surcharge nodes

Tunnels

not recommended along with ZT 2.0
- double encapsulation: tunnel + (D)TLS
  - MTU: PMTUD available
- SVPNs not available, (D)TLS processing by SE
- all ZT 2.0 traffic for user lands on single SE (single tunnel to SE)
ZT 1.0 are recommended through tunnels
- ZT 1.0 are opened per request ≡ can be load-balanced
- tunnelling is recommended for large locations to avoid NAT
traffic to remote SEs is still processed by terminating SE
tunnels should be created in pairs
MTU and MSS should be adjusted

GRE

static IP geolocation selects the nearest GRE headend
static IP required
terminated on SE
preferred due to stateless nature
IP SLA (HTTP GET) or GRE keepalives for status monitoring
- gateway.<cloudname>.net/vpntest
- not public websites: Zscaler IPs might get blacklisted (e.g CAPTCHA for all users is enforced) → such connections are blocked
1 Gbps per tunnel (single IP; 250 Mbps if NAT enabled on client side)

IPsec

main and aggressive modes ⟹ support dynamic IP support
- Null encryption can be used as GRE alternative: VPN credentials in lieu of static IP
terminated on ZVPN
200 Mbps per tunnel (single IP)
max 8 SA for Phase 2 per tunnel: SA per direction per ACE

Dedicated port

authC is always enforced
FW control can be enabled for RW

Authentication

methods
- SAML
  - v2.0+
  - HTTP POST binding
  - IdP mapping based on user domain
- LDAP
  - LDAP sync adds users, groups, departments: daily, weekly, monthly, on demand
  - no LDAP user → user is deactivated
  - Zscaler AuthC Bridge (ZAB) to avoid opening FW to AD
- hosted DB
unauthenticated access – location-based policies
- use cases:
  - HTTP CONNECT request
  - no support for cookies
  - SSL inspection bypass
  - unknown user agent
  - guest users
- surrogate IP: map device IP to user
- policy evaluation has to be enabled explicitly
noauth-bypassurl user: authentication exemption lists
- user cannot be identified based on cookie
- ZCC is in PAC enforce mode

Device token

ZCC portal as IdP
up to 8 tokens
token can be used on several devices
username – from user device
ZIA only

Captive portal detection

query /generate_204 for HTTP 204 response code
HTTP 200 ≡ captive portal is present
hosts – captive portal URLs in config.zscaler.com
traffic is sent directly during fail open timer
- when expired – fail close
- captive portal is probed every 30s during fail open timer

Connectivity to ZTE

mobile admin portal
- defines user domain, SAML IdP
- registers device on successful SAML Response to ZIA/ZPA
types
- client connector (ZCC)
- app connector
- browser-based
forwarding mode
- ZTunnel: client connector
  1. packet-filter-based
    - Windows only
    - default for ZPA
  2. route-based:
    - via virtual adapter: next-hop is itself ≡ looping back through TCP/IP stack
    - client traffic is sourced from virtual adapter
    - ZCC sends traffic to Service Edge through physical adapter
    - FQDN bypass: resolve to IP, mend routing table with /32
    - race condition with VPN software
  3. local HTTP(S) proxy
    - proxy-aware traffic only
    - ZT 1.0 only
    - bypass traffic before ZCC receives it – Forwarding Profile
    - “downgrade” to ZT1 in AppProfile – choose DC per URL
- PAC
  - forwarding profile
    - system/browser PAC: what to bypass or route to ZCC
    - may not include ZCC in forwarding path
  - app profile:
    - ZCC PAC
    - routes traffic after client connector
    - selects closest Service Edge by default
    - HTTP headers are used to distinguish ZT version
      - ztunnelversion: 10 – ZT 1.0
      - ztunnelversion: 20 – ZT 2.0
- none
IPv6
- outer encap – IPv4 (SE is reachable only via IPv4)
  - GRE, IPsec: IPv6-over-IPv4
  - ZT 1.0: native proxy
  - ZT 2.0: NAT64 required
- IPv4 is preferred by SE
- locations have to be explicitly enabled
- NAT64 prefix discovery: DNS AAAA for ipv4only.arpa

Order of operation

Service Edge preliminary classifies whether traffic is web
non-web: client → internet
1. DPI
  - if web traffic is on non-standard port → TLS proxy
2. FW, IPS
3. NAT
4. BW control
HTTP(S): client → internet
1. TLS proxy: validity, OCSP, CRL
2. CloudApp control
3. URL policy
4. browser control
5. country-based blocking
6. IPS
7. file control: AV, file type
8. DLP
9. FW
10. BW control
11. NAT
HTTP(S): internet → client
1. BW control
2. NAT
3. FW
4. content classification: payload, request, response
5. URL policy
6. browser control
7. country-based blocking
8. IPS
9. file control: AV, file type
10. TLS proxy

Policies

policy selection:
1. user authenticated – user policy
2. user unknown
  - location known && authC disabled for location – location policy
  - location unknown – refuse connection
admin rank
- admin hierarchy: cannot override higher-ranking admin
- 0 ≡ super admin
- lower admin rank cannot have order earlier than higher rank
device OS type – ZCC-provided info

SSL inspection

mutual TLS authC not supported
ZTE – forward proxy + MITM certificate
- certificate is generated on the fly
- CA hierarchy:
  - root CA
  - intermediate CA: uploaded to Zscaler, used to sign short-lived CA
  - short-lived CA: 14 days expiry, rotated every 7 days, signs MITM certificates
- private key stored in memory
actions
- allow
- caution: alert user + allow
- block
- block with override: authC to bypass block
- isolate: redirect to container
inserts X-Forwarded-For header with originating IP address
MS O365 one-click: do no inspect Required FQDNs/IPs from MS perspective
- best practice: still inspect OneDrive and Sharepoint
SAML IdP should be exempted from SSL inspection

URL filtering

quota: BW or time
newly registered domains ≤ 30 days
unknown browsers are blocked
based on SNI (without inspection), URL (web) or MIME (inspection + non-web)

Cloud App control

more priority over URL filtering
tenant profile: restricts access to specific tenants (e.g., block personal)
- MS
  - login.microsoftonline.com, login.microsoft.com, login.windows.net – removed from O365 one-click rule
  - adds Restrict-Access-To-Tenants and Restrict-Access-Context HTTP headers – block on MS side
  - provision policies: SSL inspection, FW, DNS control, Cloud App control
  - firewall fingerprints application and sends data to O365 dashboard
  - rewrite dst IP to closest CDN
  - O365 one-click rule has to be adjusted accordingly

Cloud Browser Isolation (CBI)

pixel streaming ≈ sandbox for web traffic
can be used without ZCC
container – per user per browser
- Chromium

Firewall

standard: port-based
- logs only blocked transactions
- match in Network Services
advanced: DPI
- match in Network Applications
- dst FQDN match
  - must see prior DNS request for wildcard FQDN
  - cache for 2×TTL
criteria
- logical OR within section
- logical AND between sections
actions
- allow
- block/drop: potential retransmissions
- block/ICMP: port unreachable to client
- block/RST: RST to client
- logging
outbound traffic only

NAT

destination NAT
criteria
- logical OR within section
- logical AND between sections

IPS

Snort syntax
criteria
- logical OR within section
- logical AND between sections

AV

MS and Adobe feeds

Advanced Threat Protection

risk index
- page risk
  - injected script
  - vulnerable ActiveX
  - 0-pixel iframe
  - phishing
  - XSS
- domain risk
  - hosting country
  - age
  - past history
  - links to high-risk TLD
botnet protection
IRC tunneling, anonymizers
P2P file sharing, anonymizers, VoIP

Sandbox

default rule ≡ last checked rule
stages
- Cloud Effect: known hashes (including 3rd party threat intelligence)
- pre-filtering: YARA rules, AV
- behaviour analysis
- post-processing: policy enforcement
ML models for detecting both malicious and benign files (high confidence)

File type control

magic byte → MIME → file extension
400MB+ files within archive are not scanned
takes precedence over DLP
caution action is not possible with some files
- archive: file signature is required for classification – cannot warn in the beginning, only block in the end
- upload: injected caution element is usually rejected
filename: Content-Disposition header
- some apps put junk into it

DLP

criteria: logical OR
index tool: precomputes templates with hashes for EDM and IDM
case-insensitive
in-memory processing
email notification
ignored
- punctuation
- delimeters
- noise words within key phrase
index tool VM
- CSV is stored on non-persistent partition: encrypted, password not stored in VM
- CSV is deleted after indexing
dictionary
- patterns: POSIX ERE subset
  - only unique matches are counted by default
  - has to start with base token (alphabetic or numeric)
    - length and type of the token can later be used as lookup table – can the data potentially match the pattern or not
    - heurestic: reduced processing time
  - not supported
    - nested repetition: [A-Z]{2}? or ([A-Z]*)?
      - usually can be rewritten or split into several patterns
    - optional expression at the start: (1-)?[0-9]
      - use several patterns
    - start not with base token: #include <stdio>
    - grouping at the start: ([A-Z]9{4}|[A-Z]{2}9{3})
      - move one of the characters out of grouping
      - split into several patterns
- phrases
  - identical phrases are also counted by default
  - at least 3 characters
confidence
- low: words/numbers and checksum match
- medium: + proper formatting
- high: + proximity words are present in the entire document

Exact data match (EDM)

exact match on hash from index tool
allows matching valid combinations instead of standalone items
- PII information: name or surname – not informative, name and surname – allows to identify a person
cell constraints
- at least 3 characters
- at most 2 whitespaces (after 3rd whitespace data is ignored)
- invalid format → whole row is ignored
  - if 10% rows are ignored, throw error and stop processing
- blank cell in template ≡ match any
- ASCII only
- special characters are ignored
primary key: well-known format or fixed length

Indexed document match (IDM)

text-based (includes OCR)
match based on resemblance to template
accuracy
- low
  - uploaded document matches ≥40% of indexed document
  - indexed document matches ≥70% of uploaded document
- medium
  - uploaded document matches ≥70% of indexed document
  - indexed document matches ≥90% of uploaded document
- high:
  - uploaded document matches ≥40% of indexed document

Endpoint DLP

resources
- network share
- cloud storage
- removable media
- printing
compressed files are not inspected
exemption password = disable EDLP password

Identity proxy

SSL inspection required
force initial access through Zscaler
- Zscaler ≡ IdP for cloud app
- proxies SAML between cloud app and real IdP
domains
- MS: login.microsoftonline.com, login.windows.net

DNS control

resolution based on Service Edge location, not customer’s
prevents DNS tunneling
modes:
- resolver
- transit
request (DNS) and response (IP) category filtering

Bandwidth control

shaping

CASB

inline (in motion) and OOB (at rest)
- OOB is triggered by hooks and schedule
engines
- SSL inspection
- DLP
- file controls
monitors sanctioned applications only

SaaS Security Posture Management (SSPM)

recommended policies

Cloud Security Posture Management (CSPM)

cloud configuration assessment

MS Cloud App Security

CASB
integration with ZIA
- pull unsanctioned apps to MS Defender Unsanctioned Apps URL category
- send user activity logs to MCAS to enable it to compute risk score for app
requires NSS server to collect logs from ZIA and send them to MCAS

Nanolog Streaming Service (NSS)

ZIA only
VMs: VMware
receives compressed events from Nanolog and forwards to SIEM
streams: requires separate NSS
- web events: web, SaaS, tunnel, alerts
- FW events: Cloud FW, DNS, alerts

Cloud NSS

HTTPS POST
push events to cloud SIEM (no need for NSS VM)
AWS, Azure

Source IP anchoring (SIPA)

customer-supplied source IP address
forwarders
- PSE/VSE
- ZPA app connectors
  - destination – ZPA application segment
  - forwarding method in ZIA – ZPA
  - ZPA itself has to be bypasses in ZPA forwarding policy