vSAN
- groups DAS into disk groups (from 1 to 5): 1 cache SSD + 1-7 capacity HDD/SSD
- cache SSD
- 10% of total capacity
- for own disk group only
- accelerates I/O with HDD
- consolidates write to reduce flash wear
- cache SSD
- part of ESXi kernel
- 2-64 nodes
- object-based file system
- object up to 255 GB
- if object is split into components, it can be striped over different disks
- no data locality
- I/O between cache thought network
- 10 Gbps minimum
- network delay (µs) < NAND delay (µs)
- data replication after failure: 1-to-1 ≡ 1 node copies data to 1 new node
Hybrid
- cache:
- 30% write-back buffer
- 70% read cache
- 4 KB block
- when 1 block is read, next 1 MB blocks are pre-loaded
- does not support deduplication, compression, EC
All-Flash
- block-level deduplication + compression, per disk group
- is deduplication or compression is enabled, disk failure ≡ disk group failure
- can be disabled – useful for DB
- deduplication:
- 4 KB block
- inline
- SHA-1 to identify uniqueness
- compression
- LZ4
- inline
- if compression results in more than 2 KB (e.g., encrypted data), compression is not applied
Witness
- protection from split-brain in case sites lose conectivity but stay up
- a VM requires strict majority of votes from nodes to start on node ≡ quorum (Raft)
- if number of nodes is even, witness has the deciding vote
- vSAN host that cannot run VMs
- 200 ms RTT, 100 Mbps
Stretched cluster
- 2 sites
- synchronous replication
- 5 ms RTT, 10 Gbps
- max 15 nodes per site
- affinity:
- none, primary, secondary
- limits the locations, where VM can be brought up with no failures present (PFTT = 0)
- primary level of failures to tolerate (PFTT)
- number of site failures
- secondary level of failures to tolerate (SFTT)
- number of local failures after PFTT is exceeded: 0-3, 3 default
- WAN optimization
- read from local copy
- only one copy is sent over WAN during replication, remote site then performs copy distribution locally
Site recovery manager (SRM)
- asynchronous replication
- per object
- RPO: 5 minutes – 24 hours
- full sync first, then only deltas are sent
Policy
- storage policy-based management (SPBM) per VM ≡ per object
- types
- stripes per object
- min number of capacity drives used for striping
- 1 – default, 12 – max
- PFTT
- number of copies: PFTT + 1
- 1 – default, 0 – min, 3 – max
- better to have more hosts than copies ⇒ after failure system can be rebalanced and restore protection level
- fault domain
- ≡ HX LAZ
- at least 2×PFTT + 1: space for repair and witness
- flash read cache reservation
- reserves cache fraction for critical VM
- 0-100%, 0% default
- hybrid only
- tied to VM, double reservation during vMotion
- object space reservation: thick provisioning
- failure tolerance method: performance (mirror) or capacity (EC)
- IOPS limit: 32 KB block size
- disable object checksum: CRC32
- force provisioning: whether provisioning is permitted when policy cannot be fulfilled
- stripes per object
vSAN objects
- types
- VM home:
- VMFS, root directory of datastore
- .vmx, logs, delta + VMDK descriptors
- VMDK
- VM swap: created after VM powered on
- snapshot delta VMDK
- memory: after VM suspend
- VM home:
- RAID tree per object: RAID 0 ≡ stripe, RAID 1 ≡ mirror
- mirror: different nodes
- stripe: within disk group, between disk groups, between nodes
Encryption
- between VM and storage on hypervisor
- for VM home and VMDK
- keys are sent via KMIP 1.1, client for storage encryption – vCenter
- vMotion encryption: random 256-bit keys + 64-bit nonce
- XTS AES-256
vSphere data protection (VDP)
- vSAN backup
- implementation
- VM on each site: between VDP on site, not between ESXi
- agents for Exchange, MS SQL, Sharepoint
- compression
- deduplication: variable length
- encryption
- backup to cloud: AWS
- file-level hashing allows to skip backup if data have not changed
- capable to backup NAS, SAN
Integrations
- vSAN iSCSI target
- vVols
- NFS/SMB: add-on VSA controllers
Linked clones
- clones use parent VMDK
- changes per clone – in deltas
- parent modification does not affect clones
- recompose: apply parent changes to clones + delete deltas