VMware vSAN

  1. vSAN
    1. Hybrid
    2. All-Flash
  2. Witness
    1. Stretched cluster
    2. Site recovery manager (SRM)
  3. Policy
    1. vSAN objects
  4. Encryption
  5. vSphere data protection (VDP)
  6. Integrations
  7. Linked clones

vSAN

  • groups DAS into disk groups (from 1 to 5): 1 cache SSD + 1-7 capacity HDD/SSD
    • cache SSD
      • 10% of total capacity
      • for own disk group only
      • accelerates I/O with HDD
      • consolidates write to reduce flash wear
  • part of ESXi kernel
  • 2-64 nodes
  • object-based file system
    • object up to 255 GB
    • if object is split into components, it can be striped over different disks
  • no data locality
  • I/O between cache thought network
    • 10 Gbps minimum
    • network delay (µs) < NAND delay (µs)
  • data replication after failure: 1-to-1 ≡ 1 node copies data to 1 new node

Hybrid

  • cache:
    • 30% write-back buffer
    • 70% read cache
  • 4 KB block
  • when 1 block is read, next 1 MB blocks are pre-loaded
  • does not support deduplication, compression, EC

All-Flash

  • block-level deduplication + compression, per disk group
    • is deduplication or compression is enabled, disk failure ≡ disk group failure
    • can be disabled – useful for DB
  • deduplication:
    • 4 KB block
    • inline
    • SHA-1 to identify uniqueness
  • compression
    • LZ4
    • inline
    • if compression results in more than 2 KB (e.g., encrypted data), compression is not applied

Witness

  • protection from split-brain in case sites lose conectivity but stay up
  • a VM requires strict majority of votes from nodes to start on node ≡ quorum (Raft)
  • if number of nodes is even, witness has the deciding vote
  • vSAN host that cannot run VMs
  • 200 ms RTT, 100 Mbps

Stretched cluster

  • 2 sites
    • synchronous replication
    • 5 ms RTT, 10 Gbps
    • max 15 nodes per site
  • affinity:
    • none, primary, secondary
    • limits the locations, where VM can be brought up with no failures present (PFTT = 0)
  • primary level of failures to tolerate (PFTT)
    • number of site failures
  • secondary level of failures to tolerate (SFTT)
    • number of local failures after PFTT is exceeded: 0-3, 3 default
  • WAN optimization
    • read from local copy
    • only one copy is sent over WAN during replication, remote site then performs copy distribution locally

Site recovery manager (SRM)

  • asynchronous replication
  • per object
  • RPO: 5 minutes – 24 hours
  • full sync first, then only deltas are sent

Policy

  • storage policy-based management (SPBM) per VM ≡ per object
  • types
    • stripes per object
      • min number of capacity drives used for striping
      • 1 – default, 12 – max
    • PFTT
      • number of copies: PFTT + 1
      • 1 – default, 0 – min, 3 – max
      • better to have more hosts than copies ⇒ after failure system can be rebalanced and restore protection level
    • fault domain
      • ≡ HX LAZ
      • at least 2×PFTT + 1: space for repair and witness
    • flash read cache reservation
      • reserves cache fraction for critical VM
      • 0-100%, 0% default
      • hybrid only
      • tied to VM, double reservation during vMotion
    • object space reservation: thick provisioning
    • failure tolerance method: performance (mirror) or capacity (EC)
    • IOPS limit: 32 KB block size
    • disable object checksum: CRC32
    • force provisioning: whether provisioning is permitted when policy cannot be fulfilled

vSAN objects

  • types
    • VM home:
      • VMFS, root directory of datastore
      • .vmx, logs, delta + VMDK descriptors
    • VMDK
    • VM swap: created after VM powered on
    • snapshot delta VMDK
    • memory: after VM suspend
  • RAID tree per object: RAID 0 ≡ stripe, RAID 1 ≡ mirror
    • mirror: different nodes
    • stripe: within disk group, between disk groups, between nodes

Encryption

  • between VM and storage on hypervisor
  • for VM home and VMDK
  • keys are sent via KMIP 1.1, client for storage encryption – vCenter
  • vMotion encryption: random 256-bit keys + 64-bit nonce
  • XTS AES-256

vSphere data protection (VDP)

  • vSAN backup
  • implementation
    • VM on each site: between VDP on site, not between ESXi
    • agents for Exchange, MS SQL, Sharepoint
  • compression
  • deduplication: variable length
  • encryption
  • backup to cloud: AWS
  • file-level hashing allows to skip backup if data have not changed
  • capable to backup NAS, SAN

Integrations

  1. vSAN iSCSI target
  2. vVols
  3. NFS/SMB: add-on VSA controllers

Linked clones

  • clones use parent VMDK
  • changes per clone – in deltas
  • parent modification does not affect clones
    • recompose: apply parent changes to clones + delete deltas