Hyperflex

abstracts DAS over several hosts as a datastore
ESXi, Hyper-V, K8s
NFS interface for Hyper-V and ESXi, not accesible from outside
external storage support: FCoE, iSCSI, NFS
always distributed I/O: does not copy VM data to local datastore on VM move
- synchronous write to SSD
- synchronous replication, many-to-many
cache: SSD + HDD
mirroring vs parity: faster rebuild, parallel read and rebuild
up to 64 nodes in cluster: 32 HX + 32 compute-only
- compute:converged ratio – 1:1 or 2:1 (depends on license)
logical availability zone (LAZ)
- availability group, ≈ SRLG
- single failure within zone – whole zone is considered to have failed
- no more than one data copy
- requires at least 8 nodes in cluster
write I/O:
1. write to cache
2. compress
3. ack I/O to VM
4. deduplicate
authentication
- Kerberos + AD
- required NTP
hardware management: UCSM or Intersight
- requires jumbo frame on upstream
min cluster size:
- LAZ: 8 nodes
- RF3: 5 nodes (to sustain double failure)

Controller VM (CVM)

HXDP file system
- 70% usable: rest is required for HX operation, otherwise performance drops
  - 8% for StorFS operation
  - if 70% utilization is not met, does not allow upgrade or expand cluster
- circular buffer: data is always written to head
  - no seek time for HDD
  - uniform write over SSD cells
- cleanup process marks unneeded blocks as free
- 32 MB chunks
- stripe ≡ 8 chunks – atomic unit for HX operation
unpredictable read time: read is supposed to be from cache (HDD backend)
cache:
- read: hybrid only (not needed for SSD, can read directly at max speed)
- write
  - hybrid and All-Flash
  - active segment – I/O, passive segment – destaging
  - 3 levels: resiliency in case primary fails during destaging
    - 1 master on primary CVM
    - 2 and 3 slaves mirror data on secondary CVMs
    - destaging:
      - primary CVM performs deduplication and then mirrors level 1 passive cache to secondary CVMs
      - fresh data is copied to read cache
- cache is full:
  - destage data to backend
  - swap active and passive roles for cache segments
faster rebuld compared to RAID: copy only, no need for recalc
more stable than RAID: server/controller is single point of failure for RAID
native snapshots change only metadata
- more effective than native snapshot chain (RoW) – less latency to read
self-healing delay
- disk failure: 1 minute
- node failure: 2 hours

per VMDK
block-level, uses fingerprint for comparison
during destaging: after write to SSD before writing to HDD
mofifies pointers in inode
inline
cannot be disabled
useful for ReadyClones ≡ VDI, based on access frequency: the more accessed, the more probable to be deduplicated

RF4: RF2 + RF2
uses witness VM to determine which site is online
- Intersight can act as witness
- VM requirements: 100 Mbps, 200 ms RTT
- if witness is unavailable on failure – cluster shutdown
2 sites only
inter-site requirement: 100 Mbps, 5 ms RTT
active/active (native replication ≡ active/standby)
erase storage if created from standalone cluster
VMware only
read: local site
write: local and remote site

2-4 nodes
- 2-node: Intersight for quorum, RF2 only
- 4-node: in case of 2+2 partition becomes read-only (cannot reach quorum)
VMware only
impossible to convert to full-feature cluster later
impossible to expand
hardware management – Intersight
allows 1 CPU per server
mLOM only, no VIC/NIC
does not support SED, NVMe

VM is suspended during operations with VMDK (VM stun) to ensure consistency
Veeam
- uses HX datastore directly via CVM
- NFSv3 not used, because of potential ESXi performance hit: locking, if Veeam and VM are on different hosts