Hyperflex
- abstracts DAS over several hosts as a datastore
- ESXi, Hyper-V, K8s
- NFS interface for Hyper-V and ESXi, not accesible from outside
- external storage support: FCoE, iSCSI, NFS
- always distributed I/O: does not copy VM data to local datastore on VM move
- synchronous write to SSD
- synchronous replication, many-to-many
- cache: SSD + HDD
- mirroring vs parity: faster rebuild, parallel read and rebuild
- up to 64 nodes in cluster: 32 HX + 32 compute-only
- compute:converged ratio – 1:1 or 2:1 (depends on license)
- logical availability zone (LAZ)
- availability group, ≈ SRLG
- single failure within zone – whole zone is considered to have failed
- no more than one data copy
- requires at least 8 nodes in cluster
- write I/O:
- write to cache
- compress
- ack I/O to VM
- deduplicate
- authentication
- Kerberos + AD
- required NTP
- hardware management: UCSM or Intersight
- requires jumbo frame on upstream
- min cluster size:
- LAZ: 8 nodes
- RF3: 5 nodes (to sustain double failure)
Controller VM (CVM)
- Ubuntu-based
- provides access to physical storage
- hosts HX Connect
IOVisor
- controls hypervisor access to storage, distributes I/O over CVMs in cluster
- NFS mount point
- VIB ≡ driver
StorFS
- HXDP file system
- 70% usable: rest is required for HX operation, otherwise performance drops
- 8% for StorFS operation
- if 70% utilization is not met, does not allow upgrade or expand cluster
- circular buffer: data is always written to head
- no seek time for HDD
- uniform write over SSD cells
- cleanup process marks unneeded blocks as free
- 32 MB chunks
- stripe ≡ 8 chunks – atomic unit for HX operation
- 70% usable: rest is required for HX operation, otherwise performance drops
- unpredictable read time: read is supposed to be from cache (HDD backend)
- cache:
- read: hybrid only (not needed for SSD, can read directly at max speed)
- write
- hybrid and All-Flash
- active segment – I/O, passive segment – destaging
- 3 levels: resiliency in case primary fails during destaging
- 1 master on primary CVM
- 2 and 3 slaves mirror data on secondary CVMs
- destaging:
- primary CVM performs deduplication and then mirrors level 1 passive cache to secondary CVMs
- fresh data is copied to read cache
- cache is full:
- destage data to backend
- swap active and passive roles for cache segments
- faster rebuld compared to RAID: copy only, no need for recalc
- more stable than RAID: server/controller is single point of failure for RAID
- native snapshots change only metadata
- more effective than native snapshot chain (RoW) – less latency to read
- self-healing delay
- disk failure: 1 minute
- node failure: 2 hours
Deduplication
- per VMDK
- block-level, uses fingerprint for comparison
- during destaging: after write to SSD before writing to HDD
- mofifies pointers in inode
- inline
- cannot be disabled
- useful for ReadyClones ≡ VDI, based on access frequency: the more accessed, the more probable to be deduplicated
Compression
- inline
- cannot be disabled
- on cache level, algorithm – Google Snappy
Cluster
- must match
- type
- count and type of HDD
- server type
- single UCS domain
- no support: single FI, FEX, breakout cables
- direct mode connection
- HX server must connect to FI ports with same number
- split-brain “protection”: use single pair of FIs for the whole cluster
Stretched cluster
- RF4: RF2 + RF2
- uses witness VM to determine which site is online
- Intersight can act as witness
- VM requirements: 100 Mbps, 200 ms RTT
- if witness is unavailable on failure – cluster shutdown
- 2 sites only
- inter-site requirement: 100 Mbps, 5 ms RTT
- active/active (native replication ≡ active/standby)
- erase storage if created from standalone cluster
- VMware only
- read: local site
- write: local and remote site
HX Edge
- 2-4 nodes
- 2-node: Intersight for quorum, RF2 only
- 4-node: in case of 2+2 partition becomes read-only (cannot reach quorum)
- VMware only
- impossible to convert to full-feature cluster later
- impossible to expand
- hardware management – Intersight
- allows 1 CPU per server
- mLOM only, no VIC/NIC
- does not support SED, NVMe
Backup in HX
- VM is suspended during operations with VMDK (VM stun) to ensure consistency
- Veeam
- uses HX datastore directly via CVM
- NFSv3 not used, because of potential ESXi performance hit: locking, if Veeam and VM are on different hosts