Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
129 views83 pages

Sto1479bu Formatted Final 1507840549321001iws1 PDF

This document provides an overview of vSAN components and objects, the life cycle of vSAN components, fault domains, and all-flash I/O flow. It discusses how virtual disks are broken down into smaller vSAN components and distributed across disk groups in a host. It also explains how vSAN can tolerate certain component failures based on the number of fault domains it is configured with.

Uploaded by

Rashid Mahamood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views83 pages

Sto1479bu Formatted Final 1507840549321001iws1 PDF

This document provides an overview of vSAN components and objects, the life cycle of vSAN components, fault domains, and all-flash I/O flow. It discusses how virtual disks are broken down into smaller vSAN components and distributed across disk groups in a host. It also explains how vSAN can tolerate certain component failures based on the number of fault domains it is configured with.

Uploaded by

Rashid Mahamood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

STO1479BU

vSAN Beyond the Basics


t i o n
i s tr ibu
or d
t ion
bli c a
r p u
o t fo
nt: N
o n te
17 C
2 0
w o rld
V M
Sumit Lahiri – Product Line Manager
Eric Knauft – Staff Engineer

#VMworld #STO1479BU
Disclaimer
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these
features in any generally available product. t i o n
tr ibu
r dis purchase orders, or
• Features are subject to change, and must not be included in contracts,
o
sales agreements of any kind.
a t i on
c u b li
• Technical feasibility and market demand will affect o p
r final delivery.
o t f
• n
Pricing and packaging for any new technologiest : N or features discussed or presented have not
ont e
been determined.
17 C
2 0
o r ld
VMw

#STO1479BU CONFIDENTIAL 2
Agenda

t i o n
1 The world of Objects
i s tr ibu
or d
t ion
2 Life of vSAN Component bli c a
r p u
o t fo
nt: N
3 The 4 Rs of vSAN
o n te
17 C
2 0
4 rld
wo Fault Domains
Multi-Level
VM
5 All Flash I/O Flow

#STO1479BU CONFIDENTIAL 3
t i o n
i s tr ibu
or d
t ion
bli c a
r p u
The World of N o Objects
t fo
n t :
o n te
1 7 C
d 2 0
w o rl
VM

#STO1479BU CONFIDENTIAL
Disk layout in host

vSAN Datastore
n
tr utio 64 nodes
▪ ibMax
d i s
Disk groups contribute to single vSAN datastore in vSphere cluster
on or
a t i ▪ Min 2 nodes (ROBO)
disk group disk group disk group
p
disk group
u blicdisk group
fo r
N o t ▪ Max 5 Disk Groups per
nt:
Cache
o n te host
1 7 C
2 0
rld ▪ 2 – Tiers per Disk
Mwo
Capacity
V Group

#STO1479BU CONFIDENTIAL
Creating vm, creates several objects in the background

t i o n
Virtual Disk
i s tr ibu
(VMDK) or d
t ion
bli c a
r p u
o t fo
nt: N
o n te VM home namespace: VMX, log files
17 C
2 0
w orld
V M

Virtual memory swap objects

#STO1479BU CONFIDENTIAL 6
From VM to components

t i o n
(Object) (components)
istr ibu (blocks)
o r d
a t ion
ubli c
o r p Component
N otf Component
ent : Component
ont Component
17 C
d 2 0 (in low MBs)
w orl
V M
(Max Size: 255 GB)

#STO1479BU CONFIDENTIAL 7
Fault Domains

t i o n
i s tr ibu
or d
t ion
bli c a
r p u
o t fo
nt: N
o n te
vSphere vSAN
17 C
2 0
w o rld
V M
Host Racks Sites

#STO1479BU CONFIDENTIAL 8
Failures to Tolerate (FTT)

Always in context to fault domains


t i o n
i s tr ibu
or d
t ion
bli c a
r p u
o t fo
nt: N
o n te
17 C
2 0
rld
vSphere vSAN

Mw o
V
Host Racks Sites
Failures to Tolerate Failures to Tolerate Failures to Tolerate

#STO1479BU CONFIDENTIAL 9
Failures to Tolerate (FTT)

FTT implies host failures to tolerate if fault domain is not mentioned

t i o n
i s tr ibu
or d
t ion
bli c a
r p u
o t fo
nt: N
o n te
17 C
2 0
w rld
o
vSphere
V MvSAN vSphere vSAN vSphere vSAN

FTT=1 FTT=2 FTT=3

#STO1479BU CONFIDENTIAL 10
Failures to Tolerate (FTT) can be Nested

Survive one site failure and one host failure on the other site
t i o n
i s tr ibu
or d
t ion
bli c a
r p u
o t fo
nt: N
o n te
17 C
2 0
vSphere vSAN
w o rld
V M
Host Racks Sites

#STO1479BU CONFIDENTIAL 11
t i o n
i s tr ibu
or d
t ion
bli c a
Fault Tolerance
t fo
r pMethods
u
: N o
ont ent
17 C
d 2 0
orl
VMw

#STO1479BU CONFIDENTIAL
Failures Tolerate Method (FTM)

FTT=1 FTT=2 FTT=3

t i o n
i s tr ibu
or d
a t i on
blic
vSphere vSAN vSphere vSAN vSphere vSAN

r p u
o t fo
nt: N
RAID-1 ✓. 2bytes/byte
C o n te ✓. 3bytes/byte 4bytes/byte ✓.
0 1 7
rld 2
1.3 bytes/byte
Mw o
V
✓. X X
RAID-5
1.5 bytes/byte
X ✓. X
RAID-6
#STO1479BU CONFIDENTIAL 13
t i o n
i s tr ibu
or d
FTT = Failures to Toleratea t i on
blic p u
fo r
N o t
n nt:
teTolerance Method
FTM = Fault
17 C
o
d 2 0
orl
VMw

#STO1479BU CONFIDENTIAL
t i o n
i s tr ibu
or d
t ion
bli c a
Notation
t fo
r p u
: N o
ont ent
17 C
d 2 0
orl
VMw

#STO1479BU CONFIDENTIAL
Object is associated with underlying policy

t i o n
i s tr ibu
or d
t ion
bli c a
(VMDK)
r p u
o t fo
nt: N
Policy: o n te
17 C
2 0
1. Failures to Tolerate
w rld
o
V
2. Fault Tolerance MMethod

#STO1479BU CONFIDENTIAL 16
Policy dictates how objects are managed
FTT =1, FTM = RAID-1, Stripe Width >2

i b tion
(VMDK)
u
t r
or dis
a t i on
u bli c
(VMDK) or p
o t f
:N Replica Replica
ontent
Policy:
17 C
d 2 0 (stripes) (stripes)
1. w o
Failures to Tolerate (FTT)rl
VM
2. Fault Tolerance Method
(FTM) C1 C2 …. C1 C2 ….
(components) (components)

#STO1479BU CONFIDENTIAL 17
RAID Abstraction Model
FTT =1, FTM = RAID-1 , Stripe Width >2
No witness

(VMDK)
(VMDK)
t i o n
i s tr ibu
or d (RAID-1)

t ion
bli c a R1
r p u
o t fo (RAID-0) (RAID-0)
Replica Replica
n t : N
on t e R0 R0
2 0 17 C
(stripes)
orl d (stripes)
VMw
C1 C2 …. C1 C2 …. C1 C2 …. C1 C2 ….
(components) (components) (components)
(components)

#STO1479BU CONFIDENTIAL 18
FTT=1,FTM=RAID-1, comparison with stripe and without stripes

No witness No witness (VMDK)


t i o n
250GB (VMDK)
i s tribu
or d (RAID-1)

t ion
(RAID-1)
bli c a R1
f or pu
R1
o t
:N
(RAID-0) (RAID-0)
(no striping)
t e n t
on
(no striping)

1 7 C 1TB R0 R0 1 TB
d 20
C 250GB
or l C 250GB

(component) VMw (component)


C1 C2 …. C1 C2 ….
(components) (components)
250 GB
#STO1479BU CONFIDENTIAL 19
vSAN managed as bunch of components

t i o n
i s tr ibu
or d
vSAN Datastore t ion
bli c a
r p u
o t fo
nt: N
o n te
17 C
2 0
w orld
C V M C
C C C C
components

#STO1479BU CONFIDENTIAL
Each replica on different Fault Domain (e.g. host)

(VMDK)
FTT =2, FTM = RAID-1 , Stripen Width = 2
u t i o
b
istri
(RAID-1)

o r d
R1
a t ion
ubli c
o r p
tf
(RAID-0) (RAID-0) (RAID-0)

: N o
R0 t ent R0 R0
1 7 Con
2 0
w o rld
V M
C1 C2 C1 C2 C1 C2
(components) (components) (components)

#STO1479BU CONFIDENTIAL 21
Each component is commonly placed on a different host

(VMDK)
FTT =2, FTM = RAID-1 , Stripen Width = 2
u t i o
b
istri
(RAID-1)

o r d
R1
a t ion
ubli c
o r p
tf
(RAID-0) (RAID-0) (RAID-0)

: N o
R0 t ent R0 R0
1 7 Con
2 0
w o rld
V M
C1 C2 C1 C2 C1 C2
(components) (components) (components)

#STO1479BU CONFIDENTIAL 22
Can we survive 2 host failures with 3 hosts?

(VMDK)
FTT =2, FTM = RAID-1 , Stripen Width = 2
u t i o
b
istri
(RAID-1)

o r d
R1
a t ion
ubli c
o r p
tf
(RAID-0) (RAID-0) (RAID-0)

: N o
R0 t ent R0 R0
1 7 Con
2 0
w o rld
V M
C1 C2 C1 C2 C1 C2
(components) (components) (components)

#STO1479BU CONFIDENTIAL 23
t i o n
i s tr ibu
or d
t ion
bli c a
Liveness = Availability
t fo
r p u && Quorum
: N o
ont ent
17 C
d 2 0
orl
VMw

#STO1479BU CONFIDENTIAL
Quorum: In the event of cluster partition, which partition shall
proceed?

t i o n
i s tr ibu
or d
t ion
bli c a
N hosts r p u M hosts
o t fo
nt: N
…........ o n te …........
C
d 2017
or l
VM
partition-01
w partition-02

#STO1479BU CONFIDENTIAL 25
Quorum: The partition with the higher Votes proceed

N votes M votes
t i o n
i s tr ibu
N hosts o r
Mnhostsd
l i c atio
pu b
…........ for …........
N o t
e n t :
C ont
1 7
partition-01
r ld 20 partition-02
Mw o
V

Cluster members participate in voting

#STO1479BU CONFIDENTIAL 26
If M > N, Partition-2 proceeds

partition-02 proceeds
t i o n
N votes M votesdistr ibu
n o r
o
N hosts u bli catiM hosts
or p
o t f
e n t:N
…........ on t …........
2 017 C
orl d
VMw
partition-01 partition-02

Cluster members participate in voting

#STO1479BU CONFIDENTIAL 27
t i o n
i s tr ibu
or d
Voting t ion
bli c a
r p u
fo
o t
FTT=1 and n FTM
t : N = RAID-1
ont e
17 C
2 0
o r ld
V Mw

#STO1479BU CONFIDENTIAL
Quorum is calculated on a per object basis

t i o n
No witness
i s tr ibu
(VMDK)
or d
t ion
bli c a
(RAID-1)
r p u
o t fo• Each component participates in voting
R1
nt: N
o n te • With two components, this sums to even
C
1
d 2 017 1 number of votes
C or l C
V M w
(component) (component)

#STO1479BU CONFIDENTIAL 29
Add witness for Tier breaker vote

t i o n
i s tr ibu
(VMDK)
or d
t ion
bli c a
(RAID-1) (votes)
rup
1 f•o Witness is added as Tier breaker vote
R1 N
Wt:ot
n t e n • Acts as an observer which component has latest
C o
(witness)
(votes)
0 1 7 data
1 (votes) 1
r 2
ld C
o
VMw (component)
C
(component)

#STO1479BU CONFIDENTIAL 30
For VMDK-A , partition-2 has higher votes

(VMDK-A)
(votes)
(RAID-1) 1
t i o n
R1 W
i s tr ibu
(witness)
or d
(votes) 1 1 t io n
C C
bli c a
r p u
fo
(component) (component)

N o t
n te nt:
C o
N hosts
0 1 7 M hosts
rld 2
M w o
V
…........ …........
C C W
(votes) 1 1 (votes)
(votes) 1
partition-01 partition-02 proceeds

#STO1479BU CONFIDENTIAL 31
General Case: Different objects proceed on different partition

(VMDK-B) (VMDK-A)
(votes) (votes)
(RAID-1) 1 (RAID-1) 1
t i o nW
R1 W R1
i s t r ibu
(witness)
or d (witness)
(votes) 1 (votes) 1 n
aCtio
1 1
C C C
lic
ub(component)
(component) (component)
o r p (component)

o t f
nt: N
o n te
N hosts 1 7 C M hosts
2 0
w o rld
V M
…........
C C
…........
W C W C
(votes) 1 1 1 1 (votes) 1
(votes) 1

partition-01 proceeds for VMDK-B partition-02 proceeds for VMDK-A

#STO1479BU CONFIDENTIAL 32
Components can be classified as data component and witness
component

t i o n
(VMDK) i s tr ibu
or d
t ion
(1 vote)
bli c a
(RAID-1)
r p u 1
o t fo W (witness component)
R1
n t : N
on t e
(no striping)
1 7 C (no striping)
20
o1 rld
(1 vote) (1 vote) 1
VMw
D D
(data component) (data component)

#STO1479BU CONFIDENTIAL
t i o n
i s t ribu
or d
a t i on
Min count of hosts required
r p u b lic for survive
o
N host : ot f
Nfailures?
te n t
C on
2 017
orl d
VMw

#STO1479BU CONFIDENTIAL
Minimum 2N+1 hosts required to survive N host failures

N hosts = N shares of votes (N +1) hosts = (N+1) shares of vote

t i o n
1
…........ 1 1
…........distr ibu 1 1

n o r
b atio
lic
partition-01
rp u partition-02 is winning partition
t f o
:N o
t e n t
C on
7
r 201share of vote
• If each host represents same
ld
Mw o
• Wining partition V
would require a minimum of N+1 hosts

• Minimum size of cluster = 2N+1 hosts to survive N host failures

#STO1479BU CONFIDENTIAL 35
Min cluster size is determined by meeting Liveness requirement

t i o n
i s tr ibu
• Liveness = (Quorum) && (Availability) or d
t ion
bli c a
r p u
• Min of hosts in cluster = Max (Min o t fo hosts for Quorum,
n t : N
Min hosts for Availability) n te
C o
0 1 7
rl d 2
w o
VM

#STO1479BU CONFIDENTIAL 36
Examples

• FTT =1 , FTM = RAID-1 t i o n


• Min host for availability = 2 i s tr ibu
or d
• Min host of Quorum = 2N+1 = 3 a t ion
bli c
• Min cluster size =3 r p u
o t fo
nt: N
• FTT=2, FTM = RAID-1 o n te
1 7 C
• 2
Min host for availability
d 0 =3
o r l
• Min host for w
VMQuorum = 2N+1 =5
• Min cluster size =5

#STO1479BU CONFIDENTIAL 37
t i o n
i s tr ibu
or d
t ion
bli c a
r p u
Examples of Liveness
Not
(Quorum
fo + Availability)
e n t :
C ont
1 7
r ld 20
Mw o
V

#STO1479BU CONFIDENTIAL
Quorum (FTT:2, FTM: RAID-1 ) = 5 Hosts, no stripe

FTT =2, FTM = RAID-1 , Stripe Width = 1


t i o n
(witness component)

i s tr ibu
or d W
t ion 1
bli c a
(VMDK)
r p u
(RAID-1)
o t fo W
n t: N 1
R1
o n t e
1 7 C
d 20 2 witness components = 2 votes
r l 1
Mwo
1 1
D V D D
(data component) (data component) (data component)

3 data components = 3 votes

#STO1479BU CONFIDENTIAL 39
Votes Re-assigned / Re-balanced as stripe width is changed

FTT =2, FTM = RAID-1 , Stripe Width = 2

2 3
(VMDK) t i on
ib u
W W
istr
(RAID-1)

o r d
R1
a t i on
ubli c
o r p (RAID-0)
tf
(RAID-0) (RAID-0)
Assign higher votes
R0 2 2 R0 nt : No 2 R0 to break tie
on te
17 C
d 2 0
orl
VMw
1 C1 C2 1 1 C1 1 C2 1 C1 1 C2
(components) (components) (components)

#STO1479BU CONFIDENTIAL 40
Quorum with stripe width =2

Partition – 2 proceeds
Partition - 1
t i o n
i s tr ibu (VMDK)
or d
t ion
bli c a
r p u
C2 C1 W
o
C2 t fo C1 W C2 C1
n t: N
on t e
2 2 17 C 2 3 2
o rl d 20
(2 votes) VMw (2 votes) (2 votes) (1 vote)
(1 vote)

Availability but no Quorum (Availability) && (Quorum)

#STO1479BU CONFIDENTIAL 41
t i o n
i s tr ibu
or d
Quorum = True
lic a t i on
p u b
t f or
:N o
t e n t
Availability
17 C on = False
2 0
w orld
V M

#STO1479BU CONFIDENTIAL
It is possible to have Quorum but no Availability

(VMDK)
Partition - 1

✓ Quorum t i o n
R1
i s tr ibu
or d
t ion
bli c a
R0 R0 R0 r p u
o t fo
nt: N
onte
(votes)
C1 1 C1 1
1 7 C 3 W
20
C1 1

o r ld
V Mw
Partition - 2 C2 1 C2 1 C2 1
2 W
Quorum

#STO1479BU CONFIDENTIAL 43
t i o n
i s tr ibu
or d
t ion
bli c a
r p u
RAID-5 N o t fo
n t :
C onte
0 1 7
l d 2
or
VMw

#STO1479BU CONFIDENTIAL
RAID – 5 protection against 1 host failure

(VMDK)
n
Assigned higher vote to break tie
t i o
R5 b u
r distri
on o
c a t i
1 C0 1 C1 r pub2li 1
t f o C2 C3
: N o
t e n t
Each component C on on a separate host
0 1 7 …...... …......
r l d 2 …......
Mw o
V

esxi-01 esxi-02 esxi-03 esxi-04

#STO1479BU CONFIDENTIAL 45
RAID – 5 protection against 1 host failure

(VMDK)

t i o n
R5
i s tr ibu
or d
t ion
bli c a
1 C0 1 C1 r p u 2 1
o t fo C2 C3
n t : N
on t e
217 C
Each component
0…......
is divided into data and parity blocks
…...... …......
D1 D2 P1 D3
orl d
VMw

esxi-01 esxi-02 esxi-03 esxi-04

#STO1479BU CONFIDENTIAL 46
t i o n
i s tr ibu
or d
t ion
bli c a
The Life of vSAN r p u
Component
o
ot f : N
n t
C onte
0 1 7
l d 2
or
VMw

#STO1479BU CONFIDENTIAL
Object States: can be “not compliant” but accessible

• Compliance status: Are all replicas good?


(VMDK)
• Operational status: Is Accessible? 3 (votes)
W n
• Accessible implies Liveness t i o
R1 i s tr ibu
or d esxi-03
t ion
bli c a
(votes) 2 2 pu
(votes)
r
R0 o t fo
R0
n t : N
ont e
1 7 C
r ld 20
C1 o
Mw C2 C1
V C2

esxi-01 esxi-02

#STO1479BU CONFIDENTIAL 48
Object States: can be “not compliant” but accessible

• Compliance status: Are all replicas good?


(VMDK)
• Operational status: Is Accessible? 3 (votes)
W n
• Accessible implies Liveness t i o
R1 i s tr ibu
or d esxi-03
t ion
bli c a
(votes) 2 2 pu
(votes)
r
R0 o t fo
R0
n t : N
ont e • Active = known good
1 7 C
20 • Degraded = known bad, rebuild now
orld
C1
VMw C2 C1 C2 • Absent = known bad, cause not known,
repair after 60 mins

esxi-01 esxi-02 • Stale = Active however needs update

#STO1479BU CONFIDENTIAL 49
4 Rs – Resync , Rebuild, Repair and Reconfiguration

(VMDK) (VMDK) (VMDK)


t i o n
i s tr ibu
R1 R1
or d R1
(Host-1) (Host-4)
a t i on
(state: active-stale) (state: degraded)
C1 ….. C4 (components) ….. u b lic C1 ….. C4
C1 C4
or p
o t f
e n t:N
(blocks)
on t (resync blocks) (build out the component)
1 7 C
r ld 20
Mw o
• V
VMDK is divided into components Partial Resync Repair / Reconfigure
• Components comprise of data blocks • Copy data to stale components • Build fresh component
• Each component on different host • When a component comes • Full Resync
• Each data block of fixed size back from being absent

#STO1479BU CONFIDENTIAL 50
t i o n
i s tr ibu
or d
t ion
bli c a
r p u
Rebuild Example : N o t fo
te n t
C o n
0 1 7
d 2
orl
VMw

#STO1479BU CONFIDENTIAL
Begin: All components / elements are in active state

t i o n
(Active) (Active) (Active)
t r i
(Active) b u
(Active) r d is
An o
A A A
a i o
t C1
A

C1 C2 W
ub l i c C2
r p
o t fo
nt: N
2
o n te 3 2
1 7 C
2 0
(2ldvotes) (3 votes) (2 votes)
w o r
V M
Tolerate 1 host failure with RAID-1

#STO1479BU CONFIDENTIAL 53
Cluster partitions with unknown cause, components go ”Absent”

Partition - 1 Partition – 2 n
t i o
i s t r ibu
(Active) or d
t io n (Active)
bli c a
u
A
(Absent) A A
r p A A
B B
W ot fo
C1 C2
n t : N C1 C2

ont e
17 C
rl2d 20 3 2
Absent: Known bad,
Mw o
but cause not known V (2 votes) (3 votes) (2 votes)

Cluster partition, cause unknown, do not repair immediately


Object is not compliant but accessible

#STO1479BU CONFIDENTIAL 54
Partition with both Availability and Quorum proceeds

Partition - 1 Partition – 2 - proceeds


t i o n
i sibu
tr
o d
r and availability
vm HA to partition -2 , partition-2 has both quorum
t io n
bli c a
p u
t f or
: N oA
(Absent) A A
t e n t A A
B B
C2 7 C o n W
C1
2 0 1 C1 C2

o r l d
V Mw
2 3 2

(2 votes) (3 votes) (2 votes)

Availability no Quorum Quorum && Availability


#STO1479BU CONFIDENTIAL 55
Partition is resolved, component is Resynced

Resync
t i o n
i s t ribu
AS AS A
d
or A
oC1n
A
(Active-Stale)
a t i
C1 C2 W
blic C2
f or pu
o t
e n t:N
2
on t 3 2
1 7 C
2 0
(2ldvotes) (3 votes) (2 votes)
or
VMw
Active-Stale Component is Resynced
Component marked as Active Stale, Object is not compliant

#STO1479BU CONFIDENTIAL 56
All components / elements are in active state

t i o n
(Active) (Active) (Active)
t r i
(Active) b u
(Active) r d is
An o
A A
A
a i o
t C1
A

C1 C2 W
ub l i c C2
r p
o t fo
nt: N
2
o n te 3 2
1 7 C
2 0
(2ldvotes) (3 votes) (2 votes)
w o r
V M
All components are Active

Object is compliant and accessible

#STO1479BU CONFIDENTIAL 57
t i o n
i s tr ibu
or d
t ion
bli c a
r p u
fot
Repair
ent:
N oScenarios
ont
17 C
d 2 0
orl
VMw

#STO1479BU CONFIDENTIAL
Absent Components Repair After 60 Min

Partition - 1 Partition – 2 : most recent data


on
b u ti
istr i
o r d
a t ion
(Absent)
A
ubli c
or p
A A A A

C1 C2 W
o t f C1 C2
Resync after 60 min
e n t:N
on t
1 7 C
220 3 2
orld
VMw (2 votes) (3 votes) (2 votes)

#STO1479BU CONFIDENTIAL 59
Degraded Components Repair Immediately

Hardware Failure Causes Degraded


t i o n
i s tr ibu
or d
a t i on
blic A
(Degraded) A
D D
r p u A

C1 C2 W
o t fo C1 C2
Known bad,
e n t:N
Resync Now
on t
1 7 C
220 3 2
orld
VMw (2 votes) (3 votes) (2 votes)

#STO1479BU CONFIDENTIAL 60
Fresh components Resynced From Existing Components

Resync
t i o n
i s tr ibu
A
orA d R R
D D A
t ion
(Degraded)
W c
bliC1a (Reconfiguring
C1 C2
r p u C2 C1 C2

t f o
:N o
t e n t
2 on 3 2 2
20 17 C
orld
(2 votes)
VMw
Find another host to resync, Resync begins (Another Host)
Object state is not-compliant but accessible

#STO1479BU CONFIDENTIAL 61
Object is Compliant Again

t i o n
(remove)
i s t ibu
r
(Active) d
or(Active) (Active) (Active)
(Active) t ion
A A
bli c a A A
u
D D A
(Degraded)
r p W
C1 C2 C1
o t fo C2 C1 C2

nt: N
o n te
7 C 2 3 2
2
r ld 201
Mw o
V

Degraded component is marked for deletion

#STO1479BU CONFIDENTIAL 62
Rebuild RAID schematics – Resync begins

(VMDK)

t i o n
r ibu
R1 o r dist
W
o n
u bli cati
or p
o t f
e n
R0:N
t R0
R0 on t
1 7 C
r ld 20
Mw o C2
V
C1
C1 C2
C2
C1
Resync begins
(Degraded)

#STO1479BU CONFIDENTIAL 63
Rebuild RAID schematics – Resync ends

(VMDK)

t i o n
r ibu
R1 o r dist
W
o n
u bli cati
or p
o t f
e n
R0:N
t R0
R0 on t
1 7 C
r ld 20
Mw o C2
V
C1
C1 C2
C2
C1
Resync Ends
(mark for removal)

#STO1479BU CONFIDENTIAL 64
t i o n
i s tr ibu
or d
Reconfiguration
c a t i on
u bli
or p
o t f
Changing e t:N
Storage
n Policies
o n t
C 17
2 0
w orld
V M

#STO1479BU CONFIDENTIAL
Reconfiguration – Increase FTT =2 to FTT =3

t i o n
i s tr ibu
R1 or d
t ion R1
bli c a
r p u
R0 R0 R0
o t fo R0 R0 R0
nt: N R0
o n te
17 C
2 0
w orld
V M

#STO1479BU CONFIDENTIAL
Reconfiguration – Increase Sripe Width
R1

R0 R0 R0
t i o n
i s tr ibu
or d
io n
R1licat
p u b
f or
o t
e n t:N
on t
1 7 C
r ld
R0 20 R0 R0
o
VMw R0 R0 R0

#STO1479BU CONFIDENTIAL
t i o n
i s tr ibu
or d
t ion
bli c a
Multi-Level Fault r u
Domains
p
o
ot f : N
n t
C onte
0 1 7
l d 2
or
VMw

#STO1479BU CONFIDENTIAL
Failures to Tolerate (FTT) can be Nested

Survive one site failure and one host failure on the other site
t i o n
i s tr ibu
or d
t ion
bli c a
r p u
o t fo
nt: N
o n te
17 C
2 0
vSphere vSAN
w o rld
V M
Host Racks Sites

#STO1479BU CONFIDENTIAL 69
Stretched Cluster deployment with local fault protection

3rd site for


t i o n
witness
i ibu
s tr
o
• Prior examples, r d host is the fault domain
t io n
c a
r pu•bl2i Levels of fault domain
RAID-1
tf o – Site and host
e n t : No
RAID-5
on t
RAID-5
• Failures to tolerate at each level
2 0 17 C
orl d
VMw
Cluster Cluster

vSphere vSAN

5ms RTT, 10GbE

#STO1479BU CONFIDENTIAL 70
RAID tree for stretched cluster with local fault protection

R1
t i o n
i s tr ibu
or d
t ion
R5
bli c a R5
r p u
t f o
:N o
t e n t
D1 on D1 P1
17 C
P1

d 2 0
D2 orl D3 D2 D3
VMw

(Site -1) (Site -2)

#STO1479BU CONFIDENTIAL 71
Survive 1 site failure

R1
t i o n
i s tr ibu
or d
t ion
R5
bli c a R5
r p u
t f o
:N o
t e n t
D1 on D1 P1
17 C
P1

d 2 0
D2 orl D3 D2 D3
VMw

(Site -1) (Site -2)

#STO1479BU CONFIDENTIAL 72
Survive 1 site failure and 1 host failure

R1
t i o n
i s tr ibu
or d
t ion
R5
bli c a R5
r p u
t f o
:N o
t e n t
D1 on D1 P1
17 C
P1

d 2 0
D2 orl D3 D2 D3
VMw

(Site -1) (Site -2)

#STO1479BU CONFIDENTIAL 73
Anatomy of write: from site - 1 to site - 2
1 Issue write

R1
t i o n
2b Send only data across sites
i s tr ibu
d
or Remote Helper Raid Tree
t ion
Dn
bli c a R5
(proxy owner)
2a R5 R5
r p u
fo 3 Remote side calculates
Not
Update Local Data
and Parity
e n t : parity.

C ont
D1 P1 1 7 D1 P1

r ld20
Mw o D2 D3
D2 V D3

(Site -1) (Site - 2)

#STO1479BU CONFIDENTIAL 74
t i o n
i s tr ibu
or d
t ion
bli c a
Votes in Stretched r p u Cluster
o
ot f : N
n t
C onte
0 1 7
l d 2
or
VMw

#STO1479BU CONFIDENTIAL
5 Votes per site

Witness has equal share of votes as


the other 2 entities (e.g. sites)
3 voting entities for first level
Wion
Site-1, Site-2 and the witness
i b u t
istr
R1
o r d
a t ion
ubli c
o r p
R5 N otf
ent : R5
on t
17 C
d 2 0
orl D3 2
VMw
D1 D1 D3
1
D2 P1 5 5 D2 P1
1 1

4 components for second level


(Site -1) (Site -2)
Total of 5 votes (odd number of votes)
#STO1479BU CONFIDENTIAL 76
Witness is assigned same voting rights as the sites

Witness has equal share of votes as


the other 2 entities (e.g. sites)
3 voting entities for first level
5
Wion
Site-1, Site-2 and the witness
i b u t
istr
R1
o r d
a t ion
ubli c
5
o r p 5
R5 N otf
ent : R5
ont
17 C
d 2 0
orl D3
VMw
D1 D1 D3

D2 P1 5 5 D2 P1

4 components for second level


(Site -1) (Site -2)
Total of 5 votes (odd number of votes)
#STO1479BU CONFIDENTIAL 77
t i o n
i s tr ibu
or d
t ion
bli c a
I/O Flows r p u
o
ot f: N
n t
C onte
0 1 7
l d 2
or
VMw

#STO1479BU CONFIDENTIAL
Anatomy of a All Flash Write
Pretty much same as hybrid:
virtual disk ▪ VM running on host H1
▪ H1 is owner of virtual disk object Number
1
6 Of Failures To Tolerate = 1
t i o n
▪ Object has 2 replicas buH1 and H2
trion
vSphere
r d is
Virtual SAN t
1. GuestaOS no
io issues write op to virtual disk
i
u c
bl
p
o2.r Owner clones write op
H1 H2 H3
o t f
2
e n t:N 3. In parallel: sends “prepare” op to H1 (locally)
on t
1 7 C and H2
3
r ld 20
Mw o 4. H1, H2 persist op to Flash (log)
5 4 V 4 5
5. H1, H2 ACK prepare op to owner
7
7 6. Owner waits for ACK from both ‘prepares’ and
completes I/O
7. Later, owner commits batch of writes
#STO1479BU CONFIDENTIAL
All-flash: Destaging Cache to Capacity
▪ Data from committed writes
virtual disk accumulate on Flash Cache (Write
Buffer)
• From different VMs / virtual disks
t i o n
▪ In all-flash, blocks that t r i
areb u written most
vSphere
often (hot) stayoin r is cache.
dwrite
Virtual SAN
a t i on
u b lic blocks that are infrequently
▪ In all-flash,
o r p
H1 H2 H3
o t f accessed (cold) are destaged to flash
nt: N capacity layer.
o n te
1 7 C
2 0
w o rld
V M
hot

cold

#STO1479BU CONFIDENTIAL
Nerd Out With These Key vSAN Activities at VMworld

t i o n
i s tr ibu
or d
Practice with Visit SDDC
a t i on Become a
Hands-on-Labs ublic
Assessment Lounge
r p vSAN Specialist
t o
f assess if your IT
Learn from self-paced and expert Discover how
: o to
Nfit for HCI
Earn VMware digital badges to
led hands on labs
n t
is a good showcase your skills
• vSAN Getting Started Workshop o n•te • New 2017 vSAN Specialist
7 C Four Seasons Willow Room/2nd

01
(Expert led) floor Badge
VxRail Getting Started (Self ld 2

w o r • Open from 11am – 5pm Sun, • Education & Certification Lounge:
VM online
paced) Mon, and Tue VM Village
• Self-Paced lab available • Learn more at Assessing &
24x7 Sizing in STO1500BU • Certification Exam Center:
Jasmine EFG, Level 3

#HitRefresh on your current data center and discover the possibilities!


3 Easy Ways to Learn More about vSAN

Storage Hub Technical Library New vSAN Tools Hands-On Lab

t i o n
i s tr ibu
or d
t ion
bli c a Test drive vSAN

r p u for free today!

o t fo
nt: N
o n te vSAN Sizer
• StorageHub.vmware.com2017
C • Live at VMworld
o r ld
M w
• Reference architectures, • Practical learning of
off-line demosVand more vSAN, VxRail and more
• Easy search function • 24x7 availability online
• And More! – for free!
vSAN Assessment

82
#STO1479BU CONFIDENTIAL
t i o n
i s tr ibu
or d
t ion
bli c a
r p u
o t fo
nt: N
o n te
17 C
2 0
w orld
V M
t i o n
i s tr ibu
or d
t ion
bli c a
r p u
o t fo
nt: N
o n te
17 C
2 0
w orld
V M

You might also like