Implementation of A Stretched Cluster With SVC
Implementation of A Stretched Cluster With SVC
Special Programs:
Comprehensive education, training and service IBM Systems ‘Guaranteed to Run’ Classes --
offerings Make your education plans for classes
with confidence!
Expert instructors and consultants,
world-class content and skills
Multiple delivery options for training
Instructor-led online (ILO) training
The classroom comes to you.
and services
Conferences explore emerging trends
Customized, private training
and product strategies
SVC Overview
SVC Split Cluster Overview
– Volume Mirroring
– Voting Set
– Quorum Disks
Split Cluster Scenarios
Performance Optimization
– Data Paths
– Fast Write Technologies
3
© 2012 IBM Corporation
SVC 2145-CG8 Storage Engine
Based on IBM System x3550 M3 server (1U)
– Intel® Xeon® 5600 (Westmere) 2.53 GHz quad-core processor
24GB of cache
– Up to 192GB of cache per SVC cluster
Four 8Gbps FC ports (support Short-Wave & Long-Wave SFPs)
– Up to 32 FC ports per SVC cluster
● For external storage
● And/or for server attachment
● And/or Remote Copy/Mirroring
Two 1 Gbps iSCSI ports
– Up to 16 GbE ports per SVC cluster
Optional 1 to 4 Solid State Drives, or Dual 10 Gbps iSCSI ports
– Up to 32 SSD per SVC cluster (supported only with SVC code 5.1 & 6.2, not with 6.1)
– Up to 16 10GbE ports per SVC cluster
Optional two 10 Gbps iSCSI ports
New engines may be intermixed in pairs with other engines in SVC clusters
– Mixing engine types in a cluster results in Volume throughput characteristics of the
engine type in that I/O group
Cluster non-disruptive upgrade capability may be used to replace older engines
with new CG8 engines
Replaces the SVC 2145-CF8 engine
4 – Available with Entry Edition License i.e. based on number of physical drives
© 2012 IBM Corporation
SAN Volume Controller Cluster Architecture
Node
A controller (1 rack unit)
– 2 quad-core Xeon Intel processors
– 24GB of cache
– 4 Fibre Channel ports (8Gbps) IO Group 0
– Up to 4 SSD
Node 1
Cache protected by dedicated Uninterruptible Power
Supply (1 rack unit) Node 2
IO Group 2
System
Node 5
1 to 4 Node-pairs or I/O Groups
Up to 192 GB of cache Node 6
Up to 32 FC ports (8Gbps)
Up to 16 GbE ports IO Group 3
Up to 16 10GbE ports Node 7
Up to 32 SSD
Node 8
Managed using the embedded Graphical User Interface
(Optional external by Master Console)
All nodes interconnected in the cluster via the SAN
5 Note: rack and management console (optional)
© 2012 IBM Corporation
Virtualisation Concepts & Limits
Hosts
•Up to 1024 Hosts per cluster
•Up to 2048 FC ports per cluster & Host
Single Multipath driver •Up to 256 iSCSI ports per cluster & Host
(SDD, MPIO)
Storage Pools
•Up to 128 MDisks per
Pool
Vol21 Pool Pool
Vol43 Pool 4 Pool
Vol65
Vol1 Vol32 Vol7 Vol5 Storage Pool
•Up to 128 storage pools
per cluster
M
ap
Authentication service for Single Sign-On & LDAP
3
Map 4
– Consistency Groups
Virtualise data without data-loss – Reverse
Vol3
FlashCopy
Vol4
FlashCopy
target of Vol1 target of Vol3
– External storage virtualization (optional)
Microsoft Virtual Disk Service & Volume Shadow
Thin-provisioned & Compressed Volumes Copy Services hardware provider
– Reclaim Zero-write space
– Thick to thin, thin to thick & thin to Remote Copy (optional)
thin migration – Synchronous & asynchronous remote replication with
Expand or shrink Volumes on-line Volume Consistency groups Storwize V7000 Storwize V7000
Storwize V7000
On-line Volume Migration between ● Optional Cycling Mode
using snapshots
Storage Pools & IO groups MDisk MDisk MM or GM Consolidated MM or GM
Volume Source Target Relationship DR Site Relationship
Storwize V7000
Storwize V7000
Volume Mirroring
MM or GM Relationship
Volume Volume
copy 0 copy 1
Manager
7
Hot-spots Optimized performance and throughput – VAAI support & vCenter Server management plug-in
© 2012 IBM Corporation
8
© 2012 IBM Corporation
SVC Split Cluster
= Split Cluster
Site C
10
© 2012 IBM Corporation
Volume Mirroring
Redundancy Latency
Failure occurs Then write data is still in Then write data is still in cache
before data is cache of other node. of other node.
freed from
cache Volume remains online. Volume remains online.
Failure occurs Both volume copies are in Volume copies may not be in
after data sync. sync.
released from
cache Volume stays online using Volume may go offline.
12 online storage device.
© 2012 IBM Corporation
Voting Set / Quorum
13
© 2012 IBM Corporation
Quorum Disks
The synchronization status for mirrored volumes is recorded on the quorum disk.
Volumes can be taken offline if no quorum disk is available
SVC creates three Quorum disk candidates on the first three managed MDisks
One Quorum disk is active, they other two disks remain as candidates
SVC 6.2.0 and later:
– SVC verifies that the Quorum disk candidates are placed on different storage systems.
– SVC is able to handle the Quorum disk management in a very flexible way, but in a Split
IO group configuration a well defined setup is required.
– Disable the dynamic quorum feature using the “override” flag
● svctask chquorum -MDisk <mdisk_id or name> -override yes
● This flag is currently not configurable in the GUI
Quorum Disk has a ½ vote in the voting set:
– This means that if a set of nodes contains half the SVC nodes and they can see the
quorum disk they will remain up and running
– Conversely if they can see half the nodes and not the quorum disk they will go into
service state
14
© 2012 IBM Corporation
Failure Scenarios
Operational, link to Site B Operational, Link to Site A failed Operational Operational, Site A with Config node
failed continues with operation and take over
load from Site B, Site B stopped
Operational Failed same time as Site C Failed same time as Site B Stopped
Failed same time as Site C Operational Failed same time as Site A Stopped
15
© 2012 IBM Corporation
16
© 2012 IBM Corporation
SVC Failure Domain
17
© 2012 IBM Corporation
18
© 2012 IBM Corporation
Split Cluster without ISLs
19
© 2012 IBM Corporation
Split Clusters without ISLs
SVC 6.3:
– Similar to the support statement in SVC 6.2
– Additional: support for active WDM devices
– Quorum disk requirement similar to Remote Copy (MM/GM) requirments:
● Max. 80 ms Round Trip delay time, 40 ms each direction
● FCIP connectivity supported
● No support for iSCSI storage system
Quorum disk must be listed as “Extended Quorum” in the SVC Supported
Hardware List
Two ports on each SVC node needed to be connected to the “remote” switch
Link speed over long distance must be reduced according to the distance
21
© 2012 IBM Corporation
Long Distance Configuration
SVC Buffer to Buffer credits
– 2145–CF8 / CG8 have 41 B2B credits
● Enough for 10km at 8Gb/sec with 2 KB payload
– All earlier models:
● Use 1/2/4 Gbps fibre channel adapters
● Have 8 B2B credits, which is enough for 4km at 4Gb/sec
Recommendation 1:
– Use CF8 / CG8 nodes for more than 4km distance for best performance
SAN switches don’t auto-negotiate Buffer to Buffer credits, 8 B2B credits is
default for most SAN switches
Recommendation 2:
– Change the Buffer to Buffer credits in your switch to 41 as well
SAN A SAN A
SAN B SAN B
Site C
23
© 2012 IBM Corporation
Using WDM Devices
Site A Site B
CWDM CWDM
SAN A SAN A
SAN B SAN B
Site C
24
© 2012 IBM Corporation
With ISLs between sites
25
© 2012 IBM Corporation
With ISLs with separate switches
Site A Site B
PRIV A PRIV A
PRIV B PRIV B
PUB A PUB A
PUB B PUB B
Site C
Site C
PERFORMANCE OPTIMIZATION
28
© 2012 IBM Corporation
Latency Considerations
29
© 2012 IBM Corporation
Metro Mirror
Application Latency = 1 long distance round trip
4 3 2 1
4. Metro Mirror Data transfer to remote site
5. Acknowledgment
4
SVC cluster 1 SVC cluster 2
5
1 round trip
10a 9a 8a 7a 7. Write request from SVC 10b 9b 8b 7b
8. Xfer ready to SVC
9. Data transfer from SVC
10. Write completed to SVC
4 3 2 1
4. Cache Mirror Data transfer to remote site
5. Acknowledgment
4
5
1 round trip
SVC stretch cluster
2 round trips - but SVC write cache
10a 9a 8a 7a 7b
hides this latency from the host
8b
9b
10b
Stretch Cluster is also often used to move workload between servers at different
sites
– VMotion or equivalent can be used to move Applications between servers
– Applications no longer necessarily issue I/O requests to the local SVC nodes
SCSI Write commands from hosts to remote SVC nodes results in an additional 2
round trips worth of latency that is visible to the Application
Some switches and distance extenders use extra buffers and proprietary
protocols to eliminate one of the round trips worth of latency for SCSI Write
commands
– These devices are already supported for use with SVC
– No benefit or impact to the inter-node communication
– Does benefit Host to remote SVC I/Os
– Does benefit SVC to remote Storage Controller I/Os
32
© 2012 IBM Corporation
Split Cluster writes
Remote I/O: Application Latency = 3 round trips
When writes are performed to a volume with 2 copies writes are destaged to
both copies
– Backend latencies are added regardless of the SVC node that performs write to disks
When reads are performed to a volume one of two things can happen
1. Read data is in cache and returned from cache
2. SVC node needs to collect the data
● This is done from the primary volume copy as marked in the SVC
Try to ensure that primary copy of data is on the same site as application
– When moving an application switch primary copy on the SVC
– Not always possible
● Same volume may be used by applications on both sites
● Features such as Storage vMotion may be helpful
35
© 2012 IBM Corporation
Split Cluster reads
Local I/O: Application Latency = 0 round trips
4 1
3 2
4 1
2 1 round trip
3
37
Steps 2 and 3 are skipped if data is in SVC node cache
© 2012 IBM Corporation
Split Cluster read
Remote I/O: Application Latency = 2 round trips
1 1 round trip
2 1 round trip
3
38
Steps 2 and 3 are skipped if data is in SVC node cache
© 2012 IBM Corporation
Split I/O Group – Disaster Recovery
39
© 2012 IBM Corporation
Summary
40
41