Data Domain Overview
September 4, 2014 Paras EMC Data Domain
EMC Data Domain
Systems:
EMC Data Domain storage systems are traditionally used for disk
backup, archiving, and disaster recovery.
EMC Data Domain system can also be used for online storage with
additional features and benefits.
A Data Domain system can connect to your network via Ethernet or
Fibre Channel connections.
Data Domain systems use low-cost Serial Advanced Technology
Attachment (SATA) disk drives and implement a redundant array of
independent disks (RAID) 6 in the software. RAID 6 is block-level
striping with double distributed parity.
Note: Data Domain uses only RAID 6 no other raids are possible.
Most Data Domain systems have a controller and multiple storage
units.
Hardware Overview:
Data Domain Models available:
Data Domain hardware consists of Controller and Disk Array
Enclosure.
I will be explaining Data Domain 990 model hardware overview in this
blog.
Hardware overview:
Two components: a. Controller b. Disk Shelf
Data Domain components in chassis:
1.
Quad-socket, 10-core Xeon processors (Westmere-EX)
2.
Two memory configurations available
3.
Base: 128 GB supports up to 360 TB raw, 285 TB usable
4.
Expanded: 256 GB supports up to 720 TB raw, 570 TB usable
5.
External expansion using ES30 and ES20 shelves
6.
Three quad-port 6 Gb/s SAS HBAs for external connectivity
7.
Connectivity up to 24 shelves, or up to max capacity
8.
Four I/O slots for data access connectivity
9.
Up to four dual-port 1 GbE NICs, optical
10.
Up to four quad-port 1 GbE NICs, copper
11.
Up to three dual-port 10 GbE NICs, copper with SFP+ interface
12.
Up to three dual-port 10 GbE NICs, optical with LC interface
13.
Up to three dual-port 8 Gb Fibre Channel VTL HBAs
14.
Two 2 GB remote-battery NVRAM with Battery Backup Unit
Two types of configuration are available in DD990. One DD990 with
128 GB RAM and second one is DD990 with 256 GB.
DD990 chassis enclosure View:
Controller Module Front and Back panel View.
Controller Front panel View
Controller Back Panel View
Disk Shelf Front View:
Disk Shelf Back View:
Software overview:
Overview:
Support for leading backup, file archiving, and email archiving
applications
Simultaneous use of VTL, CIFS, NFS, NDMP, and EMC Data
Domain Boost
Inline write/read verification, continuous fault detection, and
healing
Conformance with IT governance and regulatory compliance
standards for archived data
Software components: Data Domain Operating system
Data Domain Inline Deduplication:
Data domain follows Inline deduplication below mentioned is the
process occurs during Inline deduplication.
1.
2.
3.
Inbound segments are analyzed in RAM.
If a segment is redundant, a reference to the stored segment is
created.
If a segment is unique, it is compressed and stored.
Inline deduplication requires less disk space than post-process
deduplication. There is less administration for an inline deduplication
process, as the administrator does not need to define and monitor the
staging space. Inline deduplication analyzes the data in RAM, and
reduces disk seek times to determine if the new data must be stored
EMC Global and Local Compression:
Global Compression:
EMC Data Domain Global Compression is the EMC Data Domain
trademarked name for global compression, local compression, and
deduplication.
Global compression equals deduplication. It identifies previously
stored segments and cannot be turned off.
Local Compression:
Local compression compresses segments before writing them to disk.
It uses common, industry-standard algorithms (for example, lz, gz,
and gzfast). The default compression algorithm used by Data Domain
systems is lz.
Local compression is similar to zipping a file to reduce the file size. Zip
is a file format used for data compression and archiving. A zip file
contains one or more files that have been compressed, to reduce file
size, or stored as is. The zip file format permits a number of
compression algorithms. Local compression can be turned off.
EMC Data Domain SISL Scaling Architecture:
SISL architecture helps to speed up Data Domain systems.
SISL does the following:
1.Segment The data is broken into variable-length segments.
2.Fingerprint Each segment is given a fingerprint, or hash, for
identification.
3.Filter The summary vector and segment locality techniques
identify 99% of the duplicate segments in RAM, inline, before storing
to disk. If a segment is a duplicate, it is referenced and discarded. If a
segment is new, the data moves on to step 4.
4.Compress New segments are grouped and compressed using
common algorithms: lz, gz, gzfast (lz by default).
5.Write Writes data (segments, fingerprints, metadata and logs) to
containers, and containers are written to disk.
EMC Data Domain Data Invulnerability Architecture (DIA):
The EMC Data Domain operating system (DD OS) is built for data
protection. Its elements comprise an architectural design whose goal is
data invulnerability. Four technologies within the DIA fight data loss:
End-to-end verification
1.
Fault avoidance and containment
2.
Continuous fault detection and healing
3.
File system recoverability
Now lets discuss on above technologies
1. End to End verification:
Steps involved in End to End Verification:
1.
Write request comes from backup software.
2.
Analyze the Data for redundancy.
3.
Store New Data Segments only.
4.
Store fingerprints and verify.
5.
6.
Verify after Backup that DD OS can read the data from disk
through Data domain File system.
Verify that checksum are correct.
2. Fault avoidance and containment
Data Domain systems are equipped with a specialized log-structured
file system that has below features.
1.
New data never overwrites existing data.
2.
Fewer complex data structures.
3.
System includes non-volatile RAM (NVRAM) for fast, safe
restart.
3. Continuous fault detection and healing
Continuous fault detection and healing provide an extra level of
protection within the Data Domain operating system. The DD OS
detects faults and recovers from them continuously. Continuous fault
detection and healing ensures successful data restore operations.
Continuous fault detection and healing process:
1. The Data Domain system periodically rechecks the integrity of the
RAID stripes and container logs.
2. The Data Domain system uses RAID system redundancy to heal
faults. RAID 6 is the foundation for Data Domain systems continuous
fault detection and healing. Its dual-parity architecture offers
advantages over conventional architectures, including RAID 1
(mirroring), RAID 3, RAID 4 or RAID 5 single-parity approaches.
RAID 6:
Protects against two disk failures.
Protects against disk read errors during reconstruction.
Protects against the operator pulling the wrong disk.
Guarantees RAID stripe consistency even during power failure without
reliance on NVRAM or an uninterruptable power supply (UPS).
3. During every read, data integrity is re-verified.
4. Any errors are healed as they are encountered.
4. File system recoverability
File system recovery is a feature that reconstructs lost or corrupted
file system metadata.
In Data Domain file systems data is written in a self-describing format
the file system can be recreated by scanning the logs and rebuilding it
from metadata stored with the data.
Why to Use Data Domain system?
Data Domain has below advantages
1.
Data Deduplication
2.
Easy Integration
3.
Network Efficient Replication
4.
Safe and reliable