Unstructured Data in the
Cloud with ECS
Mikhail Vladimirov
Senior Sales Engineer
GLOBAL SPONSORS
How to build modern
digital Archive (WEB 3.0)
Five reasons to choose ECS
Agenda
Some Definitions
Market Factors Driving Adoption
Trends in Archiving
DellEMC ECS Storage
Summary
Some Definitions
Latin archīum or archīvum, which is the romanized form of the Greek ἀρχεῖον
(arkheion), "public records, town-hall, residence, or office of chief magistrates",itself
from ἀρχή (arkhē), amongst others "magistracy, office, government, which comes from
the verb ἄρχω (arkhō), "to begin, rule, govern”.
https://en.wikipedia.org/wiki/Archive
Backup vs Archive
Short-term Insurance Policy Long-term Insurance Policy
Kept For Weeks Or Months - Overwritten On A Kept For Years, Decades, or Forever
Regular Basis
Primary Copy - Information Retrieval,
Secondary Copy - Operational Recovery Compliance, and eDiscovery
Altered State - Restore Required Native State - Immediately Available
Disaster Recovery / Business Continuance WORM Used For Compliance Enforcement
Types Of Archiving
In-place (Passive Archive) Archive via Backup Active Archive Long Term Preservation Archive
Primary Storage Secondary Storage Long-term Storage
Hot Access times and dollar per gigabyte Cold
In-Place Archive Archive via Backup Active Archive Long Term Archive
• Do Nothing • Inexpensive • Fast access to data • Data resides for decades
• Wastes IT resources • May not be indexed • Non-intrusive • Fast growing data sets
• Management issues • In altered state • Data can be retained • In-place analytics
for years • Long tail data monetization
• Regulatory challenges • Might be offsite • Regulatory compliant • Regulatory compliant
• Regulatory challenges • In-place data analytics • Often Object/Cloud
• Often file based
What Is Driving Archiving Adoption?
Total Capacity Shipped, Worldwide
% of Unstructured Data
80%
74%
67%
2013 2015 2017
37 EB 71 EB 133 EB
Source: IDC
Because Most Of This Data Is Static
ACTIVE
60-80% (IN PROGRESS)
OF DATA IN OPERATIONAL INACTIVE
APPLICATIONS (FINAL FORM)
IS INACTIVE
Meaning The Probability Of Access And
Value Decreases Quickly
100%
• The value of the
data and likelihood
Probability of Data Retrieval
of access goes
down over time Analytics Value Regulatory &
Litigation Value
• But a litigation or Regulatory &
Litigation Value Long tail Data
other event can Monetization
instantly increase
the value of the data Analytics Value
• High availability and
fast access can be
0%
critical
0-30 days 31-90 days 4-12 months 1 year +
The Result: Unprecedented Storage Costs
• Primary Storage is quickly consumed driving more purchases
• Backups can no longer be accomplished in window driving more capex
and opex
• Power, cooling and other facility costs increase
• Human resources costs increase
• Value of data is marginalized
• Companies may put themselves at regulatory risk
The Solution: Intelligent Archiving
Rapid
Higher Data Intelligent Archiving Breaks the Cycle
Management Growth
Costs
Reclaim expensive existing primary storage
$$$$$ Reduce on-going primary storage acquisition costs
Exhaust
More Power,
Available
Cooling, Get static data out of the recurring backup process
Capacity
Space
Reduce backup storage acquisition costs
More Reduce management and operating costs
Backup Buy
Complexity More
& Cost Disk
Analyze data in place on cost effective storage
The Vicious Storage Cycle
Three Trends In Archiving
• Data is coming from more sources that ever before
– Archives must be designed for flexibility
• The era of big data analytics increased value of archived
data
– The archive should be capable of in-place data analytics
• Archives are moving to object clouds to deal with massive
data sets
– The archive data should be geo-distributed/geo-accessible
ECS Supports The Trends
Solving Archive Challenges At Geo-scale
• Modern Hyper-scale Cloud Architecture
– Scales from Petabytes to Exabytes
• Archive data from all sources
ATMOS – S3, Swift, Atmos and Centera CAS Object APIs
– HDFS compatible with Cloudera, Hortonworks, Pivotal etc.
• Break down barriers with Geo-scale data
access
• Innovation to enable scalability,
efficiency and serviceability!
ECS Solutions
For Archive
EMC Elastic Cloud Storage (ECS) - Hardware
U-Series
Available in multiple capacities within a rack
COTS – x86 servers, JBOD DAS, 10GbE
connectivity, SATA/SAS Disks
Max 60 disks per DAE per node
D-Series
Denser Model
Minimum of eight x86 servers
Max 98 disks per DAE per node.
Hyper-scale
Aggressive seeking lowest $/GB
ECS Software Certified on Dell Servers
Dell DSS7000
EMC
Dell R730xd 13G
Customers
Minimum 5 servers
Object Access
Namespace / Bucket|Container
• Namespace can span multiple instances of physical hardware, and data
management functions like data replication and data distribution at object-level
granularityy.
• Instead of organizing files in a directory hierarchy, object storage systems store files
in a flat organization of containers/buckets.
ECS Data
SYSTEM META-DATA CUSTOMER META-DATA
DATA
• Identifiers and
descriptors
• Client=DellEMC
• Encryption keys in
encrypted format
• Internal flags • Event=DellEMC Forum
• Location information
• Timestamps • ID=123
• Configuration/tenancy
information
Access Protocols
• Representational State Transfer Protocol APIs
– Comibnation of HTTP methods
– Amazon S3
– Openstack Swift
– Atmos
– Content Adress Storage (CAS)
ECS Supports A Wide Variety Of Workloads
Traditional/“Platform 2” Cloud Native/“Platform 3”
Tiered Cloud Backup Sync Cloud Cloud Native Apps IoT Analytics
Archive & Share Gateway (web/mobile)
SITE 1 SITE 3
SITE 2
Scale Effortlessly - Store Efficiently - Access Globally
Data Archives - Complete Cloud Storage Platform
• Lower cost than public cloud
• Unmatched combination of storage efficiency and
data access
• Anywhere read/write access with strong consistency
simplifies finding and using archived assets
• No single points of failure increases availability
and performance
• Universal accessibility eliminates storage silos and
inefficient archiving processes
• Comprehensive data types satisfy the broadest range
of application needs
ECS Storage engine
unique capabilities
Multi-Protocol Support
Access the same data from any access method
• CAS: SDK v3.1.54 • Primary file system
or later support for with native Ambari
upgrade 2.2 integration
• Swift: byte range XXXXX
update within an • Byte range
object updates
• Retentions • Retentions
• Keystone • Metadata search
integration: drop-in • NFS v3 extension
replacement for • Global namespace
OpenStack Swift with global locking
23
CloudBoost to ECS
ENABLING LONG TERM RETENTION FOR BACKUP DATA
ROBO CENTRAL DATA
DOMAIN
DEDUPE
A B C D
DAS Isilon C D B A
CLOUDARRAY CLOUDPOOLS A B C D
CLOUDBOOST
S3 S3
S3
CloudBoost
Long Term Retention
Elastic Cloud
Storage
Compliance ready
Features
Retention policy management
Retention enforcement
Data immutability
Advanced Rétention Management
Access Locks
Lock/unlock user or bucket
Key Benefits
Meet storage requirements for
SEC 17a-4(f)
CFTC 1.31(b)-(c)
NF Z 42-013
Geo Capabilities
High durability with a low overhead
Read
Read
Cache the
from
Replicate
the
obj1 A
A
(1 MB)
XOR reduction
Object
the
the cache
128 128
MB from
MB
chunk
Site-1
container A B
B
C
PUT obj1 (1 MB)
GET obj1
B
PUT obj2
26
Storage Efficiency
High performance and low overhead for both small and large objects
Requests 2MB Buffered Writer
Store Delete
objects in A A original
chunks on chunks
disks
A Add parity
In memory
27
Metadata search
• Remove the cost and complexity of external databases
Search objects using GPS coordinates, image resolution, …
/?query=x-amz-meta-image-gps-latitude>50&…
Save objects with metadata
28
Native NFS support
Features
Native NFS v3 capability
Rich ACLs
Global namespace
Global locking
Multi-protocol access object, NFS and HDFS
Key Benefits
Ingest data in native format
Requires no change on the application level, accelerating
the move to an object platform
Free CIFS-ECS
software Features
S3 API
Caching
Multipart upload and download
Retention & versioning
ACL translation
Client side load balancing
Key Benefits
Ingest data in native format
Requires no change on the application level, accelerating the move
to an object platform
Five reasons to choose ECS
Simple
Powerful
Efficient
Scalable
Support for various data types across the enterprise
• EMC ECS cloud storage platform combines the
cost advantages of commodity infrastructure
with high reliability, availability and
serviceability.
Thank you!