Using AWS for Backup and Restore
Backup in the cloud, Backup to the cloud, and Recovery options
Neel Mitra, Solutions Architect
Sep 7, 2017
Backup and recovery before the cloud
Tape storage
Application Media Data bunker
servers server
Local disk
Backup challenges in today’s age
• IDC estimates the volume of digital data
will grow 40% to 50% per year. By
2020, IDC predicts the number will have
reached 40,000 EB, or 40 Zettabytes
(ZB).
a
• The world’s information is doubling
Primary Storage every two years.
Primary Storage
Amazon EFS
• Primary Storage provides file, block and object storage targets. Targets can either be extensions into the on-
premise environment or a pure cloud implementation.
• Primary storage provides first level storage of data to customer workloads
• Storage for a variety of customer workloads
• File distribution services
• Gateway for IP storage protocols
• Replication of storage via native replication mechanisms
Backup and Recovery
Backup and recovery use cases protect data from logical errors such as system failure,
application error, or accidental deletion. Backups can be run on-premise to the cloud,
either directly to a cloud target or via a gateway appliance, or within the cloud.
Backup is not archive
• Backup represents a point in time copy of the data.
• Archived data is the only authoritative copy of the data.
Archive
• The Archive use case allows the migration of important, but infrequently used data to storage
devices of the appropriate cost and resiliency. Frees existing “primary” storage for new or
frequently accessed data, achieving both a potential cost and performance advantage for the
customer.
• Archives move data between different classifications of storage
• Archive is not backup/recovery
– Backup represents a point in time copy of the data. There may be many copies of the data depending on the number of backups
that have been completed.
Backup vs. Archive
Backup Archive
Number of copies for one piece of
data
Many 1
Growth of the repository over time Exponential Linear
Contains “the” copy of data? No Yes
Point in time copy of data? Yes Yes
Select individual pieces of data
based on policy
Not really Yes
Backups held for “long” periods of
time
Remember:
• A Backup makes a copy of the data and keeps as many copies as needed.
• An Archive moves data between different classifications of storage but does not make any copies.
• Long term backups make life hard for business…
• Costly
• Hard to track
Backup to the Cloud & Recovery
What should I use and when?
Amazon S3 Amazon Glacier AWS Storage AWS Snowball
Durable object storage Archival storage Gateway Petabyte-scale data
for all types of data for infrequently Hybrid Storage service transport solution
accessed data
Economics Easy to use Reduce risk Agility, scale
Pay as you go Self service administration Durable and secure Reduce time to market
No upfront investment SDKs for simple Avoid risks of physical Focus on your business,
integration media handling not your infrastructure
No commitment
No risky capacity
planning
Backup and recovery to the cloud
Cloud Connector
Local disk
Internet Amazon S3
Application Media server
servers with cloud
connector Amazon S3-IA
Amazon
AWS
Cloud Gateway Direct Glacier
Connect
Cloud Gateway
Application Media
servers server
Local disk
Cloud Connector
1. Direct Amazon S3 / Glacier API/SDK
2. Amazon S3 lifecycle integration
3. Third-party tools and gateways
These are only a few examples of APN Technology partners with S3 connectors
Cloud Gateways for Backup
Customer premises
AWS Storage AWS
Gateway Storage Gateway Amazon S3
Internet
back-end
Gateway
Amazon
appliances
Glacier
Application AWS
server Direct Altavault Amazon EBS
Connect appliance in snapshots
EC2
AWS Storage Gateway
Hybrid Storage Service
Storage Gateway for Backup & Recovery
• Deploy appliance ‘locally’ as a VM
Amazon S3 Amazon in ESX, Hyper-V or EC2
Glacier
• Connect to local applications,
including backup servers
Storage Amazon EBS
snapshots • Backup & archive in S3, Glacier
Gateway
• Backup volumes as EBS Snapshots
• Restore on-premises or in the cloud
Virtual Tapes Archived Tapes
stored in
Amazon S3
stored in
Amazon Glacier • Works with major backup vendors
Backing up to AWS via Storage Gateway
3 options to write on-premises backups to AWS
Customer Premises or EC2 Customer Bucket
File S3 S3-IA Glacier
Standard
Gateway
Customer Environment
iSCSI S3
Backup Volume Volume Gateway EBS Snapshots
Server Gateway
Tape S3 Glacier
Gateway Tape Gateway VTL
File Gateway
Customer Premises
NFS
HTTPS
v3 / v4.1
Application File S3 S3 Glacier
Server Gateway Standard Standard -
Infrequent
Access
Data (including backups) stored and retrieved from your S3 buckets
1-1 mapping from files-to-objects
File metadata stored in object metadata
Bucket access managed by IAM role you own and manage
Use S3 Lifecycle Policies, versioning, or CRR to manage data
Volume Gateway
On-premises volume storage backed by Amazon S3 with EBS snapshots
Customer Premises
iSCSI HTTPS
Application Volume Storage Gateway Amazon
Server Gateway bucket in EBS
Amazon S3
snapshots
Block storage in S3 accessed via the volume gateway
Compression of data in-transit and at-rest
Backup on-premises volumes to EBS snapshots
Create on-premises volumes from EBS snapshots
Up to 1PB of total volume storage per gateway
Can be used by backup apps, e.g. Veeam, to write to AWS and recover in EC2
Volume Gateway
GATEWAY-CACHED
Customer Data Center
AWS Storage
INITIATOR
iSCSI Gateway VM
TARGET
INITIATOR HTTPS
iSCSI
Client
AWS Volume Amazon EBS
Storage storage backed snapshots
Gateway service by Amazon S3
Upload
Application buffer
Cache
servers storage
Users
Volume Gateway
GATEWAY-STORED
Customer data center
AWS Storage
INITIATOR
iSCSI Gateway VM
TARGET
INITIATOR
iSCSI
Client
AWS Storage Amazon EBS
Gateway service snapshots
Upload
Application buffer
Volume
servers volume
storage
Users
Tape Gateway
Virtual tape storage in Amazon S3 and Glacier with VTL management
Customer Premises
DRIVE CHANGER
MEDIA
iSCSI HTTPS
Tape Virtual Tapes Archived Tapes
TAPE
Backup stored in stored in
Server Gateway Amazon S3 Amazon Glacier
Virtual tape storage in S3 and Glacier accessed via tape gateway
Compressed of data in-transit and at-rest
Up to 1 PB total tape storage per gateway, unlimited archive capacity
Supports leading backup applications
3-5 hour retrieval of virtual tapes from Glacier
Backup, archive, and disaster recovery
Cost effective storage in AWS with local or cloud restore
“Tapes are a headache. AWS Storage Gateway
provided the most cost-effective and simple alternative.
We switched from physical to virtual tape backup simply by dropping the
gateway’s virtual appliance into our existing Veeam workflow.
Setting it all up took three hours, at most.
We even got disaster recovery by using a bi-coastal data center.”
-Jesse Martinich, Network Services Manager, SOU
AWS Snowball
Petabyte-scale data transport solution
What is AWS Snowball?
Petabyte-scale data transport
Ruggedized case
“8.5G impact”
80 TB
10 GE network
Rain- and dust-
resistant
Tamper-resistant
case and
electronics
All data encrypted
end-to-end
E-ink shipping
label
How it works
How fast is Snowball?
Less than 1 day to transfer 250 TB through 5 x 10G connections with 5
Snowballs, less than 1 week, including shipping
Number of days to transfer 250 TB through the Internet at typical utilizations
Internet connection speed
Utilization 1 Gbps 500 Mbps 300 Mbps 150 Mbps
25% 95 190 316 632
50% 47 95 158 316
75% 32 63 105 211
Customer Use Case: Backup and Archive with
Snowball
rawdata1
rawdata2
rawdata3 Archive after Delete after
30 days 7 years
My S3 bucket Amazon Glacier
PetroBank Archive Service Migrated from Tape to Cloud
Cost effective storage in AWS with local data access
Self service loading of data
Reduced time-to-data by days or weeks
Cut storage archive costs by 90%
AWS Lambda
automated functions,
including inventory
AWS Snowball
initial bulk transfer
PetroBank
application
File Gateway Amazon S3 Amazon S3 Amazon
servers
continuous file AWS Direct Standard Infrequent Access Glacier
access & upload, Connect
with local cache Lifecycle policies migrate data
across storage tiers
Backup in the cloud
What should I use and when?
Amazon EBS Amazon S3 Amazon Glacier Amazon EFS
Block storage for use Durable object storage Archival storage File storage for use
with Amazon EC2 for all types of data for infrequently with Amazon EC2
accessed data
Reduce risk Economics Easy to use Agility, scale
Durable and secure Self service administration Reduce time to market
Pay as you go
Avoid risks of physical SDKs for simple Focus on your business,
No upfront investment
media handling integration not your infrastructure
No commitment
No risky capacity
planning
Amazon EBS
Block storage for use with Amazon EC2
Amazon EBS Lifecycle
AWS Cloud
EC2 Availability Zone Amazon S3
Create Snapshot
EBS EBS EBS EBS EBS EBS
EBS Snapshot
EBS Snapshot
Clone From
Snapshot EBS Snapshot
EC2 EC2 EC2 EBS Snapshot
EBS Snapshot
Internet
How Do Snapshots Work?
Time
Snapshot 1 Snapshot 2 Snapshot 3
S3
EBS Volume
Block 11
Chunk
Block 22
Chunk
Block 33
Chunk
Block 44
Chunk
Benefits of using EBS snapshots
More durable than an EBS volume
• Stored in Amazon S3
Incremental (space-efficient)
• First snapshot is a clone
• Pay only for what you use
Availability Zone-independent
• Clone into any AZ
Can be copied efficiently across regions
AWS Database Backups
RDS for MySQL, PostgreSQL, MariaDB, Oracle, SQL Server
• Scheduled daily backup of entire instance
• Archive database change logs
• 35 day retention for backups
• Multiple copies in each AZ where you have instances for a deployment
Aurora
• Automatic, continuous, incremental backups
• Point-in-time restore
• No impact on database performance
• 35 day retention
DIY on EC2
• Engine specific (RMAN, BAK)
• Third party (GoldenGate, Commvault)
Amazon S3
Durable object storage for all types of data
Amazon S3 Lifecycle
Use Amazon S3
for reliable, durable
primary storage
S3-IA
Use Amazon S3 Use Amazon Glacier
Infrequent Access for lowest-cost, durable cold
Storage storage of archival data
for secondary backups
at a lower cost
S3 lifecycle policies
Key prefix “logs/”
Transition objects to Amazon Glacier 30 days after
creation
Delete 365 days after creation date
<LifecycleConfiguration>
<Rule>
<ID>archive-in-30-days</ID>
<Prefix>logs/</Prefix>
<Status>Enabled</Status>
<Transition>
<Days>30</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
<Expiration>
<Days>365</Days>
</Expiration>
</Rule>
</LifecycleConfiguration
Cross-region replication: Details
Replication status Access control Cost Delete operation
HEAD operation on a source Object ACL updates are • Usual charges for DELETE without object
object to determine replication replicated storage, requests, and version ID
status • Objects with Amazon inter-region data transfer • Marker replicated
managed encryption key for the replicated copy of
• Replicated objects will not be replicated data DELETE specific object
re-replicated • KMS encryption not version ID
replicated • Replicate into Standard-IA • Marker NOT replicated
• Use Amazon S3 COPY to or Amazon Glacier
replicate existing objects
Versioning with cross-region replication
Vid1- v4
Vid1- v3
Vid1- v2 Vid1- v2
A
Vid1- v1 Vid1- v1
Key: A/vid1 Key: B/vid1
Why Amazon Web Services
Druva runs inSync Cloud on AWS using
Amazon Elastic Compute Cloud
(Amazon EC2) for compute, Amazon
Elastic Block Store (Amazon EBS) for
storage volume, Amazon Relational
Database Service (Amazon RDS) for
configuration management, and Amazon
Simple Storage Service (Amazon S3) for
storage.
Amazon Glacier
Archival storage for infrequently accessed data
Amazon Glacier Lifecycle
1 Create vault 3 Upload archives
UploadArchive(data) ->
Archive ID
2 Configure access policies
User policy
Effect:Allow
Resource:
arn:aws:glacier:<accountId>:vaults
Action: glacier:UploadArchive
Using vault lock policy with vault access policy
Compliance/Governance Flexibility
Vault lock policy Vault access policy
• Lockable/Immutable policy • Can be updated/deleted
• Cannot be updated/deleted
after lockdown
Use vault lock policy to: Use vault access policy to:
• Deploy regulatory controls such • Designate third-party access
as records retention
• Enforce data access through • Grant temporary read
multi-factor authentication only permissions when necessary
Vault lock best practices
• Map one vault to a single retention range
– Group regulatory data by retention: 1-year vault, 6-year vault, etc.
• Create new vault and lock it before storing production data
– Enforce the full ArchiveAgeInDays on all new archives
– Leave no “gap” on existing archives
• Thoroughly test a vault lock policy before locking it down (Abort/Initiate)
• Implement only the most restrictive controls with vault lock
– Leave the flexible controls to vault access policy
Amazon Glacier received a third-party assessment
from Cohasset Associates on how Amazon Glacier
with Vault Lock can be used to meet the
requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c).
SoundCloud—leveraging Glacier for audio
transcoding
• World’s leading social sound
platform
• Audio files must be transcoded and
stored in multiple formats
S3
Glacier
Amazon EFS
File storage for use with Amazon EC2
Amazon EFS Backup
• Automated EFS backups based on a
schedule that you define (for example,
hourly, daily, weekly, or monthly)
• Automated rotation of the backups,
where the oldest backup is replaced
with the newest backup based on the
number of backups that you want to
retain
Amazon EFS Restore
• Restore a backup copy of an Amazon
EFS file system
• Restores can be done in parallel to
meet the recovery time objective
• Restore individual files from EFS
Backups
Why EFS for Database Backup
Can be used with native backup commands
- ie. dump, RMAN, “hot-backup” mode
Copy is stored to another storage target for availability
- production copy runs on EBS
- backup copy is on EFS
Can be managed by the database administrators
- to meet their specific recovery points
- easy to restore online
High performance network shares provide for fast recovery vs. tape
Saves licensing costs and workload from traditional backup software
The Arcesium platform leverages Amazon EFS for shared data storage
between applications and for storing and analyzing operational data.
“Arcesium is a financial services SaaS platform that requires resilient,
secure, and scalable file storage. Amazon EFS offers us a powerful
way to operate and scale file storage for our Amazon EC2 instances,
which has allowed us to build out our platform quickly without
compromising quality.”
-- Gaurav Suri, CEO
“We are growing by leaps and bounds, and our core offering is all about better
support delivery. During the course of developing our next-generation internal
support system, we never wanted to worry about scale again, yet we had
existing architectural commitments that meant a distributed file solution was
required. Atlassian chose Amazon EFS because it was the only option
available that scaled both capacity and performance – without the up-front
payments or the management overhead of traditional models. This allows our
support teams to focus on what matters most - helping our customers.”
- Sri Viswanath, CTO
Customer References
Public Sector – King County
• Most populous county in Washington State
• Replace tape solution for backup from 17 agencies
• Meet compliance requirement
• Saved $1MM in first year, no more tape refresh or
management churn
https://aws.amazon.com/solutions/case-studies/king-county/
China Expansion – iQIYI
• 2nd largest Online Video Service – 100MM+ monthly viewers
• Self managed Swift cluster out of capacity
• 5PB media assets/stats, secondary back up on Glacier
https://aws.amazon.com/cn/solutions/case-studies/iqiyi/
AWS External Resources
• AWS Storage Solution Pages
– Backup, Archive and Disaster Recovery
• AWS Storage Competency and Storage Test Drives
– AWS Storage Competency
– APN Partner-provided labs
• AWS Marketplace Storage for in-cloud use cases
– AWS Online Software Store
https://aws.amazon.com/training
• Select Partner Microsites – additional in plan
– www.netapp.com/aws
– www.commvault.com/aws
– www.averesystems.com/aws
Thank you!