KEMBAR78
Deep Dive Dynamo DB | PDF | Software Engineering | Computer Data
0% found this document useful (0 votes)
26 views88 pages

Deep Dive Dynamo DB

The document provides an in-depth overview of Amazon DynamoDB, covering its architecture, including tables, partitioning, and indexing, as well as scaling and data modeling best practices. It emphasizes the importance of key choice and uniform access patterns for optimizing throughput and discusses various scenarios for using DynamoDB effectively, such as event logging and real-time voting. Additionally, it highlights the integration of DynamoDB Streams and AWS Lambda for event-driven programming.

Uploaded by

Vikram Simha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views88 pages

Deep Dive Dynamo DB

The document provides an in-depth overview of Amazon DynamoDB, covering its architecture, including tables, partitioning, and indexing, as well as scaling and data modeling best practices. It emphasizes the importance of key choice and uniform access patterns for optimizing throughput and discusses various scenarios for using DynamoDB effectively, such as event logging and real-time voting. Additionally, it highlights the integration of DynamoDB Streams and AWS Lambda for event-driven programming.

Uploaded by

Vikram Simha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Deep Dive:

Amazon DynamoDB
Richard Westby-Nunn – Technical Account Manager, Enterprise Support

28 June 2017

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda

DynamoDB 101 - Tables, Partitioning


Indexes
Scaling
Data Modeling
DynamoDB Streams
Scenarios and Best Practices
Recent Announcements
Amazon DynamoDB

Fully Managed NoSQL Document or Key-Value Scales to Any Workload

Fast and Consistent Access Control Event Driven Programming


Tables, Partitioning
Table
Table

Items

Attributes

All items for a partition key


Partition Sort ==, <, >, >=, <=
Key Key “begins with”
Mandatory “between”
Key-value access pattern sorted results
Determines data distribution Optional counts
Model 1:N relationships
top/bottom N values
Enables rich query capabilities
paged responses
Partition table
Partition key uniquely identifies an item
Partition key is used for building an unordered hash index
Table can be partitioned for scale

Id = 1 Id = 2 Id = 3
Name = Jim Name = Andy Name = Kim
Dept = Engg Dept = Ops
Hash (1) = 7B Hash (2) = 48 Hash (3) = CD

0000 54 55 Key Space A9 AA FF


Partition-sort key table
Partition key and sort key together uniquely identify an Item
Within unordered partition key-space, data is sorted by the sort key
No limit on the number of items (∞) per partition key
• Except if you have local secondary indexes

Partition 1 Partition 2 Partition 3

00:0 54:∞ 55 A9:∞ AA FF:∞


Customer# = 2 Customer# = 1 Customer# = 3
Order# = 10 Order# = 10 Order# = 10
Item = Pen Item = Toy Item = Book
Customer# = 2 Customer# = 1 Customer# = 3
Order# = 11 Order# = 11 Order# = 11
Item = Shoes Item = Boots Item = Paper

Hash (2) = 48 Hash (1) = 7B Hash (3) = CD


Partitions are three-way replicated

Id = 2 Id = 1 Id = 3
Name = Andy Name = Jim Name = Kim
Dept = Engg Dept = Ops Replica 1

Id = 2 Id = 1 Id = 3
Name = Andy Name = Jim Name = Kim
Dept = Engg Dept = Ops Replica 2

Id = 2 Id = 1 Id = 3
Name = Andy Name = Jim Name = Kim
Dept = Engg Dept = Ops Replica 3

Partition 1 Partition 2 Partition N


Eventually Consistent Reads
Rand(1,3)

Replica 1 Replica 2 Replica 3

Partition A Partition A Partition A


1000 RCUs 1000 RCUs 1000 RCUs
Strongly Consistent Reads
Locate Primary

Replica 1 Replica 2 Replica 3

Partition A Partition A Partition A


1000 RCUs 1000 RCUs 1000 RCUs
Primary
Indexes
Local secondary index (LSI)

Alternate sort key attribute


Index is local to a partition key

A1 A2 A3 A4 A5 10 GB max per partition


Table key, i.e. LSIs limit the #
(partition) (sort)
of sort keys!
A1 A3 A2
KEYS_ONLY
(partition) (sort) (table key)
A1 A4 A2 A3
LSIs
(partition) (sort) (table key) (projected) INCLUDE A3
A1 A5 A2 A3 A4
ALL
(partition) (sort) (table key) (projected) (projected)
Global secondary index (GSI) Online Indexing
Alternate partition (+sort) key
Index is across all table partition keys

A1 A2 A3 A4 A5 RCUs/WCUs
Table (partition) provisioned separately
for GSIs
A2 A1
(part.) (table key) KEYS_ONLY

A5 A4 A1 A3
GSIs INCLUDE A3
(part.) (sort) (table key) (projected)

A4 A5 A1 A2 A3
(part.) (sort) (table key) (projected) (projected) ALL
LSI or GSI?

LSI can be modeled as a GSI


If data size in an item collection > 10 GB, use GSI
If eventual consistency is okay for your scenario, use
GSI!
Scaling
Scaling

Throughput
§ Provision any amount of throughput to a table

Size
§ Add any number of items to a table
- Max item size is 400 KB
- LSIs limit the number of range keys due to 10 GB limit

Scaling is achieved through partitioning


Throughput

Provisioned at the table level


• Write capacity units (WCUs) are measured in 1 KB per second
• Read capacity units (RCUs) are measured in 4 KB per second
• RCUs measure strictly consistent reads
• Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent

RCU WCU
Partitioning Math

Number of Partitions
By Capacity (Total RCU / 3000) + (Total WCU / 1000)

By Size Total Size / 10 GB

Total Partitions CEILING(MAX (Capacity, Size))


Partitioning Example

Table size = 8 GB, RCUs = 5000, WCUs = 500


Number of Partitions
By Capacity (5000 / 3000) + (500 / 1000) = 2.17

By Size 8 / 10 = 0.8

Total Partitions CEILING(MAX (2.17, 0.8)) = 3

RCUs per partition = 5000/3 = 1666.67


RCUs and WCUs are uniformly
WCUs per partition = 500/3 = 166.67
spread across partitions
Data/partition = 10/3 = 3.33 GB
Burst capacity is built-in

Consume saved up capacity


1600
Capacity Units

1200
Burst: 300 seconds
(1200 × 300 = 360k CU)
800

400
“Save up” unused capacity
0
Time

Provisioned Consumed
Burst capacity may not be sufficient

1600
Capacity Units

1200
Burst: 300 seconds
(1200 × 300 = 360k CU)
800
Throttled requests
400

0
Time

Provisioned Consumed Attempted

Don’t completely depend on burst capacity… provision sufficient throughput


What causes throttling?

Non-uniform workloads
• Hot keys/hot partitions
• Very large bursts

Dilution of throughput across partitions caused by mixing


hot data with cold data
• Use a table per time period for storing time series data so WCUs
and RCUs are applied to the hot data set
Example: Key Choice or Uniform Access

Partition

Heat

Time
Getting the most out of DynamoDB throughput

“To get the most out of DynamoDB 1. Key Choice: High key
throughput, create tables where cardinality (“uniqueness”)
the partition key has a large
number of distinct values, and 2. Uniform Access: access is
evenly spread over the key-
values are requested fairly space
uniformly, as randomly as
possible.” 3. Time: requests arrive evenly
spaced in time
—DynamoDB Developer Guide
Example: Time-based Access
Example: Uniform access
Data Modeling

Store data based on how you will access it!


1:1 relationships or key-values

Use a table or GSI with a partition key


Use GetItem or BatchGetItem API

Example: Given a user or email, get attributes


Users Table
Partition key Attributes
UserId = bob Email = bob@gmail.com, JoinDate = 2011-11-15
UserId = fred Email = fred@yahoo.com, JoinDate = 2011-12-01

Users-Email-GSI
Partition key Attributes
Email = bob@gmail.com UserId = bob, JoinDate = 2011-11-15
Email = fred@yahoo.com UserId = fred, JoinDate = 2011-12-01
1:N relationships or parent-children

Use a table or GSI with partition and sort key


Use Query API

Example: Given a device, find all readings


between epoch X, Y

Device-measurements
Part. Key Sort key Attributes
DeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90
DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90
N:M relationships

Use a table and GSI with partition and sort key elements
switched
Use Query API
Example: Given a user, find all games. Or given a game,
find all users.

User-Games-Table Game-Users-GSI
Part. Key Sort key Part. Key Sort key
UserId = bob GameId = Game1 GameId = Game1 UserId = bob
UserId = fred GameId = Game2 GameId = Game2 UserId = fred
UserId = bob GameId = Game3 GameId = Game3 UserId = bob
Documents (JSON) Javascript DynamoDB
string S
Data types (M, L, BOOL, NULL) number N
introduced to support JSON boolean BOOL
Document SDKs
• Simple programming model null NULL
• Conversion to/from JSON array L
• Java, JavaScript, Ruby, .NET
Cannot create an Index on object M
elements of a JSON object stored
in Map
• They need to be modeled as top-
level table attributes to be used in
LSIs and GSIs
Set, Map, and List have no element
limit but depth is 32 levels
Rich expressions

Projection expression
• Query/Get/Scan: ProductReviews.FiveStar[0]
Filter expression
• Query/Scan: #V > :num (#V is a place holder for keyword VIEWS)
Conditional expression
• Put/Update/DeleteItem: attribute_not_exists (#pr.FiveStar)
Update expression
• UpdateItem: set Replies = Replies + :num
DynamoDB Streams
DynamoDB Streams

Stream of updates to a table Highly durable


Asynchronous • Scale with table
Exactly once 24-hour lifetime
Strictly ordered Sub-second latency
• Per item
DynamoDB Streams and AWS Lambda
Scenarios and Best Practices
Event Logging
Storing time series data
Time series tables

Hot data
Events_table_2016_April
Event_id Timestamp Attribute1 …. Attribute N RCUs = 10000
Current table (Partition key) (sort key) WCUs = 10000

Events_table_2016_March
RCUs = 1000
Event_id Timestamp Attribute1 …. Attribute N
(Partition key) (sort key) WCUs = 100

Cold data
Events_table_2016_Feburary
Older tables RCUs = 100
Event_id Timestamp Attribute1 …. Attribute N
(Partition key) (sort key) WCUs = 1
Events_table_2016_January
RCUs = 10
Event_id Timestamp Attribute1 …. Attribute N
(Partition key) (sort key) WCUs = 1

Don’t mix hot and cold data; archive cold data to Amazon S3
DynamoDB TTL

Hot data
Events_table_2016_April
Event_id Timestamp myTTL …. Attribute N RCUs = 10000
Current table (Partition key) (sort key) 1489188093 WCUs = 10000

Cold data
Events_Archive
Event_id Timestamp Attribute1 …. Attribute N RCUs = 100
(Partition key) (sort key) WCUs = 1

Use DynamoDB TTL and Streams to archive


Isolate cold data from hot data

Pre-create daily, weekly, monthly tables


Provision required throughput for current table
Writes go to the current table
Turn off (or reduce) throughput for older tables
OR move items to separate table with TTL

Dealing with time series data


Real-Time Voting
Write-heavy items
Scaling bottlenecks

Voters

Provision 200,000 WCUs

Partition 1 Partition K Partition M Partition N


1000 WCUs 1000 WCUs 1000 WCUs 1000 WCUs

Candidate A Candidate B

Votes Table
Write sharding

Voter

UpdateItem: “CandidateA_” + rand(0, 10)


ADD 1 to Votes

Candidate A_7 Candidate B_4 Candidate B_8


Candidate A_1 Candidate A_4
Candidate B_5
Candidate B_1
Candidate A_5 Candidate B_3 Candidate B_7
Candidate A_3
Candidate A_2

Candidate A_6 Candidate A_8 Votes Table Candidate B_2 Candidate B_6
Shard aggregation

2. Store Voter
Periodic
Process
1. Sum

Candidate A_7 Candidate B_4 Candidate B_8


Candidate A_4 Candidate A
Candidate A_1
Total: 2.5M
Candidate B_5
Candidate B_1
Candidate A_5 Candidate B_3 Candidate B_7
Candidate A_3
Candidate A_2

Candidate A_6 Candidate A_8 Votes Table Candidate B_2 Candidate B_6
Shard write-heavy partition keys
Trade off read cost for write scalability
Consider throughput per partition key and per partition

Your write workload is not horizontally


scalable
Product Catalog
Popular items (read)
Scaling bottlenecks SELECT Id, Description, ...
FROM ProductCatalog
100,000$%&
WHERE Id="POPULAR_PRODUCT"
≈ 2333 456 789 7:9;<;<=>
50()*+,+,-./
Shoppers

Partition 1 Partition K Partition M Partition 50


2000 RCUs 2000 RCUs 2000 RCUs 2000 RCU

Product A Product B

ProductCatalog Table
Request Distribution Per Partition Key

Requests Per Second

Item Primary Key

DynamoDB Requests
SELECT Id, Description, ...
FROM ProductCatalog
WHERE Id="POPULAR_PRODUCT"
User User

DynamoDB

Partition 1 Partition 2

ProductCatalog Table
Request Distribution Per Partition Key

Requests Per Second

Item Primary Key

DynamoDB Requests Cache Hits


Amazon DynamoDB Accelerator (DAX)

Extreme Performance Highly Scalable Fully Managed

Ease of Use Flexible Secure


Multiplayer Online Gaming
Query filters vs.
composite key indexes
Multivalue sorts and filters

Partition key Sort key Bob

Secondary index

Opponent Date GameId Status Host


Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
Approach 1: Query filter
SELECT * FROM Game
WHERE Opponent='Bob' Bob
ORDER BY Date DESC
FILTER ON Status='PENDING'
Secondary Index

Opponent Date GameId Status Host


Alice 2016-10-02 d9bl3 DONE David
Carol 2016-10-08 o2pnb IN_PROGRESS Bob
Bob 2016-09-30 72f49 PENDING Alice
Bob 2016-10-03 b932s PENDING Carol
Bob 2016-10-03 ef9ca IN_PROGRESS David (filtered out)
Use query filter
Send back less data “on the wire”
Simplify application code
Simple SQL-like expressions
• AND, OR, NOT, ()

Your index isn’t entirely selective


Approach 2: Composite key

Status Date StatusDate


DONE 2016-10-02 DONE_2016-10-02
IN_PROGRESS 2016-10-08 IN_PROGRESS_2016-10-08
IN_PROGRESS 2016-10-03 IN_PROGRESS_2016-10-03
PENDING 2016-10-03 PENDING_2016-09-30
PENDING 2016-09-30 PENDING_2016-10-03
Approach 2: Composite key

Secondary Index

Opponent StatusDate GameId Host


Alice DONE_2016-10-02 d9bl3 David
Carol IN_PROGRESS_2016-10-08 o2pnb Bob
Bob IN_PROGRESS_2016-10-03 ef9ca David
Bob PENDING_2016-09-30 72f49 Alice
Bob PENDING_2016-10-03 b932s Carol
Approach 2: Composite key
SELECT * FROM Game
WHERE Opponent='Bob' Bob
AND StatusDate BEGINS_WITH 'PENDING'
Secondary Index

Opponent StatusDate GameId Host


Alice DONE_2016-10-02 d9bl3 David
Carol IN_PROGRESS_2016-10-08 o2pnb Bob
Bob IN_PROGRESS_2016-10-03 ef9ca David
Bob PENDING_2016-09-30 72f49 Alice
Bob PENDING_2016-10-03 b932s Carol
Sparse indexes

Scan sparse partition GSIs


Game-scores-table
Award-GSI
Id
User Game Score Date Award Award
(Part.) Id User Score
(Part.)
1 Bob G1 1300 2015-12-23 Champ 4 Mary 2000
2 Bob G1 1450 2015-12-23
3 Jay G1 1600 2015-12-24
4 Mary G1 2000 2015-10-24 Champ
5 Ryan G2 123 2015-03-10
6 Jones G2 345 2015-03-20
Replace filter with indexes

Concatenate attributes to form useful secondary index keys

Take advantage of sparse indexes

You want to optimize a query as much


as possible
Scaling with
DynamoDB at Ocado
/OcadoTechnology
Who is Ocado
Ocado is the world’s largest dedicated online grocery
retailer.
Here in the UK we deliver to over 500,000 active
customers shopping with us.
Each day we pick and pack over 1,000,000 items for
25,000 customers

/OcadoTechnology
What does Ocado Technology Do?
We create the websites and webapps for Ocado,
Morrisons, Fetch, Sizzled, Fabled and more
Design build robotics and software for our
automated warehouses.
Optimise routes for thousands of miles of deliveries
each week

/OcadoTechnology
Ocado Smart Platform
We’ve also developed a proprietary solution for
operating an online retail business.
This has been built with the Cloud and AWS in
mind.
Allows us to offer our solutions international
retailers and scale to serve customers around the
globe.

/OcadoTechnology
Our Challenges
Starting from scratch, how do we design a modern
application suite architecture?
How do we minimise bottlenecks and allow
ourselves to scale?
How can we support our customers (retailers) and
their customers without massively increasing our
size?

/OcadoTechnology
Application Architecture
How about them microservices?
Applications as masters of their own state
In the past we had ETLs moving data around
We wanted to control how data moved through the
platform
For us this mean RESTful services or asynchronous
messaging

/OcadoTechnology
Name-spaced Microservices
Using IAM policies each application can have it’s
own name-spaced AWS resources
Applications need to stateless themselves, all state
must be stored in the database
Decouple how data is stored from how it’s
accessed.

/OcadoTechnology
Opting for DynamoDB
Every microservice needs it’s own database.
Easy to manage by developers themselves
Need to be highly available and as fast as needed

/OcadoTechnology
Things we wanted to avoid
Shared resources or infrastructure between apps
Manual intervention in order to scale
Downtime

/OcadoTechnology
Recommendations
Allow applications to create their own DynamoDB
tables.
Allowing them to tune their own capacity too
Use namespacing to control access
arn:aws:dynamodb:region:account-
id:table/mymicroservice-*

/OcadoTechnology
Scaling Up
While managing costs
Managed Service vs DIY
We’re a team of 5 engineers
Supporting over 1,000 developers
Working on over 200 applications

/OcadoTechnology
Scaling With DynamoDB
Use account level throughput limits as a cost safety
control no as planned usage.
Encourage developers to scale based on a metric
using the SDK.
Make sure scale down is happening too.

/OcadoTechnology
DynamoDB Autoscaling NEW!
Recently released is the DynamoDB Autoscaling
feature.
It allows you to set min and max throughput and a
target utilisation

/OcadoTechnology
Some Stats - Jan 2017
Total Tables: 2,045
Total Storage: 283.42 GiB
Number of Items: 1,000,374,288
Total Provisioned Throughput (Reads): 49,850
Total Provisioned Throughput (Writes): 20,210

/OcadoTechnology
Some Stats - Jun 2017
Total Tables: 3,073
Total Storage: 1.78 TiB
Number of Items: 2,816,558,766
Total Provisioned Throughput (Reads): 63,421
Total Provisioned Throughput (Writes): 32,569

/OcadoTechnology
Looking to the Future
What additional features do we want
Backup
Despite DynamoDB being a managed service we
still have a desire to backup our data
Highly available services don’t protect you from
user error
Having a second copy of live databases makes
everyone feel confident

/OcadoTechnology
Introducing Hippolyte
Hippolyte is an at-scale DynamoDB backup
solution
Designed for point-in-time backups of hundreds of
tables or more
Scales capacity up to complete as fast as needed.

/OcadoTechnology
Available Now
Github: https://github.com/ocadotechnology/hippolyte

Blog Post: http://hippolyte.oca.do

/OcadoTechnology
Thank you
Recent Announcements
Amazon DynamoDB now Supports Cost Allocation Tags

AWS DMS Adds Support for MongoDB and Amazon DynamoDB

Announcing VPC Endpoints for Amazon DynamoDB (Public Preview)

Announcing Amazon DynamoDB Auto Scaling


Amazon DynamoDB AutoScaling
Automate capacity management for tables and GSIs

Enabled by default for all new tables and indexes

Can manually configure for existing tables and indexes

Requires DynamoDBAutoScale Role

Full CLI / API Support

No additional cost beyond existing DynamoDB and Cloudwatch alarms


Amazon DynamoDB Resources

Amazon Dynamo FAQ: https://aws.amazon.com/dynamodb/faqs/

Amazon DynamoDB Documentation (includes Developers Guide):

https://aws.amazon.com/documentation/dynamodb/

AWS Blog: https://aws.amazon.com/blogs/aws/


Thank you!

You might also like