KEMBAR78
Amazon Dynamo DB for Developers (김일호) - AWS DB Day | PDF
DynamoDB for developers
김일호, Solutions Architect
Time : 15:10 – 16:30
Agenda
§Tip 1. DynamoDB Index(LSI, GSI)
§Tip 2. DynamoDB Scaling
§Tip 3. DynamoDB Data Modeling
§Scenario based Best Practice
§DynamoDB Streams
§Nexon use-case
Tip 1. DynamoDB Index(LSI, GSI)
Tip 2. DynamoDB Scaling
Tip 3. DynamoDB Data Modeling
Scenario based Best Practice
DynamoDB Streams
Nexon use-case
Local secondary index (LSI)
§Alternate range key attribute
§Index is local to a hash key (or partition)
A1
(hash)
A3
(range)
A2
(table	key)
A1
(hash)
A2
(range)
A3 A4 A5
LSIs A1
(hash)
A4
(range)
A2
(table	key)
A3
(projected)
Table
KEYS_ONLY
INCLUDE	A3
A1
(hash)
A5
(range)
A2
(table	key)
A3
(projected)
A4
(projected)
ALL
10	GB	max	per	hash	key,	
i.e.	LSIs	limit	the	#	of	ran
ge	keys!
Global secondary index (GSI)
§Alternate hash (+range) key
§Index is across all table hash keys (partitions)
A1
(hash)
A2 A3 A4 A5
GSIs A5
(hash)
A4
(range)
A1
(table	key)
A3
(projected)
Table
INCLUDE	A3
A4
(hash)
A5
(range)
A1
(table	key)
A2
(projected)
A3
(projected) ALL
A2
(hash)
A1
(table	key) KEYS_ONLY
RCUs/WCUs	provisioned	
separately	for	GSIs	
Online	indexing
How do GSI updates work?
Table
Primary	ta
ble
Primary	ta
ble
Primary	ta
ble
Primary	ta
ble
Global	Seco
ndary	Index
Client
2.	Asynchronous	
update	(in	progress)
If	GSIs	don’t	 have	enough	 write	capacity,	table	writes	will	be	throttled!
LSI or GSI?
§LSI can be modeled as a GSI
§If data size in an item collection > 10 GB, use GSI
§If eventual consistency is okay for your scenario, use
GSI!
Tip 1. DynamoDB Index(LSI, GSI)
Tip 2. DynamoDB Scaling
Tip 3. DynamoDB Data Modeling
Scenario based Best Practice
DynamoDB Streams
Nexon use-case
Scaling
§Throughput
§ Provision any amount of throughput to a table
§Size
§ Add any number of items to a table
§ Max item size is 400 KB
§ LSIs limit the number of range keys due to 10 GB limit
§Scaling is achieved through partitioning
Throughput
§Provisioned at the table level
§ Write capacity units (WCUs) are measured in 1 KB per second
§ Read capacity units (RCUs) are measured in 4 KB per second
§ RCUs measure strictly consistent reads
§ Eventually consistent reads cost 1/2 of consistent reads
§Read and write throughput limits are independent
WCURCU
Partitioning math
Number	of	Partitions
By	Capacity (Total	RCU	/	3000)	+	(Total	WCU	/	1000)
By	Size Total	Size	/	10	GB
Total	Partitions CEILING(MAX	(Capacity,	Size))
Partitioning example
Table	size	=	8	GB,	RCUs	=	5000,	WCUs	=	500
RCUs	per	partition	=	5000/3	=	1666.67
WCUs	per	partition	=	500/3	=		166.67
Data/partition	=	10/3	=	3.33	GB
RCUs	and	WCUs	are	uniformly	 sprea
d	across	partitions
Numberof	Partitions
By Capacity (5000	/	3000)	+	(500	/	1000)	=	2.17
By	Size 8 /	10	=	0.8
Total	Partitions CEILING(MAX	(2.17,	0.8))	=	3
Allocation of partitions
§A partition split occurs when
§ Increased provisioned throughput settings
§ Increased storage requirements
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html
Example: hot keys
Partition
Time
Heat
Example: periodic spike
Getting the most out of DynamoDB throughput
“To get the most out of
DynamoDB throughput, create
tables where the hash key
element has a large number of
distinct values, and values are
requested fairly uniformly, as
randomly as possible.”
—DynamoDB Developer Guide
§Space: access is evenly
spread over the key-
space
§Time: requests arrive
evenly spaced in time
What causes throttling?
§ If sustained throughput goes beyond provisioned throughput per partition
§ Non-uniform workloads
§ Hot keys/hot partitions
§ Very large bursts
§ Mixing hot data with cold data
§ Use a table per time period
§ From the example before:
§ Table created with 5000 RCUs, 500 WCUs
§ RCUs per partition = 1666.67
§ WCUs per partition = 166.67
§ If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition, DynamoDB may
throttle requests
§ Solution: Increase provisioned throughput
Tip 1. DynamoDB Index(LSI, GSI)
Tip 2. DynamoDB Scaling
Tip 3. DynamoDB Data Modeling
Scenario based Best Practice
DynamoDB Streams
Nexon use-case
1:1 relationships or key-values
§Use a table or GSI with a hash key
§Use GetItem or BatchGetItem API
§Example: Given an SSN or license number, get
attributes
Users	Table
Hash	key Attributes
SSN	=	123-45-6789 Email	=	johndoe@nowhere.com,	License =	TDL25478134
SSN	=	987-65-4321 Email	=	maryfowler@somewhere.com,	License =	TDL78309234
Users-Email-GSI
Hash	key Attributes
License =	TDL78309234 Email	=	maryfowler@somewhere.com,	SSN	=	987-65-4321
License =	TDL25478134 Email	=	johndoe@nowhere.com,	SSN	=	123-45-6789
1:N relationships or parent-children
§Use a table or GSI with hash and range key
§Use Query API
Example:
§ Given a device, find all readings between epoch X, Y
Device-measurements
Hash	Key Range	key Attributes
DeviceId	=	1 epoch	=	5513A97C Temperature	=	30,	pressure	=	90
DeviceId	=	1 epoch	=	5513A9DB Temperature	=	30,	pressure	=	90
N:M relationships
§Use a table and GSI with hash and range key elements
switched
§Use Query API
Example: Given a user, find all games. Or given a game,
find all users.
User-Games-Table
Hash	Key Range	key
UserId	=	bob GameId	=	Game1
UserId	=	fred GameId	=	Game2
UserId	=	bob GameId	=	Game3
Game-Users-GSI
Hash	Key Range	key
GameId	=	Game1 UserId	=	bob
GameId	=	Game2 UserId	=	fred
GameId	=	Game3 UserId	=	bob
Documents (JSON)
§ New data types (M, L, BOOL,
NULL) introduced to support
JSON
§ Document SDKs
§ Simple programming model
§ Conversion to/from JSON
§ Java, JavaScript, Ruby, .NET
§ Cannot index (S,N) elements
of a JSON object stored in M
§ Only top-level table attributes
can be used in LSIs and
GSIs without
Streams/Lambda
JavaScript DynamoDB
string S
number N
boolean BOOL
null NULL
array L
object M
Rich expressions
§Projection expression to get just some of the attributes
§ Query/Get/Scan: ProductReviews.FiveStar[0]
Rich expressions
§Projection expression to get just some of the attributes
§ Query/Get/Scan: ProductReviews.FiveStar[0]
ProductReviews: {
FiveStar: [
"Excellent! Can't recommend it highly enough! Buy it!",
"Do yourself a favor and buy this." ],
OneStar: [
"Terrible product! Do not buy this." ] }
]
}
Rich expressions
§Filter expression
§ Query/Scan: #VIEWS > :num
§Update expression
§ UpdateItem: set Replies = Replies + :num
Rich expressions
§ Conditional expression
§ Put/Update/DeleteItem
§ attribute_not_exists (#pr.FiveStar)
§ attribute_exists(Pictures.RearView)
1. First looks for an item whose primary key matches that of the
item to be written.
2. Only if the search returns nothing is there no partition key
present in the result.
3. Otherwise, the attribute_not_exists function above fails and
the write will be prevented.
Tip 1. DynamoDB Index(LSI, GSI)
Tip 2. DynamoDB Scaling
Tip 3. DynamoDB Data Modeling
Scenario based Best Practice
DynamoDB Streams
Nexon use-case
Game logging
Storing time series data
Time series tables
Events_table_2015_April
Event_id
(Hash	key)
Timestamp
(range	key)
Attribute1 …. Attribute	N
Events_table_2015_March
Event_id
(Hash	key)
Timestamp
(range	key)
Attribute1 …. Attribute	N
Events_table_2015_Feburary
Event_id
(Hash	key)
Timestamp
(range	key)
Attribute1 …. Attribute	N
Events_table_2015_January
Event_id
(Hash	key)
Timestamp
(range	key)
Attribute1 …. Attribute	N
RCUs	=	1000
WCUs	=	100
RCUs	=	10000
WCUs	=	10000
RCUs	=	100
WCUs	=	1
RCUs	=	10
WCUs	=	1
Current	table
Older	tables
Hot	dataCold	data
Don’t	mix	hot	and	cold	data;	archive	cold	data	to	Amazon	S3
Important	when:
Use a table per time period
§Pre-create daily, weekly, monthly tables
§Provision required throughput for current table
§Writes go to the current table
§Turn off (or reduce) throughput for older tables
Dealing with time series data
Item shop catalog
Popular items (read)
Partition	1
2000	RCUs
Partition	K
2000	RCUs
Partition	M
2000	RCUs
Partition	50
2000	RCU
Scaling bottlenecks
Product	 A Product	 B
Gamers
ItemShopCatalog Table
SELECT Id, Description, ...
FROM ItemShopCatalog
Requests	Per	Second
Item	Primary	Key
Request	Distribution	Per	Hash	Key
DynamoDB	Requests
Partition	 1 Partition	 2
ItemShopCatalog Table
User
DynamoDB
User
Cache
popular	items
SELECT Id, Description, ...
FROM ProductCatalog
Requests	Per	Second
Item	Primary	Key
Request	Distribution	Per	Hash	Key
DynamoDB	Requests Cache	Hits
Multiplayer online gaming
Query filters vs.
composite key indexes
GameId Date Host Opponent Status
d9bl3 2014-10-02 David Alice DONE
72f49 2014-09-30 Alice Bob PENDING
o2pnb 2014-10-08 Bob Carol IN_PROGRESS
b932s 2014-10-03 Carol Bob PENDING
ef9ca 2014-10-03 David Bob IN_PROGRESS
Games	Table
Multiplayer online game data
Hash	key
Query for incoming game requests
§DynamoDB indexes provide hash and range
§What about queries for two equalities and a range?
SELECT * FROM Game
WHERE Opponent='Bob‘
AND Status=‘PENDING'
ORDER BY Date DESC
(hash)
(range)
(?)
Secondary	Index
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
Approach 1: Query filter
BobHash	key Range	key
Secondary	Index
Approach 1: Query filter
Bob
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
SELECT * FROM Game
WHERE Opponent='Bob'
ORDER BY Date DESC
FILTER ON Status='PENDING'
(filtered	out)
Needle in a haystack
Bob
Important	when:
Use query filter
§Send back less data “on the wire”
§Simplify application code
§Simple SQL-like expressions
§ AND, OR, NOT, ()
Your index isn’t entirely selective
Approach 2: composite key
StatusDate
DONE_2014-10-02
IN_PROGRESS_2014-10-08
IN_PROGRESS_2014-10-03
PENDING_2014-09-30
PENDING_2014-10-03
Status
DONE
IN_PROGRESS
IN_PROGRESS
PENDING
PENDING
Date
2014-10-02
2014-10-08
2014-10-03
2014-10-03
2014-09-30
+ =
Secondary	Index
Approach 2: composite key
Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Hash	key Range	key
Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Secondary	Index
Approach 2: composite key
Bob
SELECT * FROM Game
WHERE Opponent='Bob'
AND StatusDate BEGINS_WITH 'PENDING'
Needle in a sorted haystack
Bob
Sparse indexes
Id	
(Hash)
User Game Score Date Award
1 Bob G1 1300 2012-12-23
2 Bob G1 1450 2012-12-23
3 Jay G1 1600 2012-12-24
4 Mary G1 2000 2012-10-24 Champ
5 Ryan G2 123 2012-03-10
6 Jones G2 345 2012-03-20
Game-scores-table
Award	
(Hash)
Id User Score
Champ 4 Mary 2000
Award-GSI
Scan	sparse	hash	GSIs
Important	when:
Replace filter with indexes
§Concatenate attributes to form useful
§secondary index keys
§Take advantage of sparse indexes
You want to optimize a query as much
as possible
Status + Date
Big data analytics
with DynamoDB
Transactional Data Processing
§DynamoDB is well-suited for transactional processing:
• High concurrency
• Strong consistency
• Atomic updates of single items
• Conditional updates for de-dupe and optimistic concurrency
• Supports both key/value and JSON document schema
• Capable of handling large table sizes with low latency data
access
Case 1: Store and Index Metadata for Objects
Stored in Amazon S3
Case 1: Use Case
§We have a large number of digital audio files stored in
Amazon S3 and we want to make them searchable:
§Use DynamoDB as the primary data store for the
metadata.
§Index and query the metadata using Elasticsearch.
Case 1: Steps to Implement
1. Create a Lambda function that reads the metadata
from the ID3 tag and inserts it into a DynamoDB
table.
2. Enable S3 notifications on the S3 bucket storing the
audio files.
3. Enable streams on the DynamoDB table.
4. Create a second Lambda function that takes the
metadata in DynamoDB and indexes it using
Elasticsearch.
5. Enable the stream as the event source for the
Lambda function.
Case 1: Key Takeaways
§DynamoDB + Elasticsearch = Durable, scalable, highly-
available database with rich query capabilities.
§Use Lambda functions to respond to events in both
DynamoDB streams and Amazon S3 without having to
manage any underlying compute infrastructure.
Case 2 – Execute Queries Against Multiple Data Sources Using
DynamoDB and Hive
Case 2: Use Case
We want to enrich our audio file metadata stored in
DynamoDB with additional data from the Million Song
dataset:
§Million song data set is stored in text files.
§ID3 tag metadata is stored in DynamoDB.
§Use Amazon EMR with Hive to join the two datasets
together in a query.
Case 2: Steps to Implement
1. Spin up an Amazon EMR cluster
with Hive.
2. Create an external Hive table
using the
DynamoDBStorageHandler.
3. Create an external Hive table
using the Amazon S3 location of
the text files containing the
Million Song project metadata.
4. Create and run a Hive query that
joins the two external tables
together and writes the joined
results out to Amazon S3.
5. Load the results from Amazon
S3 into DynamoDB.
Case 2: Key Takeaways
§Use Amazon EMR to quickly provision a Hadoop
cluster with Hive and to tear it down when done.
§Use of Hive with DynamoDB allows items in
DynamoDB tables to be queried/joined with data from a
variety of sources.
Case 3 – Store and Analyze Sensor Data with
DynamoDB and Amazon Redshift
Dashboard
Case 3: Use Case
A large number of sensors are taking readings at regular intervals.
You need to aggregate the data from each reading into a data
warehouse for analysis:
• Use Amazon Kinesis to ingest the raw sensor data.
• Store the sensor readings in DynamoDB for fast access and real-
time dashboards.
• Store raw sensor readings in Amazon S3 for durability and
backup.
• Load the data from Amazon S3 into Amazon Redshift using AWS
Lambda.
Case 3: Steps to Implement
1. Create two Lambda functions to
read data from the Amazon
Kinesis stream.
2. Enable the Amazon Kinesis
stream as an event source for
each Lambda function.
3. Write data into DynamoDB in
one of the Lambda functions.
4. Write data into Amazon S3 in the
other Lambda function.
5. Use the aws-lambda-redshift-
loader to load the data in
Amazon S3 into Amazon
Redshift in batches.
Case 3: Key Takeaways
§ Amazon Kinesis + Lambda + DynamoDB = Scalable, durable,
highly available solution for sensor data ingestion with very low
operational overhead.
§ DynamoDB is well-suited for near-realtime queries of recent
sensor data readings.
§ Amazon Redshift is well-suited for deeper analysis of sensor data
readings spanning longer time horizons and very large numbers of
records.
§ Using Lambda to load data into Amazon Redshift provides a way
to perform ETL in frequent intervals.
Tip 1. DynamoDB Index(LSI, GSI)
Tip 2. DynamoDB Scaling
Tip 3. DynamoDB Data Modeling
Scenario based Best Practice
DynamoDB Streams
Nexon use-case
§Stream of updates to a
table
§Asynchronous
§Exactly once
§Strictly ordered
§ Per item
§Highly durable
§ Scale with table
§24-hour lifetime
§Sub-second latency
DynamoDB Streams
View type Destination
Old image—before update Name = John, Destination = Mars
New image—after update Name = John, Destination = Pluto
Old and new images Name = John, Destination = Mars
Name = John, Destination = Pluto
Keys only Name = John
View types
UpdateItem (Name = John, Destination = Pluto)
Stream
Table
Partition	1
Partition	2
Partition	3
Partition	4
Partition	5
Table
Shard	1
Shard	2
Shard	3
Shard	4
KCL
Worker
KCL
Worker
KCL
Worker
KCL
Worker
Amazon Kinesis Client
Library application
DynamoDB
client	application
Updates
DynamoDB Streams and
Amazon Kinesis Client Library
DynamoDB Streams
Open Source Cross-Region
Replication Library
Asia Pacific (Sydney) EU (Ireland) Replica
US	East	(N.	Virginia)
Cross-region replication
DynamoDB Streams and AWS Lambda
Triggers
Lambda	function
Notify	change
Derivative	tables
Amazon CloudSearch
Amazon ElasticSearch
Amazon ElastiCache
Analytics with DynamoDB Streams
§Collect and de-dupe data in DynamoDB
§Aggregate data in-memory and flush periodically
§Performing real-time aggregation and analytics
EMR
Redshift
DynamoDB
Cross-region Replication
Tip 1. DynamoDB Index(LSI, GSI)
Tip 2. DynamoDB Scaling
Tip 3. DynamoDB Data Modeling
Scenario based Best Practice
DynamoDB Streams
Nexon use-case
HIT chose DynamoDB
§HIT Main Game DB
• 33	Tables
• CloudWatch Alarms
• Flexible	Dashboards
Architecture
§The key is All gaming users directly access DynamoDB
170,000	
Concurrent	users
A	main	game	DB
(33	Tables,	NRT)
Game,	PvP,	Raid	and	etc ….
Unreal	Engine	+	AWS	.NET	SDK
Main DynamoDB tables
Mail
Table
Jewel
Table
Friend
Table
Account
Table
Character
Table
Skills
Table
Mission
Table
Item
Table
Your	idea	J
Your application with DynamoDB
Table
Table Table
Table
Table
Table
Table
Table
Thank you!

Amazon Dynamo DB for Developers (김일호) - AWS DB Day

  • 1.
    DynamoDB for developers 김일호,Solutions Architect Time : 15:10 – 16:30
  • 2.
    Agenda §Tip 1. DynamoDBIndex(LSI, GSI) §Tip 2. DynamoDB Scaling §Tip 3. DynamoDB Data Modeling §Scenario based Best Practice §DynamoDB Streams §Nexon use-case
  • 3.
    Tip 1. DynamoDBIndex(LSI, GSI) Tip 2. DynamoDB Scaling Tip 3. DynamoDB Data Modeling Scenario based Best Practice DynamoDB Streams Nexon use-case
  • 4.
    Local secondary index(LSI) §Alternate range key attribute §Index is local to a hash key (or partition) A1 (hash) A3 (range) A2 (table key) A1 (hash) A2 (range) A3 A4 A5 LSIs A1 (hash) A4 (range) A2 (table key) A3 (projected) Table KEYS_ONLY INCLUDE A3 A1 (hash) A5 (range) A2 (table key) A3 (projected) A4 (projected) ALL 10 GB max per hash key, i.e. LSIs limit the # of ran ge keys!
  • 5.
    Global secondary index(GSI) §Alternate hash (+range) key §Index is across all table hash keys (partitions) A1 (hash) A2 A3 A4 A5 GSIs A5 (hash) A4 (range) A1 (table key) A3 (projected) Table INCLUDE A3 A4 (hash) A5 (range) A1 (table key) A2 (projected) A3 (projected) ALL A2 (hash) A1 (table key) KEYS_ONLY RCUs/WCUs provisioned separately for GSIs Online indexing
  • 6.
    How do GSIupdates work? Table Primary ta ble Primary ta ble Primary ta ble Primary ta ble Global Seco ndary Index Client 2. Asynchronous update (in progress) If GSIs don’t have enough write capacity, table writes will be throttled!
  • 7.
    LSI or GSI? §LSIcan be modeled as a GSI §If data size in an item collection > 10 GB, use GSI §If eventual consistency is okay for your scenario, use GSI!
  • 8.
    Tip 1. DynamoDBIndex(LSI, GSI) Tip 2. DynamoDB Scaling Tip 3. DynamoDB Data Modeling Scenario based Best Practice DynamoDB Streams Nexon use-case
  • 9.
    Scaling §Throughput § Provision anyamount of throughput to a table §Size § Add any number of items to a table § Max item size is 400 KB § LSIs limit the number of range keys due to 10 GB limit §Scaling is achieved through partitioning
  • 10.
    Throughput §Provisioned at thetable level § Write capacity units (WCUs) are measured in 1 KB per second § Read capacity units (RCUs) are measured in 4 KB per second § RCUs measure strictly consistent reads § Eventually consistent reads cost 1/2 of consistent reads §Read and write throughput limits are independent WCURCU
  • 11.
    Partitioning math Number of Partitions By Capacity (Total RCU / 3000) + (Total WCU / 1000) By SizeTotal Size / 10 GB Total Partitions CEILING(MAX (Capacity, Size))
  • 12.
  • 13.
    Allocation of partitions §Apartition split occurs when § Increased provisioned throughput settings § Increased storage requirements http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html
  • 14.
  • 15.
  • 16.
    Getting the mostout of DynamoDB throughput “To get the most out of DynamoDB throughput, create tables where the hash key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.” —DynamoDB Developer Guide §Space: access is evenly spread over the key- space §Time: requests arrive evenly spaced in time
  • 17.
    What causes throttling? §If sustained throughput goes beyond provisioned throughput per partition § Non-uniform workloads § Hot keys/hot partitions § Very large bursts § Mixing hot data with cold data § Use a table per time period § From the example before: § Table created with 5000 RCUs, 500 WCUs § RCUs per partition = 1666.67 § WCUs per partition = 166.67 § If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition, DynamoDB may throttle requests § Solution: Increase provisioned throughput
  • 18.
    Tip 1. DynamoDBIndex(LSI, GSI) Tip 2. DynamoDB Scaling Tip 3. DynamoDB Data Modeling Scenario based Best Practice DynamoDB Streams Nexon use-case
  • 19.
    1:1 relationships orkey-values §Use a table or GSI with a hash key §Use GetItem or BatchGetItem API §Example: Given an SSN or license number, get attributes Users Table Hash key Attributes SSN = 123-45-6789 Email = johndoe@nowhere.com, License = TDL25478134 SSN = 987-65-4321 Email = maryfowler@somewhere.com, License = TDL78309234 Users-Email-GSI Hash key Attributes License = TDL78309234 Email = maryfowler@somewhere.com, SSN = 987-65-4321 License = TDL25478134 Email = johndoe@nowhere.com, SSN = 123-45-6789
  • 20.
    1:N relationships orparent-children §Use a table or GSI with hash and range key §Use Query API Example: § Given a device, find all readings between epoch X, Y Device-measurements Hash Key Range key Attributes DeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90 DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90
  • 21.
    N:M relationships §Use atable and GSI with hash and range key elements switched §Use Query API Example: Given a user, find all games. Or given a game, find all users. User-Games-Table Hash Key Range key UserId = bob GameId = Game1 UserId = fred GameId = Game2 UserId = bob GameId = Game3 Game-Users-GSI Hash Key Range key GameId = Game1 UserId = bob GameId = Game2 UserId = fred GameId = Game3 UserId = bob
  • 22.
    Documents (JSON) § Newdata types (M, L, BOOL, NULL) introduced to support JSON § Document SDKs § Simple programming model § Conversion to/from JSON § Java, JavaScript, Ruby, .NET § Cannot index (S,N) elements of a JSON object stored in M § Only top-level table attributes can be used in LSIs and GSIs without Streams/Lambda JavaScript DynamoDB string S number N boolean BOOL null NULL array L object M
  • 23.
    Rich expressions §Projection expressionto get just some of the attributes § Query/Get/Scan: ProductReviews.FiveStar[0]
  • 24.
    Rich expressions §Projection expressionto get just some of the attributes § Query/Get/Scan: ProductReviews.FiveStar[0] ProductReviews: { FiveStar: [ "Excellent! Can't recommend it highly enough! Buy it!", "Do yourself a favor and buy this." ], OneStar: [ "Terrible product! Do not buy this." ] } ] }
  • 25.
    Rich expressions §Filter expression §Query/Scan: #VIEWS > :num §Update expression § UpdateItem: set Replies = Replies + :num
  • 26.
    Rich expressions § Conditionalexpression § Put/Update/DeleteItem § attribute_not_exists (#pr.FiveStar) § attribute_exists(Pictures.RearView) 1. First looks for an item whose primary key matches that of the item to be written. 2. Only if the search returns nothing is there no partition key present in the result. 3. Otherwise, the attribute_not_exists function above fails and the write will be prevented.
  • 27.
    Tip 1. DynamoDBIndex(LSI, GSI) Tip 2. DynamoDB Scaling Tip 3. DynamoDB Data Modeling Scenario based Best Practice DynamoDB Streams Nexon use-case
  • 28.
  • 29.
    Time series tables Events_table_2015_April Event_id (Hash key) Timestamp (range key) Attribute1…. Attribute N Events_table_2015_March Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N Events_table_2015_Feburary Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N Events_table_2015_January Event_id (Hash key) Timestamp (range key) Attribute1 …. Attribute N RCUs = 1000 WCUs = 100 RCUs = 10000 WCUs = 10000 RCUs = 100 WCUs = 1 RCUs = 10 WCUs = 1 Current table Older tables Hot dataCold data Don’t mix hot and cold data; archive cold data to Amazon S3
  • 30.
    Important when: Use a tableper time period §Pre-create daily, weekly, monthly tables §Provision required throughput for current table §Writes go to the current table §Turn off (or reduce) throughput for older tables Dealing with time series data
  • 31.
  • 32.
    Partition 1 2000 RCUs Partition K 2000 RCUs Partition M 2000 RCUs Partition 50 2000 RCU Scaling bottlenecks Product AProduct B Gamers ItemShopCatalog Table SELECT Id, Description, ... FROM ItemShopCatalog
  • 33.
  • 34.
    Partition 1 Partition 2 ItemShopCatalog Table User DynamoDB User Cache popular items SELECT Id, Description, ... FROM ProductCatalog
  • 35.
  • 36.
    Multiplayer online gaming Queryfilters vs. composite key indexes
  • 37.
    GameId Date HostOpponent Status d9bl3 2014-10-02 David Alice DONE 72f49 2014-09-30 Alice Bob PENDING o2pnb 2014-10-08 Bob Carol IN_PROGRESS b932s 2014-10-03 Carol Bob PENDING ef9ca 2014-10-03 David Bob IN_PROGRESS Games Table Multiplayer online game data Hash key
  • 38.
    Query for incominggame requests §DynamoDB indexes provide hash and range §What about queries for two equalities and a range? SELECT * FROM Game WHERE Opponent='Bob‘ AND Status=‘PENDING' ORDER BY Date DESC (hash) (range) (?)
  • 39.
    Secondary Index Opponent Date GameIdStatus Host Alice 2014-10-02 d9bl3 DONE David Carol 2014-10-08 o2pnb IN_PROGRESS Bob Bob 2014-09-30 72f49 PENDING Alice Bob 2014-10-03 b932s PENDING Carol Bob 2014-10-03 ef9ca IN_PROGRESS David Approach 1: Query filter BobHash key Range key
  • 40.
    Secondary Index Approach 1: Queryfilter Bob Opponent Date GameId Status Host Alice 2014-10-02 d9bl3 DONE David Carol 2014-10-08 o2pnb IN_PROGRESS Bob Bob 2014-09-30 72f49 PENDING Alice Bob 2014-10-03 b932s PENDING Carol Bob 2014-10-03 ef9ca IN_PROGRESS David SELECT * FROM Game WHERE Opponent='Bob' ORDER BY Date DESC FILTER ON Status='PENDING' (filtered out)
  • 41.
    Needle in ahaystack Bob
  • 42.
    Important when: Use query filter §Sendback less data “on the wire” §Simplify application code §Simple SQL-like expressions § AND, OR, NOT, () Your index isn’t entirely selective
  • 43.
    Approach 2: compositekey StatusDate DONE_2014-10-02 IN_PROGRESS_2014-10-08 IN_PROGRESS_2014-10-03 PENDING_2014-09-30 PENDING_2014-10-03 Status DONE IN_PROGRESS IN_PROGRESS PENDING PENDING Date 2014-10-02 2014-10-08 2014-10-03 2014-10-03 2014-09-30 + =
  • 44.
    Secondary Index Approach 2: compositekey Opponent StatusDate GameId Host Alice DONE_2014-10-02 d9bl3 David Carol IN_PROGRESS_2014-10-08 o2pnb Bob Bob IN_PROGRESS_2014-10-03 ef9ca David Bob PENDING_2014-09-30 72f49 Alice Bob PENDING_2014-10-03 b932s Carol Hash key Range key
  • 45.
    Opponent StatusDate GameIdHost Alice DONE_2014-10-02 d9bl3 David Carol IN_PROGRESS_2014-10-08 o2pnb Bob Bob IN_PROGRESS_2014-10-03 ef9ca David Bob PENDING_2014-09-30 72f49 Alice Bob PENDING_2014-10-03 b932s Carol Secondary Index Approach 2: composite key Bob SELECT * FROM Game WHERE Opponent='Bob' AND StatusDate BEGINS_WITH 'PENDING'
  • 46.
    Needle in asorted haystack Bob
  • 47.
    Sparse indexes Id (Hash) User GameScore Date Award 1 Bob G1 1300 2012-12-23 2 Bob G1 1450 2012-12-23 3 Jay G1 1600 2012-12-24 4 Mary G1 2000 2012-10-24 Champ 5 Ryan G2 123 2012-03-10 6 Jones G2 345 2012-03-20 Game-scores-table Award (Hash) Id User Score Champ 4 Mary 2000 Award-GSI Scan sparse hash GSIs
  • 48.
    Important when: Replace filter withindexes §Concatenate attributes to form useful §secondary index keys §Take advantage of sparse indexes You want to optimize a query as much as possible Status + Date
  • 49.
  • 50.
    Transactional Data Processing §DynamoDBis well-suited for transactional processing: • High concurrency • Strong consistency • Atomic updates of single items • Conditional updates for de-dupe and optimistic concurrency • Supports both key/value and JSON document schema • Capable of handling large table sizes with low latency data access
  • 51.
    Case 1: Storeand Index Metadata for Objects Stored in Amazon S3
  • 52.
    Case 1: UseCase §We have a large number of digital audio files stored in Amazon S3 and we want to make them searchable: §Use DynamoDB as the primary data store for the metadata. §Index and query the metadata using Elasticsearch.
  • 53.
    Case 1: Stepsto Implement 1. Create a Lambda function that reads the metadata from the ID3 tag and inserts it into a DynamoDB table. 2. Enable S3 notifications on the S3 bucket storing the audio files. 3. Enable streams on the DynamoDB table. 4. Create a second Lambda function that takes the metadata in DynamoDB and indexes it using Elasticsearch. 5. Enable the stream as the event source for the Lambda function.
  • 54.
    Case 1: KeyTakeaways §DynamoDB + Elasticsearch = Durable, scalable, highly- available database with rich query capabilities. §Use Lambda functions to respond to events in both DynamoDB streams and Amazon S3 without having to manage any underlying compute infrastructure.
  • 55.
    Case 2 –Execute Queries Against Multiple Data Sources Using DynamoDB and Hive
  • 56.
    Case 2: UseCase We want to enrich our audio file metadata stored in DynamoDB with additional data from the Million Song dataset: §Million song data set is stored in text files. §ID3 tag metadata is stored in DynamoDB. §Use Amazon EMR with Hive to join the two datasets together in a query.
  • 57.
    Case 2: Stepsto Implement 1. Spin up an Amazon EMR cluster with Hive. 2. Create an external Hive table using the DynamoDBStorageHandler. 3. Create an external Hive table using the Amazon S3 location of the text files containing the Million Song project metadata. 4. Create and run a Hive query that joins the two external tables together and writes the joined results out to Amazon S3. 5. Load the results from Amazon S3 into DynamoDB.
  • 58.
    Case 2: KeyTakeaways §Use Amazon EMR to quickly provision a Hadoop cluster with Hive and to tear it down when done. §Use of Hive with DynamoDB allows items in DynamoDB tables to be queried/joined with data from a variety of sources.
  • 59.
    Case 3 –Store and Analyze Sensor Data with DynamoDB and Amazon Redshift Dashboard
  • 60.
    Case 3: UseCase A large number of sensors are taking readings at regular intervals. You need to aggregate the data from each reading into a data warehouse for analysis: • Use Amazon Kinesis to ingest the raw sensor data. • Store the sensor readings in DynamoDB for fast access and real- time dashboards. • Store raw sensor readings in Amazon S3 for durability and backup. • Load the data from Amazon S3 into Amazon Redshift using AWS Lambda.
  • 61.
    Case 3: Stepsto Implement 1. Create two Lambda functions to read data from the Amazon Kinesis stream. 2. Enable the Amazon Kinesis stream as an event source for each Lambda function. 3. Write data into DynamoDB in one of the Lambda functions. 4. Write data into Amazon S3 in the other Lambda function. 5. Use the aws-lambda-redshift- loader to load the data in Amazon S3 into Amazon Redshift in batches.
  • 62.
    Case 3: KeyTakeaways § Amazon Kinesis + Lambda + DynamoDB = Scalable, durable, highly available solution for sensor data ingestion with very low operational overhead. § DynamoDB is well-suited for near-realtime queries of recent sensor data readings. § Amazon Redshift is well-suited for deeper analysis of sensor data readings spanning longer time horizons and very large numbers of records. § Using Lambda to load data into Amazon Redshift provides a way to perform ETL in frequent intervals.
  • 63.
    Tip 1. DynamoDBIndex(LSI, GSI) Tip 2. DynamoDB Scaling Tip 3. DynamoDB Data Modeling Scenario based Best Practice DynamoDB Streams Nexon use-case
  • 64.
    §Stream of updatesto a table §Asynchronous §Exactly once §Strictly ordered § Per item §Highly durable § Scale with table §24-hour lifetime §Sub-second latency DynamoDB Streams
  • 65.
    View type Destination Oldimage—before update Name = John, Destination = Mars New image—after update Name = John, Destination = Pluto Old and new images Name = John, Destination = Mars Name = John, Destination = Pluto Keys only Name = John View types UpdateItem (Name = John, Destination = Pluto)
  • 66.
  • 67.
    DynamoDB Streams Open SourceCross-Region Replication Library Asia Pacific (Sydney) EU (Ireland) Replica US East (N. Virginia) Cross-region replication
  • 68.
  • 69.
  • 70.
    Analytics with DynamoDBStreams §Collect and de-dupe data in DynamoDB §Aggregate data in-memory and flush periodically §Performing real-time aggregation and analytics EMR Redshift DynamoDB
  • 71.
  • 72.
    Tip 1. DynamoDBIndex(LSI, GSI) Tip 2. DynamoDB Scaling Tip 3. DynamoDB Data Modeling Scenario based Best Practice DynamoDB Streams Nexon use-case
  • 73.
    HIT chose DynamoDB §HITMain Game DB • 33 Tables • CloudWatch Alarms • Flexible Dashboards
  • 74.
    Architecture §The key isAll gaming users directly access DynamoDB 170,000 Concurrent users A main game DB (33 Tables, NRT) Game, PvP, Raid and etc …. Unreal Engine + AWS .NET SDK
  • 75.
  • 76.
    Your idea J Your application withDynamoDB Table Table Table Table Table Table Table Table
  • 77.