KEMBAR78
Lambda and DynamoDB best practices | PDF
Lambda &
DynamoDB
Best Practices
a talk by Yan Cui
“Best practice is usually just someone else’s opinion”
- random person on the internet
The “goodness” of a practice is tied to
the context in which it is applied
“Good ideas that work for most people, most of the time”
Yan Cui
http://theburningmonk.com
@theburningmonk
AWS user for 10 years
Yan Cui
http://theburningmonk.com
@theburningmonk
Developer Advocate @
Yan Cui
http://theburningmonk.com
@theburningmonk
Independent Consultant
advise
training delivery
Yan Cui
http://theburningmonk.com
@theburningmonk running serverless in
production since 2016
01. Observability from the start
A measure of how well the internal state of a
system can be inferred from its external outputs
Observability
happens…
everything fails, all the time
happened system repaired
user impact
reduce MTTR
Identify & Resolve Issues
Identify & Resolve Issues
happened system repaired
user impact
MTTDiscovery
“What alerts should I have?”
It depends on what you’re building…
Lambda
error rate %
throttle count
Lambda
error rate %
throttle count
DLQ error count
iterator age
Lambda
error rate %
throttle count
DLQ error count
iterator age
regional concurrency
Lambda
error rate %
throttle count
DLQ error count
iterator age
regional concurrency
API Gateway
p90/95/99 latency
success rate %
4xx rate %
5xx rate %
API Gateway
p90/95/99 latency
success rate %
4xx rate %
5xx rate %
SQS
message age
Lambda
error rate %
throttle count
DLQ error count
iterator age
regional concurrency
happened system repaired
user impact
finding root cause
Logs are over-rated
the needle is here
somewhere…
This is my approach nowadays
+ high-value
structured logs +
metrics + alerts
+ high-value
structured logs +
metrics + alerts
most of my
troubleshooting
errors are captured
and categorized
errors are captured
and categorized
frequency and trends
did errors correlate
to a deployment?
invocation event,
env vars, logs, etc.
+ high-value
structured logs +
metrics + alerts
Lambda invocations +
every IO-request
+ high-value
structured logs +
metrics + alerts
Lambda invocations +
every IO-request
complex (non-IO)
biz logic
logs and traces
side-by-side
logs from all the
functions
+ high-value
structured logs +
metrics + alerts
system metrics for
AWS services
02. One account per team per environment
Mind the shared limits
no. of DynamoDB tables
no. of API Gateway regional APIs
no. of API Gateway edge-optimized APIs
no. of Kinesis shards
no. of IAM roles
no. of S3 buckets
no. of CloudFormation stacks
no. of SNS subscription filters
no. of SSM parameters
…
Resource Limits
DynamoDB read & write
API Gateway requests/second
Lambda concurrent executions
SSM parameter ops/second
…
Throughput Limits
Compartmentalise security breaches
One account per Team per Environment
Isolate critical/high-throughput services
to their own accounts
org-formation
org-formation
infrastructure-as-code
CloudFormation-like YML syntax
template landing zones
org-formation
org-formation
org-formation
org-formation
org-formation
org-formation
> org-formation update
org-formation
org-formation
> org-formation perform-tasks
org-formation
https://github.com/OlafConijn/AwsOrganizationFormation
03. Load secrets
at runtime
?
?
?
?
?
?
?
?
?
? ?
??
?
?
?
SSM Parameter Store
Secret 1
Secret 2
SSM Parameter Store
Secret 1
Secret 2
IAM
Environment:
SECRET_1: …
SECRET_2: …
Environment:
SECRET_1: …
SECRET_2: …
SSM Parameter Store
Secret 1
Secret 2
IAM
Environment:
SECRET_1: …
SECRET_2: …
Environment:
SECRET_1: …
SECRET_2: …
yay!
Secrets should NEVER be in plain text in env variables
SSM Parameter Store IAM
fetch at cold start,
cache,
invalidate every x mins
Secret 1
Secret 2
https://github.com/middyjs/middy
SSM Parameter Store IAM
Secret 1
Secret 2
switch to Higher
Throughput for production
Secrets Manager IAM
Secret 1
Secret 2
built-in rotation,
more expensive
04. Principle of
least privilege
Zero-trust networking
network boundary
full-trust
network boundary
full-trust
network boundary
full-trust zero-trust networking
network boundary
full-trust zero-trust networking
compromised nodes give attackers
access to our entire system
trust no-one
trust no-one
authenticate and
authorize every request
trust no-one
authenticate and
authorize every request
use IAM to protect
internal APIs
network security is a bonus, not the only line of defense
05. Parallelise
where you can
No dependency
faster!
faster!
cheaper!
06. Quick wins
Set environment variable
AWS_NODEJS_CONNECTION_REUSE_ENABLED
to “1”
(for Node.js function running AWS SDK v1.x)
Use Database Proxies when working with RDS
Smaller deployment artefact === faster coldstart
Adding more memory DOESN’T help reduce cold start duration
(except for JVM functions)
Trim your depedencies
Use Lambda Layers as a deployment optimization
NOT as a package manager
Use Lambda Layers as a deployment optimization
const AWS = require(‘aws-sdk’)
(for Node.js function running AWS SDK v1.x)
const DynamoDB = require(‘aws-sdk/clients/dynamodb’)
Prefer Lambda Destination over DLQs
DLQ Lambda Destinations
payload
DLQ Lambda Destinations
payload payload, context(s), and response
07. DynamoDB
Use DocumentClient instead of AWS.DynamoDB
(for Node.js function running AWS SDK v1.x)
Use PAY_PER_REQUEST billing mode as default
Store large blobs in S3
Use BatchGetItem and BatchWriteItem to
read/write multiple items
Avoid Scan unless you absolutely have to
Use caching to avoid DynamoDB calls
Use high cardinality keys as hash key
Use ULIDs as sort key
Use SSE with KMS CMK
Enable point-in-time recovery
Learn single-table design patterns
gumroad.com/a/279377011
Learn single-table design patterns
But don’t turn it into a religion
single-table design
Steep learning curve.
single-table design
Steep learning curve.
Difficult to add new access patterns.
single-table design
Steep learning curve.
Difficult to add new access patterns.
Can’t monitor usage cost by entity type.
single-table design
Steep learning curve.
Difficult to add new access patterns.
Can’t monitor usage cost by entity type.
Difficult to use DynamoDB streams.
“But what about all the cost savings from Single-Table Design?!”
“But what about all the cost savings from Single-Table Design?!”
Only matters when running at scale.
The “goodness” of a practice is tied to
the context in which it is applied
A best practice for Amazon is probably not best for you.
https://theburningmonk.com/hire-me
Advise
Training Delivery
“Fundamentally, Yan has improved our team by increasing our
ability to derive value from AWS and Lambda in particular.”
Nick Blair
Tech Lead
@theburningmonk
theburningmonk.com
github.com/theburningmonk

Lambda and DynamoDB best practices