KEMBAR78
[D3T1S02] Aurora Limitless Database Introduction | PDF
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Jihoon Kim
Database Specialist Solutions Architect
AWS
Aurora Limitless Database
Introduction
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Common scaling dimensions
Object size Object count Query volume
Time
QPS
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Sharding
Application
A-Z
A-F G-K L-P Q-U V-Z
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Sharding brings … challenges
Application
Consistency?
Querying? Maintenance?
?
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Amazon Aurora Limitless Database
Application
Scaling Managed
Lim
ited
Preview
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Amazon Aurora Limitless Database
Single interface
Transactionally
Consistent
Millions of
transactions
Distributed
Serverless
Petabytes of
Data
Scaling Managed
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Using Limitless Database
cust_id
name
email
order_id
cust_id
amount
tax_rate_id
tax_rate_id
city
state
country
tax_rate
Order Tax Rate
Customer
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Using Limitless Database
Order Tax Rate
Customer
Shard 1 Shard 2 Shard 3
Tax Rate
Customer Customer
Tax Rate
Order
Order
Tax Rate
Customer
Order
Sharded Reference
collocated
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Create sharded customer table
SET rds_aurora.limitless_create_table_mode='sharded';
SET rds_aurora.limitless_create_table_shard_key='{“cust_id"}';
CREATE TABLE customer (
cust_id INT PRIMARY KEY NOT NULL,
name TEXT,
email VARCHAR(100)
);
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Create co-located order table
SET rds_aurora.limitless_create_table_mode='sharded';
SET rds_aurora.limitless_create_table_shard_key='{“cust_id"}';
SET rds_aurora.limitless_create_table_collocate_with='customer';
CREATE TABLE order (
order_id INT NOT NULL,
cust_id INT NOT NULL,
amount DOUBLE NOT NULL,
tax_rate_id DOUBLE,
PRIMARY KEY (order_id, cust_id)
);
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Create reference table tax_rate
SET rds_aurora.limitless_create_table_mode='reference';
CREATE TABLE tax_rate (
tax_rate_id INT PRIMARY KEY NOT NULL,
city TEXT NOT NULL,
state TEXT,
country TEXT NOT NULL,
tax_rate DOUBLE NOT NULL
);
SET rds_aurora.limitless_create_table_mode='standard';
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Architecture overview
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Standard Aurora architecture
Aurora cluster
Aurora distributed storage
Reader instances
Writer instance
1
2 3
1. Aurora volume on distributed
storage
2. An Aurora writer instance
3. Optional reader instances for
availability and read scaling
4. Limitless Database introduces
the “shard group” concept
Limitless Database shard group
4
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Limitless Database shard group
Aurora cluster
Data access shards
Limitless Database shard group
Aurora distributed storage
Distributed transaction routers
primary writer
Contained within your Aurora cluster
Encapsulates Limitless Database
infrastructure for your cluster
Provides an endpoint for applications
Scales resources within configured
limit based on load and data size
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Distributed transaction routers
Aurora cluster
Data access shards
Limitless Database shard group
Aurora distributed storage
Distributed transaction routers
primary writer
Serve all application traffic
Scale vertically and horizontally
based on load
Know schema and key range
placement
Assign time for transaction snapshot
and drive distributed commits
Perform initial planning of query and
aggregate results from multi-shard
queries
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Data access shards
Aurora cluster
Data access shards
Limitless Database shard group
Aurora distributed storage
Distributed transaction routers
primary writer
Own portion of sharded table key
space and have full copies of
reference tables
Scale vertically and split based on
load
Perform local planning and execution
of query fragments
Execute local transaction logic
Backed by Aurora distributed storage
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Creating a shard group
aws rds create-db-shard-group
--db-cluster-identifier proddb
--db-shard-group-identifier proddb-sg
--max-acu 600
--compute-redundancy 2
Total scale of all routers and shards will be <= max-acu
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Compute redundancy
Aurora Cluster
Availability Zone 1 Availability Zone 2 Availability Zone 3
Shard Group
Aurora distributed storage
Distributed Transaction Routers
Data Access Shards
S1 S2 S3
S3 S1 S2
S2 S3 S1
Compute redundancy 0
Compute redundancy 1
Compute redundancy 2
Separately configure HA
for the primary writer
PW
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Data distribution
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Sharded tables
set rds_aurora.limitless_create_table_mode='sharded';
set rds_aurora.limitless_create_table_shard_key='{bid}’;
create table pgbench_branches(
bid int not null,
bbalance int,
filler char(88));
postgres_limitless=> d+ pgbench_branches
Partitioned table "public.pgbench_branches"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target |
Description
----------+---------------+-----------+----------+---------+----------+-------------+--------------+--------
-----
bid | integer | | not null | | plain | | |
bbalance | integer | | | | plain | | |
filler | character(88) | | | | extended | | |
Partition key: HASH (bid)
Partitions: pgbench_branches_fs00001 FOR VALUES FROM (MINVALUE) TO ('-4611686018427387904'),
pgbench_branches_fs00002 FOR VALUES FROM ('-4611686018427387904') TO ('0'),
pgbench_branches_fs00003 FOR VALUES FROM ('0') TO ('4611686018427387904'),
pgbench_branches_fs00004 FOR VALUES FROM ('4611686018427387904') TO (MAXVALUE)
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Hash-range partitioning
Shard key is hashed to 64-bits
Ranges of 64-bit space are
assigned to shards
Shards own table fragments
Routers have table fragment
references, but no data
pgbench_branches fragments
pgbench_branches
MINVALUE
-4611686018427387904
-4611686018427387904
0
0
4611686018427387904
4611686018427387904
MAXVALUE
Distributed transaction routers
Data access shards
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Table slicing
Table fragments are partitioned
into sub-range slices
Not directly visible to users
Improve intra-shard parallelism
Relocate on horizontal scale out
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Aurora distributed storage
Horizontal scale out
“Shard split” occurs due to
utilization or storage size
Collocated table slices moved
together
Leverages Aurora storage level
cloning and replication
Routers can be added or removed
accounts and branches fragments
fragment references
MINVALUE
-4611686018427387904
-4611686018427387904
0
0
4611686018427387904
4611686018427387904
MAXVALUE
Distributed transaction routers
Data access shards
4611686018427387904
9223372036854775808
9223372036854775808
MAXVALUE
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Reference tables
set rds_aurora.limitless_create_table_mode='reference’;
create table pgbench_rates(
pid int not null primary key,
term int,
rate numeric not null);
pgbench_rates
pgbench_rates
pgbench_rates
pgbench_rates
pgbench_rates
Distributed transaction routers
Data access shards
Strongly consistent (ACID writes)
Enables join pushdown
Frequent read/join, infrequent write
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Transactions
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Transaction support
Support for READ COMMITED and REPEATABLE READ
…with a consistent view as in a single system
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Isolation level refresher
READ COMMITTED
See the latest committed data
before your query began
Every query in a transaction could
see different data
REPEATABLE READ
See the latest committed data
before your transaction began
Every query in a transaction sees
the same data
Design goal: Retain PostgreSQL transaction semantics
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Challenges in a distributed database
Coordination limits
scalability
Query fragments execute
at different times
Transaction scope
unknown until commit
Maintain order Consistent restores
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Solved with bounded clocks
Based on EC2 TimeSync service
v Current time (approximate)
v Earliest possible time
v Latest possible time
Integrated into PostgreSQL
v Tuple visibility based on time of
snapshot and commit
v Global read-after-write
v One-phase & two-phase commit
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Repeatable read – distributed (with clocks)
Transaction T1
BEGIN TRANSACTION ISOLATION LEVEL
REPEATABLE READ;
SELECT abalance FROM pgbench_accounts
WHERE bid = 619 and aid = 61890340;
704
SELECT abalance FROM pgbench_accounts
WHERE bid = 801 and aid = 80044011;
1
Transaction T2
BEGIN;
SELECT abalance FROM pgbench_accounts
WHERE bid = 801 and aid = 80044011 FOR
UPDATE;
1
UPDATE pgbench_accounts SET abalance =
1001 WHERE bid = 801 and aid = 80044011;
COMMIT;
Transaction T3
SELECT abalance FROM pgbench_accounts
WHERE bid = 801 and aid = 80044011;
1001
1) router gets time t100
2) execute on shard w/bid
619 using snapshot@t100
1) router gets time t103
2) execute on shard w/bid
801 using snapshot@t103
1) router uses 1PC on shard
2) shard assigns commit@t110
3) acks commit when
a) writes durable on disk
b) earliest possible time > t110
1) router gets time t125
2) execute on shard w/bid
801 using snapshot@t125
1) execute on shard w/bid
801 using snapshot@t100
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Read committed – distributed
Transaction T1
SELECT SUM(abalance) FROM
pgbench_accounts;
10000000
Transaction T3
SELECT SUM(abalance) FROM
pgbench_accounts;
10000000
Transaction T2
BEGIN;
UPDATE pgbench_accounts SET abalance =
abalance – 500 WHERE bid = 619 and aid =
61890340;
UPDATE pgbench_accounts SET abalance =
abalance + 500 WHERE bid = 801 and aid =
80044011;
COMMIT;
1) router gets time t100
2) executes sum() on each shard
using snapshot@t100
3) aggregates the result
1) router gets time t103
2) execute on shard w/bid
619 using snapshot@t103
1) router gets time t107
2) execute on shard w/bid
801 using snapshot@t107
1) router determines 2PC, asks
shards to prepare
2) shard w/619 prepares@t118
shard w/801 prepares@t112
3) router assigns commit@t120
4) acks commit when
a) writes durable on disk
b) earliest possible time > t120
5) router tells shards to
commit@t120
1) router gets time t116
2) executes sum() on each shard
using snapshot@t116
3) aggregates the result
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Transactions conclusion
Same RC/RR semantics as PostgreSQL
All reads are consistent, w/o quorum, even on failover
Commits w/single shard writes scale linearly (millions/sec)
Distributed commits are atomic
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Queries & SQL compatibility
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Fundamentally Aurora PostgreSQL
PostgreSQL parser and semantics
Broad surface area coverage Selected extensions
PostgreSQL wire compatible
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Query execution basics
PostgreSQL foreign tables
foundation
Enhancements in core engine
A custom foreign data wrapper
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Query flow
Router
1. Receives query from client
2. Plans what can be sent to shards
and any joins that must be done
3. Sends partial queries to shards with
transaction context
7. Router does final joins, filters, and
aggregations as necessary
Shard
4. Receives partial query from router
5. Plans local joins and scans
6. Execute and sent results to router
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Single shard queries
Best performance when router determines query goes to a single shard
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Parallel operations
Parallel operations speed up on multi-shard
Some examples:
Create index
Analyze
Vacuum
Aggregates (sum, min, max, etc.)
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Thank you!
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Please complete the session
survey in the mobile app
Thank you!
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.

[D3T1S02] Aurora Limitless Database Introduction

  • 1.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Jihoon Kim Database Specialist Solutions Architect AWS Aurora Limitless Database Introduction
  • 2.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Common scaling dimensions Object size Object count Query volume Time QPS
  • 3.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Sharding Application A-Z A-F G-K L-P Q-U V-Z
  • 4.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Sharding brings … challenges Application Consistency? Querying? Maintenance? ?
  • 5.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Amazon Aurora Limitless Database Application Scaling Managed Lim ited Preview
  • 6.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Amazon Aurora Limitless Database Single interface Transactionally Consistent Millions of transactions Distributed Serverless Petabytes of Data Scaling Managed
  • 7.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Using Limitless Database cust_id name email order_id cust_id amount tax_rate_id tax_rate_id city state country tax_rate Order Tax Rate Customer
  • 8.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Using Limitless Database Order Tax Rate Customer Shard 1 Shard 2 Shard 3 Tax Rate Customer Customer Tax Rate Order Order Tax Rate Customer Order Sharded Reference collocated
  • 9.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Create sharded customer table SET rds_aurora.limitless_create_table_mode='sharded'; SET rds_aurora.limitless_create_table_shard_key='{“cust_id"}'; CREATE TABLE customer ( cust_id INT PRIMARY KEY NOT NULL, name TEXT, email VARCHAR(100) );
  • 10.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Create co-located order table SET rds_aurora.limitless_create_table_mode='sharded'; SET rds_aurora.limitless_create_table_shard_key='{“cust_id"}'; SET rds_aurora.limitless_create_table_collocate_with='customer'; CREATE TABLE order ( order_id INT NOT NULL, cust_id INT NOT NULL, amount DOUBLE NOT NULL, tax_rate_id DOUBLE, PRIMARY KEY (order_id, cust_id) );
  • 11.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Create reference table tax_rate SET rds_aurora.limitless_create_table_mode='reference'; CREATE TABLE tax_rate ( tax_rate_id INT PRIMARY KEY NOT NULL, city TEXT NOT NULL, state TEXT, country TEXT NOT NULL, tax_rate DOUBLE NOT NULL ); SET rds_aurora.limitless_create_table_mode='standard';
  • 12.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Architecture overview
  • 13.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Standard Aurora architecture Aurora cluster Aurora distributed storage Reader instances Writer instance 1 2 3 1. Aurora volume on distributed storage 2. An Aurora writer instance 3. Optional reader instances for availability and read scaling 4. Limitless Database introduces the “shard group” concept Limitless Database shard group 4
  • 14.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Limitless Database shard group Aurora cluster Data access shards Limitless Database shard group Aurora distributed storage Distributed transaction routers primary writer Contained within your Aurora cluster Encapsulates Limitless Database infrastructure for your cluster Provides an endpoint for applications Scales resources within configured limit based on load and data size
  • 15.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Distributed transaction routers Aurora cluster Data access shards Limitless Database shard group Aurora distributed storage Distributed transaction routers primary writer Serve all application traffic Scale vertically and horizontally based on load Know schema and key range placement Assign time for transaction snapshot and drive distributed commits Perform initial planning of query and aggregate results from multi-shard queries
  • 16.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Data access shards Aurora cluster Data access shards Limitless Database shard group Aurora distributed storage Distributed transaction routers primary writer Own portion of sharded table key space and have full copies of reference tables Scale vertically and split based on load Perform local planning and execution of query fragments Execute local transaction logic Backed by Aurora distributed storage
  • 17.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Creating a shard group aws rds create-db-shard-group --db-cluster-identifier proddb --db-shard-group-identifier proddb-sg --max-acu 600 --compute-redundancy 2 Total scale of all routers and shards will be <= max-acu
  • 18.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Compute redundancy Aurora Cluster Availability Zone 1 Availability Zone 2 Availability Zone 3 Shard Group Aurora distributed storage Distributed Transaction Routers Data Access Shards S1 S2 S3 S3 S1 S2 S2 S3 S1 Compute redundancy 0 Compute redundancy 1 Compute redundancy 2 Separately configure HA for the primary writer PW
  • 19.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Data distribution
  • 20.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Sharded tables set rds_aurora.limitless_create_table_mode='sharded'; set rds_aurora.limitless_create_table_shard_key='{bid}’; create table pgbench_branches( bid int not null, bbalance int, filler char(88)); postgres_limitless=> d+ pgbench_branches Partitioned table "public.pgbench_branches" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description ----------+---------------+-----------+----------+---------+----------+-------------+--------------+-------- ----- bid | integer | | not null | | plain | | | bbalance | integer | | | | plain | | | filler | character(88) | | | | extended | | | Partition key: HASH (bid) Partitions: pgbench_branches_fs00001 FOR VALUES FROM (MINVALUE) TO ('-4611686018427387904'), pgbench_branches_fs00002 FOR VALUES FROM ('-4611686018427387904') TO ('0'), pgbench_branches_fs00003 FOR VALUES FROM ('0') TO ('4611686018427387904'), pgbench_branches_fs00004 FOR VALUES FROM ('4611686018427387904') TO (MAXVALUE)
  • 21.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Hash-range partitioning Shard key is hashed to 64-bits Ranges of 64-bit space are assigned to shards Shards own table fragments Routers have table fragment references, but no data pgbench_branches fragments pgbench_branches MINVALUE -4611686018427387904 -4611686018427387904 0 0 4611686018427387904 4611686018427387904 MAXVALUE Distributed transaction routers Data access shards
  • 22.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Table slicing Table fragments are partitioned into sub-range slices Not directly visible to users Improve intra-shard parallelism Relocate on horizontal scale out
  • 23.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Aurora distributed storage Horizontal scale out “Shard split” occurs due to utilization or storage size Collocated table slices moved together Leverages Aurora storage level cloning and replication Routers can be added or removed accounts and branches fragments fragment references MINVALUE -4611686018427387904 -4611686018427387904 0 0 4611686018427387904 4611686018427387904 MAXVALUE Distributed transaction routers Data access shards 4611686018427387904 9223372036854775808 9223372036854775808 MAXVALUE
  • 24.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Reference tables set rds_aurora.limitless_create_table_mode='reference’; create table pgbench_rates( pid int not null primary key, term int, rate numeric not null); pgbench_rates pgbench_rates pgbench_rates pgbench_rates pgbench_rates Distributed transaction routers Data access shards Strongly consistent (ACID writes) Enables join pushdown Frequent read/join, infrequent write
  • 25.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Transactions
  • 26.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Transaction support Support for READ COMMITED and REPEATABLE READ …with a consistent view as in a single system
  • 27.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Isolation level refresher READ COMMITTED See the latest committed data before your query began Every query in a transaction could see different data REPEATABLE READ See the latest committed data before your transaction began Every query in a transaction sees the same data Design goal: Retain PostgreSQL transaction semantics
  • 28.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Challenges in a distributed database Coordination limits scalability Query fragments execute at different times Transaction scope unknown until commit Maintain order Consistent restores
  • 29.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Solved with bounded clocks Based on EC2 TimeSync service v Current time (approximate) v Earliest possible time v Latest possible time Integrated into PostgreSQL v Tuple visibility based on time of snapshot and commit v Global read-after-write v One-phase & two-phase commit
  • 30.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Repeatable read – distributed (with clocks) Transaction T1 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT abalance FROM pgbench_accounts WHERE bid = 619 and aid = 61890340; 704 SELECT abalance FROM pgbench_accounts WHERE bid = 801 and aid = 80044011; 1 Transaction T2 BEGIN; SELECT abalance FROM pgbench_accounts WHERE bid = 801 and aid = 80044011 FOR UPDATE; 1 UPDATE pgbench_accounts SET abalance = 1001 WHERE bid = 801 and aid = 80044011; COMMIT; Transaction T3 SELECT abalance FROM pgbench_accounts WHERE bid = 801 and aid = 80044011; 1001 1) router gets time t100 2) execute on shard w/bid 619 using snapshot@t100 1) router gets time t103 2) execute on shard w/bid 801 using snapshot@t103 1) router uses 1PC on shard 2) shard assigns commit@t110 3) acks commit when a) writes durable on disk b) earliest possible time > t110 1) router gets time t125 2) execute on shard w/bid 801 using snapshot@t125 1) execute on shard w/bid 801 using snapshot@t100
  • 31.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Read committed – distributed Transaction T1 SELECT SUM(abalance) FROM pgbench_accounts; 10000000 Transaction T3 SELECT SUM(abalance) FROM pgbench_accounts; 10000000 Transaction T2 BEGIN; UPDATE pgbench_accounts SET abalance = abalance – 500 WHERE bid = 619 and aid = 61890340; UPDATE pgbench_accounts SET abalance = abalance + 500 WHERE bid = 801 and aid = 80044011; COMMIT; 1) router gets time t100 2) executes sum() on each shard using snapshot@t100 3) aggregates the result 1) router gets time t103 2) execute on shard w/bid 619 using snapshot@t103 1) router gets time t107 2) execute on shard w/bid 801 using snapshot@t107 1) router determines 2PC, asks shards to prepare 2) shard w/619 prepares@t118 shard w/801 prepares@t112 3) router assigns commit@t120 4) acks commit when a) writes durable on disk b) earliest possible time > t120 5) router tells shards to commit@t120 1) router gets time t116 2) executes sum() on each shard using snapshot@t116 3) aggregates the result
  • 32.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Transactions conclusion Same RC/RR semantics as PostgreSQL All reads are consistent, w/o quorum, even on failover Commits w/single shard writes scale linearly (millions/sec) Distributed commits are atomic
  • 33.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Queries & SQL compatibility
  • 34.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Fundamentally Aurora PostgreSQL PostgreSQL parser and semantics Broad surface area coverage Selected extensions PostgreSQL wire compatible
  • 35.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Query execution basics PostgreSQL foreign tables foundation Enhancements in core engine A custom foreign data wrapper
  • 36.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Query flow Router 1. Receives query from client 2. Plans what can be sent to shards and any joins that must be done 3. Sends partial queries to shards with transaction context 7. Router does final joins, filters, and aggregations as necessary Shard 4. Receives partial query from router 5. Plans local joins and scans 6. Execute and sent results to router
  • 37.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Single shard queries Best performance when router determines query goes to a single shard
  • 38.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Parallel operations Parallel operations speed up on multi-shard Some examples: Create index Analyze Vacuum Aggregates (sum, min, max, etc.)
  • 39.
    © 2024, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Thank you! © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the mobile app Thank you! © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.