KEMBAR78
Running complex data queries in a distributed system | PDF
Copyright © ArangoDB Inc. , 2018
One Engine, one Query Language.
Multiple Data Models.
Copyright © ArangoDB Inc. , 2018
¡Hola, me llamo Jan!
I am working for ArangoDB Inc. in Colonia, DE
I am one of the developers of ArangoDB,
the distributed, multi-model database
About me
Copyright © ArangoDB Inc. , 2018
Running complex queries
in a distributed system
Copyright © ArangoDB Inc. , 2018
Until recently, there was a tradeof to consider when choosing an
OLTP database
Database tradeofs
Complex queries, joins
Transactional guarantees
Highly available
Scalable
traditional
relational
“NoSQL”
Copyright © ArangoDB Inc. , 2018
In the last few years, there has been a trend towards distributed
databases adopting complex query functionality and transactions
Database trends
Complex queries, joins
Transactional guarantees
Highly available
Scalable
traditional
relational
“NoSQL”
Highly available
Scalable
Transactional guarantees
Complex queries, joins
“NewSQL”
(insert buzzword of choice)
Copyright © ArangoDB Inc. , 2018
●
Distributed databases primer
●
Organizing queries in a distributed database
●
Distributed ACID transactions
●
Q & A
Today I will only consider OLTP databases
Sorry, no Spark/Hadoop!
Agenda
Copyright © ArangoDB Inc. , 2018
Distributed databases
primer
Copyright © ArangoDB Inc. , 2018
A distributed database is a cluster of database nodes
The overall dataset is partitioned into smaller chunks (“shards”)
Adding new nodes to the database increases its capacity (scale out)
Distributed databases
Copyright © ArangoDB Inc. , 2018
Sharding example
node A node B node C
Shards: S1, S2 Shards: S3, S4 Shards: S5, S6, S7
3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
shards
Copyright © ArangoDB Inc. , 2018
Adding a node = increased capacity
node A node B node C
Shards: S1, S2 Shards: S3, S4 Shards: S5, S6
4 nodes (A, B, C, D), 8 shards (S1, S2, S3, S4, S5, S6, S7, S8)
shards
node D
Shards: S7, S8
Copyright © ArangoDB Inc. , 2018
What about data loss?
node A node B node C
Shards: S1, S2 Shards: S3, S4 Shards: S5, S6, S7
3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
shards
Copyright © ArangoDB Inc. , 2018
Node failure = data loss
node A node B node C
Shards: S1, S2 Shards: S3, S4 Shards: S5, S6, S7
3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
shards
Copyright © ArangoDB Inc. , 2018
Shards example with replicas
node A node B node C
Shards: S1, S2
Replicas: S4, S6, S7
Shards: S3, S4
Replicas: S2, S5
Shards: S5, S6, S7
Replicas: S1, S3
shards
replicas
3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
Copyright © ArangoDB Inc. , 2018
Node failure with a replica setup
node A node B node C
Shards: S1, S2
Replicas: S4, S6, S7
Shards: S3, S4
Replicas: S2, S5
Shards: S5, S6, S7
Replicas: S1, S3
shards
replicas
3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
Copyright © ArangoDB Inc. , 2018
Promoting replicas
node A node B node C
Shards: S1, S2, S4
Replicas: S4, S6, S7
Shards: S3, S4
Replicas: S2, S5
Shards: S3, S5, S6, S7
Replicas: S1, S3
shards
replicas
2 nodes (A, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
Copyright © ArangoDB Inc. , 2018
Creating new replicas
node A node B node C
Shards: S1, S2, S4
Replicas: S3, S5, S6, S7
Shards: S3, S4
Replicas: S2, S5
Shards: S3, S5, S6, S7
Replicas: S1, S2, S4
shards
replicas
2 nodes (A, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
Copyright © ArangoDB Inc. , 2018
Organizing queries in a
distributed database
Copyright © ArangoDB Inc. , 2018
A typical distributed query will involve multiple nodes, and requires
communication between them
There is normally a coordinating node for per query, which is
responsible for
●
triggering data processing steps on the other nodes
●
putting together the partial results from the other nodes
●
sending the merged result back to the client
●
shutting down the query on the other nodes
Query coordination
Copyright © ArangoDB Inc. , 2018
Query coordination example
3 data nodes
Query coordinator node:
fetches data from nodes
merges the results
send result to client
shuts down query on nodes
Query result
data nodes:
return data of
shards
Copyright © ArangoDB Inc. , 2018
For each inter-node communication, there will be a network
roundtrip (latency++)
One of the major goals when running distributed queries is to
minimize the amount of network communication, e.g. by
●
restricting the query to as few shards as possible
●
pushing flter conditions to the shards
●
pre-aggregating data on the shards
Operations on diferent shards can also be executed in parallel to
reduce overall latency
Distributed query considerations
Copyright © ArangoDB Inc. , 2018
Now following are some example queries from ArangoDB
ArangoDB is a multi-model NoSQL database, which supports
documents, graphs and key-values
It can be run in single-server or distributed (cluster) mode
ArangoDB provides its own query language AQL, which is similar to
SQL, but has a diferent syntax
ArangoDB query examples
Copyright © ArangoDB Inc. , 2018
A simple ArangoDB query with a flter condition:
FOR u IN users
FILTER u.active == true
RETURN u
which is equivalent to SQL’s
SELECT * FROM users u WHERE u.active = 1
The coordinator will push the flter condition to the shards,
so they will only return data that satisfes the flter condition
Query example (flter)
Copyright © ArangoDB Inc. , 2018
Query example (flter)
3 data nodes
Query: FOR u IN users
FILTER u.active == true RETURN u coordinator:
fetches data from all shards
merges the results
Query result
data nodes:
return filteirieil
data of shards
Copyright © ArangoDB Inc. , 2018
Now a query using a flter on a shard key attribute:
FOR u IN users
FILTER u._key == “jsteemann”
RETURN u
which is equivalent to SQL’s
SELECT * FROM users u WHERE u._key = “jsteemann”
The coordinator will restrict to query to the one shard the data is
located on, push the flter condition to the shard and fetch the results
from there
Query example (flter on shard key)
Copyright © ArangoDB Inc. , 2018
Query example (flter on shard key)
3 data nodes
Query: FOR u IN users FILTER
u._key == “jsteemann” RETURN u coordinator:
fetches data from singlei
shard
Query result
singlei data node:
returns filteirieil
data of shard
Copyright © ArangoDB Inc. , 2018
Another ArangoDB query, now with a sort condition and a projection:
FOR u IN users
SORT u.name
RETURN u.name
which is equivalent to SQL’s
SELECT u.name FROM users u ORDER BY u.name
The coordinator will push the sort condition and the projection to all
shards, and combines the locally sorted results from the shards into a
totally ordered result (using merge-sort)
Query example (sorting)
Copyright © ArangoDB Inc. , 2018
Query example (sorting)
3 data nodes
Query: FOR u IN users
SORT u.name RETURN u.name coordinator:
fetches data from all shards
meirigei-sorits the results
Query result
data nodes:
return soriteil and
priojeicteil data of
shards
Copyright © ArangoDB Inc. , 2018
One more ArangoDB query, now using aggregation:
FOR u IN users
COLLECT year = DATE_YEAR(u.dob)
AGGREGATE count = COUNT(u.dob)
RETURN { year, count }
which is equivalent to SQL’s
SELECT YEAR(u.dob) AS year, COUNT(u.dob) AS count
FROM users u GROUP BY year
The coordinator will push the aggregation to all shards, and combines
the already aggregated results from the shards into a single result
Query example (aggregation)
Copyright © ArangoDB Inc. , 2018
Query example (aggregation)
3 data nodes
Query: FOR u IN users COLLECT ...
RETURN { year, count } coordinator:
fetches data from all shards
aggrieigateis thei
aggrieigateisQuery result
data nodes:
return
aggrieigateil data
of shards
Copyright © ArangoDB Inc. , 2018
One fnal ArangoDB query, now with an equi-join:
FOR u IN users FOR p IN purchases
FILTER u._key == p.user
RETURN { user: u, purchase: p }
which is equivalent to SQL’s
SELECT u.* AS user, p.* AS purchase
FROM users u, purchases p WHERE u._key = p.user
The coordinator will query all shards of the “purchases” collection, and
these will reach out to the coordinator again to get data from all shards
of the “users” collection
Query example (join)
Copyright © ArangoDB Inc. , 2018
Query example (join)
Query: FOR u IN users ...
RETURN {p , u } coordinator:
fetches data from all shards
of “purchases”
merges the results
Query result
data nodes:
fetch data from above
fetch data of shards for
“purchases”
join them
coordinator:
fetches data from all shards
of “users”
merges the results
data nodes:
return data of
shards for “users”
3 + 2 data nodes
Copyright © ArangoDB Inc. , 2018
Distributed
ACID transactions
Copyright © ArangoDB Inc. , 2018
With transactions, complex operations on multiple data items can be
executed in an all-or-nothing fashion
If something goes wrong, the database will do an automatic
cleanup of partially executed operations
With transactions, the database will ensure consistency of data and
protect us from anomalies, no matter if there are other concurrent
operations on the same data
Key take-away: transactions make application developers’ lifes easier
Benefts of transactions
Copyright © ArangoDB Inc. , 2018
Some distributed databases also support ACID transactions
or have plans to add them:
●
Google Cloud Spanner (Database as a service)
●
CockroachDB
●
FoundationDB
●
FaunaDB (closed source)
●
...
●
MongoDB (announced for future releases, with limitations)
Distributed databases with transactions
Copyright © ArangoDB Inc. , 2018
While a distributed transaction is ongoing, it may make modifcations
on diferent nodes
These changes need to be inefective (hidden) until the transaction
actually commits
On commit, the transaction’s changes must become instantly
visible on all nodes at the same time
Atomicity
Copyright © ArangoDB Inc. , 2018
Distributed databases normally store the status of transactions
(pending, committed, aborted) in a private section of the key space,
e.g:
Key Value
T0 commited
T1 aborted
T2 pending
When a transaction commits, its status key is atomically updated
from “pending” to “committed”
Atomicity
Copyright © ArangoDB Inc. , 2018
Databases that provide consistency normally serialize all write
operations for a key on the designated “leader” node for its shard
The state of data on the leader shard then is a consistent
”source of truth” for that shard
Write operations are replicated from leaders to replicas in the same
order as applied on the leader
Replicas are thus exact copies of the leader shards and can take over
any time
Consistency – designated leaders
Copyright © ArangoDB Inc. , 2018
Leader-only writes
Query: put(“amount”, 10)
Query: put(“amount”, 42)
Leader determines the order of the
operations for the same key and
executes them one after the other,
e.g.:
1. put(“amount”, 10)
2. put(“amount”, 42)
Query: put(“amount”, 42)
10
42
Copyright © ArangoDB Inc. , 2018
Shard leaders can change over time, e.g. in case of node failures,
planned maintenance
It is necessary that all nodes in the cluster have the same view on
who is the current leader for a specifc shard, and which are the
shard’s current replicas
Shard leadership
Copyright © ArangoDB Inc. , 2018
The nodes in the cluster normally use a “consensus protocol” to
exchange status messages
Paxos and RAFT are the most commonly used consensus protocols in
distributed databases
These protocols are designed to handle network partitions and node
failures, and will work reliably if a majority of nodes is still available
and can still exchange messages with each other
Consensus protocols
Copyright © ArangoDB Inc. , 2018
To ensure consistency, transactions that modify the same data must
be put into an unambiguous order
Having an unambiguous global order allows having a cross-node
consistent view on the data
This is hard to achieve because the transactions can start on diferent
nodes in parallel
Ordering transactions
Copyright © ArangoDB Inc. , 2018
Each transaction is assigned a timestamp when it is started
This same timestamp will be used later as the transaction’s commit
timestamp
The timestamps of transactions will be used for ordering them
Rule: a transaction with a lower timestamp happened before a
transaction with a higher timestamp
Ordering transactions using timestamps
Copyright © ArangoDB Inc. , 2018
Timestamps created by diferent nodes are not reliably comparable
due to clock skew
The solution to make them comparable in most cases is to defne an
“uncertainty interval” (which is the maximum tolerable clock skew)
If the timestamp diference is outside of the “uncertainty interval”,
two timestamps are safely comparable
Two timestamps with a diference inside the uncertainty interval are
not comparable safely, and the relative order of them is unknown
Clock skew
Copyright © ArangoDB Inc. , 2018
If the transactions could have infuence on each other, this is an
(actual or a potential) read or write confict, and one of the
transactions must be aborted or restarted
A transaction restart also means assigning a new, higher timestamp
Consistency using timestamps
Copyright © ArangoDB Inc. , 2018
To ensure isolation, a running transaction must not overwrite or
remove data that another ongoing transaction may still see
Write operations are stored in a multi-version data structure, which
can handle multiple values for the same key at the same time
Any transaction that reads or writes a key needs to fnd the “correct”
version of it
Isolation
Copyright © ArangoDB Inc. , 2018
Key Transaction ID Value
“amount” T0 10
”amount” T1 42
”name” T17 ”test”
”page” T2 ”index.html”
”page” T50 <removed>
Any operation can identify whether it can “see” an operation from
another transaction, simply by looking up the status and timestamp
of the corresponding transaction
Isolation – multi-versioning
Copyright © ArangoDB Inc. , 2018
Durability
To ensure durability, every write operation (and also transaction status
changes) needs to be persisted on multiple nodes (leader + replicas)
A commit is only considered successful if acknowledged by a
confgurable number of nodes
Copyright © ArangoDB Inc. , 2018
In the last few years, there has been a trend towards distributed
databases adopting complex query functionality and transactions
Database trends
Complex queries, joins
Transactional guarantees
Highly available
Scalable
traditional
relational
“NoSQL”
Highly available
Scalable
Transactional guarantees
Complex queries, joins
“NewSQL”
(insert buzzword of choice)
Copyright © ArangoDB Inc. , 2018
¡Muchas gracias!
¿Hay preguntas?
Copyright © ArangoDB Inc. , 2018
Please star ArangoDB on Github:
https://github.com/arangodb/arangodb
Participate in ArangoDB’s community survey to win a t-shirt:
https://arangodb.com/community-survey/
#arangodb | jan@arangodb.com
Icons made by Freepik (www.freepik.com) from www.faticon.com,
licensed by CC 3.0 BY
Links / credits

Running complex data queries in a distributed system

  • 1.
    Copyright © ArangoDBInc. , 2018 One Engine, one Query Language. Multiple Data Models.
  • 2.
    Copyright © ArangoDBInc. , 2018 ¡Hola, me llamo Jan! I am working for ArangoDB Inc. in Colonia, DE I am one of the developers of ArangoDB, the distributed, multi-model database About me
  • 3.
    Copyright © ArangoDBInc. , 2018 Running complex queries in a distributed system
  • 4.
    Copyright © ArangoDBInc. , 2018 Until recently, there was a tradeof to consider when choosing an OLTP database Database tradeofs Complex queries, joins Transactional guarantees Highly available Scalable traditional relational “NoSQL”
  • 5.
    Copyright © ArangoDBInc. , 2018 In the last few years, there has been a trend towards distributed databases adopting complex query functionality and transactions Database trends Complex queries, joins Transactional guarantees Highly available Scalable traditional relational “NoSQL” Highly available Scalable Transactional guarantees Complex queries, joins “NewSQL” (insert buzzword of choice)
  • 6.
    Copyright © ArangoDBInc. , 2018 ● Distributed databases primer ● Organizing queries in a distributed database ● Distributed ACID transactions ● Q & A Today I will only consider OLTP databases Sorry, no Spark/Hadoop! Agenda
  • 7.
    Copyright © ArangoDBInc. , 2018 Distributed databases primer
  • 8.
    Copyright © ArangoDBInc. , 2018 A distributed database is a cluster of database nodes The overall dataset is partitioned into smaller chunks (“shards”) Adding new nodes to the database increases its capacity (scale out) Distributed databases
  • 9.
    Copyright © ArangoDBInc. , 2018 Sharding example node A node B node C Shards: S1, S2 Shards: S3, S4 Shards: S5, S6, S7 3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7) shards
  • 10.
    Copyright © ArangoDBInc. , 2018 Adding a node = increased capacity node A node B node C Shards: S1, S2 Shards: S3, S4 Shards: S5, S6 4 nodes (A, B, C, D), 8 shards (S1, S2, S3, S4, S5, S6, S7, S8) shards node D Shards: S7, S8
  • 11.
    Copyright © ArangoDBInc. , 2018 What about data loss? node A node B node C Shards: S1, S2 Shards: S3, S4 Shards: S5, S6, S7 3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7) shards
  • 12.
    Copyright © ArangoDBInc. , 2018 Node failure = data loss node A node B node C Shards: S1, S2 Shards: S3, S4 Shards: S5, S6, S7 3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7) shards
  • 13.
    Copyright © ArangoDBInc. , 2018 Shards example with replicas node A node B node C Shards: S1, S2 Replicas: S4, S6, S7 Shards: S3, S4 Replicas: S2, S5 Shards: S5, S6, S7 Replicas: S1, S3 shards replicas 3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
  • 14.
    Copyright © ArangoDBInc. , 2018 Node failure with a replica setup node A node B node C Shards: S1, S2 Replicas: S4, S6, S7 Shards: S3, S4 Replicas: S2, S5 Shards: S5, S6, S7 Replicas: S1, S3 shards replicas 3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
  • 15.
    Copyright © ArangoDBInc. , 2018 Promoting replicas node A node B node C Shards: S1, S2, S4 Replicas: S4, S6, S7 Shards: S3, S4 Replicas: S2, S5 Shards: S3, S5, S6, S7 Replicas: S1, S3 shards replicas 2 nodes (A, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
  • 16.
    Copyright © ArangoDBInc. , 2018 Creating new replicas node A node B node C Shards: S1, S2, S4 Replicas: S3, S5, S6, S7 Shards: S3, S4 Replicas: S2, S5 Shards: S3, S5, S6, S7 Replicas: S1, S2, S4 shards replicas 2 nodes (A, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
  • 17.
    Copyright © ArangoDBInc. , 2018 Organizing queries in a distributed database
  • 18.
    Copyright © ArangoDBInc. , 2018 A typical distributed query will involve multiple nodes, and requires communication between them There is normally a coordinating node for per query, which is responsible for ● triggering data processing steps on the other nodes ● putting together the partial results from the other nodes ● sending the merged result back to the client ● shutting down the query on the other nodes Query coordination
  • 19.
    Copyright © ArangoDBInc. , 2018 Query coordination example 3 data nodes Query coordinator node: fetches data from nodes merges the results send result to client shuts down query on nodes Query result data nodes: return data of shards
  • 20.
    Copyright © ArangoDBInc. , 2018 For each inter-node communication, there will be a network roundtrip (latency++) One of the major goals when running distributed queries is to minimize the amount of network communication, e.g. by ● restricting the query to as few shards as possible ● pushing flter conditions to the shards ● pre-aggregating data on the shards Operations on diferent shards can also be executed in parallel to reduce overall latency Distributed query considerations
  • 21.
    Copyright © ArangoDBInc. , 2018 Now following are some example queries from ArangoDB ArangoDB is a multi-model NoSQL database, which supports documents, graphs and key-values It can be run in single-server or distributed (cluster) mode ArangoDB provides its own query language AQL, which is similar to SQL, but has a diferent syntax ArangoDB query examples
  • 22.
    Copyright © ArangoDBInc. , 2018 A simple ArangoDB query with a flter condition: FOR u IN users FILTER u.active == true RETURN u which is equivalent to SQL’s SELECT * FROM users u WHERE u.active = 1 The coordinator will push the flter condition to the shards, so they will only return data that satisfes the flter condition Query example (flter)
  • 23.
    Copyright © ArangoDBInc. , 2018 Query example (flter) 3 data nodes Query: FOR u IN users FILTER u.active == true RETURN u coordinator: fetches data from all shards merges the results Query result data nodes: return filteirieil data of shards
  • 24.
    Copyright © ArangoDBInc. , 2018 Now a query using a flter on a shard key attribute: FOR u IN users FILTER u._key == “jsteemann” RETURN u which is equivalent to SQL’s SELECT * FROM users u WHERE u._key = “jsteemann” The coordinator will restrict to query to the one shard the data is located on, push the flter condition to the shard and fetch the results from there Query example (flter on shard key)
  • 25.
    Copyright © ArangoDBInc. , 2018 Query example (flter on shard key) 3 data nodes Query: FOR u IN users FILTER u._key == “jsteemann” RETURN u coordinator: fetches data from singlei shard Query result singlei data node: returns filteirieil data of shard
  • 26.
    Copyright © ArangoDBInc. , 2018 Another ArangoDB query, now with a sort condition and a projection: FOR u IN users SORT u.name RETURN u.name which is equivalent to SQL’s SELECT u.name FROM users u ORDER BY u.name The coordinator will push the sort condition and the projection to all shards, and combines the locally sorted results from the shards into a totally ordered result (using merge-sort) Query example (sorting)
  • 27.
    Copyright © ArangoDBInc. , 2018 Query example (sorting) 3 data nodes Query: FOR u IN users SORT u.name RETURN u.name coordinator: fetches data from all shards meirigei-sorits the results Query result data nodes: return soriteil and priojeicteil data of shards
  • 28.
    Copyright © ArangoDBInc. , 2018 One more ArangoDB query, now using aggregation: FOR u IN users COLLECT year = DATE_YEAR(u.dob) AGGREGATE count = COUNT(u.dob) RETURN { year, count } which is equivalent to SQL’s SELECT YEAR(u.dob) AS year, COUNT(u.dob) AS count FROM users u GROUP BY year The coordinator will push the aggregation to all shards, and combines the already aggregated results from the shards into a single result Query example (aggregation)
  • 29.
    Copyright © ArangoDBInc. , 2018 Query example (aggregation) 3 data nodes Query: FOR u IN users COLLECT ... RETURN { year, count } coordinator: fetches data from all shards aggrieigateis thei aggrieigateisQuery result data nodes: return aggrieigateil data of shards
  • 30.
    Copyright © ArangoDBInc. , 2018 One fnal ArangoDB query, now with an equi-join: FOR u IN users FOR p IN purchases FILTER u._key == p.user RETURN { user: u, purchase: p } which is equivalent to SQL’s SELECT u.* AS user, p.* AS purchase FROM users u, purchases p WHERE u._key = p.user The coordinator will query all shards of the “purchases” collection, and these will reach out to the coordinator again to get data from all shards of the “users” collection Query example (join)
  • 31.
    Copyright © ArangoDBInc. , 2018 Query example (join) Query: FOR u IN users ... RETURN {p , u } coordinator: fetches data from all shards of “purchases” merges the results Query result data nodes: fetch data from above fetch data of shards for “purchases” join them coordinator: fetches data from all shards of “users” merges the results data nodes: return data of shards for “users” 3 + 2 data nodes
  • 32.
    Copyright © ArangoDBInc. , 2018 Distributed ACID transactions
  • 33.
    Copyright © ArangoDBInc. , 2018 With transactions, complex operations on multiple data items can be executed in an all-or-nothing fashion If something goes wrong, the database will do an automatic cleanup of partially executed operations With transactions, the database will ensure consistency of data and protect us from anomalies, no matter if there are other concurrent operations on the same data Key take-away: transactions make application developers’ lifes easier Benefts of transactions
  • 34.
    Copyright © ArangoDBInc. , 2018 Some distributed databases also support ACID transactions or have plans to add them: ● Google Cloud Spanner (Database as a service) ● CockroachDB ● FoundationDB ● FaunaDB (closed source) ● ... ● MongoDB (announced for future releases, with limitations) Distributed databases with transactions
  • 35.
    Copyright © ArangoDBInc. , 2018 While a distributed transaction is ongoing, it may make modifcations on diferent nodes These changes need to be inefective (hidden) until the transaction actually commits On commit, the transaction’s changes must become instantly visible on all nodes at the same time Atomicity
  • 36.
    Copyright © ArangoDBInc. , 2018 Distributed databases normally store the status of transactions (pending, committed, aborted) in a private section of the key space, e.g: Key Value T0 commited T1 aborted T2 pending When a transaction commits, its status key is atomically updated from “pending” to “committed” Atomicity
  • 37.
    Copyright © ArangoDBInc. , 2018 Databases that provide consistency normally serialize all write operations for a key on the designated “leader” node for its shard The state of data on the leader shard then is a consistent ”source of truth” for that shard Write operations are replicated from leaders to replicas in the same order as applied on the leader Replicas are thus exact copies of the leader shards and can take over any time Consistency – designated leaders
  • 38.
    Copyright © ArangoDBInc. , 2018 Leader-only writes Query: put(“amount”, 10) Query: put(“amount”, 42) Leader determines the order of the operations for the same key and executes them one after the other, e.g.: 1. put(“amount”, 10) 2. put(“amount”, 42) Query: put(“amount”, 42) 10 42
  • 39.
    Copyright © ArangoDBInc. , 2018 Shard leaders can change over time, e.g. in case of node failures, planned maintenance It is necessary that all nodes in the cluster have the same view on who is the current leader for a specifc shard, and which are the shard’s current replicas Shard leadership
  • 40.
    Copyright © ArangoDBInc. , 2018 The nodes in the cluster normally use a “consensus protocol” to exchange status messages Paxos and RAFT are the most commonly used consensus protocols in distributed databases These protocols are designed to handle network partitions and node failures, and will work reliably if a majority of nodes is still available and can still exchange messages with each other Consensus protocols
  • 41.
    Copyright © ArangoDBInc. , 2018 To ensure consistency, transactions that modify the same data must be put into an unambiguous order Having an unambiguous global order allows having a cross-node consistent view on the data This is hard to achieve because the transactions can start on diferent nodes in parallel Ordering transactions
  • 42.
    Copyright © ArangoDBInc. , 2018 Each transaction is assigned a timestamp when it is started This same timestamp will be used later as the transaction’s commit timestamp The timestamps of transactions will be used for ordering them Rule: a transaction with a lower timestamp happened before a transaction with a higher timestamp Ordering transactions using timestamps
  • 43.
    Copyright © ArangoDBInc. , 2018 Timestamps created by diferent nodes are not reliably comparable due to clock skew The solution to make them comparable in most cases is to defne an “uncertainty interval” (which is the maximum tolerable clock skew) If the timestamp diference is outside of the “uncertainty interval”, two timestamps are safely comparable Two timestamps with a diference inside the uncertainty interval are not comparable safely, and the relative order of them is unknown Clock skew
  • 44.
    Copyright © ArangoDBInc. , 2018 If the transactions could have infuence on each other, this is an (actual or a potential) read or write confict, and one of the transactions must be aborted or restarted A transaction restart also means assigning a new, higher timestamp Consistency using timestamps
  • 45.
    Copyright © ArangoDBInc. , 2018 To ensure isolation, a running transaction must not overwrite or remove data that another ongoing transaction may still see Write operations are stored in a multi-version data structure, which can handle multiple values for the same key at the same time Any transaction that reads or writes a key needs to fnd the “correct” version of it Isolation
  • 46.
    Copyright © ArangoDBInc. , 2018 Key Transaction ID Value “amount” T0 10 ”amount” T1 42 ”name” T17 ”test” ”page” T2 ”index.html” ”page” T50 <removed> Any operation can identify whether it can “see” an operation from another transaction, simply by looking up the status and timestamp of the corresponding transaction Isolation – multi-versioning
  • 47.
    Copyright © ArangoDBInc. , 2018 Durability To ensure durability, every write operation (and also transaction status changes) needs to be persisted on multiple nodes (leader + replicas) A commit is only considered successful if acknowledged by a confgurable number of nodes
  • 48.
    Copyright © ArangoDBInc. , 2018 In the last few years, there has been a trend towards distributed databases adopting complex query functionality and transactions Database trends Complex queries, joins Transactional guarantees Highly available Scalable traditional relational “NoSQL” Highly available Scalable Transactional guarantees Complex queries, joins “NewSQL” (insert buzzword of choice)
  • 49.
    Copyright © ArangoDBInc. , 2018 ¡Muchas gracias! ¿Hay preguntas?
  • 50.
    Copyright © ArangoDBInc. , 2018 Please star ArangoDB on Github: https://github.com/arangodb/arangodb Participate in ArangoDB’s community survey to win a t-shirt: https://arangodb.com/community-survey/ #arangodb | jan@arangodb.com Icons made by Freepik (www.freepik.com) from www.faticon.com, licensed by CC 3.0 BY Links / credits