KEMBAR78
Using ScyllaDB for Extreme Scale Workloads | PDF
Using ScyllaDB for
Extreme Scale
Workloads
Tzach Livyatan, VP Product, ScyllaDB
Attila Tóth, Developer Advocate, ScyllaDB
Poll
How often do you scale your database?
Presenters
Attila Tóth, Developer Advocate
+ Working as a software engineer / dev advocate in the data space for
6+ years
+ Lives in Budapest, Hungary
Tzach Livyatan, VP Product
+ Working for product manager for ages.
+ Lives in Tel Aviv, Israel
Agenda
+ Why ScyllaDB?
+ Scylla Use Cases
+ Design For High Throughput and Low Latency
+ Coming Soon
Why ScyllaDB?
Best High Availability in the industry
Best Disaster Recovery in the industry
Best scalability in the industry
Best Price/Performance in the industry Auto-tune - out of the box performance
Compatible with Cassandra & DynamoDB
The power of Cassandra at the speed of Redis with the usability of DynamoDB
No Lock-in
Open Source Software
+400 Gamechangers Leverage ScyllaDB
NoSQL - By Availability vs Consistency
Pick Two
Availability
Partition
Tolerance
Consistency
Or use a more
granular model,
like PACELC
Document store Wide Column Key-value:
Simple DB
NoSQL - By data model
Graph store
What is important for data-intensive
applications?
High Throughput Low Latency Predictable Cost
Predictable performance at scale
Low Latency
Our low-level design
plus adaptive
capabilities keep
P99s predictably low.
High Throughput
Sustain millions of
ops/sec with low
P99s. No item or
partition size limits,
no throttling down
your workloads.
Global Scale
Operate at a global
scale with high
availability, fewer
nodes and reduced
administration.
Active/active, replicated, auto-sharded
ScyllaDB Architecture
Active/Active, replicated, auto-sharded
12
Tunable, Eventual Consistency
App
App
App
App
App
App
CL= Local
Quorum
CL= One
13
Scylla Architecture
External cache vs. ScyllaDB cache
External
ScyllaDB embedded caching
CREATE TABLE caching (
k int PRIMARY KEY,
v1 int,
v2 int
) WITH caching = {'enabled': 'true'};
SELECT * FROM users BYPASS CACHE;
SELECT name FROM users WHERE userid IN (199, 200, 207) BYPASS CACHE;
Enable/disable cache per table:
Disable cache per query:
ScyllaDB vs. DynamoDB
1/5th cost
20x higher throughput
ScyllaDB vs. Google Bigtable
1/5th the cost
26x higher throughput
ScyllaDB vs. Cassandra
5x higher throughput
2-20x lower latency
What a Difference a Database Makes
From Redis + Elasticsearch to ScyllaDB
17
<1ms P99
Zero downtime
TCO
18
“ScyllaDB provides a baseline that simplifies the whole <config> process and reduces
risk and anxiety. Once in production, rather than rely on constant human intervention,
ScyllaDB becomes self-tuning, dynamically adapting to real-world workloads..”
- Mark Smith, Discord
size, fewer nodes
8x throughput, ms P99
operational complexity
19
“This not only reduced TCO, but also reduced the pain that the database
engineering team was taking to actually maintain the cluster in a healthy state.”
– Niraj Kothari, Dir. Platforms Engineering
55 C* nodes to 6!
80% EC2 costs
5xgrowth in clusters
TCO
Speed of Redis
From Redis to ScyllaDB for
Data Stores, Fraud Detection, Ad Targeting
Scalability
962 C* nodes to 78
60% TCO
95% latency
“By moving to ScyllaDB Enterprise
software running on AWS EC2
infrastructure and on-premises,
Comcast improved P99 latency by
more than 95% and were able to rip
out a UI cache layer”
22
<1ms avg Latency
From Redis to Cassandra to ScyllaDB Cloud
4-8msP99
Fault Tolerance
23
Real-time workloads on
3 AWS nodes
Out-of-order solved
Process all Zillow data in <1 day with no
performance hit to real-time
“No one even realizes we are processing the
entirety of Zillow’s property and listings data.”
– Dan Podhola, Principle Engineer
24
“It was comparable to the solution with Kafka, and we didn’t have to
add, manage, and maintain another data product in our ecosystem.”
– Daniel Belenky, Palo Alto Networks
operational complexity
operational costs
(for 1,000+ dbs!)
app throughput
Demo Time!
1M operations/sec
+ GitHub: https://github.com/scylladb/1m-ops-demo
+ You can do the demo yourself with:
+ ScyllaDB Cloud or
+ ScyllaDB Enterprise - running under your own AWS account
+ Add your AWS credentials in variables.tf
+ Then run Terraform
+ Config:
+ Loader instances: 3 (i4i.8xlarge)
+ ScyllaDB nodes: 3 (i4i.8xlarge)
+ us-east-1
Clone the repo!
ScyllaDB Design
Horizontal & Vertical Scaling
Deep Technical Advancements
Built in C++
(no Java overhead)
System and DC Aware Sharding Per Core Shard-Aware Drivers Auto-Tuning
Network
Processor NUMA
Storage
Unique Close-to-Metal Architecture
1000’s Nodes Cluster
2000 Clusters
K8S Deployment
60TB per Node 256 Cores per Node
1B Operations
per Second
ScyllaDB Design Decisions
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++ instead of Java
ScyllaDB Design Decisions
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++ instead of Java
ScyllaDB Design Decisions
Threads Shards
1 C++ instead of Java
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
ScyllaDB Design Decisions
Legacy NoSQL Scylla
Key cache
Row cache
On-heap /
Off-heap
Linux page cache
SSTables
Unified cache
SSTables
Complex
Tuning
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++ instead of Java
ScyllaDB Design Decisions
Legacy NoSQL Scylla
Key cache
Row cache
On-heap /
Off-heap
Linux page cache
SSTables
Unified cache
SSTables
App
thread
Kernel
SSD
Page fault
Suspend thread
Initiate I/O
Context switch
I/O
completes
Interrupt
Context
switch
Map page
Resume
thread
Page fault
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++ instead of Java
ScyllaDB Design Decisions
Query
Commitlog
Compaction
Userspace
I/O
Scheduler
Disk
Queue
Queue
Queue
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++ instead of Java
Scylla Design Decisions
Memtable
Seastar
Scheduler
Compaction
Query
Repair
Commitlog
SSD
Compaction
Backlog Monitor
Memory Monitor
Adjust priority
Adjust priority
WAN
CPU
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++ instead of Java
https://play.instruqt.com/scylladb/invite/fwtkeaxygujs
Coming Soon!
Tablets
Tablets
Resharding is cheap.
SStables split at tablet boundary.
Reassign tablets to shards (logical operation).
+ Introduce a new layer of indirection - the tablets table
+ Each table has its own token range to node mapping
+ Mapping can change independently of node addition
and removal
+ Different tables can have different tablet counts
+ Managed by Raft
Implementation - Metadata
System, tablets
Query
Replica
Set
Token
+ Each tablet replica is isolated into its own
memtable+ SSTables
+ Forms its own little Log-Structured Merge Tree
+ With compaction and stuff
+ Can be migrated as a unit
+ Migration: copy the unit
+ Cleanup: delete the unit
+ Split/merge as the table grows/shrinks
Implementation - Data Path
+ Hosted on one node
+ But can be migrated freely if the node is down
+ Synchronized via Raft
+ Collects statistics on tables and tablets
+ Migrates to balance space
+ Evacuates nodes to decommission
+ Migrates to balance CPU load
+ Rebuilds and repairs
Implementation - Load Balancer
Demo Time!
Tablets
https://play.instruqt.com/scylladb/invite/fwtkeaxygujs
Upcoming: Tablet File-based streaming
+ Similar to Cassandra Zero-copy Streaming
+ But better ;-)
+ Tablets are always owned by the replica
+ Simply copy, done.
+ Up to 75% faster than Open Source for Streaming
Performance Improvements
+ Up to 1.5x Higher Throughput than Open Source
+ Up to 35% Lower Latencies (mean and P99)
Network (RPC) Compression Improvements
+ Improved network compression for RPC traffic
+ Option of Zstd instead of LZ4
+ Periodically trained dictionaries, instead
compression per message
+ See Łukasz Paszkowski on Cheating the Cloud: 50%
Savings with Compression Dictionaries at P99 CONF
Serverless (VM Based..)
Typeless Sizeless Limitless
Consistent
metadata +
Elasticity =
Much More
Poll
How long does it take for you to scale
your existing database?
Keep Learning
scylladb.com/category/engineering
Visit our
blog for more
on ScyllaDB
engineering
ONLINE | MARCH 11 + 12, 2025
CALL FOR SPEAKERS
Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com
@scylladb company/scylladb/
scylladb/

Using ScyllaDB for Extreme Scale Workloads

  • 1.
    Using ScyllaDB for ExtremeScale Workloads Tzach Livyatan, VP Product, ScyllaDB Attila Tóth, Developer Advocate, ScyllaDB
  • 2.
    Poll How often doyou scale your database?
  • 3.
    Presenters Attila Tóth, DeveloperAdvocate + Working as a software engineer / dev advocate in the data space for 6+ years + Lives in Budapest, Hungary Tzach Livyatan, VP Product + Working for product manager for ages. + Lives in Tel Aviv, Israel
  • 4.
    Agenda + Why ScyllaDB? +Scylla Use Cases + Design For High Throughput and Low Latency + Coming Soon
  • 5.
    Why ScyllaDB? Best HighAvailability in the industry Best Disaster Recovery in the industry Best scalability in the industry Best Price/Performance in the industry Auto-tune - out of the box performance Compatible with Cassandra & DynamoDB The power of Cassandra at the speed of Redis with the usability of DynamoDB No Lock-in Open Source Software
  • 6.
  • 7.
    NoSQL - ByAvailability vs Consistency Pick Two Availability Partition Tolerance Consistency Or use a more granular model, like PACELC
  • 8.
    Document store WideColumn Key-value: Simple DB NoSQL - By data model Graph store
  • 9.
    What is importantfor data-intensive applications? High Throughput Low Latency Predictable Cost
  • 10.
    Predictable performance atscale Low Latency Our low-level design plus adaptive capabilities keep P99s predictably low. High Throughput Sustain millions of ops/sec with low P99s. No item or partition size limits, no throttling down your workloads. Global Scale Operate at a global scale with high availability, fewer nodes and reduced administration.
  • 11.
  • 12.
    Active/Active, replicated, auto-sharded 12 Tunable,Eventual Consistency App App App App App App CL= Local Quorum CL= One
  • 13.
  • 14.
    External cache vs.ScyllaDB cache External
  • 15.
    ScyllaDB embedded caching CREATETABLE caching ( k int PRIMARY KEY, v1 int, v2 int ) WITH caching = {'enabled': 'true'}; SELECT * FROM users BYPASS CACHE; SELECT name FROM users WHERE userid IN (199, 200, 207) BYPASS CACHE; Enable/disable cache per table: Disable cache per query:
  • 16.
    ScyllaDB vs. DynamoDB 1/5thcost 20x higher throughput ScyllaDB vs. Google Bigtable 1/5th the cost 26x higher throughput ScyllaDB vs. Cassandra 5x higher throughput 2-20x lower latency What a Difference a Database Makes
  • 17.
    From Redis +Elasticsearch to ScyllaDB 17 <1ms P99 Zero downtime TCO
  • 18.
    18 “ScyllaDB provides abaseline that simplifies the whole <config> process and reduces risk and anxiety. Once in production, rather than rely on constant human intervention, ScyllaDB becomes self-tuning, dynamically adapting to real-world workloads..” - Mark Smith, Discord size, fewer nodes 8x throughput, ms P99 operational complexity
  • 19.
    19 “This not onlyreduced TCO, but also reduced the pain that the database engineering team was taking to actually maintain the cluster in a healthy state.” – Niraj Kothari, Dir. Platforms Engineering 55 C* nodes to 6! 80% EC2 costs 5xgrowth in clusters
  • 20.
    TCO Speed of Redis FromRedis to ScyllaDB for Data Stores, Fraud Detection, Ad Targeting Scalability
  • 21.
    962 C* nodesto 78 60% TCO 95% latency “By moving to ScyllaDB Enterprise software running on AWS EC2 infrastructure and on-premises, Comcast improved P99 latency by more than 95% and were able to rip out a UI cache layer”
  • 22.
    22 <1ms avg Latency FromRedis to Cassandra to ScyllaDB Cloud 4-8msP99 Fault Tolerance
  • 23.
    23 Real-time workloads on 3AWS nodes Out-of-order solved Process all Zillow data in <1 day with no performance hit to real-time “No one even realizes we are processing the entirety of Zillow’s property and listings data.” – Dan Podhola, Principle Engineer
  • 24.
    24 “It was comparableto the solution with Kafka, and we didn’t have to add, manage, and maintain another data product in our ecosystem.” – Daniel Belenky, Palo Alto Networks operational complexity operational costs (for 1,000+ dbs!) app throughput
  • 25.
  • 26.
    + GitHub: https://github.com/scylladb/1m-ops-demo +You can do the demo yourself with: + ScyllaDB Cloud or + ScyllaDB Enterprise - running under your own AWS account + Add your AWS credentials in variables.tf + Then run Terraform + Config: + Loader instances: 3 (i4i.8xlarge) + ScyllaDB nodes: 3 (i4i.8xlarge) + us-east-1 Clone the repo!
  • 27.
  • 28.
    Horizontal & VerticalScaling Deep Technical Advancements Built in C++ (no Java overhead) System and DC Aware Sharding Per Core Shard-Aware Drivers Auto-Tuning Network Processor NUMA Storage Unique Close-to-Metal Architecture 1000’s Nodes Cluster 2000 Clusters K8S Deployment 60TB per Node 256 Cores per Node 1B Operations per Second
  • 29.
    ScyllaDB Design Decisions 1 2All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++ instead of Java
  • 30.
    ScyllaDB Design Decisions 1 2All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++ instead of Java
  • 31.
    ScyllaDB Design Decisions ThreadsShards 1 C++ instead of Java 2 All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous
  • 32.
    ScyllaDB Design Decisions LegacyNoSQL Scylla Key cache Row cache On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning 1 2 All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++ instead of Java
  • 33.
    ScyllaDB Design Decisions LegacyNoSQL Scylla Key cache Row cache On-heap / Off-heap Linux page cache SSTables Unified cache SSTables App thread Kernel SSD Page fault Suspend thread Initiate I/O Context switch I/O completes Interrupt Context switch Map page Resume thread Page fault 1 2 All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++ instead of Java
  • 34.
    ScyllaDB Design Decisions Query Commitlog Compaction Userspace I/O Scheduler Disk Queue Queue Queue 1 2All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++ instead of Java
  • 35.
    Scylla Design Decisions Memtable Seastar Scheduler Compaction Query Repair Commitlog SSD Compaction BacklogMonitor Memory Monitor Adjust priority Adjust priority WAN CPU 1 2 All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++ instead of Java
  • 36.
  • 37.
    Tablets Resharding is cheap. SStablessplit at tablet boundary. Reassign tablets to shards (logical operation).
  • 38.
    + Introduce anew layer of indirection - the tablets table + Each table has its own token range to node mapping + Mapping can change independently of node addition and removal + Different tables can have different tablet counts + Managed by Raft Implementation - Metadata System, tablets Query Replica Set Token
  • 39.
    + Each tabletreplica is isolated into its own memtable+ SSTables + Forms its own little Log-Structured Merge Tree + With compaction and stuff + Can be migrated as a unit + Migration: copy the unit + Cleanup: delete the unit + Split/merge as the table grows/shrinks Implementation - Data Path
  • 40.
    + Hosted onone node + But can be migrated freely if the node is down + Synchronized via Raft + Collects statistics on tables and tablets + Migrates to balance space + Evacuates nodes to decommission + Migrates to balance CPU load + Rebuilds and repairs Implementation - Load Balancer
  • 42.
  • 43.
    Upcoming: Tablet File-basedstreaming + Similar to Cassandra Zero-copy Streaming + But better ;-) + Tablets are always owned by the replica + Simply copy, done. + Up to 75% faster than Open Source for Streaming
  • 44.
    Performance Improvements + Upto 1.5x Higher Throughput than Open Source + Up to 35% Lower Latencies (mean and P99)
  • 45.
    Network (RPC) CompressionImprovements + Improved network compression for RPC traffic + Option of Zstd instead of LZ4 + Periodically trained dictionaries, instead compression per message + See Łukasz Paszkowski on Cheating the Cloud: 50% Savings with Compression Dictionaries at P99 CONF
  • 46.
  • 47.
  • 48.
    Poll How long doesit take for you to scale your existing database?
  • 49.
    Keep Learning scylladb.com/category/engineering Visit our blogfor more on ScyllaDB engineering ONLINE | MARCH 11 + 12, 2025 CALL FOR SPEAKERS
  • 50.
    Thank you for joiningus today. @scylladb scylladb/ slack.scylladb.com @scylladb company/scylladb/ scylladb/