Cassandra vs. ScyllaDB: Evolutionary Differences

Felipe Mendes, Technical Director at ScyllaDB
Guilherme Nogueira, Technical Director at ScyllaDB
Cassandra vs. ScyllaDB
Evolutionary Differences

Introductions
Felipe Mendes, Technical Director at ScyllaDB
+ Published Author on Linux and Databases
+ Helps teams solve their most challenging problems
+ Years of experience with Linux and distributed systems
Guilherme Nogueira, Technical Director at ScyllaDB
+ Previously Solutions Architect
+ Publishing
+ Streaming
+ Automotive

Poll
How experienced are you with ScyllaDB/Cassandra?

ScyllaDB & Cassandra – Similarities
Distributed Peer-to-Peer Automatic Sharding Global Replication Cassandra Query Language
Wide-Column Compatible Ecosystem Anti-Entropy LSM Engine

Beyond Cassandra
Tablets Raft
Workload
Prioritization
Repair-based
Operations
Incremental
Compaction
SSTable
SSTable
Low
Ampliﬁcation
Row-based
Cache
DynamoDB
Compatibility
Intriguing ScyllaDB Capabilities You Might Have Overlooked
Concurrency and
Rate-limiters

Tablets
A
C
B
C
A
B
+ Abstraction: Smaller table "fragments"
+ Span a contiguous token range
+ Dynamically shrink/expand (geometric avg size)
+ Migrated as a single unit

RAFT for metadata
■ Strongly consistent
system.token_metadata
node A
bootstrap
bootstrap
node B
node C
Read
barrier
Read
barrier
■ Fault tolerant storage

Workload Prioritization
No prioritization Workload Prioritization

Repair? Tombstones? Data Resurrection?
+ Worst things a database can do:
+ Lose data
+ Corrupt data
+ Resurrect data
+ Not a problem with ScyllaDB
+ We take your data seriously
+ We know repair is painful
Faster, Safer Node Operations with Repair vs Streaming

Incremental Compaction
A
B
...
Z
a
b
...
z
A+a
B+b
A a
B b
A+a
B+b
+ We observed problems with legacy compaction strategies:
+ STCS has high space amplification (and low write amplification)
+ LCS has high write amplification (and low space amplification)
+ We wanted to benefit from both approaches
+ By borrowing SSTable Runs from LCS
+ And applying them over size-tiers
+ Merely replacing
+ increasingly larger SSTables with
+ increasingly longer SSTable Runs
Designing Access Methods: The RUM Conjecture

+ ScyllaDB has a fast cache
+ Eﬃcient access & maintenance
+ Thanks to collocation with replica and design
+ Takes care of consistency guarantees
+ Handles complexities of data and query model
Row-based Cache
memtable
RAM
Disk
Read
cache
sstable
sstable
sstable
We Compared ScyllaDB and Memcached and… We Lost?

+ Run DynamoDB-compatible
workloads anywhere:
+ on AWS
+ on Google Cloud, Azure, or
+ On-prem
+ DynamoDB Streams, Global Tables
+ Supports Load Balancing
+ ScyllaDB Spark Migrator to move
data anywhere
DynamoDB-compatible
API (Alternator)
+ Cassandra has no comparable feature

Per-Partition Rate-Limiting
Retaining Goodput with Query Rate Limiting
■ Malicious/misbehaving users
■ Parts of your system going awry due to bugs
The system does not have to satisfy these requests, and they should not affect the whole system too much.
■ A maximum read/write rate can be set for a table.
■ ScyllaDB will reject some operations in an effort to
keep the rate of successful requests under the limit.
ALTER TABLE ks.tbl
WITH per_partition_rate_limit = {
'max_writes_per_second': 100,
'max_reads_per_second': 200
};

Poll
How large are your clusters?

Setup
+ DB Nodes
+ 3x AWS i4i.4xlarge (Cassandra 5.0.2, ScyllaDB 2024.2)
+ 16vCPU, 128GB RAM per node
+ 1.5TB used (~45%),
+ Schema: Blob(key<10>, c0<200>, c1<200>, c2<200>, c3<200>, c4<200>)
+ RF=3
+ LOCAL_QUORUM
+ Loader
+ AWS c6in.8xlarge – Rust Latte
+ Implied scheduling – see (pkolaczk/latte#120)
cassandra_latest.yml

Where do we start?
+ Multiple iterations/settings
+ Pick the best and carry out remaining tests

Thread Pools
+ Quite a pain to ﬁne-tune
+ Single funnel
+ Writes and Reads won't scale independently

Cache Workload
+ Key cache: 2G
+ Row cache: 51G
+ Bummer: To use or not? :-(

Hot/Cold Overwrites
+ Hot set: 40M rows
+ Cold set: Remainder
+ 64% hot reads, 16% hot writes – 16% cold reads, 4% cold writes

+ ScyllaDB – Decouples topology changes from streaming
+ Add nodes with time ~ 0
+ Streaming happens in parallel
+ Load gradually shifts, via tablet-aware drivers
+ Cassandra – topology rely on streaming
+ You add/remove a single node, and wait
+ Then another, and wait…
+ Time grows incrementally
Differences

+ Cassandra 4.0 docs say:
+ "To run any Zero Copy streaming benchmark the
stream_throughput_outbound_megabits_per_sec must be set to a really high value"
+ Cassandra 5 docs say:
+ "To run any Zero Copy streaming benchmark the stream_throughput_outbound must be set to a
really high value"
+ Instaclustr says:
+ entire_sstable_stream_throughput_outbound!
+ Hint: Instaclustr is correct.
What Happened?

... and then you’ve got to Cleanup
+ Not needed for ScyllaDB
+ Boom!

Oh! By the way...
+ Our Cassandra cluster got inconsistent :-(
+ How to benchmark this?
+ Fixed after a rolling restart
+ Quite annoying

● Starting from 3 x i4i.4xlarge
○ 2TB pre-replication dataset, RF=3
○ ~56K ops/s for Cassandra
○ ~200K ops/s for ScyllaDB
● Scaling to:
○ ScyllaDB: + 3 x i4i.32xlarge
○ Cassandra: + 69 x i4i.4xlarge
Scaling to tackle 2M ops/s

Bootstrap
Bootstrap Cleanup
Scaling to tackle 2M ops/s
Bootstrap
Bootstrap Cleanup
26x
faster

Bootstrap
Bootstrap Cleanup
Cassandra scaling time
< 300GB transferred,
becomes linear

ScyllaDB Scaling
● Process starts instantly and joins the cluster
● Load balancer continuously distribute tablets and load
● Client drivers are notiﬁed and route request according to tablet's movement

+ Throughput
+ Spiky and bounded – Batch, ETL
+ ScyllaDB offers unparalleled throughput
+ Latency sensitive
+ Focus of our testing – Real-time and unpredictable
+ ScyllaDB reacts faster to opportunities
+ Storage dense – Tablets + Advanced Compression allow for up to 90% disk utilization
+ Dictionary-based compression
+ Data governance / Retention requirements
+ ScyllaDB maximizes both disks and cache
Different savings for different workloads

Run df on your
Cassandra nodes
for a SURPRISE

Cassandra - DIY, Community or 3rd Party
+ No centralized offer for monitoring
+ Community versions available (Metrics Collector for Apache Cassandra)
+ Lacks 5.0 support
+ Each team comes up with their own set of dashboards and alerts
+ JMX complexity
+ Newer versions expose some metrics in a Prometheus friendly format bypassing JMX
+ Still largely needed for Java metrics, maintenance operations
+ Monitoring-as-a-Service varies in coverage and detail
+ Datadog, AxiomOps, Dynatrace, New Relic

ScyllaDB - Out of the box monitoring
+ Easy, out of the box with Scylla Monitoring stack
+ Prometheus + collection rules and alerts
+ Loki + alerts
+ Alert Manager + alerting rules
+ Grafana + powerful dashboards
+ New release = new features = new dashboards
+ Stay up-to-date for the latest and greatest

Poll
How do you monitor your clusters?
Which observability tools you use, whether custom-built or 3rd party

ScyllaDB setup and tuning
+ sysctl ✅ automatic
+ scylla_setup
+ memory, scheduling, network
+ disks parameters ✅ automatic
+ iotune, part of scylla_setup
+ best concurrency settings to maximize disk utilization
+ hardware interrupts handling ✅ automatic
+ dedicated vCPUs to handle hardware interrupts via irqbalance
+ allows shards to run without interferences
+ jvm ✅ absent
+ scylla.yaml ✅ simple changes
+ usually just customized for enabling features (Alternator, encryption settings)

Repairs, backups
ScyllaDB Manager
Backup
Restore
Repair
Backup/restore
Medusa / K8ssandra
Repairs
Reaper

Vector
● Both implement VECTOR<ﬂoat, dim> type
● Approximate Nearest Neighbour (ANN) queries
● Similar CQL syntax SELECT … ORDER BY col ANN OF vector LIMIT K
However, upon a closer look…

Vector - Cassandra implementation
● Implemented using Storage-Attached Index (SAI) and JVector
○ Shared SSTable and compaction lifecycles
● Built on Indexes
○ Susceptible to same issues
○ Data locality, large partitions
● Shared data paths (storage, chunk cache)

Vector - ScyllaDB Cloud implementation
● External service
● In-memory data
● Rust-based service
● Leverages USearch library
● Fully-managed in the Cloud

Vector - ScyllaDB Cloud implementation

+ Benchmarks are complicated
+ Be wary of sustained latencies on Apache Cassandra
+ Measure sustained response times
+ Our testing has limitations, it is impossible to test everything
+ ScyllaDB outperforms Apache Cassandra 5.0 in every aspect
+ Performance, Scaling, Costs
+ Admin (Tip: Check out how the process to upgrade to C*5 looks like ;-)
+ Plus Workload Prioritization, Alternator, frictionless monitoring, no GC, …
+ Both databases evolved on their own paths
+ ScyllaDB focused on maintaining high performance, scalability and vector features - all the while lowering costs
+ Cassandra is built for commodity, aiming at a general purpose noSQL with use-cases with broader latency tolerance
Summary

Keep Learning
Fast Scaling.
Max Eﬃciency. Lower Cost.
8 AM PT - 10 AM PT | 15:00 GMT - 17:00 GMT
ScyllaDB
X Cloud ScyllaDB
University Live
LIVE LEARNNG
November 12
ONLINE | OCT 22 + 23, 2025
All Things Performance
p99conf.io
scylladb.com/events

Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com
@scylladb company/scylladb/
scylladb/

Cassandra vs. ScyllaDB: Evolutionary Differences

More Related Content

Similar to Cassandra vs. ScyllaDB: Evolutionary Differences

More from ScyllaDB

Recently uploaded

Cassandra vs. ScyllaDB: Evolutionary Differences