4 use cases for C* to Scylla

Cassandra -> Scylla
4 Key Use Cases Where Users will See Immediate Benefit
Greg Matza

Which C* Use Cases will See Immediate Benefit With
Scylla?
2
+ Is your Dataset > 10 TB?
+ Do you have > 40k read ops/sec?
+ Is your Application sensitive to Long-tail latency?
+ Do you have a Caching layer in front of Cassandra?

Scylla Supports Huge Datasets
- Amazon’s new i3en instances have up to 60 TB of NVME
- Scylla can use all this disk, with benchmarks of
- 15 hours to add a 45 TB node (10 hrs ingestion + 5 hrs compaction)
- 6 hours to stream a new node - one 45 TB node to two 22.5 TB - (4 hrs streaming + 2 hours cleanup)
- >1 million ops per second per node with 80% cache miss and 99p stable at 2 ms
- Detailed benchmark data here and here
- Cassandra is typically limited to 1-2 TB per node.

Scylla Supports Huge Datasets
- Scenario:
- 40 TB raw data, RF=3, TWCS
- 2 Datacenters
- Total data stored is 40 TB * 3 replicas * 2 DCs = 240 TB
TB/node Node Calculation # nodes Node type Annual cost
per node
Total cost
Cassandra 1.5 240 TB/1.5 TB =
160 nodes
160 i3.2xl $5k $800k
Scylla 45 240 TB/45 TB =
5.3 nodes
6 i3en.24xl $50k $300k

Scylla Supports Heavy Reads
- On i3 hardware, Scylla handles read throughput at approximately the same rate as write
throughput
- Scylla can handle sustained read or write throughput of 10,000 ops/core. (1kb payload, NVME disk)
- Throughput scales linearly with # of cores
- Cassandra typically has per-node limitations on read throughput
- Cassandra can handle sustained read throughput of about 20,000 ops/node (1kb payload, NVME disk)
- Larger core counts, thicker networking, or better I/O do not significantly increase throughput

Scylla Supports Heavy Reads
- Scenario:
- 80k read/sec + 20k writes/sec. 1 TB raw data, RF=3
- Both Scylla and Cassandra are running on i3.4xlarge
- Given RF=3 each application-layer operation is counted as 3 ops against the cluster, as it will act on all 3 replicas
- Total Operations are (80k reads * 3 replicas) + (20k writes * 3 replicas) = 300k ops/sec (240k reads + 60k writes)
Limiting factor
per node
Node calculation # of
nodes
Annual cost
per node
Total cost
Cassandra 20k reads 240k/20k = 12 12 $5k $60k
Scylla 80k ops 300k/80k = 3.75 4 $5k $20k

Long-tail latency sensitive
- Due to Garbage Collection, Compaction, Repair and other operations, Cassandra typically
will have tightly bounded average latency, but 95p or 99p latencies will show regular 5x to
20x spikes
- Scylla has no Garbage Collection, includes its own on-board caching, and actively
manages its own I/O and CPU scheduling. This, among other things, allows it to deliver
tightly bounded 95p, 99p or even max latency.
- I/O and CPU scheduling actively manage tasks in a prioritized manner. So background tasks
like compaction or repair are almost always(*) put behind query or writes.
- *Almost always, because we do have a backpressure mechanism, such that if you are in danger
of losing your node do to OOM or out-of-disk, we will prioritize those tasks needed to save the
node above query.

Long-tail latency sensitive
- Scenario:
- “Customer 360” Use Case
- 3 nodes 8vCPU/64 GB RAM
- 1.4 TB dataset
- 20k reads/sec
- Test run by long-time C* DBA
as part of a Scylla vs.
Cassandra POC
Cassandra’s
latency
Scylla’s latency
Read Latencies, 99p
Cassandra
Scylla
Top 3 North American Telecom

Scylla Does Not Require a Caching Layer
- Read-heavy or Latency-sensitive use cases with Cassandra usually require a Redis, Memcached or
other caching layer to meet those requirements
- Scylla has a built-in caching layer, allowing for easier application-side logic and lower node counts
- no cache invalidation issues
- no cold cache issues
- no try/catch application logic on cache misses

Scylla Does Not Require a Caching Layer
- Scenario:
- Comcast needed <10ms max latency on 200k ops/sec. Balanced Read/Write
- Was implemented in Cassandra with 60 nodes of Varnish (cache) + 600 nodes of Cassandra
- Scylla replaced the entire infrastructure with only 60 nodes
- Case: https://www.scylladb.com/tech-talk/comcast-grow-small-get-big-experiences-with-scylla/
Version Apache Cassandra 2.1.8 Scylla Enterprise 2018.1.11
Data Layer: 600 nodes i3.2xlarge 60 nodes i3.2xlarge
Caching Layer: 60 nodes Varnish m4.4xlarge No caching
OpEx: $3.7 million/yr $328k/yr

4 use cases for C* to Scylla

More Related Content

What's hot

Similar to 4 use cases for C* to Scylla

Recently uploaded

4 use cases for C* to Scylla