KEMBAR78
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB | PDF
Comparing Apache
Cassandra 4, 3, and
Scylla Open Source
Comparing Apache
Cassandra 4, 3, and
Scylla Open Source
Presenters
4
Karol Baryła
Karol is a Junior Software Engineer at ScyllaDB. He often
participates in security CTF competitions as a member of team
"Armia Prezesa" where he solves web security and reverse
engineering tasks. He is currently pursuing an MSc in Computer
Science at the University of Warsaw.
Piotr Grabowski
Piotr is a software engineer working at ScyllaDB. From a young age,
he participated in many competitive programming contests. Piotr
holds a BSc in Computer Science from the University of Warsaw and
is now pursuing an MSc. For the past year, he worked on Kafka
connectors and Scylla Java Driver.
5
1. Testing methodology
2. Cassandra 3 vs 4
3. Scylla vs Cassandra 4
4. Conclusions
Agenda
6
+ The Real-Time Big Data Database
+ Drop-in replacement for Apache Cassandra
and Amazon DynamoDB
+ Outstanding performance & low tail latency
+ Open Source, Enterprise and Cloud options
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA, USA; Herzelia, Israel;
Warsaw, Poland
About ScyllaDB
At July 27th, 2021 Cassandra team
released version 4.0 - 6 years after the
release of version 3.0.
Let’s see how much Cassandra improved
during those 6 years, and how well it holds
against Scylla 4.4 now.
7
8
1. Increased speed and scalability
a. Zero Copy Streaming streaming data up to 5x faster
b. Up to 25% faster throughput on reads and writes
2. Support for JDK 11
3. New configuration settings, better security and observability
4. Better compression settings (support for Zstd)
5. A shift to a 12-month release cycle
Cassandra 4.0 new features
9
Methodology
10
11
1. Latency at different throughputs
a. Gaussian distribution
b. Disk-intensive distribution
c. Memory-intensive distribution
2. Adding a single new node
3. Doubling cluster size
4. Replacing node
Benchmarked operations
12
+ 3 vs 3:
+ Cluster nodes: 3x i3.4xlarge (16vCPU, 122GiB RAM, up to 10Gbps network, 2x1.9TB NVMe)
+ Loader nodes: 3x c5n.9xlarge (36vCPU, 96GiB RAM, up to 50Gbps network)
+ 4 vs 40:
+ Scylla cluster: 4x i3.metal (72vCPU, 512GiB RAM, up to 25Gbps network, 8x1.9TB NVMe)
+ Cassandra cluster: 40x i3.4xlarge (16vCPU, 122GiB RAM, up to 10Gbps network, 2x1.9TB NVMe)
+ Loader nodes: 15x c5n.9xlarge (36vCPU, 96GiB RAM, up to 50Gbps network)
+ Java version: JDK 16 (Cassandra 4.0), JDK 8 (Cassandra 3.11)
Benchmarks setup - 3vs3 and 4vs40
Cassandra 3 vs 4
13
14
Gaussian distribution, mixed workload
15
Gaussian distribution, mixed workload
16
Disk-intensive, write-only
17
Disk-intensive, read-only
18
Memory-intensive, write-only
19
Memory-intensive, read-only
20
Adding nodes
21
Adding a single new node
22
Doubling cluster size
23
Replacing node
24
+ Cassandra 3 officially supports only Java 8
+ Cassandra 4 officially supports Java 8 and Java 11
+ Java 11 introduced ZGC - as an experimental feature
+ ZGC is considered production ready from Java 15
+ We used Java 16 in benchmarks in order to utilize full power of ZGC
+ ZGC has extremely short pauses, which reduces Cassandra’s tail latencies.
What causes latency improvements?
25
How much data do you have under management in your own transactional
database systems?
+ <1 terabyte
+ 1 to 50 terabytes
+ 50-100 terabytes
+ >100 terabytes
Quick Poll
Scylla vs Cassandra 4
vs Cassandra 3
26
27
Gaussian distribution, mixed workload
28
Gaussian distribution, mixed workload
29
Disk-intensive, write-only
30
Disk-intensive, read-only
31
Memory-intensive, write-only
32
Memory-intensive, read-only
33
Adding nodes
34
Adding a single new node
35
Doubling cluster size
36
Replace node
37
Major compaction
Scylla on 4 nodes
vs
Cassandra 4 on 40 nodes
38
39
+ 3 vs 3:
+ Cluster nodes: 3x i3.4xlarge (16vCPU, 122GiB RAM, up to 10Gbps network, 2x1.9TB NVMe)
+ Loader nodes: 3x c5n.9xlarge (36vCPU, 96GiB RAM, up to 50Gbps network)
+ 4 vs 40:
+ Scylla cluster: 4x i3.metal (72vCPU, 512GiB RAM, up to 25Gbps network, 8x1.9TB NVMe)
+ Cassandra cluster: 40x i3.4xlarge (16vCPU, 122GiB RAM, up to 10Gbps network, 2x1.9TB NVMe)
+ Loader nodes: 15x c5n.9xlarge (36vCPU, 96GiB RAM, up to 50Gbps network)
+ Java version: JDK 16 (Cassandra 4.0), JDK 8 (Cassandra 3.11)
Benchmarks setup - 3vs3 and 4vs40
40
Cost of the clusters
41
Disk intensive, mixed workload
42
Disk intensive, mixed workload
43
Increasing cluster size by 25%
44
Major compaction
Conclusions
45
46
Summary of results
+ Cassandra 4 has much better tail latencies than Cassandra 3.
+ Scylla performs 3-4 times better than Cassandra when adding/replacing nodes.
+ Scylla adds 25% capacity to a 40 TB optimized cluster 11x faster than Cassandra 4.0.
+ Scylla performs major compaction 32x faster than Cassandra 4.0.
+ Scylla has 2x-5x better throughput than Cassandra 4.0 on the same 3-node cluster
+ Scylla has 3x-8x better throughput than Cassandra 4.0 on the same 3-node cluster while
P99 <10ms
+ A 40 TB cluster is 2.5x cheaper with Scylla while providing 42% more throughput under
P99 latency of 10 ms
47
Should I upgrade to Scylla or Cassandra 4?
1. Upgrading is hard, so why not upgrade to Scylla right away?
a. Upgrading is problematic anyway - you should make backups, you risk downtime.
b. Migrating from Cassandra to Scylla is a bit more involving - but the benefits are worth it.
2. Upgrading to Scylla will save you the money in the long run.
3. Scylla offers better performance and lower latencies compared to Cassandra 4.
4. Scylla offers exciting new features:
a. Scylla CDC
b. Kubernetes support with Scylla Operator
c. Scylla Cloud
Download Scylla Open Source:
scylladb.com/download
Learn more https://university.scylladb.com/
Experience Scylla for Yourself
48
Q&A
piotr.grabowski@scylladb.com
Stay in touch
Join us at P99 CONF p99conf.io
October 6-7, 2021
karol.baryla@scylladb.com
United States
545 Faber Place
Palo Alto, CA 94303
Israel
11 Galgalei Haplada
Herzelia, Israel
www.scylladb.com
@scylladb
Thank You!

Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB

  • 1.
    Comparing Apache Cassandra 4,3, and Scylla Open Source
  • 3.
    Comparing Apache Cassandra 4,3, and Scylla Open Source
  • 4.
    Presenters 4 Karol Baryła Karol isa Junior Software Engineer at ScyllaDB. He often participates in security CTF competitions as a member of team "Armia Prezesa" where he solves web security and reverse engineering tasks. He is currently pursuing an MSc in Computer Science at the University of Warsaw. Piotr Grabowski Piotr is a software engineer working at ScyllaDB. From a young age, he participated in many competitive programming contests. Piotr holds a BSc in Computer Science from the University of Warsaw and is now pursuing an MSc. For the past year, he worked on Kafka connectors and Scylla Java Driver.
  • 5.
    5 1. Testing methodology 2.Cassandra 3 vs 4 3. Scylla vs Cassandra 4 4. Conclusions Agenda
  • 6.
    6 + The Real-TimeBig Data Database + Drop-in replacement for Apache Cassandra and Amazon DynamoDB + Outstanding performance & low tail latency + Open Source, Enterprise and Cloud options + Founded by the creators of KVM hypervisor + HQs: Palo Alto, CA, USA; Herzelia, Israel; Warsaw, Poland About ScyllaDB
  • 7.
    At July 27th,2021 Cassandra team released version 4.0 - 6 years after the release of version 3.0. Let’s see how much Cassandra improved during those 6 years, and how well it holds against Scylla 4.4 now. 7
  • 8.
    8 1. Increased speedand scalability a. Zero Copy Streaming streaming data up to 5x faster b. Up to 25% faster throughput on reads and writes 2. Support for JDK 11 3. New configuration settings, better security and observability 4. Better compression settings (support for Zstd) 5. A shift to a 12-month release cycle Cassandra 4.0 new features
  • 9.
  • 10.
  • 11.
    11 1. Latency atdifferent throughputs a. Gaussian distribution b. Disk-intensive distribution c. Memory-intensive distribution 2. Adding a single new node 3. Doubling cluster size 4. Replacing node Benchmarked operations
  • 12.
    12 + 3 vs3: + Cluster nodes: 3x i3.4xlarge (16vCPU, 122GiB RAM, up to 10Gbps network, 2x1.9TB NVMe) + Loader nodes: 3x c5n.9xlarge (36vCPU, 96GiB RAM, up to 50Gbps network) + 4 vs 40: + Scylla cluster: 4x i3.metal (72vCPU, 512GiB RAM, up to 25Gbps network, 8x1.9TB NVMe) + Cassandra cluster: 40x i3.4xlarge (16vCPU, 122GiB RAM, up to 10Gbps network, 2x1.9TB NVMe) + Loader nodes: 15x c5n.9xlarge (36vCPU, 96GiB RAM, up to 50Gbps network) + Java version: JDK 16 (Cassandra 4.0), JDK 8 (Cassandra 3.11) Benchmarks setup - 3vs3 and 4vs40
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
    24 + Cassandra 3officially supports only Java 8 + Cassandra 4 officially supports Java 8 and Java 11 + Java 11 introduced ZGC - as an experimental feature + ZGC is considered production ready from Java 15 + We used Java 16 in benchmarks in order to utilize full power of ZGC + ZGC has extremely short pauses, which reduces Cassandra’s tail latencies. What causes latency improvements?
  • 25.
    25 How much datado you have under management in your own transactional database systems? + <1 terabyte + 1 to 50 terabytes + 50-100 terabytes + >100 terabytes Quick Poll
  • 26.
    Scylla vs Cassandra4 vs Cassandra 3 26
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
    Scylla on 4nodes vs Cassandra 4 on 40 nodes 38
  • 39.
    39 + 3 vs3: + Cluster nodes: 3x i3.4xlarge (16vCPU, 122GiB RAM, up to 10Gbps network, 2x1.9TB NVMe) + Loader nodes: 3x c5n.9xlarge (36vCPU, 96GiB RAM, up to 50Gbps network) + 4 vs 40: + Scylla cluster: 4x i3.metal (72vCPU, 512GiB RAM, up to 25Gbps network, 8x1.9TB NVMe) + Cassandra cluster: 40x i3.4xlarge (16vCPU, 122GiB RAM, up to 10Gbps network, 2x1.9TB NVMe) + Loader nodes: 15x c5n.9xlarge (36vCPU, 96GiB RAM, up to 50Gbps network) + Java version: JDK 16 (Cassandra 4.0), JDK 8 (Cassandra 3.11) Benchmarks setup - 3vs3 and 4vs40
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
    46 Summary of results +Cassandra 4 has much better tail latencies than Cassandra 3. + Scylla performs 3-4 times better than Cassandra when adding/replacing nodes. + Scylla adds 25% capacity to a 40 TB optimized cluster 11x faster than Cassandra 4.0. + Scylla performs major compaction 32x faster than Cassandra 4.0. + Scylla has 2x-5x better throughput than Cassandra 4.0 on the same 3-node cluster + Scylla has 3x-8x better throughput than Cassandra 4.0 on the same 3-node cluster while P99 <10ms + A 40 TB cluster is 2.5x cheaper with Scylla while providing 42% more throughput under P99 latency of 10 ms
  • 47.
    47 Should I upgradeto Scylla or Cassandra 4? 1. Upgrading is hard, so why not upgrade to Scylla right away? a. Upgrading is problematic anyway - you should make backups, you risk downtime. b. Migrating from Cassandra to Scylla is a bit more involving - but the benefits are worth it. 2. Upgrading to Scylla will save you the money in the long run. 3. Scylla offers better performance and lower latencies compared to Cassandra 4. 4. Scylla offers exciting new features: a. Scylla CDC b. Kubernetes support with Scylla Operator c. Scylla Cloud
  • 48.
    Download Scylla OpenSource: scylladb.com/download Learn more https://university.scylladb.com/ Experience Scylla for Yourself 48
  • 49.
    Q&A piotr.grabowski@scylladb.com Stay in touch Joinus at P99 CONF p99conf.io October 6-7, 2021 karol.baryla@scylladb.com
  • 50.
    United States 545 FaberPlace Palo Alto, CA 94303 Israel 11 Galgalei Haplada Herzelia, Israel www.scylladb.com @scylladb Thank You!