KEMBAR78
ScyllaDB @ Apache BigData, may 2016 | PDF
Cloudius Systems presents:
NoSQL goes NATIVE
Don Marti Tzach Livyatan
@dmarti @TzachL
Capable of 1,800,000 operations per second
PER NODE
With predictable, low latencies
Compatible with Apache Cassandra
Scylla: A new Open Source NoSQL Database
BACKGROUND
SQL:
Structured,
no scale
Document store:
No structure
Some scale
Column store:
Some structure
Scale out
Awesome HA/DR
Key-value:
Simple
Scale
Not a real DB
THE POWER OF
CASSANDRA AT
THE SPEED OF REDIS
SOLUTION: SCYLLA DB
AWESOME REDUNDANCY & HA
+ Multi DC
+ Spark
+ CQL
+ Auto sharding
+ Wide rows
RESULTS: THROUGHPUT/SCALE UP
Benchmark configuration
● Server type: Rackspace Bare Metal IO Class v1
● CPU: Dual 2.8 GHz, 10 core Intel® Xeon® E5-2680 v2
● RAM: 128 GB
● Networking: Redundant 10 Gb/s connections in a high availability bond
● Data Disks: 2 * 1.6 TB PCIe flash cards
● OS: CentOS 7.2.1511, Kernel version: 3.10.0-327.10.1.el7.x86_64
● Java
○ Cassandra - Oracle jdk-8u65
○ Scylla - Open JDK 1.8 (used only for scylla-jmx)
Source: http://www.scylladb.com/technology/ycsb-cassandra-scylla/
LATENCY - AVG
LATENCY - P99
LATENCY - P99
FULLY COMPATIBLE
WHAT WOULD YOU DO WITH 1 MILLION TPS?
Shrink your cluster by a factor of X10
Handle 10X traffic spikes on Black Friday
Faster repairs, faster scale out.
Get the most out of your data - Run more queries
Administration operations while serving
Stop using caches in front of the database
TECHNOLOGY:
HOW IT WORKS
SCYLLA IS QUITE DIFFERENT
Shard-per-core, no locks, no threads, zero-copy
Reactor programing with C++14
Our own efficient, DB-aware cache, not using Linux page cache
Better storage engine
Max out all HW resources - NUMA friendly, multiqueue NICs, etc
Userspace I/O scheduler
Based on Seastar project
SCYLLA ARCHITECTURE COMPARISON
Cassandra
TCP/IPScheduler
queuequeuequeuequeuequeue
threads
NIC
Queues
Kernel
Traditional stack Seastar’s sharded stack
Memory
Lock contention
Cache contention
NUMA unfriendly
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
No contention
Linear scaling
NUMA friendly
Kernel
Core
Database
Task Scheduler
queuequeuequeuequeuequeuesmp queue
Userspace
Scylla has its own task scheduler
Traditional stack Scylla’s stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a
pointer to
eventually
computed value
Task is a
pointer to a
lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread is a
function pointer
Stack is a byte
array from 64k
to megabytes
Context switch cost is
high. Large stacks pollutes
the caches No sharing, millions of
parallel events
Unified cache
Cassandra Scylla
Key cache
Row cache
On-heap /
Off-heap
Linux page cache
SSTables
Unified cache
SSTables
Unified cache
Cassandra Scylla
Key cache
Row cache
On-heap /
Off-heap
Linux page cache
SSTables
Unified cache
SSTables
Page faults
Parasitic rows
Tuning
Scylla has an I/O scheduler
Traditional stack
Scylla stack
Max useful disk
concurrency
I/O scheduled
by priority
here
Source: http://www.scylladb.com/2016/04/14/io-scheduler-1/
Scylla has an I/O scheduler
total, 14825839, 25002, 25002, 25002, 0.5, 0.3, 0.5, 5.0, 12.8, 22.2, 592.6, 0.00076
total, 14851605, 24980, 24980, 24980, 0.5, 0.3, 0.6, 6.7, 12.9, 21.6, 593.6, 0.00076
total, 14877443, 25004, 25004, 25004, 0.5, 0.3, 0.5, 6.5, 17.5, 38.8, 594.7, 0.00076
total, 14903361, 25017, 25017, 25017, 0.5, 0.3, 0.5, 6.9, 29.9, 39.9, 595.7, 0.00076
total, 14927655, 23553, 23553, 23553, 4.6, 0.3, 34.3, 66.8, 203.0, 255.9, 596.7, 0.00076
total, 14956055, 26384, 26384, 26384, 5.0, 0.4, 27.2, 53.9, 81.5, 99.9, 597.8, 0.00077
total, 14981910, 24987, 24987, 24987, 0.5, 0.3, 0.7, 6.2, 13.5, 25.0, 598.8, 0.00077
total, 15007673, 25003, 25003, 25003, 0.4, 0.3, 0.5, 3.7, 12.5, 24.5, 599.9, 0.00077
total, 15033484, 25006, 25006, 25006, 0.4, 0.3, 0.5, 3.8, 12.4, 32.8, 600.9, 0.00077
total, 15059256, 25004, 25004, 25004, 0.4, 0.3, 0.5, 2.2, 14.9, 33.0, 601.9, 0.00076
total, 15085126, 24994, 24994, 24994, 0.4, 0.3, 0.5, 2.4, 10.4, 19.4, 603.0, 0.00076
total, 15110948, 24988, 24988, 24988, 0.5, 0.3, 0.6, 4.1, 10.3, 19.9, 604.0, 0.00076
Compatibility (and speed): Repair
Jan 13 17:33:44 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbbf8dc1-ba1b-11e5-a51a-00000000000b ID#0]
Creating new streaming plan for repair-in
Jan 13 17:33:44 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbbf8dc1-ba1b-11e5-a51a-00000000000b
ID#0] Received streaming plan for repair-in
Jan 13 17:33:45 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbc02a01-ba1b-11e5-a51a-00000000000b
ID#0] Creating new streaming plan for repair-in
Jan 13 17:33:45 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbc02a01-ba1b-11e5-a51a-00000000000b
ID#0] Received streaming plan for repair-in
SCYLLA as an INFRASTRUCTURE
Scale up with the number of cores
Kernel bypass for direct networking and block I/O
Good match for upcoming Non Volatile Memory technology
High availability with gossip and flexible replication
Runs everywhere: Physical, virtual, containers
Can be integrated with microservices with its own httpd
Monitoring Scylla
Connections to Apache Ecosystem: today
Connections to Apache Ecosystem: Soon
❏ Build a community
❏ Core database improvements
❏ VERTICAL: Spark, Solr, distributed SQL engines
❏ HORIZONTAL: Microservice integration, more
protocols
❏ Upcoming releases: Scylla 1.0.3, Scylla 1.1
WHAT’S NEXT?
SCYLLA, NoSQL GOES NATIVE
Thank you.

ScyllaDB @ Apache BigData, may 2016

  • 1.
    Cloudius Systems presents: NoSQLgoes NATIVE Don Marti Tzach Livyatan @dmarti @TzachL
  • 2.
    Capable of 1,800,000operations per second PER NODE With predictable, low latencies Compatible with Apache Cassandra Scylla: A new Open Source NoSQL Database
  • 4.
    BACKGROUND SQL: Structured, no scale Document store: Nostructure Some scale Column store: Some structure Scale out Awesome HA/DR Key-value: Simple Scale Not a real DB
  • 5.
    THE POWER OF CASSANDRAAT THE SPEED OF REDIS SOLUTION: SCYLLA DB
  • 6.
    AWESOME REDUNDANCY &HA + Multi DC + Spark + CQL + Auto sharding + Wide rows
  • 7.
  • 8.
    Benchmark configuration ● Servertype: Rackspace Bare Metal IO Class v1 ● CPU: Dual 2.8 GHz, 10 core Intel® Xeon® E5-2680 v2 ● RAM: 128 GB ● Networking: Redundant 10 Gb/s connections in a high availability bond ● Data Disks: 2 * 1.6 TB PCIe flash cards ● OS: CentOS 7.2.1511, Kernel version: 3.10.0-327.10.1.el7.x86_64 ● Java ○ Cassandra - Oracle jdk-8u65 ○ Scylla - Open JDK 1.8 (used only for scylla-jmx) Source: http://www.scylladb.com/technology/ycsb-cassandra-scylla/
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    WHAT WOULD YOUDO WITH 1 MILLION TPS? Shrink your cluster by a factor of X10 Handle 10X traffic spikes on Black Friday Faster repairs, faster scale out. Get the most out of your data - Run more queries Administration operations while serving Stop using caches in front of the database
  • 14.
  • 15.
    SCYLLA IS QUITEDIFFERENT Shard-per-core, no locks, no threads, zero-copy Reactor programing with C++14 Our own efficient, DB-aware cache, not using Linux page cache Better storage engine Max out all HW resources - NUMA friendly, multiqueue NICs, etc Userspace I/O scheduler Based on Seastar project
  • 16.
    SCYLLA ARCHITECTURE COMPARISON Cassandra TCP/IPScheduler queuequeuequeuequeuequeue threads NIC Queues Kernel Traditionalstack Seastar’s sharded stack Memory Lock contention Cache contention NUMA unfriendly Application TCP/IP Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/IP Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/IP Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace No contention Linear scaling NUMA friendly Kernel Core Database Task Scheduler queuequeuequeuequeuequeuesmp queue Userspace
  • 17.
    Scylla has itsown task scheduler Traditional stack Scylla’s stack Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise is a pointer to eventually computed value Task is a pointer to a lambda function Scheduler CPU Scheduler CPU Scheduler CPU Scheduler CPU Scheduler CPU Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread is a function pointer Stack is a byte array from 64k to megabytes Context switch cost is high. Large stacks pollutes the caches No sharing, millions of parallel events
  • 18.
    Unified cache Cassandra Scylla Keycache Row cache On-heap / Off-heap Linux page cache SSTables Unified cache SSTables
  • 19.
    Unified cache Cassandra Scylla Keycache Row cache On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Page faults Parasitic rows Tuning
  • 20.
    Scylla has anI/O scheduler Traditional stack Scylla stack Max useful disk concurrency I/O scheduled by priority here Source: http://www.scylladb.com/2016/04/14/io-scheduler-1/
  • 21.
    Scylla has anI/O scheduler
  • 22.
    total, 14825839, 25002,25002, 25002, 0.5, 0.3, 0.5, 5.0, 12.8, 22.2, 592.6, 0.00076 total, 14851605, 24980, 24980, 24980, 0.5, 0.3, 0.6, 6.7, 12.9, 21.6, 593.6, 0.00076 total, 14877443, 25004, 25004, 25004, 0.5, 0.3, 0.5, 6.5, 17.5, 38.8, 594.7, 0.00076 total, 14903361, 25017, 25017, 25017, 0.5, 0.3, 0.5, 6.9, 29.9, 39.9, 595.7, 0.00076 total, 14927655, 23553, 23553, 23553, 4.6, 0.3, 34.3, 66.8, 203.0, 255.9, 596.7, 0.00076 total, 14956055, 26384, 26384, 26384, 5.0, 0.4, 27.2, 53.9, 81.5, 99.9, 597.8, 0.00077 total, 14981910, 24987, 24987, 24987, 0.5, 0.3, 0.7, 6.2, 13.5, 25.0, 598.8, 0.00077 total, 15007673, 25003, 25003, 25003, 0.4, 0.3, 0.5, 3.7, 12.5, 24.5, 599.9, 0.00077 total, 15033484, 25006, 25006, 25006, 0.4, 0.3, 0.5, 3.8, 12.4, 32.8, 600.9, 0.00077 total, 15059256, 25004, 25004, 25004, 0.4, 0.3, 0.5, 2.2, 14.9, 33.0, 601.9, 0.00076 total, 15085126, 24994, 24994, 24994, 0.4, 0.3, 0.5, 2.4, 10.4, 19.4, 603.0, 0.00076 total, 15110948, 24988, 24988, 24988, 0.5, 0.3, 0.6, 4.1, 10.3, 19.9, 604.0, 0.00076 Compatibility (and speed): Repair Jan 13 17:33:44 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbbf8dc1-ba1b-11e5-a51a-00000000000b ID#0] Creating new streaming plan for repair-in Jan 13 17:33:44 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbbf8dc1-ba1b-11e5-a51a-00000000000b ID#0] Received streaming plan for repair-in Jan 13 17:33:45 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbc02a01-ba1b-11e5-a51a-00000000000b ID#0] Creating new streaming plan for repair-in Jan 13 17:33:45 server-02.localdomain scylla[7730]: [shard 27] stream_session - [Stream #cbc02a01-ba1b-11e5-a51a-00000000000b ID#0] Received streaming plan for repair-in
  • 23.
    SCYLLA as anINFRASTRUCTURE Scale up with the number of cores Kernel bypass for direct networking and block I/O Good match for upcoming Non Volatile Memory technology High availability with gossip and flexible replication Runs everywhere: Physical, virtual, containers Can be integrated with microservices with its own httpd
  • 24.
  • 25.
    Connections to ApacheEcosystem: today
  • 26.
    Connections to ApacheEcosystem: Soon
  • 27.
    ❏ Build acommunity ❏ Core database improvements ❏ VERTICAL: Spark, Solr, distributed SQL engines ❏ HORIZONTAL: Microservice integration, more protocols ❏ Upcoming releases: Scylla 1.0.3, Scylla 1.1 WHAT’S NEXT?
  • 28.
    SCYLLA, NoSQL GOESNATIVE Thank you.