KEMBAR78
Cassandra 3.0 Data Modeling | PDF
@PatrickMcFadin
Patrick McFadin

Chief Evangelist, DataStax
Cassandra 3.0 Data Modeling
1
A brief history of CQL
You
CQL 3.0 - Cassandra 1.2
• Goodbye CQL 2.0!
• Custom secondary indexes
• Empty IN
CQL 3.1 - Cassandra 2.0
• Aliases
• CREATE <table> IF NOT EXISTS
• INSERT IF NOT EXISTS
• UPDATE IF
• DELETE IF EXISTS
• IN supports cluster columns
LWT
CQL 3.2 - Cassandra 2.1
• User Defined Types
• Collection Indexing
• Indexes can use contains
• Tuples?
User Dened Types
CREATE TYPE video_metadata (
height int,
width int,
video_bit_rate set<text>,
encoding text
);
User Dened Types
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
metadata set <frozen<video_metadata>>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CQL 3.3 - Cassandra 2.2
• Date and Time are now types
• TinyInt and SmallInt
• User Defined Functions
• Aggregates
• User Defined Aggregates
User Dened Functions
CREATE TABLE video_rating (
videoid uuid,
rating_counter counter,
rating_total counter,
PRIMARY KEY (videoid)
);
CREATE OR REPLACE FUNCTION
avg_rating (rating_counter counter, rating_total counter)
CALLED ON NULL INPUT
RETURNS double
LANGUAGE java AS
'return Double.valueOf(rating_total.doubleValue()/
rating_counter.doubleValue());';
User Dened Functions
SELECT avg_rating(rating_counter, rating_total) AS avg_rating
FROM video_rating
WHERE videoid = 99051fe9-6a9c-46c2-b949-38ef78858dd0;
Aggregates
CREATE TABLE video_ratings_by_user (
videoid uuid,
userid uuid,
rating int,
PRIMARY KEY (videoid, userid)
);
SELECT count(userid)
FROM video_ratings_by_user
WHERE videoed = 49f64d40-7d89-4890-b910-dbf923563a33
CQL 3.4 - Cassandra 3.x
• CAST operator
• Per Partition Limit
• Materialized Views
• SASI
Materialized View
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
metadata set <frozen<video_metadata>>,
added_date timestamp,
PRIMARY KEY (videoid)
);
Lookup by this?
Materialized View
CREATE TABLE videos_by_location (
videoid uuid,
userid uuid,
location text,
added_date timestamp,
PRIMARY KEY (location, videoid)
);
Roll your own
Materialized View
CREATE MATERIALIZED VIEW videos_by_location
AS SELECT userid, added_date, videoid, location
FROM videos
WHERE videoId IS NOT NULL AND location IS NOT NULL
PRIMARY KEY(location, videoid);
Cassandra rolls for you
Materialized View Perf
Materialized View Perf
5 Materialized Views vs 5 tables writes async
Materialized View
SELECT location, videoid
FROM videos_by_location ;
location | videoid
-------------------------------------------------+--------------------------------------
http://www.youtube.com/watch?v=px6U2n74q3g | 06049cbb-dfed-421f-b889-5f649a0de1ed
http://www.youtube.com/watch?v=qphhxujn5Es | 873ff430-9c23-4e60-be5f-278ea2bb21bd
/us/vid/0c/0c3f7e87-f6b6-41d2-9668-2b64d117102c | 0c3f7e87-f6b6-41d2-9668-2b64d117102c
/us/vid/b3/b3a76c6b-7c7f-4af6-964f-803a9283c401 | 99051fe9-6a9c-46c2-b949-38ef78858dd0
/us/vid/b3/b3a76c6b-7c7f-4af6-964f-803a9283c401 | b3a76c6b-7c7f-4af6-964f-803a9283c401
http://www.youtube.com/watch?v=HdJlsOZVGwM | 49f64d40-7d89-4890-b910-dbf923563a33
/us/vid/41/416a5ddc-00a5-49ed-adde-d99da9a27c0c | 416a5ddc-00a5-49ed-adde-d99da9a27c0c
SASI
CREATE TABLE users (
userid uuid,
firstname varchar,
lastname varchar,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
Lookup by this?
Storage Attached Secondary Index
SASI
SASI
CREATE CUSTOM INDEX ON users (firstname)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'
};
SASI
CREATE CUSTOM INDEX ON users (lastname)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {'mode': 'CONTAINS'};
SASI
CREATE CUSTOM INDEX ON users (created_date)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {'mode': 'SPARSE'};
SASI Indexes
Client
INSERT INTO users(userid,firstname,lastname,email,created_date)

VALUES (9761d3d7-7fbd-4269-9988-6cfd4e188678,’Patrick’,’McFadin’,
’patrick@datastax.com’,’2015-06-01’);
userid 1
userid 2
Memtable
SSTable
SSTable
SSTable
SASI Index
Node
Data
lastname
lastname
firstname
firstname
email
email
created_date
created_date
SASI Index
SASI Index
Indexer
SASI Queries
SELECT * FROM users WHERE firstname LIKE 'pat%';
SELECT * FROM users WHERE lastname LIKE ‘%Fad%';
SELECT * FROM users WHERE email LIKE '%data%';
SELECT * FROM users
WHERE created_date > '2011-6-15'
AND created_date < '2011-06-30';
userid | created_date | email | firstname | lastname
--------------------------------------+---------------------------------+----------------------+-----------+----------
9761d3d7-7fbd-4269-9988-6cfd4e188678 | 2011-06-20 20:50:00.000000+0000 | patrick@datastax.com | Patrick | McFadin
SASI Guidelines
• Multiple fields to search
• No more than 1000 rows returned
• You know the partition key
• Indexing static columns
Use SASI when…
SASI Guidelines
• Searching large partitions
• Tight SLA on reads
• Search for analytics
• Ordering search is important
Don’t Use SASI when…
Cassandra 3.0 Data Modeling

Cassandra 3.0 Data Modeling

  • 1.
    @PatrickMcFadin Patrick McFadin
 Chief Evangelist,DataStax Cassandra 3.0 Data Modeling 1
  • 2.
    A brief historyof CQL You
  • 3.
    CQL 3.0 -Cassandra 1.2 • Goodbye CQL 2.0! • Custom secondary indexes • Empty IN
  • 4.
    CQL 3.1 -Cassandra 2.0 • Aliases • CREATE <table> IF NOT EXISTS • INSERT IF NOT EXISTS • UPDATE IF • DELETE IF EXISTS • IN supports cluster columns LWT
  • 5.
    CQL 3.2 -Cassandra 2.1 • User Defined Types • Collection Indexing • Indexes can use contains • Tuples?
  • 6.
    User Dened Types CREATETYPE video_metadata ( height int, width int, video_bit_rate set<text>, encoding text );
  • 7.
    User Dened Types CREATETABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, metadata set <frozen<video_metadata>>, added_date timestamp, PRIMARY KEY (videoid) );
  • 8.
    CQL 3.3 -Cassandra 2.2 • Date and Time are now types • TinyInt and SmallInt • User Defined Functions • Aggregates • User Defined Aggregates
  • 9.
    User Dened Functions CREATETABLE video_rating ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid) ); CREATE OR REPLACE FUNCTION avg_rating (rating_counter counter, rating_total counter) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS 'return Double.valueOf(rating_total.doubleValue()/ rating_counter.doubleValue());';
  • 10.
    User Dened Functions SELECTavg_rating(rating_counter, rating_total) AS avg_rating FROM video_rating WHERE videoid = 99051fe9-6a9c-46c2-b949-38ef78858dd0;
  • 11.
    Aggregates CREATE TABLE video_ratings_by_user( videoid uuid, userid uuid, rating int, PRIMARY KEY (videoid, userid) ); SELECT count(userid) FROM video_ratings_by_user WHERE videoed = 49f64d40-7d89-4890-b910-dbf923563a33
  • 12.
    CQL 3.4 -Cassandra 3.x • CAST operator • Per Partition Limit • Materialized Views • SASI
  • 13.
    Materialized View CREATE TABLEvideos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, metadata set <frozen<video_metadata>>, added_date timestamp, PRIMARY KEY (videoid) ); Lookup by this?
  • 14.
    Materialized View CREATE TABLEvideos_by_location ( videoid uuid, userid uuid, location text, added_date timestamp, PRIMARY KEY (location, videoid) ); Roll your own
  • 15.
    Materialized View CREATE MATERIALIZEDVIEW videos_by_location AS SELECT userid, added_date, videoid, location FROM videos WHERE videoId IS NOT NULL AND location IS NOT NULL PRIMARY KEY(location, videoid); Cassandra rolls for you
  • 16.
  • 17.
    Materialized View Perf 5Materialized Views vs 5 tables writes async
  • 18.
    Materialized View SELECT location,videoid FROM videos_by_location ; location | videoid -------------------------------------------------+-------------------------------------- http://www.youtube.com/watch?v=px6U2n74q3g | 06049cbb-dfed-421f-b889-5f649a0de1ed http://www.youtube.com/watch?v=qphhxujn5Es | 873ff430-9c23-4e60-be5f-278ea2bb21bd /us/vid/0c/0c3f7e87-f6b6-41d2-9668-2b64d117102c | 0c3f7e87-f6b6-41d2-9668-2b64d117102c /us/vid/b3/b3a76c6b-7c7f-4af6-964f-803a9283c401 | 99051fe9-6a9c-46c2-b949-38ef78858dd0 /us/vid/b3/b3a76c6b-7c7f-4af6-964f-803a9283c401 | b3a76c6b-7c7f-4af6-964f-803a9283c401 http://www.youtube.com/watch?v=HdJlsOZVGwM | 49f64d40-7d89-4890-b910-dbf923563a33 /us/vid/41/416a5ddc-00a5-49ed-adde-d99da9a27c0c | 416a5ddc-00a5-49ed-adde-d99da9a27c0c
  • 19.
    SASI CREATE TABLE users( userid uuid, firstname varchar, lastname varchar, email text, created_date timestamp, PRIMARY KEY (userid) ); Lookup by this?
  • 20.
  • 21.
  • 22.
    SASI CREATE CUSTOM INDEXON users (firstname) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false' };
  • 23.
    SASI CREATE CUSTOM INDEXON users (lastname) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};
  • 24.
    SASI CREATE CUSTOM INDEXON users (created_date) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'SPARSE'};
  • 25.
    SASI Indexes Client INSERT INTOusers(userid,firstname,lastname,email,created_date)
 VALUES (9761d3d7-7fbd-4269-9988-6cfd4e188678,’Patrick’,’McFadin’, ’patrick@datastax.com’,’2015-06-01’); userid 1 userid 2 Memtable SSTable SSTable SSTable SASI Index Node Data lastname lastname firstname firstname email email created_date created_date SASI Index SASI Index Indexer
  • 26.
    SASI Queries SELECT *FROM users WHERE firstname LIKE 'pat%'; SELECT * FROM users WHERE lastname LIKE ‘%Fad%'; SELECT * FROM users WHERE email LIKE '%data%'; SELECT * FROM users WHERE created_date > '2011-6-15' AND created_date < '2011-06-30'; userid | created_date | email | firstname | lastname --------------------------------------+---------------------------------+----------------------+-----------+---------- 9761d3d7-7fbd-4269-9988-6cfd4e188678 | 2011-06-20 20:50:00.000000+0000 | patrick@datastax.com | Patrick | McFadin
  • 27.
    SASI Guidelines • Multiplefields to search • No more than 1000 rows returned • You know the partition key • Indexing static columns Use SASI when…
  • 28.
    SASI Guidelines • Searchinglarge partitions • Tight SLA on reads • Search for analytics • Ordering search is important Don’t Use SASI when…