KEMBAR78
Apache cassandra - future without boundaries (part3) | PPTX
August 6, 2015 www.ExigenServices.com
Apache Cassandra – Future without
Boundaries
2 www.ExigenServices.com
V. Architecture (part 2)
3 www.ExigenServices.com
SEDA Architecture
SEDA – Staged event-driven architecture
1. Every unit of work is split into several
stages that are executed in parallel
threads.
2. Each stage consist of input and output
event queue, event handler and stage
controller.
4 www.ExigenServices.com
SEDA Architecture advantages
 Well conditioned system load
 Preventing resources from being overcommitted.
5 www.ExigenServices.com
SEDA in Cassandra - Usages
1. Read
2. Mutation
3. Gossip
4. Anti – Entropy
….
6 www.ExigenServices.com
SEDA in Cassandra - Design
 Stage Manager presents Map between stage
names and Java 5 thread pool executers.
 Each controller with queue is presented by
ThreadPoolExecuter that can be configured
through JMX.
7 www.ExigenServices.com
VI. Advanced column types
8 www.ExigenServices.com
TTL column attribute
 TTL column is column value of which expires after
given period of time.
 Useful to store session token.
set test[row1][col2] = 'val2' with ttl=60;
9 www.ExigenServices.com
Counter column
 In eventual consistent environment old versions of
column values are overridden by new one, but
counters should be cumulative.
 Counter columns are intended to support
increment/decrement operations in eventual
consistent environment without losing any of
them.
10 www.ExigenServices.com
CounterColumn internals
CounterColumn structure:
name
…….
[
(replicaId1, counter1, logical clock1),
(replicaId2, counter2, logical clock2),
………………..
(replicaId3, counter3, logical clock3)
]
11 www.ExigenServices.com
CounterColumn write (before)
UPDATE CounterCF SET count_me = count_me + 2
WHERE key = 'counter1‘
[
(A, 10, 2),
(B, 3, 4),
(C, 6, 7)
]
12 www.ExigenServices.com
CounterColumn write (after)
 A is leader
[
(A, 10 + 2, 2 + 1),
(B, 3, 4),
(C, 6, 7)
]
13 www.ExigenServices.com
CounterColumn Read
All Memtables and SSTables are read through using
following algorithm:
 All tuples with local replicaId will be summarized,
tuple with maximum logical clock value will be
chosen for foreign replica.
 Counters of foreign replicas are updated during
read repair , during replicate on write procedure or
by AES
14 www.ExigenServices.com
CounterColumn read - example
 Memtable - (A, 12, 4) (B, 3, 5) (C, 10, 3)
 SSTable1 – (A, 5, 3) (B, 1, 6) (C, 5, 4)
 SSTable2 – (A, 2, 2) (B, 2, 4) (C, 6, 2)
Result:
(A, 12, 4) + (B, 1,6) + (C, 5, 4) =12 + 1 + 5 = 18
15 www.ExigenServices.com
VI. Working with Cassandra
16 www.ExigenServices.com
Installing and launching Cassandra
 Download from
http://cassandra.apache.org/download/
17 www.ExigenServices.com
Installing and launching Cassandra
 Launching server:
bin/cassandra.bat
bin/cassandra.sh
– use “-f” key to run sever in foreground, so that all of the server
logs will print to standard out
– is started with single node cluster called “Test Cluster” listening
on port 9160
18 www.ExigenServices.com
Installing and launching Cassandra
 Starting command-line client interface:
bin/cassandra-cli.bat
bin/cassandra-cli.sh
– you see [username@keyspace] at the beginning of every line
19 www.ExigenServices.com
Creating a cluster
In configuration file cassandra.yaml specify:
 seeds – the list of seeds for the cluster
 rpc_address and listen_address – network
addresses
20 www.ExigenServices.com
Creating a cluster
 initial_token – defining the node’s token range
 auto_bootstrap – enables auto-migration of data
to the new node
21 www.ExigenServices.com
nodetool ring
Use nodetool for view configuration
~$ nodetool -h localhost -p 8080 ring
Address Status State Load Owns Range Ring
850705…
10.203.71.154 Up Normal 2.53 KB 50.00 0 |<--|
10.203.55.186 Up Normal 1.33 KB 50.00 850705… |-->|
22 www.ExigenServices.com
Connecting to server
 Connect from command line:
connect <HOSTNAME>/<PORT> [<USERNAME> ‘<PASSWORD>’];
Examples:
connect localhost/9160;
connect 127.0.0.1/9160 user ‘password’;
 Connect when staring command line client:
cassandra-cli
–h,––host <HOSTNAME>
–p,––port <PORT>
–k,––keyspace <KEYSPACE>
–u,––username <USERNAME>
–p,––password <PASSWORD>
23 www.ExigenServices.com
Describing environment
 show cluster name;
 show keyspaces;
 show api version;
 describe cluster;
 describe keyspace [<KEYSPACE>];
24 www.ExigenServices.com
Create keyspace
 create keyspace <KEYSPACE>;
 create keyspace <KEYSPACE> with
<ATTR1> = <VAL1> and
<ATTR2> = <VAL2> ...;
 Attributes:
– placement_strategy
– strategy_options
– …
25 www.ExigenServices.com
Create keyspace
Example:
create keyspace Keyspace1
with placement_strategy =
‘org.apache.cassandra.locator.NetworkTopologyStrategy’
and strategy_options =
[{replication_factor: 4}];
26 www.ExigenServices.com
Update keyspace
 Update attributes of created keyspace:
update keyspace <KEYSPACE> with
<ATTR1> = <VAL1> and
<ATTR2> = <VAL2> ...;
27 www.ExigenServices.com
Switch to keyspace
 use <KEYSPACE>;
 use <KEYSPACE> [<USERNAME> ‘<PASSWORD>’];
 If you don’t specify username and password then
credentials supplied to the ‘connect’ statement will
be used
 If the server doesn’t support authentication it will
ignore credentials
28 www.ExigenServices.com
Switch to keyspace
 Example:
use Keyspace1 user1 ‘qwerty123’;
When you use keyspace you’ll see [user1@Keyspace1] at the
beginning of every line
29 www.ExigenServices.com
Create column family
 create column family <COL_FAMILY>;
 create column family <COL_FAMILY> with
<ATTR1> = <VAL1> and
<ATTR2> = <VAL1> ...;
 Example:
create column family Users with
column_type = Super and
comparator = UTF8Type and
rows_cached = 1000;
30 www.ExigenServices.com
Update column family
 When column family is created you can update its
attributes:
update column family <COL_FAMILY> with
<ATTR1> = <VAL1> and
<ATTR2> = <VAL1> ...;
31 www.ExigenServices.com
Writing data
 To write data use set command:
set Customers[‘ivan’][‘name’] = ‘Ivan’;
set Customers[‘makar’][‘info’][‘age’] = 96;
32 www.ExigenServices.com
Reading data
 To read data use get command:
get Customers[‘ivan’][‘name’];
- this will display ‘Ivan’
get Customers[‘makar’];
- this will display all columns for key ‘makar’
33 www.ExigenServices.com
Reading data
 To list a range of rows use list command:
list Customers;
list Customers[a:];
list Customers[a:c] limit 40;
- you can specify limit of rows that will be displayed (default - 100)
34 www.ExigenServices.com
Reading data
 To get columns number use count command:
count Customers[‘ivan’]
- this will display number of columns for key ‘ivan’
35 www.ExigenServices.com
Deleting data
 To delete a row, a column or a subcolumn use del
command:
del Customers[‘ivan’];
- this will delete all columns for key ‘ivan’
del Customers[‘ivan’][‘name’];
- this will delete column name for key ‘ivan’
del Customers[‘ivan’][‘accounts’][‘2312784829312343’];
- this will delete a subcolumn with an account number from ‘accounts’
column for key ‘ivan’
36 www.ExigenServices.com
Deleting data
 To delete all data in a column family use truncate
command:
truncate Customers;
37 www.ExigenServices.com
Drop column family or keyspace
drop column family Customers;
drop keyspace Keyspace1;
38 www.ExigenServices.com
Comparators and validators
 Comparators – compare column names
 Validators – validate column values
39 www.ExigenServices.com
Comparators and validators
 You can specify comparator for column family
and all subcolumns in column family (one for all)
 You can specify validators for each known
column of column family
 You can specify default validator for column
family that will be used for columns for which
validators aren’t specified
 You can specify key validator which will validate
row keys
40 www.ExigenServices.com
Attributes of column family
– column_type: can be Standard or Super
(default - Standard)
– comparator: specifies how column names will be
compared for sort order
– column_metadata: defines the validation and indexes
for known columns
– default_validation_class: validator to use for values in
columns which are not listed in the column_metadata.
(default – BytesType)
– key_validation_class: validator for keys
41 www.ExigenServices.com
Column metadata
You can define validators for each known column in the
family
create column family User
with column_metadata = [
{column_name: name, validation_class: UTF8Type},
{column_name: age, validation_class: IntegerType},
{column_name: birth, validation_class: UTF8Type}
];
Columns not listed in this section are validated with
default_validation_class
42 www.ExigenServices.com
Secondary indexes
 Allows queries by value
get users where name = ‘Some user';
 Can be created in background
43 www.ExigenServices.com
Creating index
 Define it in column metadata
For example in cassandra-cli:
create column family users with
comparator=UTF8Type and column_metadata=[{
column_name: birth_date,
validation_class: LongType,
index_type: KEYS
}];
44 www.ExigenServices.com
Some restrictions
 Cassandra use hash indexes instead of btree
indexes.
Thus, in where condition at least one indexed field
with operator “=“ must be present
So, you can’t use
get users where birth_date > 1970;
but can
get users where birth_date = 1990 and karma > 50;
45 www.ExigenServices.com
Index types
 KEYS
 BITMAP (will be supported in future releases)
Id Gender
Bitmaps
F M
1 Female 1 0
2 Male 0 1
3 Male 0 1
4 Unspecified 0 0
5 Female 1 0
46 www.ExigenServices.com
Q&A
47 www.ExigenServices.com
Resources
 Home of Apache Cassandra Project
http://cassandra.apache.org/
 Apache Cassandra Wiki http://wiki.apache.org/cassandra/
 Documentation provided by DataStax
http://www.datastax.com/docs/0.8/
 Good explanation of creation secondary indexes
http://www.anuff.com/2010/07/secondary-indexes-in-
cassandra.html
 Eben Hewitt “Cassandra: The Definitive Guide”, O’REILLY,
2010, ISBN: 978-1-449-39041-9
48 www.ExigenServices.com
Authors
 Lev Sivashov - lsivashov@gmail.com
 Andrey Lomakin - lomakin.andrey@gmail.com,
twitter: @Andrey_Lomakin
LinkedIn: http://www.linkedin.com/in/andreylomakin
 Artem Orobets – enisher@gmail.com
twitter: @Dr_EniSh
 Anton Veretennik - tennik@gmail.com

Apache cassandra - future without boundaries (part3)

  • 1.
    August 6, 2015www.ExigenServices.com Apache Cassandra – Future without Boundaries
  • 2.
  • 3.
    3 www.ExigenServices.com SEDA Architecture SEDA– Staged event-driven architecture 1. Every unit of work is split into several stages that are executed in parallel threads. 2. Each stage consist of input and output event queue, event handler and stage controller.
  • 4.
    4 www.ExigenServices.com SEDA Architectureadvantages  Well conditioned system load  Preventing resources from being overcommitted.
  • 5.
    5 www.ExigenServices.com SEDA inCassandra - Usages 1. Read 2. Mutation 3. Gossip 4. Anti – Entropy ….
  • 6.
    6 www.ExigenServices.com SEDA inCassandra - Design  Stage Manager presents Map between stage names and Java 5 thread pool executers.  Each controller with queue is presented by ThreadPoolExecuter that can be configured through JMX.
  • 7.
  • 8.
    8 www.ExigenServices.com TTL columnattribute  TTL column is column value of which expires after given period of time.  Useful to store session token. set test[row1][col2] = 'val2' with ttl=60;
  • 9.
    9 www.ExigenServices.com Counter column In eventual consistent environment old versions of column values are overridden by new one, but counters should be cumulative.  Counter columns are intended to support increment/decrement operations in eventual consistent environment without losing any of them.
  • 10.
    10 www.ExigenServices.com CounterColumn internals CounterColumnstructure: name ……. [ (replicaId1, counter1, logical clock1), (replicaId2, counter2, logical clock2), ……………….. (replicaId3, counter3, logical clock3) ]
  • 11.
    11 www.ExigenServices.com CounterColumn write(before) UPDATE CounterCF SET count_me = count_me + 2 WHERE key = 'counter1‘ [ (A, 10, 2), (B, 3, 4), (C, 6, 7) ]
  • 12.
    12 www.ExigenServices.com CounterColumn write(after)  A is leader [ (A, 10 + 2, 2 + 1), (B, 3, 4), (C, 6, 7) ]
  • 13.
    13 www.ExigenServices.com CounterColumn Read AllMemtables and SSTables are read through using following algorithm:  All tuples with local replicaId will be summarized, tuple with maximum logical clock value will be chosen for foreign replica.  Counters of foreign replicas are updated during read repair , during replicate on write procedure or by AES
  • 14.
    14 www.ExigenServices.com CounterColumn read- example  Memtable - (A, 12, 4) (B, 3, 5) (C, 10, 3)  SSTable1 – (A, 5, 3) (B, 1, 6) (C, 5, 4)  SSTable2 – (A, 2, 2) (B, 2, 4) (C, 6, 2) Result: (A, 12, 4) + (B, 1,6) + (C, 5, 4) =12 + 1 + 5 = 18
  • 15.
  • 16.
    16 www.ExigenServices.com Installing andlaunching Cassandra  Download from http://cassandra.apache.org/download/
  • 17.
    17 www.ExigenServices.com Installing andlaunching Cassandra  Launching server: bin/cassandra.bat bin/cassandra.sh – use “-f” key to run sever in foreground, so that all of the server logs will print to standard out – is started with single node cluster called “Test Cluster” listening on port 9160
  • 18.
    18 www.ExigenServices.com Installing andlaunching Cassandra  Starting command-line client interface: bin/cassandra-cli.bat bin/cassandra-cli.sh – you see [username@keyspace] at the beginning of every line
  • 19.
    19 www.ExigenServices.com Creating acluster In configuration file cassandra.yaml specify:  seeds – the list of seeds for the cluster  rpc_address and listen_address – network addresses
  • 20.
    20 www.ExigenServices.com Creating acluster  initial_token – defining the node’s token range  auto_bootstrap – enables auto-migration of data to the new node
  • 21.
    21 www.ExigenServices.com nodetool ring Usenodetool for view configuration ~$ nodetool -h localhost -p 8080 ring Address Status State Load Owns Range Ring 850705… 10.203.71.154 Up Normal 2.53 KB 50.00 0 |<--| 10.203.55.186 Up Normal 1.33 KB 50.00 850705… |-->|
  • 22.
    22 www.ExigenServices.com Connecting toserver  Connect from command line: connect <HOSTNAME>/<PORT> [<USERNAME> ‘<PASSWORD>’]; Examples: connect localhost/9160; connect 127.0.0.1/9160 user ‘password’;  Connect when staring command line client: cassandra-cli –h,––host <HOSTNAME> –p,––port <PORT> –k,––keyspace <KEYSPACE> –u,––username <USERNAME> –p,––password <PASSWORD>
  • 23.
    23 www.ExigenServices.com Describing environment show cluster name;  show keyspaces;  show api version;  describe cluster;  describe keyspace [<KEYSPACE>];
  • 24.
    24 www.ExigenServices.com Create keyspace create keyspace <KEYSPACE>;  create keyspace <KEYSPACE> with <ATTR1> = <VAL1> and <ATTR2> = <VAL2> ...;  Attributes: – placement_strategy – strategy_options – …
  • 25.
    25 www.ExigenServices.com Create keyspace Example: createkeyspace Keyspace1 with placement_strategy = ‘org.apache.cassandra.locator.NetworkTopologyStrategy’ and strategy_options = [{replication_factor: 4}];
  • 26.
    26 www.ExigenServices.com Update keyspace Update attributes of created keyspace: update keyspace <KEYSPACE> with <ATTR1> = <VAL1> and <ATTR2> = <VAL2> ...;
  • 27.
    27 www.ExigenServices.com Switch tokeyspace  use <KEYSPACE>;  use <KEYSPACE> [<USERNAME> ‘<PASSWORD>’];  If you don’t specify username and password then credentials supplied to the ‘connect’ statement will be used  If the server doesn’t support authentication it will ignore credentials
  • 28.
    28 www.ExigenServices.com Switch tokeyspace  Example: use Keyspace1 user1 ‘qwerty123’; When you use keyspace you’ll see [user1@Keyspace1] at the beginning of every line
  • 29.
    29 www.ExigenServices.com Create columnfamily  create column family <COL_FAMILY>;  create column family <COL_FAMILY> with <ATTR1> = <VAL1> and <ATTR2> = <VAL1> ...;  Example: create column family Users with column_type = Super and comparator = UTF8Type and rows_cached = 1000;
  • 30.
    30 www.ExigenServices.com Update columnfamily  When column family is created you can update its attributes: update column family <COL_FAMILY> with <ATTR1> = <VAL1> and <ATTR2> = <VAL1> ...;
  • 31.
    31 www.ExigenServices.com Writing data To write data use set command: set Customers[‘ivan’][‘name’] = ‘Ivan’; set Customers[‘makar’][‘info’][‘age’] = 96;
  • 32.
    32 www.ExigenServices.com Reading data To read data use get command: get Customers[‘ivan’][‘name’]; - this will display ‘Ivan’ get Customers[‘makar’]; - this will display all columns for key ‘makar’
  • 33.
    33 www.ExigenServices.com Reading data To list a range of rows use list command: list Customers; list Customers[a:]; list Customers[a:c] limit 40; - you can specify limit of rows that will be displayed (default - 100)
  • 34.
    34 www.ExigenServices.com Reading data To get columns number use count command: count Customers[‘ivan’] - this will display number of columns for key ‘ivan’
  • 35.
    35 www.ExigenServices.com Deleting data To delete a row, a column or a subcolumn use del command: del Customers[‘ivan’]; - this will delete all columns for key ‘ivan’ del Customers[‘ivan’][‘name’]; - this will delete column name for key ‘ivan’ del Customers[‘ivan’][‘accounts’][‘2312784829312343’]; - this will delete a subcolumn with an account number from ‘accounts’ column for key ‘ivan’
  • 36.
    36 www.ExigenServices.com Deleting data To delete all data in a column family use truncate command: truncate Customers;
  • 37.
    37 www.ExigenServices.com Drop columnfamily or keyspace drop column family Customers; drop keyspace Keyspace1;
  • 38.
    38 www.ExigenServices.com Comparators andvalidators  Comparators – compare column names  Validators – validate column values
  • 39.
    39 www.ExigenServices.com Comparators andvalidators  You can specify comparator for column family and all subcolumns in column family (one for all)  You can specify validators for each known column of column family  You can specify default validator for column family that will be used for columns for which validators aren’t specified  You can specify key validator which will validate row keys
  • 40.
    40 www.ExigenServices.com Attributes ofcolumn family – column_type: can be Standard or Super (default - Standard) – comparator: specifies how column names will be compared for sort order – column_metadata: defines the validation and indexes for known columns – default_validation_class: validator to use for values in columns which are not listed in the column_metadata. (default – BytesType) – key_validation_class: validator for keys
  • 41.
    41 www.ExigenServices.com Column metadata Youcan define validators for each known column in the family create column family User with column_metadata = [ {column_name: name, validation_class: UTF8Type}, {column_name: age, validation_class: IntegerType}, {column_name: birth, validation_class: UTF8Type} ]; Columns not listed in this section are validated with default_validation_class
  • 42.
    42 www.ExigenServices.com Secondary indexes Allows queries by value get users where name = ‘Some user';  Can be created in background
  • 43.
    43 www.ExigenServices.com Creating index Define it in column metadata For example in cassandra-cli: create column family users with comparator=UTF8Type and column_metadata=[{ column_name: birth_date, validation_class: LongType, index_type: KEYS }];
  • 44.
    44 www.ExigenServices.com Some restrictions Cassandra use hash indexes instead of btree indexes. Thus, in where condition at least one indexed field with operator “=“ must be present So, you can’t use get users where birth_date > 1970; but can get users where birth_date = 1990 and karma > 50;
  • 45.
    45 www.ExigenServices.com Index types KEYS  BITMAP (will be supported in future releases) Id Gender Bitmaps F M 1 Female 1 0 2 Male 0 1 3 Male 0 1 4 Unspecified 0 0 5 Female 1 0
  • 46.
  • 47.
    47 www.ExigenServices.com Resources  Homeof Apache Cassandra Project http://cassandra.apache.org/  Apache Cassandra Wiki http://wiki.apache.org/cassandra/  Documentation provided by DataStax http://www.datastax.com/docs/0.8/  Good explanation of creation secondary indexes http://www.anuff.com/2010/07/secondary-indexes-in- cassandra.html  Eben Hewitt “Cassandra: The Definitive Guide”, O’REILLY, 2010, ISBN: 978-1-449-39041-9
  • 48.
    48 www.ExigenServices.com Authors  LevSivashov - lsivashov@gmail.com  Andrey Lomakin - lomakin.andrey@gmail.com, twitter: @Andrey_Lomakin LinkedIn: http://www.linkedin.com/in/andreylomakin  Artem Orobets – enisher@gmail.com twitter: @Dr_EniSh  Anton Veretennik - tennik@gmail.com