KEMBAR78
A Beginners Guide to noSQL | PDF
THE BEGINNERS GUIDE TO
noSQL
THE
WHY WE ARE STORING MORE DATA
NOW THAN WE EVER HAVE
BEFORE
THE
WHY WE ARE STORING MORE DATA
NOW THAN WE EVER HAVE
BEFORE
CONNECTIONS BETWEEN OUR
DATA ARE GROWING ALL THE
TIME
THE
WHY WE ARE STORING MORE DATA
NOW THAN WE EVER HAVE
BEFORE
CONNECTIONS BETWEEN OUR
DATA ARE GROWING ALL THE
TIME
WE DON’T MAKE THINGS
KNOWING THE STRUCTURE
FROM DAY 1
THE
WHY WE ARE STORING MORE DATA
NOW THAN WE EVER HAVE
BEFORE
CONNECTIONS BETWEEN OUR
DATA ARE GROWING ALL THE
TIME
WE DON’T MAKE THINGS
KNOWING THE STRUCTURE
FROM DAY 1
SERVER ARCHITECTURE IS NOW
AT A STAGE WHERE WE CAN
TAKE ADVANTAGE OF IT
salary lists
most web applications
social networks
semantic trading
SiZE
Complexity
relational databases
NOSQL
USE CASES
LARGE DATA VOLUMES
MASSIVELY DISTRIBUTED ARCHITECTURE
REQUIRED TO STORE THE DATA
GOOGLE, AMAZON, FACEBOOK, 100K SERVERS
NOSQL
USE CASES
LARGE DATA VOLUMES
MASSIVELY DISTRIBUTED ARCHITECTURE
REQUIRED TO STORE THE DATA
GOOGLE, AMAZON, FACEBOOK, 100K SERVERS
EXTREME QUERY WORKLOAD
IMPOSSIBLE TO EFFICIENTLY DO JOINS AT THAT
SCALE WITH AN RDBMS
NOSQL
USE CASES
LARGE DATA VOLUMES
MASSIVELY DISTRIBUTED ARCHITECTURE
REQUIRED TO STORE THE DATA
GOOGLE, AMAZON, FACEBOOK, 100K SERVERS
EXTREME QUERY WORKLOAD
IMPOSSIBLE TO EFFICIENTLY DO JOINS AT THAT
SCALE WITH AN RDBMS
SCHEMA EVOLUTION
SCEMA FLEXIBILITY IS NOT TRIVIAL AT A LARGE
SCALE BUT IT CAN BE WITH NO SQL
NOSQL
PROS AND CONS
PROS
MASSIVE SCALABILITY
HIGH AVAILABILITY
LOWER COST
SCHEMA FLEXIBILITY
SPARCE AND SEMI STRUCTURED DATA
NOSQL
PROS AND CONS
PROS
MASSIVE SCALABILITY
HIGH AVAILABILITY
LOWER COST
SCHEMA FLEXIBILITY
SPARCE AND SEMI STRUCTURED DATA
CONS
LIMITED QUERY CAPABILITIES
NOT STANDARDISED (PORTABILITY MAY BE AN ISSUE)
STILL A DEVELOPING TECHNOLOGY
OSQL NOSQL NOSQL NOSQL
QL BIGTABLE NOSQL NOSQL
QL NOSQL NOSQL NOSQL N
OSQL NOSQL KEY VALUE NO
SQL NOSQL NOSQL NOSQL N
NOSQL NOSQL NOSQL NOS
NOSQL NOSQL NOSQL NOSQ
QL NOSQL NOSQL NOSQL NO
GRAPHDB NOSQL NOSQL N
NOSQL NOSQL NOSQL NOS
OSQL NOSQL NOSQL NOSQL
SQL NOSQL DOCUMENT NOS
FOUREMERGING TRENDS IN
NOSQL DATABASES
BUT FIRST…
IMAGINE A LIBRARY
LOTS OF DIFFERENT FLOORS
DIFFERENT SECTIONS ON EACH FLOOR
DIFFERENT BOOKSHELVES IN EACH SECTION
LOTS OF BOOKS ON EACH SHELF
LOTS OF PAGES IN EACH BOOK
LOTS OF WORDS ON EACH PAGE
EVERYTHING IS WELL ORGANISED
AND EVERYTHING HAS A SPACE
BUT FIRST…
IMAGINE A LIBRARY
WHAT HAPPENS IF WE
BUY TOO MANY BOOKS!?
(THE WORLD EXPLODES AND THE KITTENS WIN)
BUT FIRST…
IMAGINE A LIBRARY
WHAT HAPPENS IF WE WANT TO
STORE CDS ALL OF A SUDDEN!?
(THE WORLD EXPLODES AND THE KITTENS WIN)
BUT FIRST…
IMAGINE A LIBRARY
WHAT HAPPENS IF WE WANT
TO GET RID OF ALL BOOKS
THAT MENTION KITTENS
(KITTENS STILL WIN)
BIG
BEHAVES LIKE A STANDARD RELATIONAL
DATABASE BUT WITH A SLIGHT CHANGE
http://research.google.com/archive/bigtable.html
http://research.google.com/archive/spanner.html
DESIGNED TO WORK WITH A LOT OF
DATA…A REALLY BIG CRAP TON
CREATED BY GOOGLE AND NOW USED
BY LOTS OF OTHERS
TABLE
THIS IS A STANDARD
RELATIONAL
DATABASE
BIG
TABLE
THIS IS A BIG
TABLE DATABASE
(AND NOW THE NAME MAKES SENCE!)
BIG
TABLE
“A Bigtable is a sparse, distributed, persistent
multidimensional sorted map. The map is indexed by a
row key, column key, and a timestamp; each value in
the map is an uninterpreted array of bytes.”
BIG
TABLE
“A Bigtable is a sparse, distributed, persistent
multidimensional sorted map. The map is indexed by a
row key, column key, and a timestamp; each value in
the map is an uninterpreted array of bytes.”
BIG
TABLE
“A Bigtable is a sparse, distributed, persistent
multidimensional sorted map. The map is indexed by a
row key, column key, and a timestamp; each value in
the map is an uninterpreted array of bytes.”
KEY
VALUE
AGAIN, DESIGNED TO WORK WITH A LOT
OF DATA
EACH BIT OF DATA IS STORED IN A
SINGLE COLLECTION
EACH COLLECTION CAN HAVE DIFFERENT
TYPES OF DATA
KEY
VALUE
A CB D E
KEY
VALUE
A C D E
OUR VALUES ARE HIDDEN INSIDE THE KEYS
TO FIND OUT WHAT THEY ARE WE NEED TO
QUERY THEM
What is in Key B?
The Triangle
B
KEY
VALUE
(VOLDERMORT)
DOCUMENT
STORE
DESIGNED TO WORK WITH A LOT OF
DATA (BEGINNING TO NOTICE A THEME?)
VERY SIMILAR TO A KEY VALUE DATABASE
MAIN DIFFERENCE IS THAT YOU CAN
ACTUALLY SEE THE VALUES
DOCUMENT
STORE
A CB D E
DOCUMENT
STORE
A CB D E
Bring me the triangles
Yes m’lord.
SIDENOTE
REMEMBER HOW SQL
DATABASES ARE LIBRARIES?
NO SQL IS MORE LIKE A BAG
OF CATS!
SIDENOTE
colour: tabby
name: Gunther
colour: ginger
name: Mylo
colour: grey
name: Ruffus
age: kitten
colour: ginger(ish)
name: Fred
age: kitten
colour: ginger(ish)
name: Quentin
legs: 3
WE CAN ADD IN
FIELDS AS AND
WHEN WE
NEED THEM
DOCUMENT
STORE
A CB D E
Bring me the KITTENS!
Of course m’lord.
DOCUMENT
STORE
GRAPH
DATABASE
FOCUS HERE IS ON MODELLING THE
STRUCTURE OF THE DATA
INSPIRED BY GRAPH THEORY (GO MATHS!)
SCALES REALLY WELL TO THE
STRUCTURE OF THE DATA
GRAPH
DATABASE
GRAPH
DATABASE
GRAPH
DATABASE
WORKS_WITH
WORKS_WITH
OWNS
OWNS
CARSHARES IN
GRAPH
DATABASE
name: “Michael”
twitter: “@mrmike
name: “John”
twitter:”@mrjohn”
brand: “Toyota”
currentState: “Broken”
brand: “Vauxhall”
currentState: “Working”
WORKS_WITH
WORKS_WITH
OWNS
OWNS
CARSHARES IN
GRAPH
DATABASE
name: “Michael”
twitter: “@mrmike
name: “John”
twitter:”@mrjohn”
brand: “Toyota”
currentState: “Broken”
brand: “Vauxhall”
currentState: “Working”
WORKS_WITH
WORKS_WITH
OWNS
propertyType: “car”
OWNS
propertyType: “car”
CARSHARES IN
GRAPH
DATABASE
key/value store
bigtable clone
document database
graph database
SiZE
Complexity
key/value store
bigtable clone
document database
graph database
SiZE
Complexity
>90% of use cases
WHEN TO USE
NOSQL
AND WHEN TO USE
SQL
THE BASICS
High availability and disaster recovery are a must
Understand the pros and cons of each design model
Don’t pick something just because it is new
Do you remember the zune?
Don’t pick something based JUST on performance
SQL
High performance for transactions. Think ACID
Highly structured, very portable
Small amounts of data
SMALL IS LESS THAN 500GB
Supports many tables with different types of data
Can fetch ordered data
Compatible with lots of tools
THE GOOD
ATOMICITY
CONSISTENCY
ISOLATION
DURABILITY
SQL
SQL
High performance for transactions. Think ACID
Highly structured, very portable
Small amounts of data
SMALL IS LESS THAN 500GB
Supports many tables with different types of data
Can fetch ordered data
Compatible with lots of tools
THE GOOD
SQL
Complex queries take a long time
The relational model takes a long time to learn
Not really scalable
Not suited for rapid development
THE BAD
noSQL
Fits well for volatile data
High read and write throughput
Scales really well
Rapid development is possible
In general it’s faster than SQL
THE GOOD
BASICALLY
AVAILABLE
SOFT STATE
EVENTUALLY CONSISTENT
noSQL
noSQL
Fits well for volatile data
High read and write throughput
Scales really well
Rapid development is possible
In general it’s faster than SQL
THE GOOD
noSQL
Key/Value pairs need to be packed/unpacked all the time
Still working on getting security for these working as well as SQL
Lack of relations from one key to another
THE GOOD
tl;dr
so use both, but think about when you want to use them!
works great, can’t scale for large data
works great, doesn't fit all situations
SQL
noSQL
A lot of this content is loving ripped from
lots of other (more impressive)
presentations that are already on
SlideShare - you should check them out!
FINALLY

A Beginners Guide to noSQL

  • 1.
  • 2.
    THE WHY WE ARESTORING MORE DATA NOW THAN WE EVER HAVE BEFORE
  • 3.
    THE WHY WE ARESTORING MORE DATA NOW THAN WE EVER HAVE BEFORE CONNECTIONS BETWEEN OUR DATA ARE GROWING ALL THE TIME
  • 4.
    THE WHY WE ARESTORING MORE DATA NOW THAN WE EVER HAVE BEFORE CONNECTIONS BETWEEN OUR DATA ARE GROWING ALL THE TIME WE DON’T MAKE THINGS KNOWING THE STRUCTURE FROM DAY 1
  • 5.
    THE WHY WE ARESTORING MORE DATA NOW THAN WE EVER HAVE BEFORE CONNECTIONS BETWEEN OUR DATA ARE GROWING ALL THE TIME WE DON’T MAKE THINGS KNOWING THE STRUCTURE FROM DAY 1 SERVER ARCHITECTURE IS NOW AT A STAGE WHERE WE CAN TAKE ADVANTAGE OF IT
  • 6.
    salary lists most webapplications social networks semantic trading SiZE Complexity relational databases
  • 7.
    NOSQL USE CASES LARGE DATAVOLUMES MASSIVELY DISTRIBUTED ARCHITECTURE REQUIRED TO STORE THE DATA GOOGLE, AMAZON, FACEBOOK, 100K SERVERS
  • 8.
    NOSQL USE CASES LARGE DATAVOLUMES MASSIVELY DISTRIBUTED ARCHITECTURE REQUIRED TO STORE THE DATA GOOGLE, AMAZON, FACEBOOK, 100K SERVERS EXTREME QUERY WORKLOAD IMPOSSIBLE TO EFFICIENTLY DO JOINS AT THAT SCALE WITH AN RDBMS
  • 9.
    NOSQL USE CASES LARGE DATAVOLUMES MASSIVELY DISTRIBUTED ARCHITECTURE REQUIRED TO STORE THE DATA GOOGLE, AMAZON, FACEBOOK, 100K SERVERS EXTREME QUERY WORKLOAD IMPOSSIBLE TO EFFICIENTLY DO JOINS AT THAT SCALE WITH AN RDBMS SCHEMA EVOLUTION SCEMA FLEXIBILITY IS NOT TRIVIAL AT A LARGE SCALE BUT IT CAN BE WITH NO SQL
  • 10.
    NOSQL PROS AND CONS PROS MASSIVESCALABILITY HIGH AVAILABILITY LOWER COST SCHEMA FLEXIBILITY SPARCE AND SEMI STRUCTURED DATA
  • 11.
    NOSQL PROS AND CONS PROS MASSIVESCALABILITY HIGH AVAILABILITY LOWER COST SCHEMA FLEXIBILITY SPARCE AND SEMI STRUCTURED DATA CONS LIMITED QUERY CAPABILITIES NOT STANDARDISED (PORTABILITY MAY BE AN ISSUE) STILL A DEVELOPING TECHNOLOGY
  • 12.
    OSQL NOSQL NOSQLNOSQL QL BIGTABLE NOSQL NOSQL QL NOSQL NOSQL NOSQL N OSQL NOSQL KEY VALUE NO SQL NOSQL NOSQL NOSQL N NOSQL NOSQL NOSQL NOS NOSQL NOSQL NOSQL NOSQ QL NOSQL NOSQL NOSQL NO GRAPHDB NOSQL NOSQL N NOSQL NOSQL NOSQL NOS OSQL NOSQL NOSQL NOSQL SQL NOSQL DOCUMENT NOS FOUREMERGING TRENDS IN NOSQL DATABASES
  • 13.
    BUT FIRST… IMAGINE ALIBRARY LOTS OF DIFFERENT FLOORS DIFFERENT SECTIONS ON EACH FLOOR DIFFERENT BOOKSHELVES IN EACH SECTION LOTS OF BOOKS ON EACH SHELF LOTS OF PAGES IN EACH BOOK LOTS OF WORDS ON EACH PAGE EVERYTHING IS WELL ORGANISED AND EVERYTHING HAS A SPACE
  • 14.
    BUT FIRST… IMAGINE ALIBRARY WHAT HAPPENS IF WE BUY TOO MANY BOOKS!? (THE WORLD EXPLODES AND THE KITTENS WIN)
  • 15.
    BUT FIRST… IMAGINE ALIBRARY WHAT HAPPENS IF WE WANT TO STORE CDS ALL OF A SUDDEN!? (THE WORLD EXPLODES AND THE KITTENS WIN)
  • 16.
    BUT FIRST… IMAGINE ALIBRARY WHAT HAPPENS IF WE WANT TO GET RID OF ALL BOOKS THAT MENTION KITTENS (KITTENS STILL WIN)
  • 17.
    BIG BEHAVES LIKE ASTANDARD RELATIONAL DATABASE BUT WITH A SLIGHT CHANGE http://research.google.com/archive/bigtable.html http://research.google.com/archive/spanner.html DESIGNED TO WORK WITH A LOT OF DATA…A REALLY BIG CRAP TON CREATED BY GOOGLE AND NOW USED BY LOTS OF OTHERS TABLE
  • 18.
    THIS IS ASTANDARD RELATIONAL DATABASE BIG TABLE THIS IS A BIG TABLE DATABASE (AND NOW THE NAME MAKES SENCE!)
  • 19.
    BIG TABLE “A Bigtable isa sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.”
  • 20.
    BIG TABLE “A Bigtable isa sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.”
  • 21.
    BIG TABLE “A Bigtable isa sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.”
  • 22.
    KEY VALUE AGAIN, DESIGNED TOWORK WITH A LOT OF DATA EACH BIT OF DATA IS STORED IN A SINGLE COLLECTION EACH COLLECTION CAN HAVE DIFFERENT TYPES OF DATA
  • 23.
  • 24.
    KEY VALUE A C DE OUR VALUES ARE HIDDEN INSIDE THE KEYS TO FIND OUT WHAT THEY ARE WE NEED TO QUERY THEM What is in Key B? The Triangle B
  • 25.
  • 26.
    DOCUMENT STORE DESIGNED TO WORKWITH A LOT OF DATA (BEGINNING TO NOTICE A THEME?) VERY SIMILAR TO A KEY VALUE DATABASE MAIN DIFFERENCE IS THAT YOU CAN ACTUALLY SEE THE VALUES
  • 27.
  • 28.
    DOCUMENT STORE A CB DE Bring me the triangles Yes m’lord.
  • 29.
    SIDENOTE REMEMBER HOW SQL DATABASESARE LIBRARIES? NO SQL IS MORE LIKE A BAG OF CATS!
  • 30.
    SIDENOTE colour: tabby name: Gunther colour:ginger name: Mylo colour: grey name: Ruffus age: kitten colour: ginger(ish) name: Fred age: kitten colour: ginger(ish) name: Quentin legs: 3 WE CAN ADD IN FIELDS AS AND WHEN WE NEED THEM
  • 31.
    DOCUMENT STORE A CB DE Bring me the KITTENS! Of course m’lord.
  • 32.
  • 33.
    GRAPH DATABASE FOCUS HERE ISON MODELLING THE STRUCTURE OF THE DATA INSPIRED BY GRAPH THEORY (GO MATHS!) SCALES REALLY WELL TO THE STRUCTURE OF THE DATA
  • 34.
  • 35.
  • 36.
  • 37.
    GRAPH DATABASE name: “Michael” twitter: “@mrmike name:“John” twitter:”@mrjohn” brand: “Toyota” currentState: “Broken” brand: “Vauxhall” currentState: “Working” WORKS_WITH WORKS_WITH OWNS OWNS CARSHARES IN
  • 38.
    GRAPH DATABASE name: “Michael” twitter: “@mrmike name:“John” twitter:”@mrjohn” brand: “Toyota” currentState: “Broken” brand: “Vauxhall” currentState: “Working” WORKS_WITH WORKS_WITH OWNS propertyType: “car” OWNS propertyType: “car” CARSHARES IN
  • 39.
  • 40.
    key/value store bigtable clone documentdatabase graph database SiZE Complexity
  • 41.
    key/value store bigtable clone documentdatabase graph database SiZE Complexity >90% of use cases
  • 42.
    WHEN TO USE NOSQL ANDWHEN TO USE SQL
  • 43.
    THE BASICS High availabilityand disaster recovery are a must Understand the pros and cons of each design model Don’t pick something just because it is new Do you remember the zune? Don’t pick something based JUST on performance
  • 44.
    SQL High performance fortransactions. Think ACID Highly structured, very portable Small amounts of data SMALL IS LESS THAN 500GB Supports many tables with different types of data Can fetch ordered data Compatible with lots of tools THE GOOD
  • 45.
  • 46.
    SQL High performance fortransactions. Think ACID Highly structured, very portable Small amounts of data SMALL IS LESS THAN 500GB Supports many tables with different types of data Can fetch ordered data Compatible with lots of tools THE GOOD
  • 47.
    SQL Complex queries takea long time The relational model takes a long time to learn Not really scalable Not suited for rapid development THE BAD
  • 48.
    noSQL Fits well forvolatile data High read and write throughput Scales really well Rapid development is possible In general it’s faster than SQL THE GOOD
  • 49.
  • 50.
    noSQL Fits well forvolatile data High read and write throughput Scales really well Rapid development is possible In general it’s faster than SQL THE GOOD
  • 51.
    noSQL Key/Value pairs needto be packed/unpacked all the time Still working on getting security for these working as well as SQL Lack of relations from one key to another THE GOOD
  • 52.
    tl;dr so use both,but think about when you want to use them! works great, can’t scale for large data works great, doesn't fit all situations SQL noSQL
  • 53.
    A lot ofthis content is loving ripped from lots of other (more impressive) presentations that are already on SlideShare - you should check them out! FINALLY