KEMBAR78
Introduction to NoSQL Databases | PPTX
Introduction to NoSQL DatabasesSan Diego NoSQL Meetup – Aug 2010By Derek Stainerhttp://nosqldatabases.com
AgendaIntroductionObjectiveExplore NoSQL DatabasesConclusion
IntroductionUCSD Graduate in Computer ScienceJava Developer for 10 yearsCreator of http://nosqldatabases.comCurator of NoSQL information
ObjectiveDeeper dive into each type of NoSQL databaseDiscuss 1-2 NoSQL databases  in each family of databases
NoSQL TaxonomyKey/ValueDocumentColumnGraphOthersGeospatialFile SystemObject
Key/Value DatabasesGlobal collection of Key/Value pairsInspired by Amazon’s Dynamo and Distributed HashtablesDesigned to handle massive loadMultiple TypesIn memory i.e. MemcacheOn Disk i.e. Redis, SimpleDBEventually Consistent i.e. Dynamo, Voldemort
Key/Value: VoldemortCreated by LinkedIn, now open sourceInspired by Amazon’s DynamoWritten in JavaPluggable StorageBerkeleyDB, In Memory, MySQLPluggable SerializationJSON, Thrift, Protocol Buffers, etc.Cluster Rebalancing
Key/Value: VoldemortVersioning, based on Vector ClocksReconciliation occurs on reads.Partitioning and Replication based on DynamoConsistent HashingVirtual NodesGossip
Other Key/Value StoresOther Key/Value StoresAmazon’s DynamoRiakRedisMemcacheSimpleDB
Document DatabasesSimilar to a Key/Value database but with a major difference, value is a documentInspired by Lotus NotesFlexible SchemaAny number of fields can be addedDocuments stored in JSON or BSON formatsExamples: CouchDB, MongoDB
Sample Document{     "day": [ 2010, 01, 23 ],     "products": {         "apple": { "price": 10 "quantity": 6 },         "kiwi": { "price": 20 "quantity": 2 }     },     "checkout": 100 }
Document: CouchDBDevelopment began ~ 2005 by Damien Katz former Lotus Notes DeveloperCouch – Cluster Of Unreliable Commodity HardwareTop level Apache ProjectCommercially supported by CouchIOLicensed under Apache LicenseWritten in ErlangDocuments are stored in JSON
Document: CouchDB [cont’d]B-Tree Storage EngineMVCC model, no locking No joins, primary key or foreign key (UUIDs are auto assigned) Built bi-directional replicationCan even run offline, come back and sync back changesCustom persistent views using MapReduceREST API
Document: MongoDBDevelopment started in 2007Commercially supported and developed by 10GenStores documents using BSONSupports AdHoc queriesCan query against embedded objects and arraysSupport multiples types of indexing
Document: MongoDB [cont’d]Officially supported drivers available for multiple languagesC, C++, Java, Javascript, Perl, PHP, Python and RubyCommunity supported drivers include:Scala, Node.js, Haskell, Erlang, SmalltalkReplication uses a master/slave modelScales horizontally via shardingWritten C++
Column Family DatabasesEach key is associated with multiple attributes (i.e. Columns)Hybrid row/column storesInspired by Google BigTableExamples: HBase, Cassandra
Column: HBaseBased on Google’s BigTableApache Project TLPCloudera (certifications, EC2 AMI’s, etc.)Layered over HDFS (Hadoop Distributed File System)Input/Output for MapReduce JobsAPIsThrift, REST
Column: Hbase [cont’d]Automatic partitioningAutomatic re-balancing/re-partitioningFault tolerantHDFS Multiple ReplicasHighly distributed
Column: Hbase [cont’d]Lars George
Column: CassandraCreated at Facebook for Inbox searchFacebook -> Google Code -> ASFCommercial Support available from RiptanoFeatures taken from both Dynamo and BigTableDynamo – Consistent hashing, Partitioning, ReplicationBig Table – Column Familes, MemTables, SSTables
Column: Cassandra [cont’d]Symmetric nodesNo single point of failureLinearly scalableEase of administrationFlexible/Automated ProvisioningFlexible Replica ReplacementHigh AvailabilityEventual ConsistencyHowever, consistency is tuneable
Column: Cassandra [cont’d]PartitioningRandomGood distribution of data between nodesRange scans not possibleOrder PreservingCan lead to unbalanced nodesRange scans, Natural OrderCustomExtremely fast reads/writes (low latency)Thrift API
Column: Cassandra [cont’d]ColumnBasic unit of storageColumn FamilyCollection of like recordsRecord level atomicityIndexedKeyspaceTop level namespaceUsually one per application
Column: Cassandra [cont’d]Eric Evans
Column: Cassandra [cont’d]Column DetailsNamebyte[]Queried againstDetermines sort orderValuebyte[]Opaque to CassandraTimestamplongConflict resolution (last write wins)
Graph DatabasesInspired by Euler Graph Theory, G=(E,V)Focused on modeling the structure of the dataProperty Graph Data ModelExamples: Neo4j, InfiniteGraph
Sample Property Graph[]Todd Hoff
Graph: Neo4jData Model: Property GraphNodes – Person, Place, Thing, etc.Relationships – Lives, Likes, Owns, etc.Properties on BothPrimary operation is graph traversal between nodesWritten in JavaEmbedded database
Graph: Neo4j [cont’d]Disk-basedGraph stored in custom binary formatTransactionalJTA/JTS, XA, 2PC, MVCCScalesBillions of nodes/relationships/properties per JVMRobust6+ years in 24/7 production
Graph: Neo4j [cont’d]Multiple language bindsJython, CpythonJruby (including RESTful API)ClojureScala (including RESTful API)UsesSocial Graph i.e. FacebookRecommendation EnginesFinancial Audit
Graph: Neo4j [cont’d]Licensed under AGPLv3Dual Commercial License AvailableFirst server is freeSecond server InexpensiveCommercial support provided by Neo Technologies
Other Graph DatabasesOther graph databasesInfiniteGraphHyperGraphDBsones
Conclusion
Thank You!
ReferencesNoSQL Databases - Part 1 – Landscape, Vineet Guptahttp://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape.htmlNoSQL for Dummies, Tobias Ivarssonhttp://www.slideshare.net/thobe/nosql-for-dummiesNoSQL Databases, Marin Dimitrovhttp://www.slideshare.net/marin_dimitrov/nosql-databases-3584443CouchDB vs. MongoDB, Gabriele Lanahttp://www.slideshare.net/gabriele.lana/couchdb-vs-mongodb-2982288Hbase, Ryan Rawsonhttp://www.slideshare.net/adorepump/hbase-nosqlIntroduction to Cassandra, Gary Dusbabekhttp://www.slideshare.net/gdusbabek/introduction-to-cassandra-june-2010Cassandra Explained, Eric Evanshttp://www.slideshare.net/jericevans/cassandra-explainedTowards Robust Distributed Systems, Eric Brewerhttp://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdfCassandra - A Decentralized Structured Storage System, Lakshman, Ladishttp://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
References [cont’d]Bigtable: A Distributed Storage System for Structured Data, Google Inc.http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/bigtable-osdi06.pdfDynamo: Amazon’s Highly Available Key-value Store, Amazon Inc.http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdfHBase Architecture 101 – Storage, Lars Georgehttp://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlBASE: An ACID Alternative, Dan Pritchett

Introduction to NoSQL Databases

  • 1.
    Introduction to NoSQLDatabasesSan Diego NoSQL Meetup – Aug 2010By Derek Stainerhttp://nosqldatabases.com
  • 2.
  • 3.
    IntroductionUCSD Graduate inComputer ScienceJava Developer for 10 yearsCreator of http://nosqldatabases.comCurator of NoSQL information
  • 4.
    ObjectiveDeeper dive intoeach type of NoSQL databaseDiscuss 1-2 NoSQL databases in each family of databases
  • 5.
  • 6.
    Key/Value DatabasesGlobal collectionof Key/Value pairsInspired by Amazon’s Dynamo and Distributed HashtablesDesigned to handle massive loadMultiple TypesIn memory i.e. MemcacheOn Disk i.e. Redis, SimpleDBEventually Consistent i.e. Dynamo, Voldemort
  • 7.
    Key/Value: VoldemortCreated byLinkedIn, now open sourceInspired by Amazon’s DynamoWritten in JavaPluggable StorageBerkeleyDB, In Memory, MySQLPluggable SerializationJSON, Thrift, Protocol Buffers, etc.Cluster Rebalancing
  • 8.
    Key/Value: VoldemortVersioning, basedon Vector ClocksReconciliation occurs on reads.Partitioning and Replication based on DynamoConsistent HashingVirtual NodesGossip
  • 9.
    Other Key/Value StoresOtherKey/Value StoresAmazon’s DynamoRiakRedisMemcacheSimpleDB
  • 10.
    Document DatabasesSimilar toa Key/Value database but with a major difference, value is a documentInspired by Lotus NotesFlexible SchemaAny number of fields can be addedDocuments stored in JSON or BSON formatsExamples: CouchDB, MongoDB
  • 11.
    Sample Document{ "day": [ 2010, 01, 23 ], "products": { "apple": { "price": 10 "quantity": 6 }, "kiwi": { "price": 20 "quantity": 2 } }, "checkout": 100 }
  • 12.
    Document: CouchDBDevelopment began~ 2005 by Damien Katz former Lotus Notes DeveloperCouch – Cluster Of Unreliable Commodity HardwareTop level Apache ProjectCommercially supported by CouchIOLicensed under Apache LicenseWritten in ErlangDocuments are stored in JSON
  • 13.
    Document: CouchDB [cont’d]B-TreeStorage EngineMVCC model, no locking No joins, primary key or foreign key (UUIDs are auto assigned) Built bi-directional replicationCan even run offline, come back and sync back changesCustom persistent views using MapReduceREST API
  • 14.
    Document: MongoDBDevelopment startedin 2007Commercially supported and developed by 10GenStores documents using BSONSupports AdHoc queriesCan query against embedded objects and arraysSupport multiples types of indexing
  • 15.
    Document: MongoDB [cont’d]Officiallysupported drivers available for multiple languagesC, C++, Java, Javascript, Perl, PHP, Python and RubyCommunity supported drivers include:Scala, Node.js, Haskell, Erlang, SmalltalkReplication uses a master/slave modelScales horizontally via shardingWritten C++
  • 16.
    Column Family DatabasesEachkey is associated with multiple attributes (i.e. Columns)Hybrid row/column storesInspired by Google BigTableExamples: HBase, Cassandra
  • 17.
    Column: HBaseBased onGoogle’s BigTableApache Project TLPCloudera (certifications, EC2 AMI’s, etc.)Layered over HDFS (Hadoop Distributed File System)Input/Output for MapReduce JobsAPIsThrift, REST
  • 18.
    Column: Hbase [cont’d]AutomaticpartitioningAutomatic re-balancing/re-partitioningFault tolerantHDFS Multiple ReplicasHighly distributed
  • 19.
  • 20.
    Column: CassandraCreated atFacebook for Inbox searchFacebook -> Google Code -> ASFCommercial Support available from RiptanoFeatures taken from both Dynamo and BigTableDynamo – Consistent hashing, Partitioning, ReplicationBig Table – Column Familes, MemTables, SSTables
  • 21.
    Column: Cassandra [cont’d]SymmetricnodesNo single point of failureLinearly scalableEase of administrationFlexible/Automated ProvisioningFlexible Replica ReplacementHigh AvailabilityEventual ConsistencyHowever, consistency is tuneable
  • 22.
    Column: Cassandra [cont’d]PartitioningRandomGooddistribution of data between nodesRange scans not possibleOrder PreservingCan lead to unbalanced nodesRange scans, Natural OrderCustomExtremely fast reads/writes (low latency)Thrift API
  • 23.
    Column: Cassandra [cont’d]ColumnBasicunit of storageColumn FamilyCollection of like recordsRecord level atomicityIndexedKeyspaceTop level namespaceUsually one per application
  • 24.
  • 25.
    Column: Cassandra [cont’d]ColumnDetailsNamebyte[]Queried againstDetermines sort orderValuebyte[]Opaque to CassandraTimestamplongConflict resolution (last write wins)
  • 26.
    Graph DatabasesInspired byEuler Graph Theory, G=(E,V)Focused on modeling the structure of the dataProperty Graph Data ModelExamples: Neo4j, InfiniteGraph
  • 27.
  • 28.
    Graph: Neo4jData Model:Property GraphNodes – Person, Place, Thing, etc.Relationships – Lives, Likes, Owns, etc.Properties on BothPrimary operation is graph traversal between nodesWritten in JavaEmbedded database
  • 29.
    Graph: Neo4j [cont’d]Disk-basedGraphstored in custom binary formatTransactionalJTA/JTS, XA, 2PC, MVCCScalesBillions of nodes/relationships/properties per JVMRobust6+ years in 24/7 production
  • 30.
    Graph: Neo4j [cont’d]Multiplelanguage bindsJython, CpythonJruby (including RESTful API)ClojureScala (including RESTful API)UsesSocial Graph i.e. FacebookRecommendation EnginesFinancial Audit
  • 31.
    Graph: Neo4j [cont’d]Licensedunder AGPLv3Dual Commercial License AvailableFirst server is freeSecond server InexpensiveCommercial support provided by Neo Technologies
  • 32.
    Other Graph DatabasesOthergraph databasesInfiniteGraphHyperGraphDBsones
  • 33.
  • 34.
  • 35.
    ReferencesNoSQL Databases -Part 1 – Landscape, Vineet Guptahttp://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape.htmlNoSQL for Dummies, Tobias Ivarssonhttp://www.slideshare.net/thobe/nosql-for-dummiesNoSQL Databases, Marin Dimitrovhttp://www.slideshare.net/marin_dimitrov/nosql-databases-3584443CouchDB vs. MongoDB, Gabriele Lanahttp://www.slideshare.net/gabriele.lana/couchdb-vs-mongodb-2982288Hbase, Ryan Rawsonhttp://www.slideshare.net/adorepump/hbase-nosqlIntroduction to Cassandra, Gary Dusbabekhttp://www.slideshare.net/gdusbabek/introduction-to-cassandra-june-2010Cassandra Explained, Eric Evanshttp://www.slideshare.net/jericevans/cassandra-explainedTowards Robust Distributed Systems, Eric Brewerhttp://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdfCassandra - A Decentralized Structured Storage System, Lakshman, Ladishttp://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
  • 36.
    References [cont’d]Bigtable: ADistributed Storage System for Structured Data, Google Inc.http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/bigtable-osdi06.pdfDynamo: Amazon’s Highly Available Key-value Store, Amazon Inc.http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdfHBase Architecture 101 – Storage, Lars Georgehttp://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlBASE: An ACID Alternative, Dan Pritchett

Editor's Notes

  • #2 Surveying the NoSQL Landscape, By Derek Stainer
  • #15 Indexing types include, single-key, compound, unique, non-unique, and geospatial
  • #34 Surveying the NoSQL Landscape, By Derek Stainer
  • #35 Surveying the NoSQL Landscape, By Derek Stainer