KEMBAR78
Introduction to NoSQL | PPTX
AWS Sydney Meetup 2012
• Agenda
  – Introductions
     •   18.00 – 20.00
     •   First Wednesday of the month
     •   Takers on Co Organizing meetup group
     •   Future presentations
  – Presentations
     • Introduction to NoSql - Darrell King, AWS Architect
     • EMR and Dynamo DB – Sohail Khan, AWS/Salesforce Consultant
  – Q&A Session
NoSQL Definition
• NoSQL is a broad class of database that differs
  from the classic RDBMS in some significant
  ways, most important being they do not use
  SQL as their primary query language.
  – NOSQL means Not Only SQL, as in: in the future,
    our backends will consist of Not Only SQL
    databases but also key-value stores, graph
    databases and more.
NoSQL Drivers
• Google, Facebook and Twitter
   – Real time data out of large volumes of data
   – Performance and Real Time more important then consistency
• RDBMS Problems
   – Inability to scale
   – Demands of big data and elastic provisioning
• Big Data
   – Big data is a term applied to data sets whose size is beyond the
     ability of commonly used software tools to capture, manage,
     and process the data within a tolerable elapsed time. Big data
     sizes are a constantly moving target currently ranging from a
     few dozen terabytes to many petabytes of data in a single data
     set.
NoSQL
• NoSQL is all about scalability
  – Scaling to size
  – Scaling to complexity
• Deliver Heavy R/W workloads.
• Eventual consistency
NoSQL
– Eric Brewer’s CAP theorem says that if you want
  consistency, availability, and partition tolerance,
  you have to settle for two out of three. (For a distributed
  system, partition tolerance means the system will continue to work unless there is
  a total network failure. A few nodes can fail and the system keeps going.)

– Consistency means that each client always has the same view of the data.
– Availability means that all clients can always read and write.
– Partition tolerance means that the system works well across physical network
  partitions.
Emerging Categories of NoSQL
1. key-stores without an explicit data model
  –   many based on Amazon's Dynamo key-value store.

2. Others influenced by Google's BigTable
   database
  –   which supports Google products such as Google Maps and Google
      Reader.

3. Document databases store highly structured
   self-describing objects
4. Graph databases store complex relationships
  –   such as those found in social networks.
http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-and-scaling-to-complexity.html
Changing Landscape
Amazon DynamoDB
• Fully managed NoSQL database
• Released January 18th 2012
• Service based on throughput rather then
  storage
• HW – SSD allow predictable performance
Also interesting that they mentioned hardware at all!!

• Similar to managed version of Cassandra
Amazon DynamoDB
• Consistency
   – DynamoDB writes are always consistent
   – Reads are consistent, or eventually consistent
• Durability
   – All writes occur to disk, not memory
   – A write is only committed once it exists in at least two
     physical data centers
• Availability
   – Regional Service
   – Spans multiple AZ’s
   – All data continuously replicated to multiple AZ’s
Amazon Elastic MapReduce
• Aim
  – Process vast amounts of data
• Hosted
  – Hadoop framework (Clusters) (hive)
  – EC2 and S3
• Examples
  – Web Indexing, Data mining, Log file analysis
Elastic MapReduce with DynamoDB
•   Seamless Integration
•   Complementing technologies
•   Managing, analysing and monetising Big Data
•   What it fixes
    – Cost of admin, maintenance and upfront costs
    – Effortless scalability
Source/Further Reading
• NoSQL Ecosystem
    – http://blog.nahurst.com/visual-guide-to-nosql-systems
• NOSQL: scaling to size and scaling to complexity
    – http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-and-scaling-to-
      complexity.html
• Google: MoreSQL is Real
    – http://williamedwardscoder.tumblr.com/post/16399069781/google-moresql-is-real
• Visual Guide to NoSQL Systems
    – http://blog.nahurst.com/visual-guide-to-nosql-systems
• Brewer’s Keynote
    – http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
• Overview of NoSQL
    – http://youtu.be/sh1YACOK_bo
Source/Further Reading
• CAP Theorem
    – http://mysqlha.blogspot.com.au/2010/04/cap-theorem.html
• Plain English Intro to CAP Theorem
    – http://ksat.me/a-plain-english-introduction-to-cap-theorem/
• Availability and Partition Tolerance
    – http://ksat.me/a-plain-english-introduction-to-cap-theorem/
• Nancy Lunch’s 2002 SIGACT paper proving CAP theorm
    – http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=rep1&type=p
      df
• NOSQL for Dummies
    – http://www.slideshare.net/thobe/nosql-for-dummies

Introduction to NoSQL

  • 1.
    AWS Sydney Meetup2012 • Agenda – Introductions • 18.00 – 20.00 • First Wednesday of the month • Takers on Co Organizing meetup group • Future presentations – Presentations • Introduction to NoSql - Darrell King, AWS Architect • EMR and Dynamo DB – Sohail Khan, AWS/Salesforce Consultant – Q&A Session
  • 2.
    NoSQL Definition • NoSQLis a broad class of database that differs from the classic RDBMS in some significant ways, most important being they do not use SQL as their primary query language. – NOSQL means Not Only SQL, as in: in the future, our backends will consist of Not Only SQL databases but also key-value stores, graph databases and more.
  • 3.
    NoSQL Drivers • Google,Facebook and Twitter – Real time data out of large volumes of data – Performance and Real Time more important then consistency • RDBMS Problems – Inability to scale – Demands of big data and elastic provisioning • Big Data – Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set.
  • 4.
    NoSQL • NoSQL isall about scalability – Scaling to size – Scaling to complexity • Deliver Heavy R/W workloads. • Eventual consistency
  • 5.
    NoSQL – Eric Brewer’sCAP theorem says that if you want consistency, availability, and partition tolerance, you have to settle for two out of three. (For a distributed system, partition tolerance means the system will continue to work unless there is a total network failure. A few nodes can fail and the system keeps going.) – Consistency means that each client always has the same view of the data. – Availability means that all clients can always read and write. – Partition tolerance means that the system works well across physical network partitions.
  • 7.
    Emerging Categories ofNoSQL 1. key-stores without an explicit data model – many based on Amazon's Dynamo key-value store. 2. Others influenced by Google's BigTable database – which supports Google products such as Google Maps and Google Reader. 3. Document databases store highly structured self-describing objects 4. Graph databases store complex relationships – such as those found in social networks.
  • 8.
  • 9.
  • 10.
    Amazon DynamoDB • Fullymanaged NoSQL database • Released January 18th 2012 • Service based on throughput rather then storage • HW – SSD allow predictable performance Also interesting that they mentioned hardware at all!! • Similar to managed version of Cassandra
  • 11.
    Amazon DynamoDB • Consistency – DynamoDB writes are always consistent – Reads are consistent, or eventually consistent • Durability – All writes occur to disk, not memory – A write is only committed once it exists in at least two physical data centers • Availability – Regional Service – Spans multiple AZ’s – All data continuously replicated to multiple AZ’s
  • 12.
    Amazon Elastic MapReduce •Aim – Process vast amounts of data • Hosted – Hadoop framework (Clusters) (hive) – EC2 and S3 • Examples – Web Indexing, Data mining, Log file analysis
  • 13.
    Elastic MapReduce withDynamoDB • Seamless Integration • Complementing technologies • Managing, analysing and monetising Big Data • What it fixes – Cost of admin, maintenance and upfront costs – Effortless scalability
  • 14.
    Source/Further Reading • NoSQLEcosystem – http://blog.nahurst.com/visual-guide-to-nosql-systems • NOSQL: scaling to size and scaling to complexity – http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-and-scaling-to- complexity.html • Google: MoreSQL is Real – http://williamedwardscoder.tumblr.com/post/16399069781/google-moresql-is-real • Visual Guide to NoSQL Systems – http://blog.nahurst.com/visual-guide-to-nosql-systems • Brewer’s Keynote – http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf • Overview of NoSQL – http://youtu.be/sh1YACOK_bo
  • 15.
    Source/Further Reading • CAPTheorem – http://mysqlha.blogspot.com.au/2010/04/cap-theorem.html • Plain English Intro to CAP Theorem – http://ksat.me/a-plain-english-introduction-to-cap-theorem/ • Availability and Partition Tolerance – http://ksat.me/a-plain-english-introduction-to-cap-theorem/ • Nancy Lunch’s 2002 SIGACT paper proving CAP theorm – http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&rep=rep1&type=p df • NOSQL for Dummies – http://www.slideshare.net/thobe/nosql-for-dummies

Editor's Notes

  • #2 Hi and thank you all for coming this evening. My name is Darrell King and this here is Sohail. We are AWS consultants based here in Sydney and working for a consultancy company called ProQuest. ProQuest have also been sponsoring this even up until now. This is our first event this year and it looks like a good turn out. Thanks for coming.
  • #3 ACID v’s BASEAtomic: Everything in a transaction succeeds or the entire transaction is rolled back.Consistent: A transaction cannot leave the database in an inconsistent state.Isolated: Transactions cannot interfere with each other.Durable: Completed transactions persist, even when servers restart etc.NoSQL db’s are based on BASEBasic AvailabilitySoft-stateEventual consistency
  • #4 Although NoSQL is a new term and only around for a year or so.  In reality it has been around for years, resolving the issues of scalability with RDBMS.Google, Facebook, Amazon and other huge web sites, therefore, developed non-relational databases that sacrificed consistency for availability and scalability.
  • #6 Consistency means that each client always has the same view of the data.Availability means that all clients can always read and write.Partition tolerance means that the system works well across physical network partitions.
  • #7 Consistency means that each client always has the same view of the data.Availability means that all clients can always read and write.Partition tolerance means that the system works well across physical network partitions.
  • #8 Within the NoSQL zoo, there areSome NoSQL databases are pure key-stores without an explicit data model, with many based on Amazon's Dynamo key-value store.Others are heavily influenced by Google's BigTable database, which supports Google products such as Google Maps and Google Reader.Document databases store highly structured self-describing objects, usually in an XML-like format called JSON.Finally, graph databases store complex relationships such as those found in social networks. several distinct family trees.
  • #10 Some other categories.
  • #11 Designed to addressManagementPerformanceScalabilityReliabilityReplicated across AZ’s
  • #12 Designed to addressManagementPerformanceScalabilityReliabilityReplicated across AZ’s
  • #13 ExamplesWeb indexingData miningLog file analysisData warehousingMachine learningFinancial analysisScientific simulation