KEMBAR78
MongoDB Introduction and Data Modelling | PPTX
© 2011 Xpanxion all rights reserved
GLOBAL SOFTWARE ENGINEERING EXCELLENCE
MongoDB
<Version 5.1>
17 April 2013
Internal
<Internal Restricted/Confidential(when filled) >
- Sachin Bhosale
© 2011 Xpanxion all rights reserved
The Evolution of Databases
2010
RDBMS
NoSQL
OLAP/BI
Hadoop
2000
RDBMS
OLAP/BI
1990
RDBMS
Operational
Data
Datawarehouse
© 2011 Xpanxion all rights reserved
Big Data
 "Big Data" describes data sets so large and complex they are impractical
to manage with traditional software tools. Big Data relates to data
creation, storage, retrieval and analysis that is remarkable in terms
of volume, velocity, and variety.
 Volume - A typical PC might have had 10 gigabytes of storage in 2000.
Today, Facebook ingests 500 terabytes of new data every day
 Velocity - Clickstreams and ad impressions capture user behavior at
millions of events per second; high-frequency stock trading algorithms
reflect market changes within microseconds
 Variety - Big Data data isn't just numbers, dates, and strings. Big Data
is also geospatial data, 3D data, audio and video, and unstructured
text, including log files and social media
© 2011 Xpanxion all rights reserved
Big Data Technologies
Operational Analytical
Latency 10 ms - 100 ms 1 min - 100 min
Concurrency 1000 - 100,000 1 - 10
Access Pattern Writes and Reads Reads
Queries Selective Unselective
Data Scope Operational Retrospective
End User Customer Data Scientist
Technology NoSQL MapReduce, MPP Database
© 2011 Xpanxion all rights reserved
Relational Database Challenges
Data Types
• Unstructured data
• Semi-structured data
• Polymorphic data
Volume of Data
• Petabytes of data
• Trillions of records
• Tens of millions of queries per second
Agile Development
• Iterative
• Short development cycles
• New workloads
New Architectures
• Horizontal scaling
• Commodity servers
• Cloud computing
© 2011 Xpanxion all rights reserved
NOSQL Categories
Redis Cassandra MongoDB Neo4j
© 2011 Xpanxion all rights reserved
Which one is the best?
© 2011 Xpanxion all rights reserved
What is MongoDB?
 MongoDB is a ___________ database
 Document
 Open source
 High performance
 Horizontally scalable
 Full featured
© 2011 Xpanxion all rights reserved
Document Database
 Not for .PDF & .DOC files
 A document is essentially an associative array
 Document == JSON object
 Document == PHP Array
 Document == Python Dictionary
 Document == Ruby Hash
 etc
© 2011 Xpanxion all rights reserved
Open Source
 MongoDB is an open source project
 On GitHub
 Licensed under the AGPL
 Commercial licenses available
 Started & sponsored by 10gen
© 2011 Xpanxion all rights reserved
High Performance
 Written in C++
 Extensive use of memory-mapped files
i.e. read-through write-through memory caching.
 Runs nearly everywhere
 Data serialized as BSON (fast parsing)
 Full support for primary & secondary indexes
 Document model = less work
© 2011 Xpanxion all rights reserved
Horizontally Scalable
© 2011 Xpanxion all rights reserved
Full Featured
 Ad Hoc queries
 Real time aggregation
 Rich query capabilities
 Traditionally consistent
 Geospatial features
 Support for most programming languages
 JavaScript, Python, Ruby, PHP, Perl, Java, Scala, C#, C, C++
 Flexible schema
© 2011 Xpanxion all rights reserved
MongoDB Installation
 Get the MongoDB distributions by platform and version from
http://www.mongodb.org/downloads
 MongoDB requires a data folder to store its files. The default location for
the MongoDB data directory is C:datadb (Windows) or /data/db (Linux)
 Running MongoDB
Windows
C:mongodbbinmongod.exe --dbpath d:testdata
Linux
./bin/mongod --dbpath /data/mongodb
© 2011 Xpanxion all rights reserved
MongoDB Package Components - 1
 Core Processes
 mongod
 mongos
 mongo
 Binary Import and Export Tools
 mongodump
 mongorestore
 bsondump
 Mongooplog
© 2011 Xpanxion all rights reserved
MongoDB Package Components - 2
 Data Import and Export Tools
 mongoimport
 Mongoexport
 Diagnostic Tools
 mongostat
 mongotop
 mongosniff
 Mongoperf
 GridFS
 mongofiles
© 2011 Xpanxion all rights reserved
Mongo Shell
vars / functions / data structs + types
Spidermonkey / V8
ObjectId("...")
new Date()
Object.bsonsize()
db["collection"].find/count/update
short-hand for collections
Doesn't require quoted keys
Don’t copy and paste too much
Embedded
Javascript
Interpreter
Global Functions
and Objects
MongoDB driver
Exposed
JSON-like stuff
© 2011 Xpanxion all rights reserved
Terminology
© 2011 Xpanxion all rights reserved
Core MongoDB Operations (CRUD) - 1
 CREATE
 insert() - is the primary method to insert a document or documents
into a MongoDB collection
db.studs.insert({_id : 1, name : “Sachin”, score : 110})
 save() - performs an insert if the document to save does not contain
the _id field
db.studs.save({name : “Sachin”, score : 110})
 READ
 find() - method returns a cursor that contains a number of documents
db.collection.find( <query>, <projection> )
 findOne() - selects a single document from a collection and returns
that document
© 2011 Xpanxion all rights reserved
Core MongoDB Operations (CRUD) - 2
 UPDATE
 update() - method updates a single document, but by using the multi
option, update() can update all documents that match the query
criteria in the collection
 Update Operators
 Fields - $inc, $rename, $set, $unset
 Array - $addToSet, $pop, $pullAll, $pull, $push
 save() - performs a special type of update(), depending on the _id field
of the specified document
 Examples
db.bios.update( { _id: 3}, {$unset: {birth: 1 } }, { multi: true } )
db.bios.update( { _id: 1}, {$set: {'contribs.1': 'ALGOL 58' } } )
© 2011 Xpanxion all rights reserved
Core MongoDB Operations (CRUD) - 3
 DELETE
 remove() - deletes documents from a collection.
db.collection.remove( <query>, <justOne> )
 Remove All documents
db.bios.remove()
 Remove a single document that matches a condition
db.bios.remove( { turing: true }, 1 )
© 2011 Xpanxion all rights reserved
Data Modeling
 Data in MongoDB has a flexible schema.
 Collections do not enforce document structure.
 documents in the same collection do not need to have the same set of
fields or structure, and
 common fields in a collection’s documents may hold different types of
data.
 MongoDB does not support
 Joins – on multiple collections
 Transaction - across multiple documents
© 2011 Xpanxion all rights reserved
Data Modeling Considerations
 Inherent properties and requirements of the application objects and the
relationships
 MongoDB data models must also reflect
 how data will grow and change over time, and
 the kinds of queries your application will perform
 These considerations and requirements force to make a number of multi-
factored decisions:
 normalization and de-normalization
 indexing strategy
 representation of data in arrays in BSON
© 2011 Xpanxion all rights reserved
Data Modeling Decisions
Data modeling decisions involve determining how to structure the
documents to model the data effectively.
 Embedding
 To de-normalize data, store two related pieces of data in a single
document.
 Referencing
 To normalize data, store references between two documents to
indicate a relationship between the data represented in each
document.
 Atomicity
 MongoDB only provides atomic operations on the level of a single
document
© 2011 Xpanxion all rights reserved
Aggregation
 MongoDB introduced the aggregation framework that provides a
powerful and flexible set of tools to use for many data aggregation tasks
without having to use map-reduce
 While map-reduce is powerful, it is often more difficult than necessary for
many simple aggregation tasks, such as totaling or averaging field values.
db.collection.mapReduce()
 Pipeline Operators and Indexes
$match, $sort, $limit, $skip, $project, $unwind, $group
db.articles.aggregate(
{ $project : {
author : 1,
tags : 1,
} },
{ $unwind : "$tags" },
{ $group : {
_id : { tags : "$tags" },
authors : { $addToSet : "$author" }
} }
)
© 2011 Xpanxion all rights reserved
Blog Project withMongoDB
 Blogger with following functionality
 Singup
 New Post
 Login
 Logout
 It uses Python, Pymongo drivers, MongoDB
© 2011 Xpanxion all rights reserved
Questions ?
© 2011 Xpanxion all rights reserved
Thank You

MongoDB Introduction and Data Modelling

  • 1.
    © 2011 Xpanxionall rights reserved GLOBAL SOFTWARE ENGINEERING EXCELLENCE MongoDB <Version 5.1> 17 April 2013 Internal <Internal Restricted/Confidential(when filled) > - Sachin Bhosale
  • 2.
    © 2011 Xpanxionall rights reserved The Evolution of Databases 2010 RDBMS NoSQL OLAP/BI Hadoop 2000 RDBMS OLAP/BI 1990 RDBMS Operational Data Datawarehouse
  • 3.
    © 2011 Xpanxionall rights reserved Big Data  "Big Data" describes data sets so large and complex they are impractical to manage with traditional software tools. Big Data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, velocity, and variety.  Volume - A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data every day  Velocity - Clickstreams and ad impressions capture user behavior at millions of events per second; high-frequency stock trading algorithms reflect market changes within microseconds  Variety - Big Data data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media
  • 4.
    © 2011 Xpanxionall rights reserved Big Data Technologies Operational Analytical Latency 10 ms - 100 ms 1 min - 100 min Concurrency 1000 - 100,000 1 - 10 Access Pattern Writes and Reads Reads Queries Selective Unselective Data Scope Operational Retrospective End User Customer Data Scientist Technology NoSQL MapReduce, MPP Database
  • 5.
    © 2011 Xpanxionall rights reserved Relational Database Challenges Data Types • Unstructured data • Semi-structured data • Polymorphic data Volume of Data • Petabytes of data • Trillions of records • Tens of millions of queries per second Agile Development • Iterative • Short development cycles • New workloads New Architectures • Horizontal scaling • Commodity servers • Cloud computing
  • 6.
    © 2011 Xpanxionall rights reserved NOSQL Categories Redis Cassandra MongoDB Neo4j
  • 7.
    © 2011 Xpanxionall rights reserved Which one is the best?
  • 8.
    © 2011 Xpanxionall rights reserved What is MongoDB?  MongoDB is a ___________ database  Document  Open source  High performance  Horizontally scalable  Full featured
  • 9.
    © 2011 Xpanxionall rights reserved Document Database  Not for .PDF & .DOC files  A document is essentially an associative array  Document == JSON object  Document == PHP Array  Document == Python Dictionary  Document == Ruby Hash  etc
  • 10.
    © 2011 Xpanxionall rights reserved Open Source  MongoDB is an open source project  On GitHub  Licensed under the AGPL  Commercial licenses available  Started & sponsored by 10gen
  • 11.
    © 2011 Xpanxionall rights reserved High Performance  Written in C++  Extensive use of memory-mapped files i.e. read-through write-through memory caching.  Runs nearly everywhere  Data serialized as BSON (fast parsing)  Full support for primary & secondary indexes  Document model = less work
  • 12.
    © 2011 Xpanxionall rights reserved Horizontally Scalable
  • 13.
    © 2011 Xpanxionall rights reserved Full Featured  Ad Hoc queries  Real time aggregation  Rich query capabilities  Traditionally consistent  Geospatial features  Support for most programming languages  JavaScript, Python, Ruby, PHP, Perl, Java, Scala, C#, C, C++  Flexible schema
  • 14.
    © 2011 Xpanxionall rights reserved MongoDB Installation  Get the MongoDB distributions by platform and version from http://www.mongodb.org/downloads  MongoDB requires a data folder to store its files. The default location for the MongoDB data directory is C:datadb (Windows) or /data/db (Linux)  Running MongoDB Windows C:mongodbbinmongod.exe --dbpath d:testdata Linux ./bin/mongod --dbpath /data/mongodb
  • 15.
    © 2011 Xpanxionall rights reserved MongoDB Package Components - 1  Core Processes  mongod  mongos  mongo  Binary Import and Export Tools  mongodump  mongorestore  bsondump  Mongooplog
  • 16.
    © 2011 Xpanxionall rights reserved MongoDB Package Components - 2  Data Import and Export Tools  mongoimport  Mongoexport  Diagnostic Tools  mongostat  mongotop  mongosniff  Mongoperf  GridFS  mongofiles
  • 17.
    © 2011 Xpanxionall rights reserved Mongo Shell vars / functions / data structs + types Spidermonkey / V8 ObjectId("...") new Date() Object.bsonsize() db["collection"].find/count/update short-hand for collections Doesn't require quoted keys Don’t copy and paste too much Embedded Javascript Interpreter Global Functions and Objects MongoDB driver Exposed JSON-like stuff
  • 18.
    © 2011 Xpanxionall rights reserved Terminology
  • 19.
    © 2011 Xpanxionall rights reserved Core MongoDB Operations (CRUD) - 1  CREATE  insert() - is the primary method to insert a document or documents into a MongoDB collection db.studs.insert({_id : 1, name : “Sachin”, score : 110})  save() - performs an insert if the document to save does not contain the _id field db.studs.save({name : “Sachin”, score : 110})  READ  find() - method returns a cursor that contains a number of documents db.collection.find( <query>, <projection> )  findOne() - selects a single document from a collection and returns that document
  • 20.
    © 2011 Xpanxionall rights reserved Core MongoDB Operations (CRUD) - 2  UPDATE  update() - method updates a single document, but by using the multi option, update() can update all documents that match the query criteria in the collection  Update Operators  Fields - $inc, $rename, $set, $unset  Array - $addToSet, $pop, $pullAll, $pull, $push  save() - performs a special type of update(), depending on the _id field of the specified document  Examples db.bios.update( { _id: 3}, {$unset: {birth: 1 } }, { multi: true } ) db.bios.update( { _id: 1}, {$set: {'contribs.1': 'ALGOL 58' } } )
  • 21.
    © 2011 Xpanxionall rights reserved Core MongoDB Operations (CRUD) - 3  DELETE  remove() - deletes documents from a collection. db.collection.remove( <query>, <justOne> )  Remove All documents db.bios.remove()  Remove a single document that matches a condition db.bios.remove( { turing: true }, 1 )
  • 22.
    © 2011 Xpanxionall rights reserved Data Modeling  Data in MongoDB has a flexible schema.  Collections do not enforce document structure.  documents in the same collection do not need to have the same set of fields or structure, and  common fields in a collection’s documents may hold different types of data.  MongoDB does not support  Joins – on multiple collections  Transaction - across multiple documents
  • 23.
    © 2011 Xpanxionall rights reserved Data Modeling Considerations  Inherent properties and requirements of the application objects and the relationships  MongoDB data models must also reflect  how data will grow and change over time, and  the kinds of queries your application will perform  These considerations and requirements force to make a number of multi- factored decisions:  normalization and de-normalization  indexing strategy  representation of data in arrays in BSON
  • 24.
    © 2011 Xpanxionall rights reserved Data Modeling Decisions Data modeling decisions involve determining how to structure the documents to model the data effectively.  Embedding  To de-normalize data, store two related pieces of data in a single document.  Referencing  To normalize data, store references between two documents to indicate a relationship between the data represented in each document.  Atomicity  MongoDB only provides atomic operations on the level of a single document
  • 25.
    © 2011 Xpanxionall rights reserved Aggregation  MongoDB introduced the aggregation framework that provides a powerful and flexible set of tools to use for many data aggregation tasks without having to use map-reduce  While map-reduce is powerful, it is often more difficult than necessary for many simple aggregation tasks, such as totaling or averaging field values. db.collection.mapReduce()  Pipeline Operators and Indexes $match, $sort, $limit, $skip, $project, $unwind, $group db.articles.aggregate( { $project : { author : 1, tags : 1, } }, { $unwind : "$tags" }, { $group : { _id : { tags : "$tags" }, authors : { $addToSet : "$author" } } } )
  • 26.
    © 2011 Xpanxionall rights reserved Blog Project withMongoDB  Blogger with following functionality  Singup  New Post  Login  Logout  It uses Python, Pymongo drivers, MongoDB
  • 27.
    © 2011 Xpanxionall rights reserved Questions ?
  • 28.
    © 2011 Xpanxionall rights reserved Thank You