The document discusses MongoDB concepts including:
- MongoDB uses a document-oriented data model with dynamic schemas and supports embedding and linking of related data.
- Replication allows for high availability and data redundancy across multiple nodes.
- Sharding provides horizontal scalability by distributing data across nodes in a cluster.
- MongoDB supports both eventual and immediate consistency models.
In this document
Powered by AI
Introduction to MongoDB by Norberto Leite, Senior Solutions Architect.
An overview of topics: Replication, Scalability, Consistency, Durability, Flexibility.
Transition of data needs from initial states to more complex requirements.
Overview of document-oriented database with key features of performance, consistency, and scalability.
Tradeoff between scalability and functionality in various database technologies.
Replication necessity for failover, backups, batch jobs, and high availability.
Functionality of replica sets including data protection, high availability, and automated failover.
Introduction to scalability in systems.
Discussion of horizontal scalability in MongoDB.
Details of sharding for data distribution across nodes, auto-balancing, and query routing.
Caching strategies in sharding for managing data efficiently.
Introduction to concepts of consistency and durability in data handling.
Comparison of eventual and immediate consistency models with examples.
Factors determining data durability in MongoDB and associated strategies.
Discussion on the flexibility of data models in MongoDB.
Benefits of using JSON for data representation and its compatibility with object-oriented languages.
Differences in schema design between relational databases and MongoDB including embedding.
Example of using embedding in MongoDB to structure data.
Advantages of JSON for scalability and distribution without performance penalties.
Resources for further information, support, and community engagement with MongoDB.
Why do weneed replication
•Failover
•Backups
•Secondary batch jobs
•High availability
Thursday, 25 October 12
9.
Replica Sets
Data Availability across nodes
• Data Protection
• Multiple copies of the data
• Spread across Data Centers, AZs
• High Availability
• Automated Failover
• Automated Recovery
Thursday, 25 October 12
Sharding
Data Distribution across nodes
• Data location transparent to your code
• Data distribution is automatic
• Data re-distribution is automatic
• Aggregate system resources horizontally
• No code changes
Thursday, 25 October 12
Two choices forconsistency
•Eventual consistency
•Allow updates when a system has been partitioned
•Resolve conflicts later
•Example: CouchDB, Cassandra
•Immediate consistency
•Limit the application of updates to a single master
node for a given slice of data
•Another node can take over after a failure is detected
•Avoids the possibility of conflicts
•Example: MongoDB
Thursday, 25 October 12
32.
Durability
•For how long is my data available?
•When do I now that my data is safe?
•Where?
•Mongodb style
•Fire and Forget
•Get Last Error
•Journal Sync
•Replica Safe
Thursday, 25 October 12
Data Model
• Why JSON?
• Provides a simple, well understood
encapsulation of data
• Maps simply to the object in your OO language
• Linking & Embedding to describe relationships
Thursday, 25 October 12
36.
Json
place1 = {
name : "10gen HQ",
address : "578 Broadway 7th Floor",
city : "New York",
zip : "10011",
tags : [ "business", "tech" ]
}
Thursday, 25 October 12
37.
Schema Design
Relational Database
Thursday, 25 October 12
38.
Schema Design
MongoDB embedding
linking
Thursday, 25 October 12
39.
Schemas in MongoDB
Design documents that simply map to
your application
post = {author: "Hergé",
date: new Date(),
text: "Destination Moon",
tags: ["comic", "adventure"]}
> db.posts.save(post)
Thursday, 25 October 12
JSON & Scaleout
• Embedding removes need for
• Distributed Joins
• Two Phase commit
• Enables data to be distributed across many nodes
without penalty
Thursday, 25 October 12