KEMBAR78
Migrating from RDBMS to MongoDB | PPTX
RDBMS to MongoDB
Migration Best Practices
Mrinal Sarkar
Solutions Architect
mrinal.sarkar@mongodb.com
2
• Relational Challenges
• Migration Roadmap
• Schema Design
• Application Integration
• Data Migration
• Operational Considerations
• Resources to Get Started
What We’ll Cover
Relational Challenges
4
Relational
Expressive Query Language
& Secondary Indexes
Strong Consistency
Enterprise Management
& Integrations
5
Relational Database Challenges
Data Types
Unstructured data
Semi-structured data
Polymorphic data
Agile Development
Iterative
Short development cycles
New workloads
Volume of Data
Tera-Peta Bytes of data
Billions of records
‘000s of queries/sec
New Architectures
Horizontal scaling
Commodity servers
Cloud computing
6
The World Has Changed
Data Risk Time Cost
7
NoSQL
Scalability
& Performance
Always On,
Global Deployments
FlexibilityExpressive Query Language
& Secondary Indexes
Strong Consistency
Enterprise Management
& Integrations
8
NexusArchitecture
Scalability
& Performance
Always On,
Global Deployments
FlexibilityExpressive Query Language
& Secondary Indexes
Strong Consistency
Enterprise Management
& Integrations
Migration Steps
Migration Roadmap
• Backed by Free, Online MongoDB Training
• Paid Consulting, Services and Support available
Schema Design
DefinitionsRDBMS MongoDB
Database Database
Table Collection
Row Document
Index Index
JOIN Embedded document, document
references or $lookup to
combine data from different
Collections
SQL toAggregation Mapping
Mapping Chart:
http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
Mapping MongoDB Query Language to SQL
Mapping Chart:
http://docs.mongodb.org/manual/reference/sql-comparison/
15
• Embedding
– For 1:1 or 1:Many (where “many” viewed with the parent)
– Ownership and containment
– Document limit of 16MB, consider document growth
– Atomicity of updates
• Referencing
– _id field is referenced in the related document
– Application runs 2nd query to retrieve the data
– Data duplication vs performance gain
– Object referenced by many different sources
– Models complex Many : Many & hierarchical structures
Modeling Relationships:
Embedding and Referencing
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location:
[45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Data Models: Relational to Document
Relational MongoDB
Referencing Documents
18
RDBMS
Document Model Benefits
MongoDB
{
_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus" },
{ type : "Dental",
plan : "Standard" }
]
}
19
Anatomy of a BSON Document{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: ‘+447557505611’
city: ‘London’,
location: [45.123,47.232],
Profession: [banking, finance,
trader],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
Fields can contain an array of
sub-documents
Fields
Typed field values
Fields can
contain
arrays
Document Model Benefits
Agility and flexibility
• Data model supports business
change
• Rapidly iterate to meet new
requirements
Intuitive, natural data representation
• Eliminates ORM layer
• Developers are more productive
Reduces the need for joins, disk seeks
• Programming is more simple
{
_id :
ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham,
Justin",
department : "Marketing",
title : "Product Manager,
Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus"
},
{ type : "Dental",
plan : "Standard"
}
]
}
MongoDB is Fully Featured
22
• MongoDB indexing will be familiar to DBAs
– B-Tree Indexes, Secondary Indexes
• Single biggest tunable performance factor
– Define indexes by identifying common queries
– Use MongoDB explain to ensure index coverage
– MongoDB profiler logs all slow queries
Indexing in MongoDB
• Compound
• Unique
• Array
• TTL
• Geospatial
• Hash
• Sparse
• Partial (new in version 3.2)
• Text Search
Further Reading
http://docs.mongodb.org/manual/data-modeling/
Application Integration
Drivers & Ecosystem
Morphia
MEAN Stack
Python PerlRuby
Support for the most popular languages and frameworks
27
• Ad-hoc reporting, grouping and aggregations, without the complexity of
MapReduce
– Max, Min, Averages, Sum, Union, Redact, GeoNear
• Similar functionality to SQL GROUP_BY
• Processes a stream of documents
• Series of operators
– Filter or transform data
– Input/output chain
• Supports single servers & shards
Application Integration
MongoDBAggregation Framework
HighAvailability: Replica Sets
Replica Set – 2 to 50 copies
Addresses availability considerations:
High Availability
Disaster Recovery
Maintenance
Workload Isolation: operational & analytics
29
Scalability via Sharding
Multiple query optimization models
Each sharding option appropriate for different apps
Elastic and self-balancing
Shard Key Selection:
http://docs.mongodb.org/manual/tutorial/choose-a-shard-key/
30
https://docs.mongodb.org/ecosystem/tools/hadoop/
31
MongoDB Connector for BI
Visualize and explore multi-dimensional
documents using SQL-based BI tools.
The connector does the following:
• Provides the BI tool with the schema of the
MongoDB collection to be visualized
• Translates SQL statements issued by the BI tool
into equivalent MongoDB queries that are sent to
MongoDB for processing
• Converts the results into the tabular format
expected by the BI tool, which can then visualize
the data based on user requirements
Data Integrity
33
Data Governance with Document Validation
Implement data governance without
sacrificing agility that comes from
dynamic schema
• Enforce data quality across multiple teams
and applications
• Use familiar MongoDB expressions to
control document structure
• Validation is optional and can be as simple
as a single field, all the way to every field,
including existence, data types, and
regular expressions
34
Document Validation Example
The example on the left adds a
rule to the contacts collection that
validates:
• The year of birth is no later than
1994
• The document contains a phone
number and / or an email address
• When present, the phone number
and email addresses are strings
Data Durability: Write Concern & Journal
• Configurable per operation
• Combination of Write Concern
Levels & Journaling allow multiple
levels of Guarantees
Write Concern describes the level of
acknowledgement requested from
MongoDB for write operations
Migration and Operations
39
Traditional ETL
Source Database ETL
Incremental Migration, Live
Legacy
Database
MongoDB
Database
41
• Configuration, Provisioning, Monitoring and Backup
• High Availability & Disaster Recovery
• Scalability
• Hardware selection
– Commodity Servers: Prioritize RAM, Fast CPUs & SSD
• Security
– Access Control, Authentication, Encryption
Operations
Download the Whitepaper
MongoDB Operations Best Practices
42
Ops Manager & Cloud Manager
Single-click provisioning, scaling &
upgrades, admin tasks
Monitoring, with charts, dashboards and
alerts on 100+ metrics
Backup and restore, with point-in-time
recovery, support for sharded clusters
The Best Way to Manage MongoDB
Up to 95% Reduction in Operational Overhead
43
MongoDB Compass
For fast schema discovery and
visual construction of ad-hoc
queries
• Visualize schema
– Frequency of fields
– Frequency of types
– Determine validator rules
• View Documents
• Graphically build queries
• Authenticated access
Migration Roadmap
• Backed by Free, Online MongoDB Training
• Paid Consulting, Services and Support available
Getting Started
MongoDB Enablement
Consulting, training, and professional services throughout your project lifecycle
For Operations
For Developers
Design
& Development
Pre-Production
(Test, QA, Deployment)
Production Expansion
Dedicated Consulting Engineer | Custom Projects
Operations
Rapid Start
Production Readiness
MongoDB
Private Cloud
Accelerator
Health Check
Development
Rapid Start
Performance Evaluation and Tuning
For Both
T
Developer
Training
T
Essentials
Training
T
Administrator
Training
T
Advanced Developer
Training
T
Advanced Administrator
Training
Migration inAction
eCommerce Application
• Migration from MS-SQL
• Project completed in 8
months vs original 18 month
planned.
• High Availability,
Performance and reliability
at a fraction of the cost.
• Lower latency
• Faster dev cycles
Content Management
• Migration from Oracle
• 80% cost reduction with
commodity hardware
• 900% performance
improvement
• Development cycles in
weeks vs. tens of months
Customer Data Mgmt &
Analytics
• Multi RDBMS Migration
• 95% faster in identifying
matches
• 50% increase in paying
subscribers
• 60% increase in unique web
site visits.
48
• MongoDB Brings the best of Both Relational & NoSQL Data Models
• MongoDB is a full featured Database Platform
• MongoDB Helps you reduce your Project Time, Cost and Risks
• Migrating to MongoDB is easier than before with Enterprise level
Consulting, Training and Support.
Summary
Download the Guide
https://www.mongodb.com/collateral/rdbms-mongodb-migration-guide
Migrating from RDBMS to MongoDB

Migrating from RDBMS to MongoDB

  • 1.
    RDBMS to MongoDB MigrationBest Practices Mrinal Sarkar Solutions Architect mrinal.sarkar@mongodb.com
  • 2.
    2 • Relational Challenges •Migration Roadmap • Schema Design • Application Integration • Data Migration • Operational Considerations • Resources to Get Started What We’ll Cover
  • 3.
  • 4.
    4 Relational Expressive Query Language &Secondary Indexes Strong Consistency Enterprise Management & Integrations
  • 5.
    5 Relational Database Challenges DataTypes Unstructured data Semi-structured data Polymorphic data Agile Development Iterative Short development cycles New workloads Volume of Data Tera-Peta Bytes of data Billions of records ‘000s of queries/sec New Architectures Horizontal scaling Commodity servers Cloud computing
  • 6.
    6 The World HasChanged Data Risk Time Cost
  • 7.
    7 NoSQL Scalability & Performance Always On, GlobalDeployments FlexibilityExpressive Query Language & Secondary Indexes Strong Consistency Enterprise Management & Integrations
  • 8.
    8 NexusArchitecture Scalability & Performance Always On, GlobalDeployments FlexibilityExpressive Query Language & Secondary Indexes Strong Consistency Enterprise Management & Integrations
  • 9.
  • 10.
    Migration Roadmap • Backedby Free, Online MongoDB Training • Paid Consulting, Services and Support available
  • 11.
  • 12.
    DefinitionsRDBMS MongoDB Database Database TableCollection Row Document Index Index JOIN Embedded document, document references or $lookup to combine data from different Collections
  • 13.
    SQL toAggregation Mapping MappingChart: http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
  • 14.
    Mapping MongoDB QueryLanguage to SQL Mapping Chart: http://docs.mongodb.org/manual/reference/sql-comparison/
  • 15.
    15 • Embedding – For1:1 or 1:Many (where “many” viewed with the parent) – Ownership and containment – Document limit of 16MB, consider document growth – Atomicity of updates • Referencing – _id field is referenced in the related document – Application runs 2nd query to retrieve the data – Data duplication vs performance gain – Object referenced by many different sources – Models complex Many : Many & hierarchical structures Modeling Relationships: Embedding and Referencing
  • 16.
    { first_name: ‘Paul’, surname: ‘Miller’, city:‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Data Models: Relational to Document Relational MongoDB
  • 17.
  • 18.
    18 RDBMS Document Model Benefits MongoDB { _id: ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] }
  • 19.
    19 Anatomy of aBSON Document{ first_name: ‘Paul’, surname: ‘Miller’, cell: ‘+447557505611’ city: ‘London’, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, Fields can contain an array of sub-documents Fields Typed field values Fields can contain arrays
  • 20.
    Document Model Benefits Agilityand flexibility • Data model supports business change • Rapidly iterate to meet new requirements Intuitive, natural data representation • Eliminates ORM layer • Developers are more productive Reduces the need for joins, disk seeks • Programming is more simple { _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] }
  • 21.
  • 22.
    22 • MongoDB indexingwill be familiar to DBAs – B-Tree Indexes, Secondary Indexes • Single biggest tunable performance factor – Define indexes by identifying common queries – Use MongoDB explain to ensure index coverage – MongoDB profiler logs all slow queries Indexing in MongoDB • Compound • Unique • Array • TTL • Geospatial • Hash • Sparse • Partial (new in version 3.2) • Text Search
  • 23.
  • 24.
  • 25.
    Drivers & Ecosystem Morphia MEANStack Python PerlRuby Support for the most popular languages and frameworks
  • 26.
    27 • Ad-hoc reporting,grouping and aggregations, without the complexity of MapReduce – Max, Min, Averages, Sum, Union, Redact, GeoNear • Similar functionality to SQL GROUP_BY • Processes a stream of documents • Series of operators – Filter or transform data – Input/output chain • Supports single servers & shards Application Integration MongoDBAggregation Framework
  • 27.
    HighAvailability: Replica Sets ReplicaSet – 2 to 50 copies Addresses availability considerations: High Availability Disaster Recovery Maintenance Workload Isolation: operational & analytics
  • 28.
    29 Scalability via Sharding Multiplequery optimization models Each sharding option appropriate for different apps Elastic and self-balancing Shard Key Selection: http://docs.mongodb.org/manual/tutorial/choose-a-shard-key/
  • 29.
  • 30.
    31 MongoDB Connector forBI Visualize and explore multi-dimensional documents using SQL-based BI tools. The connector does the following: • Provides the BI tool with the schema of the MongoDB collection to be visualized • Translates SQL statements issued by the BI tool into equivalent MongoDB queries that are sent to MongoDB for processing • Converts the results into the tabular format expected by the BI tool, which can then visualize the data based on user requirements
  • 31.
  • 32.
    33 Data Governance withDocument Validation Implement data governance without sacrificing agility that comes from dynamic schema • Enforce data quality across multiple teams and applications • Use familiar MongoDB expressions to control document structure • Validation is optional and can be as simple as a single field, all the way to every field, including existence, data types, and regular expressions
  • 33.
    34 Document Validation Example Theexample on the left adds a rule to the contacts collection that validates: • The year of birth is no later than 1994 • The document contains a phone number and / or an email address • When present, the phone number and email addresses are strings
  • 34.
    Data Durability: WriteConcern & Journal • Configurable per operation • Combination of Write Concern Levels & Journaling allow multiple levels of Guarantees Write Concern describes the level of acknowledgement requested from MongoDB for write operations
  • 35.
  • 36.
  • 37.
  • 38.
    41 • Configuration, Provisioning,Monitoring and Backup • High Availability & Disaster Recovery • Scalability • Hardware selection – Commodity Servers: Prioritize RAM, Fast CPUs & SSD • Security – Access Control, Authentication, Encryption Operations Download the Whitepaper MongoDB Operations Best Practices
  • 39.
    42 Ops Manager &Cloud Manager Single-click provisioning, scaling & upgrades, admin tasks Monitoring, with charts, dashboards and alerts on 100+ metrics Backup and restore, with point-in-time recovery, support for sharded clusters The Best Way to Manage MongoDB Up to 95% Reduction in Operational Overhead
  • 40.
    43 MongoDB Compass For fastschema discovery and visual construction of ad-hoc queries • Visualize schema – Frequency of fields – Frequency of types – Determine validator rules • View Documents • Graphically build queries • Authenticated access
  • 41.
    Migration Roadmap • Backedby Free, Online MongoDB Training • Paid Consulting, Services and Support available
  • 42.
  • 43.
    MongoDB Enablement Consulting, training,and professional services throughout your project lifecycle For Operations For Developers Design & Development Pre-Production (Test, QA, Deployment) Production Expansion Dedicated Consulting Engineer | Custom Projects Operations Rapid Start Production Readiness MongoDB Private Cloud Accelerator Health Check Development Rapid Start Performance Evaluation and Tuning For Both T Developer Training T Essentials Training T Administrator Training T Advanced Developer Training T Advanced Administrator Training
  • 44.
    Migration inAction eCommerce Application •Migration from MS-SQL • Project completed in 8 months vs original 18 month planned. • High Availability, Performance and reliability at a fraction of the cost. • Lower latency • Faster dev cycles Content Management • Migration from Oracle • 80% cost reduction with commodity hardware • 900% performance improvement • Development cycles in weeks vs. tens of months Customer Data Mgmt & Analytics • Multi RDBMS Migration • 95% faster in identifying matches • 50% increase in paying subscribers • 60% increase in unique web site visits.
  • 45.
    48 • MongoDB Bringsthe best of Both Relational & NoSQL Data Models • MongoDB is a full featured Database Platform • MongoDB Helps you reduce your Project Time, Cost and Risks • Migrating to MongoDB is easier than before with Enterprise level Consulting, Training and Support. Summary Download the Guide https://www.mongodb.com/collateral/rdbms-mongodb-migration-guide

Editor's Notes

  • #5 A lot of people expect us to come in and bash relational database or say we don’t think they’re good. And that’s simply not true. Relational databases has laid the foundation for what you’d want out of a database, and we absolutely think there are capabilities that remain critical today Expressive query language & secondary Indexes. Users should be able to access and manipulate their data in sophisticated ways – and you need a query language that let’s you do all that out of the box. Indexes are a critical part of providing efficient access to data. We believe these are table stakes for a database. Strong consistency. Strong consistency has become second nature for how we think about building applications, and for good reason. The database should always provide access to the most up-to-date copy of the data. Strong consistency is the right way to design a database. Enterprise Management and Integrations. Finally, databases are just one piece of the puzzle, and they need to fit into the enterprise IT stack. Organizations need a database that can be secured, monitored, automated, and integrated with their existing IT infrastructure and staff, such as operations teams, DBAs, and data analysts.
  • #7 But of course the world has changed a lot since the 1980s when the relational database first came about. First of all, data and risk are significantly up. In terms of data 90% data created in last 2 years - think about that for a moment, of all the data ever created, 90% of it was in the last 2 years 80% of enterprise data is unstructured - this is data that doesn’t fit into the neat tables of a relational database Unstructured data is growing 2X rate of structured data At the same time, risks of running a database are higher than ever before. You are now faced with: More users - Apps have shifted from small internal departmental system with thousands of users to large external audiences with millions of users No downtime - It’s no longer the case that apps only need to be available during standard business hours. They must be up 24/7. All across the globe - your users are everywhere, and they are always connected On the other hand, time and costs are way down. There’s less time to build apps than ever before. You’re being asked to: Ship apps in a few months not years - Development methods have shifted from a waterfall process to an iterative process that ships new functionality in weeks and in some cases multiple times per day at companies like Facebook and Amazon. And costs are way down too.  Companies want to: Pay for value over time - Companies have shifted to open-source business and SaaS models that allow them to pay for value over time Use cloud and commodity resources - to reduce the time to provision their infrastructure, and to lower their total cost of ownership
  • #8 Because the relational database was not designed for modern applications, starting about 10 years ago a number of companies began to build their own databases that are fundamentally different. The market calls these NoSQL. NoSQL databases were designed for this new world… Flexibility. All of them have some kind of flexible data model to allow for faster iteration and to accommodate the data we see dominating modern applications. While they all have different approaches, what they have in common is they want to be more flexible. Scalability + Performance. Similarly, they were all built with a focus on scalability, so they all include some form of sharding or partitioning. And they're all designed to deliver great performance. Some are better at reads, some are better at writes, but more or less they all strive to have better performance than a relational database. Always-On Global Deployments. Lastly, NoSQL databases are designed for highly available systems that provide a consistent, high quality experience for users all over the world. They are designed to run on many computers, and they include replication to automatically synchronize the data across servers, racks, and data centers. However, when you take a closer look at these NoSQL systems, it turns out they have thrown out the baby with the bathwater. They have sacrificed the core database capabilities you’ve come to expect and rely on in order to build fully functional apps, like rich querying and secondary indexes, strong consistency, and enterprise management.
  • #9 MongoDB was built to address the way the world has changed while preserving the core database capabilities required to build modern applications. Our vision is to leverage the work that Oracle and others have done over the last 40 years to make relational databases what they are today, and to take the reins from here. We pick up where they left off, incorporating the work that internet pioneers like Google and Amazon did to address the requirements of modern applications. MongoDB is the only database that harnesses the innovations of NoSQL and maintains the foundation of relational databases – and we call this our Nexus Architecture.
  • #22 Rich queries, text search, geospatial, aggregation, mapreduce are types of things you can build based on the richness of the query model.
  • #44 Determine validator rules: You can use the tool to figure out what you want to set as validation rules
  • #47 We offer professional service for all teams throughout your development lifecycle, from design to production.