KEMBAR78
Sizing MongoDB Clusters | PPTX
#MDBW17
Jay Runkel
Principal Solutions Architect
SIZING MONGODB CLUSTERS
jay.runkel@mongodb.com
@jayrunkel
#MDBW17
AGENDA
• Sizing Objective
• IOPS, Query Processing, Working Set
• Sizing Methodology
• Sizing Example
SIZING
OBJECTIVE
#MDBW17
SIZING
Do I need to shard?
What size servers should I use?
What will my monthly Atlas/AWS/Azure/Google costs be?
When will I need to add a new shard or upgrade my servers?
How much data can my servers support?
How many queries can my servers support?
Will we be able to meet our query latency requirements?
#MDBW17
YOUR BOSS COMES TO YOU…
• Large coffee chain: PlanetDollar
• Collect mobile app performance
• Every tap, click, gesture will generate an event
• 2 Year History
• Perform analytics
‒ Historical
‒ Near real-time (executive dashboards)
• Support usage
• 3000 – 5000 events per second
I need a budget for the monthly Atlas costs?
#MDBW17
THE ONLY ACCURATE WAY TO SIZE A
CLUSTER
• Build a prototype
• Run performance tests using actual data and queries on hardware with specs
similar to production servers
• EVERY OTHER APPROACH IS A GUESS
• Including the one I am presenting today
#MDBW17
SOMETIMES, IT IS NECESSARY TO GUESS 
• Early in project, but
‒ Need to order hardware
‒ Estimate costs to determine “Go/No Go” decision
• Schema design
‒ Compare the hardware requirements for different schemas
MongoDB Clusters Look Like This
Config
Config
Config
Application
Driver
Primary
Secondary
Secondary
#MDBW17
OUR SOLUTION WILL CONSIST OF
• # of shards
• Specifications of each server
‒ CPU
‒ Storage
o Size
o Performance: IOPS
‒ Memory
‒ Network
BACKGROUND
IOPS, Query Processing, Working Set
#MDBW17
OUR SOLUTION WILL CONSIST OF
• # of shards
• Specifications of each server
‒ CPU
‒ Storage
o Size
o Performance: IOPS
‒ Memory
‒ Network
#MDBW17
IOPS
• IOPS – input output units per second
• Throughput
• Random access
• Most workloads “randomly” access documents
collection
#MDBW17
STORAGE PERFORMANCE
Type IOPS
7200 rpm SATA ~ 75 – 100
15000 rpm SAS ~ 175 – 210
RAID-10 (24 x 7200 RPM SAS) 2000
Amazon EBS 250 – 500
Amazon EBS Provisioned IOPS 10000 - 20000
SSD 50000
Flash Storage 100K – 400K (or more)
http://en.wikipedia.org/wiki/IOPS
#MDBW17
HARDEST PART OF SIZING IS IOPS
• How many IOPS do we need?
• Want the real answer, run a test
• How to estimate?
#MDBW17
PROCESSING A QUERY
Select Index
Load relevant index
entries from disk
Identify documents
using index
Retrieve documents
from disk
Filter documents
Return Documents
#MDBW17
PROCESSING A QUERY
Select Index
Load relevant index
entries from disk
Identify documents
using index
Retrieve documents
from disk
Filter documents
Return Documents
IO
#MDBW17
BUT MONGODB HAS A CACHE
Select Index
Load relevant index
entries from disk
Identify documents
using index
Retrieve documents
from disk
Filter documents
Return Documents
File System
indexes collections
CPU
Memory
indexes
documents
Disk access is only necessary if
indexes or documents are not in
cache
#MDBW17
WORKING SET
Select Index
Load relevant index
entries from disk
Identify documents
using index
Retrieve documents
from disk
Filter documents
Return Documents
File System
indexes collections
CPU
Memory
indexes
documents
Working Set = indexes plus frequently accessed
documents
If RAM greater than working set then reduced IO
#MDBW17
THIS IS ALL GREAT, BUT HOW DO WE ESTIMATE
IOPS?
#MDBW17
MONGODB SIMPLIFIED MODEL
Assume
• Working Set < RAM < Data Size
• Memory contains indexes only
File System
collections indexes
CPU
Memory
indexes
#MDBW17
FIND QUERIES WITH SIMPLIFIED MODEL
File System
collections indexes
CPU
Memory
indexes
Assume appropriate indexes
To resolve find:
• Navigate in-memory indexes
• Retrieve document from disk
1 IOP per document returned
#MDBW17
FIND QUERIES WITH SIMPLIFIED MODEL
File System
collections indexes
CPU
Memory
indexes
Assume appropriate indexes
To resolve find:
• Navigate in-memory indexes
• Retrieve document from disk
1 IOP per document returned
#MDBW17
INSERTS WITH SIMPLIFIED MODEL
To resolve insert:
• Write document to disk
• Update each index file
IOPS = 1 + # of indexes
File System
collections indexes
CPU
Memory
indexes
#MDBW17
DELETES WITH SIMPLIFIED MODEL
To resolve delete:
• Navigate in-memory indexes
• Mark document deleted
• Update each index file
IOPS = 1 + # of indexes
File System
collections indexes
CPU
Memory
indexes
#MDBW17
UPDATES WITH SIMPLIFIED MODEL
To resolve delete:
• Navigate in-memory indexes
• Mark document deleted
• Insert new document version
• Update each index file
IOPS = 2 + # of indexes
File System
collections indexes
CPU
Memory
indexes
#MDBW17
THE SIMPLIFIED MODEL IS TOO SIMPLISTIC
• Working Set
• Checkpoints
• Document size relative to block size
• Indexed Arrays
• Journal, Log
#MDBW17
CHECKPOINTS
• WiredTiger write process:
1. Update document in RAM (cache)
2. Write to journal (disk)
3. Periodically, write dirty documents to disk (checkpoint)
o 60 seconds or 2 GB (whichever comes first)
Checkpoint 1 Checkpoint 2 Checkpoint 3
B C A A C A
3 writes
3 documents written
3 writes
2 documents written
#MDBW17
HOW ARE WE GOING TO GET THERE?
• Estimate total requirements (using simplified model):
‒ RAM
‒ CPU
‒ Disk Space
‒ IOPS
• Adjust based upon working set, checkpoints, etc.
• Design (sharded) cluster that provides these totals
SIZING PROCESS
#MDBW17
METHODOLOGY
Application
Requirements
Cluster Sizing
• Number of
shards
• Server specs
Magic Happens
#MDBW17
METHODOLOGY (CONT.)
1. Collection Size
2. Working Set
3. Queries -> IOPS
4. Adjust based upon working set, checkpoints, etc.
5. Using candidate server specs, calculate # of shards
6. Review, iterate, repeat
Build a spread
sheet
Multiple
iterations may
be required
Sizing Spreadsheet
1. Assumptions
2. Data Size
1. Working Set
– Index Size
– Frequently Accessed Documents
1. Queries – IOPS
1. Shard Calculations
#MDBW17
SIZING SPREADSHEET
1. Assumptions
2. Data Size
1. Working Set
‒ Index Size
‒ Frequently Accessed Documents
1. Queries – IOPS
1. Shard Calculations
#MDBW17
COLLECTION ANALYSIS
‒ # of documents
‒ Data size
‒ Index size
‒ WT compression
#MDBW17
CALCULATE THE NUMBER OF DOCUMENTS
Application Description # of Documents in Collection
There will be 20M documents in the
collection by the end of 2017
20,000,000
We expect to insert 10K documents
per day with 1 year retention period
365*10,000 = 3,655,000
We have 3000 devices each
producing 1 event per minute and
we need to keep a 90 day history
3000 * 60 * 24 * 90 = 388,800,000
#MDBW17
CALCULATE THE NUMBER OF DOCUMENTS
Application Description # of Documents in Collection
There will be 20M documents in the
collection by the end of 2017
20,000,000
We expect to insert 10K documents
per day with 1 year retention period
365*10,000 = 3,655,000
We have 3000 devices each
producing 1 event per minute and
we need to keep a 90 day history
3000 * 60 * 24 * 90 = 388,800,000
PlanetDollar:
2 year history. Each day 5000
inserts per second for 5 hours and
3000 inserts per second for 19
hours
2*365*(5000*5*3600 +
3000*19*3600) = 215496000000
#MDBW17
CALCULATE THE DATA SIZE
• Data Size = # of documents * Average document size
• This information is available in db.stats(), Compass, Ops Manager, Cloud
Manager, Atlas, etc.
#MDBW17
WHAT IF THERE AREN’T ANY DOCUMENTS?
• Write some code
‒ Programmatically generate a large data set
o 5-10% of expected size
‒ Measure
o Collection size
o Index size
o Compression
#MDBW17
DETERMINE COLLECTION AND DATA SIZE
• Use db.collection.stats()
‒ Take data size, index size and extrapolate to production size
‒ Calculate compression ratio
db.collection.stats()
{
count: 10000
size: 70,388,956
avgObjSize: 7038
storageSize: 25341952
…
totalIndexSize: 147456
}
Parameter Formula Value
# of documents 2.5B
avgObjSize 7038
Collection Size =2.5B * 7038 1.760E13
Bytes
WT Compression =
25341952/70388956
.36
Collection Storage =2.5B * 7038 * .36 6.33E12 Bytes
Index Size Per Doc = 147456 / 10000 15 Bytes
Collection Index Size =2.5B * 15 /1024^3 35 GB
#MDBW17
SIZING SPREADSHEET
1. Assumptions
2. Data Size
1. Working Set
‒ Index Size
‒ Frequently Accessed Documents
1. Queries – IOPS
1. Shard Calculations
#MDBW17
WORKING SET
• WorkSet = Indexes plus the set of documents accessed frequently
‒ We know the index size from previous analysis
• Estimate the working set
‒ Given the queries
‒ What are the frequently
accessed docs?
File System
collections indexes
CPU
Memory
indexes
documents
#MDBW17
PLANETDOLLAR WORKING SET
Query Analysis
• Dashboards look at last minute of data
• Customer support debugging tools inspect last hours worth of data
• Reports (run once per day) inspect last years worth of data
Active Documents = 1 hours worth of data
5000 * 3600 * 1KB = 18M KB = 17 GB
Run reports on secondaries
#MDBW17
SIZING SPREADSHEET
1. Assumptions
2. Data Size
1. Working Set
‒ Index Size
‒ Frequently Accessed Documents
1. Queries – IOPS
1. Shard Calculations
#MDBW17
IOPS CALCULATION
+ # of documents returned per second
+ # of documents updated per second
+ # of indexes impacted by each update
+ # of inserts per second
+ # of indexes impacted by each insert
+ # of deletes per second (x2)
+ # of indexes impacted by each delete
- Multiple updates occurring within checkpoint
- % of find query results in cache
Total IOPS
#MDBW17
PLANETDOLLAR QUERIES
• 5000 inserts per second
• 5000 deletes per second
Dashboards (aggregations: 100 per minute)
• Total events per minute across all users (current minute)
• Total events per minute per region (current minute)
• Total events per store per minute (current minute)
Debugging Tool (ad hoc – 5 per second)
• Find all events for a user in last 60 minutes (100 events returned, on average)
Analytics (reports generated once per day)
• For all store and regions, count events per day year over year (last 2 years)
• For all store and regions, events per day for last 365 days
#MDBW17
IOPS FOR INSERTS AND DELETES
• Each insert:
‒ Update collection
‒ Update each index (3 indexes)
• Each Delete:
‒ Update collection
‒ Update each index (3 indexes)
• 5000 inserts/sec
• 5000 deletes/sec
4 IOPS
4 IOPS
(4 * 5000) + (4 * 5000) = 40000 IOPS
#MDBW17
IOPS FOR PLANETDOLLAR AGGREGATIONS
• Example: Total events per minute across all users (current minute)
• How many documents will be read from disk?
05000 per second * 60 seconds = 300,000
Most data in
cache
Some IOPS will
likely be required
#MDBW17
IOPS FOR FIND
• Find all events for a user in last 60 minutes
‒ 5 per second
‒ 100 documents per query
• # IOPS = 5 * 100 = 500 IOPS
#MDBW17
HOW MANY CPUS DO I NEED?
• CPU utilized for:
‒ Compress/decompress
‒ Encrypt/Decrypt
‒ Aggregation queries
‒ General query processing
• In most cases, RAM requirements  large servers  many cores
• Possible exception: aggregation queries
‒ One core per query
‒ # cores >> # of simultaneous aggregation queries
#MDBW17
SIZING SPREADSHEET
1. Assumptions
2. Data Size
1. Working Set
‒ Index Size
‒ Frequently Accessed Documents
1. Queries – IOPS
1. Shard Calculations
#MDBW17
SHARD CALCULATIONS
• At this point you have:
1. Required storage capacity
2. Working Set Size
3. IOPS Estimate
4. Some idea about class of server (or VM) the customer plans to deploy
• Determine number of required shards
#MDBW17
DISK SPACE: HOW MANY SHARDS DO I NEED?
• Sum of disk space across shards > greater than required storage size
Example
Data Size = 9 TB
WiredTiger Compression Ratio:
.33
Storage size = 3 TB
Server disk capacity = 2 TB
2 Shards Required
Recommend providing
2X the compressed data
size in disk
#MDBW17
RAM: HOW MANY SHARDS DO I NEED?
Example
Working Set = 428 GB
Server RAM = 128 GB
428/128 = 3.34
4 Shards Required
#MDBW17
IOPS: HOW MANY SHARDS DO I NEED?
Example
Require: 50K IOPS
AWS Instance: 20K IOPS
3 Shards Required
PLANETDOLLAR
EXAMPLE
https://github.com/jayrunkel/mdbw2017Sizing
#MDBW17
ASSUMPTIONS
#MDBW17
COLLECTION SIZE
#MDBW17
WORKING SET
#MDBW17
QUERIES/IOPS
#MDBW17
SHARD CALCULATIONS
#MDBW17
SIZING SUMMARY
1. Calculate:
‒ Collection size
‒ Index size
2. Estimate Working Set
3. Use simplified model to estimate IOPS
4. Revise (working set coverage, checkpoints, etc.)
5. Calculate shards
Sizing MongoDB Clusters

Sizing MongoDB Clusters

  • 1.
    #MDBW17 Jay Runkel Principal SolutionsArchitect SIZING MONGODB CLUSTERS jay.runkel@mongodb.com @jayrunkel
  • 2.
    #MDBW17 AGENDA • Sizing Objective •IOPS, Query Processing, Working Set • Sizing Methodology • Sizing Example
  • 3.
  • 4.
    #MDBW17 SIZING Do I needto shard? What size servers should I use? What will my monthly Atlas/AWS/Azure/Google costs be? When will I need to add a new shard or upgrade my servers? How much data can my servers support? How many queries can my servers support? Will we be able to meet our query latency requirements?
  • 5.
    #MDBW17 YOUR BOSS COMESTO YOU… • Large coffee chain: PlanetDollar • Collect mobile app performance • Every tap, click, gesture will generate an event • 2 Year History • Perform analytics ‒ Historical ‒ Near real-time (executive dashboards) • Support usage • 3000 – 5000 events per second I need a budget for the monthly Atlas costs?
  • 6.
    #MDBW17 THE ONLY ACCURATEWAY TO SIZE A CLUSTER • Build a prototype • Run performance tests using actual data and queries on hardware with specs similar to production servers • EVERY OTHER APPROACH IS A GUESS • Including the one I am presenting today
  • 7.
    #MDBW17 SOMETIMES, IT ISNECESSARY TO GUESS  • Early in project, but ‒ Need to order hardware ‒ Estimate costs to determine “Go/No Go” decision • Schema design ‒ Compare the hardware requirements for different schemas
  • 8.
    MongoDB Clusters LookLike This Config Config Config Application Driver Primary Secondary Secondary
  • 9.
    #MDBW17 OUR SOLUTION WILLCONSIST OF • # of shards • Specifications of each server ‒ CPU ‒ Storage o Size o Performance: IOPS ‒ Memory ‒ Network
  • 10.
  • 11.
    #MDBW17 OUR SOLUTION WILLCONSIST OF • # of shards • Specifications of each server ‒ CPU ‒ Storage o Size o Performance: IOPS ‒ Memory ‒ Network
  • 12.
    #MDBW17 IOPS • IOPS –input output units per second • Throughput • Random access • Most workloads “randomly” access documents collection
  • 13.
    #MDBW17 STORAGE PERFORMANCE Type IOPS 7200rpm SATA ~ 75 – 100 15000 rpm SAS ~ 175 – 210 RAID-10 (24 x 7200 RPM SAS) 2000 Amazon EBS 250 – 500 Amazon EBS Provisioned IOPS 10000 - 20000 SSD 50000 Flash Storage 100K – 400K (or more) http://en.wikipedia.org/wiki/IOPS
  • 14.
    #MDBW17 HARDEST PART OFSIZING IS IOPS • How many IOPS do we need? • Want the real answer, run a test • How to estimate?
  • 15.
    #MDBW17 PROCESSING A QUERY SelectIndex Load relevant index entries from disk Identify documents using index Retrieve documents from disk Filter documents Return Documents
  • 16.
    #MDBW17 PROCESSING A QUERY SelectIndex Load relevant index entries from disk Identify documents using index Retrieve documents from disk Filter documents Return Documents IO
  • 17.
    #MDBW17 BUT MONGODB HASA CACHE Select Index Load relevant index entries from disk Identify documents using index Retrieve documents from disk Filter documents Return Documents File System indexes collections CPU Memory indexes documents Disk access is only necessary if indexes or documents are not in cache
  • 18.
    #MDBW17 WORKING SET Select Index Loadrelevant index entries from disk Identify documents using index Retrieve documents from disk Filter documents Return Documents File System indexes collections CPU Memory indexes documents Working Set = indexes plus frequently accessed documents If RAM greater than working set then reduced IO
  • 19.
    #MDBW17 THIS IS ALLGREAT, BUT HOW DO WE ESTIMATE IOPS?
  • 20.
    #MDBW17 MONGODB SIMPLIFIED MODEL Assume •Working Set < RAM < Data Size • Memory contains indexes only File System collections indexes CPU Memory indexes
  • 21.
    #MDBW17 FIND QUERIES WITHSIMPLIFIED MODEL File System collections indexes CPU Memory indexes Assume appropriate indexes To resolve find: • Navigate in-memory indexes • Retrieve document from disk 1 IOP per document returned
  • 22.
    #MDBW17 FIND QUERIES WITHSIMPLIFIED MODEL File System collections indexes CPU Memory indexes Assume appropriate indexes To resolve find: • Navigate in-memory indexes • Retrieve document from disk 1 IOP per document returned
  • 23.
    #MDBW17 INSERTS WITH SIMPLIFIEDMODEL To resolve insert: • Write document to disk • Update each index file IOPS = 1 + # of indexes File System collections indexes CPU Memory indexes
  • 24.
    #MDBW17 DELETES WITH SIMPLIFIEDMODEL To resolve delete: • Navigate in-memory indexes • Mark document deleted • Update each index file IOPS = 1 + # of indexes File System collections indexes CPU Memory indexes
  • 25.
    #MDBW17 UPDATES WITH SIMPLIFIEDMODEL To resolve delete: • Navigate in-memory indexes • Mark document deleted • Insert new document version • Update each index file IOPS = 2 + # of indexes File System collections indexes CPU Memory indexes
  • 26.
    #MDBW17 THE SIMPLIFIED MODELIS TOO SIMPLISTIC • Working Set • Checkpoints • Document size relative to block size • Indexed Arrays • Journal, Log
  • 27.
    #MDBW17 CHECKPOINTS • WiredTiger writeprocess: 1. Update document in RAM (cache) 2. Write to journal (disk) 3. Periodically, write dirty documents to disk (checkpoint) o 60 seconds or 2 GB (whichever comes first) Checkpoint 1 Checkpoint 2 Checkpoint 3 B C A A C A 3 writes 3 documents written 3 writes 2 documents written
  • 28.
    #MDBW17 HOW ARE WEGOING TO GET THERE? • Estimate total requirements (using simplified model): ‒ RAM ‒ CPU ‒ Disk Space ‒ IOPS • Adjust based upon working set, checkpoints, etc. • Design (sharded) cluster that provides these totals
  • 29.
  • 30.
  • 31.
    #MDBW17 METHODOLOGY (CONT.) 1. CollectionSize 2. Working Set 3. Queries -> IOPS 4. Adjust based upon working set, checkpoints, etc. 5. Using candidate server specs, calculate # of shards 6. Review, iterate, repeat Build a spread sheet Multiple iterations may be required
  • 32.
    Sizing Spreadsheet 1. Assumptions 2.Data Size 1. Working Set – Index Size – Frequently Accessed Documents 1. Queries – IOPS 1. Shard Calculations
  • 33.
    #MDBW17 SIZING SPREADSHEET 1. Assumptions 2.Data Size 1. Working Set ‒ Index Size ‒ Frequently Accessed Documents 1. Queries – IOPS 1. Shard Calculations
  • 34.
    #MDBW17 COLLECTION ANALYSIS ‒ #of documents ‒ Data size ‒ Index size ‒ WT compression
  • 35.
    #MDBW17 CALCULATE THE NUMBEROF DOCUMENTS Application Description # of Documents in Collection There will be 20M documents in the collection by the end of 2017 20,000,000 We expect to insert 10K documents per day with 1 year retention period 365*10,000 = 3,655,000 We have 3000 devices each producing 1 event per minute and we need to keep a 90 day history 3000 * 60 * 24 * 90 = 388,800,000
  • 36.
    #MDBW17 CALCULATE THE NUMBEROF DOCUMENTS Application Description # of Documents in Collection There will be 20M documents in the collection by the end of 2017 20,000,000 We expect to insert 10K documents per day with 1 year retention period 365*10,000 = 3,655,000 We have 3000 devices each producing 1 event per minute and we need to keep a 90 day history 3000 * 60 * 24 * 90 = 388,800,000 PlanetDollar: 2 year history. Each day 5000 inserts per second for 5 hours and 3000 inserts per second for 19 hours 2*365*(5000*5*3600 + 3000*19*3600) = 215496000000
  • 37.
    #MDBW17 CALCULATE THE DATASIZE • Data Size = # of documents * Average document size • This information is available in db.stats(), Compass, Ops Manager, Cloud Manager, Atlas, etc.
  • 38.
    #MDBW17 WHAT IF THEREAREN’T ANY DOCUMENTS? • Write some code ‒ Programmatically generate a large data set o 5-10% of expected size ‒ Measure o Collection size o Index size o Compression
  • 39.
    #MDBW17 DETERMINE COLLECTION ANDDATA SIZE • Use db.collection.stats() ‒ Take data size, index size and extrapolate to production size ‒ Calculate compression ratio db.collection.stats() { count: 10000 size: 70,388,956 avgObjSize: 7038 storageSize: 25341952 … totalIndexSize: 147456 } Parameter Formula Value # of documents 2.5B avgObjSize 7038 Collection Size =2.5B * 7038 1.760E13 Bytes WT Compression = 25341952/70388956 .36 Collection Storage =2.5B * 7038 * .36 6.33E12 Bytes Index Size Per Doc = 147456 / 10000 15 Bytes Collection Index Size =2.5B * 15 /1024^3 35 GB
  • 40.
    #MDBW17 SIZING SPREADSHEET 1. Assumptions 2.Data Size 1. Working Set ‒ Index Size ‒ Frequently Accessed Documents 1. Queries – IOPS 1. Shard Calculations
  • 41.
    #MDBW17 WORKING SET • WorkSet= Indexes plus the set of documents accessed frequently ‒ We know the index size from previous analysis • Estimate the working set ‒ Given the queries ‒ What are the frequently accessed docs? File System collections indexes CPU Memory indexes documents
  • 42.
    #MDBW17 PLANETDOLLAR WORKING SET QueryAnalysis • Dashboards look at last minute of data • Customer support debugging tools inspect last hours worth of data • Reports (run once per day) inspect last years worth of data Active Documents = 1 hours worth of data 5000 * 3600 * 1KB = 18M KB = 17 GB Run reports on secondaries
  • 43.
    #MDBW17 SIZING SPREADSHEET 1. Assumptions 2.Data Size 1. Working Set ‒ Index Size ‒ Frequently Accessed Documents 1. Queries – IOPS 1. Shard Calculations
  • 44.
    #MDBW17 IOPS CALCULATION + #of documents returned per second + # of documents updated per second + # of indexes impacted by each update + # of inserts per second + # of indexes impacted by each insert + # of deletes per second (x2) + # of indexes impacted by each delete - Multiple updates occurring within checkpoint - % of find query results in cache Total IOPS
  • 45.
    #MDBW17 PLANETDOLLAR QUERIES • 5000inserts per second • 5000 deletes per second Dashboards (aggregations: 100 per minute) • Total events per minute across all users (current minute) • Total events per minute per region (current minute) • Total events per store per minute (current minute) Debugging Tool (ad hoc – 5 per second) • Find all events for a user in last 60 minutes (100 events returned, on average) Analytics (reports generated once per day) • For all store and regions, count events per day year over year (last 2 years) • For all store and regions, events per day for last 365 days
  • 46.
    #MDBW17 IOPS FOR INSERTSAND DELETES • Each insert: ‒ Update collection ‒ Update each index (3 indexes) • Each Delete: ‒ Update collection ‒ Update each index (3 indexes) • 5000 inserts/sec • 5000 deletes/sec 4 IOPS 4 IOPS (4 * 5000) + (4 * 5000) = 40000 IOPS
  • 47.
    #MDBW17 IOPS FOR PLANETDOLLARAGGREGATIONS • Example: Total events per minute across all users (current minute) • How many documents will be read from disk? 05000 per second * 60 seconds = 300,000 Most data in cache Some IOPS will likely be required
  • 48.
    #MDBW17 IOPS FOR FIND •Find all events for a user in last 60 minutes ‒ 5 per second ‒ 100 documents per query • # IOPS = 5 * 100 = 500 IOPS
  • 49.
    #MDBW17 HOW MANY CPUSDO I NEED? • CPU utilized for: ‒ Compress/decompress ‒ Encrypt/Decrypt ‒ Aggregation queries ‒ General query processing • In most cases, RAM requirements  large servers  many cores • Possible exception: aggregation queries ‒ One core per query ‒ # cores >> # of simultaneous aggregation queries
  • 50.
    #MDBW17 SIZING SPREADSHEET 1. Assumptions 2.Data Size 1. Working Set ‒ Index Size ‒ Frequently Accessed Documents 1. Queries – IOPS 1. Shard Calculations
  • 51.
    #MDBW17 SHARD CALCULATIONS • Atthis point you have: 1. Required storage capacity 2. Working Set Size 3. IOPS Estimate 4. Some idea about class of server (or VM) the customer plans to deploy • Determine number of required shards
  • 52.
    #MDBW17 DISK SPACE: HOWMANY SHARDS DO I NEED? • Sum of disk space across shards > greater than required storage size Example Data Size = 9 TB WiredTiger Compression Ratio: .33 Storage size = 3 TB Server disk capacity = 2 TB 2 Shards Required Recommend providing 2X the compressed data size in disk
  • 53.
    #MDBW17 RAM: HOW MANYSHARDS DO I NEED? Example Working Set = 428 GB Server RAM = 128 GB 428/128 = 3.34 4 Shards Required
  • 54.
    #MDBW17 IOPS: HOW MANYSHARDS DO I NEED? Example Require: 50K IOPS AWS Instance: 20K IOPS 3 Shards Required
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
    #MDBW17 SIZING SUMMARY 1. Calculate: ‒Collection size ‒ Index size 2. Estimate Working Set 3. Use simplified model to estimate IOPS 4. Revise (working set coverage, checkpoints, etc.) 5. Calculate shards

Editor's Notes

  • #9 Sharding is transparent to applications; whether there is one or one hundred shards, the application code for querying MongoDB is the same. Applications issue queries to a query router that dispatches the query to the appropriate shards. For key-value queries that are based on the shard key, the query router will dispatch the query to the shard that manages the document with the requested key. When using range-based sharding, queries that specify ranges on the shard key are only dispatched to shards that contain documents with values within the range. For queries that don’t use the shard key, the query router will dispatch the query to all shards and aggregate and sort the results as appropriate. Multiple query routers can be used with a MongoDB system, and the appropriate number is determined based on performance and availability requirements of the application.
  • #32 Do we really need all indexes in RAM? How big is the set of frequently accessed documents? How often will documents be RAM?