KEMBAR78
Mongo db v3_deep_dive | PPTX
MongoDB v3.0 Deep Dive
{ Name: ‘Bryan Reinero’,
Title: ‘Developer Advocate’,
Twitter: ‘@blimpyacht’,
Email: ‘bryan@mongdb.com’ }
2
Agenda
• Storage Engine API
• MmapV1
• WiredTiger
• Document Level Concurrency
• Index Improvements
• The Future
3
Storage Engine API
• Allows to "plug-in" different storage engines
– Different work sets require different performance characteristics
– mmapv1 is not ideal for all workloads
– More flexibility
• Can mix storage engines on same replica set/sharded cluster
• Opportunity to integrate further ( HDFS, native encrypted, hardware
optimized …)
4
Storage Engine API
StorageEngine
Top Level Class for creating a Storage Engine
RecoveryUnit
Durability interface. Ensures data is persisted. On-disk information mutated through this interface
DatabaseCatalogEntry
MongoDB Logical Database
CollectionCatalogEntry
MongoDB Collection
SortedDataInterface
Index implementation. Not all Indexes are B-trees
5
MongoDB Storage Engines
• <= MongoDB 2.6
– One unique mechanism using Memory Mapped Files
– "mmapv1" Storage Engine
• MongoDB 3.0 has a few more options
– mmapv1 – default
– wiredTiger
– (in_memory – experimental only)
MMAPv1
https://angrytechnician.files.wordpress.com/2009/05/memory.jpg
7
MMAPv1
9
What is WiredTiger?
• Storage engine company founded by BerkeleyDB alums
• Recently acquired by MongoDB
• Available as a storage engine option in MongoDB 3.0
10
Why is WiredTiger Awesome
• Document-level concurrency
• Disk Compression
• Consistency without journaling
• Better performance on many workloads
– write heavy
11
Improving Concurrency
• 2.2 – Global Lock
• 2.4 – Database-level Locking
• 3.0 MMAPv1 – Collection-level Locking
• 3.0 WT – Document-level
– Writes no longer block all other writes
– Higher level of concurrency leads to more CPU usage
12
Lock Free Algorithms
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
http://ses.library.usyd.edu.au/bitstream/2123/5353/1/michael-cahill-2009-thesis.pdf
13
Lock Free Algorithms
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
Read0(8) = 6
14
Lock Free Algorithms
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
Read0(8) = 6
Read1(8) = 6
Read2(8) = 6
15
Lock Free Algorithms
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
write3(8, $inc )
write4(8, $inc )
16
Lock Free Algorithms
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
write3(8, $inc )
write4(8, $inc )
8 6
8 6
17
Lock Free Algorithms
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
write3(8, $inc )
write4(8, $inc )
8 6
8 7
18
Lock Free Algorithms
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 7
9 2
10 1
11 1
12 5
13 4
15 5
14 5
write3(8, $inc )
write4(8, $inc )
8 6
8 7
Compare
&
Swap
19
Lock Free Algorithms
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 7
9 2
10 1
11 1
12 5
13 4
15 5
14 5
write3(8, $inc )
8 6
20
Lock Free Algorithms
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 7
9 2
10 1
11 1
12 5
13 4
15 5
14 5
write3(8, $inc )
8 7
21
Lock Free Algorithms
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 7
9 2
10 1
11 1
12 5
13 4
15 5
14 5
write3(8, $inc )
8 7Compare
&
!swap
22
Lock Free Algorithms
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 7
9 2
10 1
11 1
12 5
13 4
15 5
14 5
write3(8, $inc )
8 7
Re-read & Retry
23
Lock Free Algorithms
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 7
9 2
10 1
11 1
12 5
13 4
15 5
14 5
write3(8, $inc )
8 8
24
Document Level Concurrency Control
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 8
9 2
10 1
11 1
12 5
13 4
15 5
14 5
Compare
&
Swap
write3(8, $inc )
8 8
25
Document Level Concurrency Control
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 8
9 2
10 1
11 1
12 5
13 4
15 5
14 5
Compare
&
Swap
write3(8, $inc )
8 8
26
Read More
http://ses.library.usyd.edu.au/bitstream/2123/5353/1/michael-cahill-2009-thesis.pdf
27
Wired Tiger Concurrency
• Fine grained
• Lock free
• Wait free
• Stone cold
• Superfly
28
Compression
• Data is compressed on disk
• 2 supported algorithms
– snappy: default. Good compression,
relatively low overhead
– zlib: Better
• Indexes are compressed using
prefix compression
– Allows compression in memory
29
Tuning Wired Tiger
File System CacheWired Tiger Cache
Total RAM
Non-
mapped
30
Tuning Wired Tiger
File System CacheWired Tiger Cache
Total RAM
Default 50% RAM
Non-
mapped
31
Tuning Wired Tiger
File System CacheWired Tiger Cache
Total RAM
Default 50% RAM
Non-
mapped
Knobs
• Wired Tiger Cache Size
• Compression
• Snappy
• Zlib
• off
32
Indexing Improvements
MMapV11 4
2 2
3 5
4 7
5 4
6 9
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
33
Indexing Improvements
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
{
_id: 6,
categories: [
“database”,
“distributed”,
“document store”
]
}
34
Indexing Improvements
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
{$push:
{
categories:
“sharded”
}
}
{
_id: 6,
categories: [
“database”,
“distributed”,
“document store”,
“sharded”
]
}
35
Indexing Improvements
1 4
2 2
3 5
4 7
5 4
6 9
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
{
_id: 6,
categories: [
“database”,
“distributed”,
“document store”,
“sharded”
]
}
36
Indexing Improvements
1 4
2 2
3 5
4 7
5 4
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
6 9
MMapV1
6 9
37
Indexing Improvements
1 4
2 2
3 5
4 7
5 4
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
6 9
MMapV1
38
Indexing Improvements
1 4
2 2
3 5
4 7
5 4
7 3
8 6
9 2
10 1
11 1
12 5
13 4
15 5
14 5
6 9
WiredTiger
The RecordId != DiskLoc
39
Consistency without Journaling
• MMAPv1 uses write-ahead log (journal) to guarantee consistency
• WT doesn't have this need: no in-place updates
– Write-ahead log committed at checkpoints
• 2GB or 60sec by default – configurable!
– No journal commit interval: writes are written to journal as they come in
– Better for insert-heavy workloads
• Replication guarantees the durability
40
7x-10x Performance, 50%-80% Less Storage
How: WiredTiger Storage Engine
• Same data model, same query language,
same ops
• Write performance gains driven by
document-level concurrency control
• Storage savings driven by native
compression
• 100% backwards compatible
• Non-disruptive upgrade
MongoDB 3.0MongoDB 2.6
Performance
https://www.mongodb.com/blog/post/high-performance-
benchmarking-mongodb-and-nosql-systems
41
Playing nice together
• Can not
– Can't copy database files
– Can't just restart w/ same dbpath
• Yes we can!
– Initial sync from replica set works perfectly!
– mongodump/restore
• Rolling upgrade of replica set to WT:
– Shutdown secondary
– Delete dbpath
– Relaunch w/ --storageEngine=wiredTiger
– Rollover
AND BEYOND THE INFINITE
VERSION 3.2
43
Storage Engine
Storage Engines
• WiredTiger (now default)
• In-Memory
• Encryption at Rest
Tools
• Schema visualizer
Features
• $lookup (Enterprise)
• Read Committed
• Schema Validation Rules
• Partial Indexes
44
Storage Engine
Storage Engines
• WiredTiger (now default)
• In-Memory
• Encryption at Rest
Tools
• Schema visualizer
Features
• $lookup (Enterprise)
• Read Committed
• Schema Validation Rules
• Partial Indexes
Features now available in
v3.1.6 Community release
Thanks!
{ name: ‘Bryan Reinero’,
title: ‘Developer Advocate’,
twitter: ‘@blimpyacht’,
code: ‘github.com/breinero’
email: ‘bryan@mongdb.com’ }

Mongo db v3_deep_dive

  • 1.
    MongoDB v3.0 DeepDive { Name: ‘Bryan Reinero’, Title: ‘Developer Advocate’, Twitter: ‘@blimpyacht’, Email: ‘bryan@mongdb.com’ }
  • 2.
    2 Agenda • Storage EngineAPI • MmapV1 • WiredTiger • Document Level Concurrency • Index Improvements • The Future
  • 3.
    3 Storage Engine API •Allows to "plug-in" different storage engines – Different work sets require different performance characteristics – mmapv1 is not ideal for all workloads – More flexibility • Can mix storage engines on same replica set/sharded cluster • Opportunity to integrate further ( HDFS, native encrypted, hardware optimized …)
  • 4.
    4 Storage Engine API StorageEngine TopLevel Class for creating a Storage Engine RecoveryUnit Durability interface. Ensures data is persisted. On-disk information mutated through this interface DatabaseCatalogEntry MongoDB Logical Database CollectionCatalogEntry MongoDB Collection SortedDataInterface Index implementation. Not all Indexes are B-trees
  • 5.
    5 MongoDB Storage Engines •<= MongoDB 2.6 – One unique mechanism using Memory Mapped Files – "mmapv1" Storage Engine • MongoDB 3.0 has a few more options – mmapv1 – default – wiredTiger – (in_memory – experimental only)
  • 6.
  • 7.
  • 9.
    9 What is WiredTiger? •Storage engine company founded by BerkeleyDB alums • Recently acquired by MongoDB • Available as a storage engine option in MongoDB 3.0
  • 10.
    10 Why is WiredTigerAwesome • Document-level concurrency • Disk Compression • Consistency without journaling • Better performance on many workloads – write heavy
  • 11.
    11 Improving Concurrency • 2.2– Global Lock • 2.4 – Database-level Locking • 3.0 MMAPv1 – Collection-level Locking • 3.0 WT – Document-level – Writes no longer block all other writes – Higher level of concurrency leads to more CPU usage
  • 12.
    12 Lock Free Algorithms 14 2 2 3 5 4 7 5 4 6 9 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5 http://ses.library.usyd.edu.au/bitstream/2123/5353/1/michael-cahill-2009-thesis.pdf
  • 13.
    13 Lock Free Algorithms 14 2 2 3 5 4 7 5 4 6 9 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5 Read0(8) = 6
  • 14.
    14 Lock Free Algorithms 14 2 2 3 5 4 7 5 4 6 9 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5 Read0(8) = 6 Read1(8) = 6 Read2(8) = 6
  • 15.
    15 Lock Free Algorithms 14 2 2 3 5 4 7 5 4 6 9 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5 write3(8, $inc ) write4(8, $inc )
  • 16.
    16 Lock Free Algorithms 14 2 2 3 5 4 7 5 4 6 9 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5 write3(8, $inc ) write4(8, $inc ) 8 6 8 6
  • 17.
    17 Lock Free Algorithms 14 2 2 3 5 4 7 5 4 6 9 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5 write3(8, $inc ) write4(8, $inc ) 8 6 8 7
  • 18.
    18 Lock Free Algorithms 14 2 2 3 5 4 7 5 4 6 9 7 3 8 7 9 2 10 1 11 1 12 5 13 4 15 5 14 5 write3(8, $inc ) write4(8, $inc ) 8 6 8 7 Compare & Swap
  • 19.
    19 Lock Free Algorithms 14 2 2 3 5 4 7 5 4 6 9 7 3 8 7 9 2 10 1 11 1 12 5 13 4 15 5 14 5 write3(8, $inc ) 8 6
  • 20.
    20 Lock Free Algorithms 14 2 2 3 5 4 7 5 4 6 9 7 3 8 7 9 2 10 1 11 1 12 5 13 4 15 5 14 5 write3(8, $inc ) 8 7
  • 21.
    21 Lock Free Algorithms 14 2 2 3 5 4 7 5 4 6 9 7 3 8 7 9 2 10 1 11 1 12 5 13 4 15 5 14 5 write3(8, $inc ) 8 7Compare & !swap
  • 22.
    22 Lock Free Algorithms 14 2 2 3 5 4 7 5 4 6 9 7 3 8 7 9 2 10 1 11 1 12 5 13 4 15 5 14 5 write3(8, $inc ) 8 7 Re-read & Retry
  • 23.
    23 Lock Free Algorithms 14 2 2 3 5 4 7 5 4 6 9 7 3 8 7 9 2 10 1 11 1 12 5 13 4 15 5 14 5 write3(8, $inc ) 8 8
  • 24.
    24 Document Level ConcurrencyControl 1 4 2 2 3 5 4 7 5 4 6 9 7 3 8 8 9 2 10 1 11 1 12 5 13 4 15 5 14 5 Compare & Swap write3(8, $inc ) 8 8
  • 25.
    25 Document Level ConcurrencyControl 1 4 2 2 3 5 4 7 5 4 6 9 7 3 8 8 9 2 10 1 11 1 12 5 13 4 15 5 14 5 Compare & Swap write3(8, $inc ) 8 8
  • 26.
  • 27.
    27 Wired Tiger Concurrency •Fine grained • Lock free • Wait free • Stone cold • Superfly
  • 28.
    28 Compression • Data iscompressed on disk • 2 supported algorithms – snappy: default. Good compression, relatively low overhead – zlib: Better • Indexes are compressed using prefix compression – Allows compression in memory
  • 29.
    29 Tuning Wired Tiger FileSystem CacheWired Tiger Cache Total RAM Non- mapped
  • 30.
    30 Tuning Wired Tiger FileSystem CacheWired Tiger Cache Total RAM Default 50% RAM Non- mapped
  • 31.
    31 Tuning Wired Tiger FileSystem CacheWired Tiger Cache Total RAM Default 50% RAM Non- mapped Knobs • Wired Tiger Cache Size • Compression • Snappy • Zlib • off
  • 32.
    32 Indexing Improvements MMapV11 4 22 3 5 4 7 5 4 6 9 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5
  • 33.
    33 Indexing Improvements 1 4 22 3 5 4 7 5 4 6 9 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5 { _id: 6, categories: [ “database”, “distributed”, “document store” ] }
  • 34.
    34 Indexing Improvements 1 4 22 3 5 4 7 5 4 6 9 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5 {$push: { categories: “sharded” } } { _id: 6, categories: [ “database”, “distributed”, “document store”, “sharded” ] }
  • 35.
    35 Indexing Improvements 1 4 22 3 5 4 7 5 4 6 9 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5 { _id: 6, categories: [ “database”, “distributed”, “document store”, “sharded” ] }
  • 36.
    36 Indexing Improvements 1 4 22 3 5 4 7 5 4 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5 6 9 MMapV1 6 9
  • 37.
    37 Indexing Improvements 1 4 22 3 5 4 7 5 4 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5 6 9 MMapV1
  • 38.
    38 Indexing Improvements 1 4 22 3 5 4 7 5 4 7 3 8 6 9 2 10 1 11 1 12 5 13 4 15 5 14 5 6 9 WiredTiger The RecordId != DiskLoc
  • 39.
    39 Consistency without Journaling •MMAPv1 uses write-ahead log (journal) to guarantee consistency • WT doesn't have this need: no in-place updates – Write-ahead log committed at checkpoints • 2GB or 60sec by default – configurable! – No journal commit interval: writes are written to journal as they come in – Better for insert-heavy workloads • Replication guarantees the durability
  • 40.
    40 7x-10x Performance, 50%-80%Less Storage How: WiredTiger Storage Engine • Same data model, same query language, same ops • Write performance gains driven by document-level concurrency control • Storage savings driven by native compression • 100% backwards compatible • Non-disruptive upgrade MongoDB 3.0MongoDB 2.6 Performance https://www.mongodb.com/blog/post/high-performance- benchmarking-mongodb-and-nosql-systems
  • 41.
    41 Playing nice together •Can not – Can't copy database files – Can't just restart w/ same dbpath • Yes we can! – Initial sync from replica set works perfectly! – mongodump/restore • Rolling upgrade of replica set to WT: – Shutdown secondary – Delete dbpath – Relaunch w/ --storageEngine=wiredTiger – Rollover
  • 42.
    AND BEYOND THEINFINITE VERSION 3.2
  • 43.
    43 Storage Engine Storage Engines •WiredTiger (now default) • In-Memory • Encryption at Rest Tools • Schema visualizer Features • $lookup (Enterprise) • Read Committed • Schema Validation Rules • Partial Indexes
  • 44.
    44 Storage Engine Storage Engines •WiredTiger (now default) • In-Memory • Encryption at Rest Tools • Schema visualizer Features • $lookup (Enterprise) • Read Committed • Schema Validation Rules • Partial Indexes Features now available in v3.1.6 Community release
  • 45.
    Thanks! { name: ‘BryanReinero’, title: ‘Developer Advocate’, twitter: ‘@blimpyacht’, code: ‘github.com/breinero’ email: ‘bryan@mongdb.com’ }

Editor's Notes

  • #5 https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/storage_engine.h https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/recovery_unit.h https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/sorted_data_interface.h https://github.com/mongodb/mongo/blob/master/src/mongo/db/catalog/index_catalog.h Mathias Stearn’s https://www.mongodb.com/presentations/write-yourself-storage-engine-40-minutes
  • #45 Read Commited