KEMBAR78
MongoDB Performance Best Practices | PDF | Mongo Db | Database Index
0% found this document useful (0 votes)
441 views15 pages

MongoDB Performance Best Practices

Mongobd

Uploaded by

Wobo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
441 views15 pages

MongoDB Performance Best Practices

Mongobd

Uploaded by

Wobo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

A MongoDB White Paper

Performance Best Practices for


MongoDB
MongoDB 3.6
November 2017
Table of Contents
Introduction 1

MongoDB Pluggable Storage Engines 1

Hardware 2

Application Patterns 4

Schema Design & Indexes 7

Disk I/O 9

Considerations for Benchmarks 10

MongoDB Atlas: Database as a Service For MongoDB 11

MongoDB Stitch: Backend as a Service 12

We Can Help 12

Resources 13
Introduction

MongoDB is a high-performance, scalable database identify bottlenecks and limitations. There are a variety of
designed for a broad array of modern applications. It is tools available, the most comprehensive of which are
used by organizations of all sizes to power on-line, MongoDB Ops Manager and Cloud Manager, discussed
operational applications where low latency, high later in this guide.
throughput, and continuous availability are critical
For a discussion on the architecture of MongoDB and
requirements of the system.
some of its underlying assumptions, see the MongoDB
This guide outlines considerations for achieving Architecture Guide. For a discussion on operating a
performance at scale in a MongoDB system across a MongoDB system, see the MongoDB Operations Best
number of key dimensions, including hardware, application Practices.
patterns, schema design, and indexing, disk I/O, Amazon
EC2, and designing for benchmarks. While this guide is
broad in scope, it is not exhaustive. Following the MongoDB Pluggable Storage
recommendations in this guide will reduce the likelihood of Engines
encountering common performance limitations, but it does
not guarantee good performance in your application.
MongoDB 3.0 exposed a new storage engine API,
This guide is aimed at users managing everything enabling the integration of pluggable storage engines that
themselves. A dedicated guide is provided for users of the extend MongoDB with new capabilities, and enable optimal
MongoDB database as a service – MongoDB Atlas Best use of specific hardware architectures. MongoDB ships
Practices. with multiple supported storage engines:
MongoDB works closely with users to help them optimize • The default WiriredT
edTiger
iger storage engine
engine. For most
their systems. Users should monitor their systems to applications, WiredTiger's granular concurrency control

1
and native compression will provide the best all-around
Hardware
performance and storage efficiency for the broadest
range of applications.
You can run MongoDB anywhere – from ARM (64 bit)
• The Encrypted storage engine
engine, protecting highly
processors through to commodity x86 CPUs, all the way
sensitive data, without the performance or management
up to IBM Power and zSeries platforms.
overhead of separate files system encryption. The
Encrypted storage is based upon WiredTiger and so Most users scale out their systems by using many
throughout this document, statements regarding commodity servers operating together as a cluster.
WiredTiger also apply to the Encrypted storage engine. MongoDB provides native replication to ensure availability;
This engine is part of MongoDB Enterprise Advanced. auto-sharding to uniformly distribute data across servers;
• The In-Memory storage engine
engine, delivering predictable and in-memory computing to provide high performance
latency coupled with real-time analytics for the most without resorting to a separate caching layer. The following
demanding, applications. This engine is part of considerations will help you optimize the hardware of your
MongoDB Enterprise Advanced. MongoDB system.

• The M MAP
MAPv1
v1 storage engine
engine, an improved version of Ensur
Ensuree your working set fits in RAM. As with most
the storage engine used in pre-3.x MongoDB releases. databases, MongoDB performs best when the working set
MMAPv1 was the default storage engine in MongoDB (indexes and most frequently accessed data) fits in RAM.
3.0 and earlier. RAM size is the most important factor for hardware; other
optimizations may not significantly improve the
Any of these storage engines can coexist within a single
performance of the system if there is insufficient RAM. If
MongoDB replica set, making it easy to evaluate and
your working set exceeds the RAM of a single server,
migrate between them. Upgrades to the WiredTiger
consider sharding your system across multiple servers. Use
storage engine are non-disruptive for existing replica set
the db.serverStatus() command to view an estimate of
deployments; applications will be 100% compatible, and
the the current working set size.
migrations can be performed with zero downtime through a
rolling upgrade of the MongoDB replica set. WiredTiger is Use SSSSDs for write-heavy applic
applications.
ations. Most disk
the default storage engine for new MongoDB access patterns in MongoDB do not have sequential
deployments; if another engine is preferred then start the properties, and as a result, customers may experience
mongod using the --storageEngine option. If a 3.2 (or substantial performance gains by using SSDs. Good
later) mongod process is started and one or more results and strong price to performance have been
databases already exist then MongoDB will use whichever observed with SATA, PCIe, and NVMe SSDs. Commodity
storage engine those databases were created with. SATA spinning drives are comparable to higher cost
spinning drives due to the random access patterns of
Review the documentation for a checklist and full
MongoDB: rather than spending more on expensive
instructions on the migration process.
spinning drives, that money may be more effectively spent
While each storage engine is optimized for different on more RAM or SSDs. Another benefit of using SSDs is
workloads, users still leverage the same MongoDB query the performance benefit of flash over hard disk drives if the
language, data model, scaling, security, and operational working set no longer fits in memory.
tooling independent of the engine they use. As a result,
While data files benefit from SSDs, MongoDB's journal
most best practices in this guide apply to all of the
files are good candidates for fast, conventional disks due
supported storage engines. Any differences in
to their high sequential write profile.
recommendations between the storage engines are noted.
Most MongoDB deployments should use RAID-10. RAID-5
and RAID-6 have limitations and may not provide sufficient
performance. RAID-0 provides good read and write

2
performance, but insufficient fault tolerance. MongoDB's compressed with zlib to attain maximum storage density
replica sets allow deployments to provide stronger with a lower cost-per-bit.
availability for data, and should be considered with RAID • As data ages, MongoDB automatically migrates it
and other factors to meet the desired availability SLA. between storage tiers, without administrators having to
Configur
Configuree compr
compression
ession for storage and II/O
/O-intensive
-intensive build tools or ETL processes to manage data
workloads. MongoDB natively supports compression movement.
when using the WiredTiger storage engine. Compression Alloc
Allocate
ate CP
CPUU har
hardwar
dwaree budget for faster CP
CPUs.
Us.
reduces storage footprint by as much as 80%, and enables MongoDB will deliver better performance on faster CPUs.
higher IOPscas fewer bits are read from disk. As with any The MongoDB WiredTiger storage engine is better able to
compression algorithm, administrators trade storage saturate multi-core processor resources than the MMAPv1
efficiency for CPU overhead, and so it is important to test storage engine.
the impacts of compression in your own environment.
Dedic
Dedicate
ate eac
eachh server to a single rrole
ole in the system.
MongoDB offers administrators a range of compression For best performance, users should run one mongod
options for both documents and indexes. The default
process per host. With appropriate sizing and resource
Snappy compression algorithm provides a balance
allocation using virtualization or container technologies,
between high document and journal compression ratios
multiple MongoDB processes can run on a single server
(typically around 70%, dependent on data types) with low
without contending for resources. If using the WiredTiger
CPU overhead, while the optional zlib library will achieve
storage engine, administrators will need to calculate the
higher compression, but incur additional CPU cycles as
appropriate cache size for each instance by evaluating
data is written to and read from disk. Indexes use prefix
what portion of total RAM each of them should use, and
compression by default, which serves to reduce the
splitting the default cache_size between each.
in-memory footprint of index storage, freeing up more of
the RAM for frequently accessed documents. Testing has The size of the WiredTiger cache is tunable through the
shown a typical 50% compression ratio using the prefix storage.wiredTiger.engineConfig.cacheSizeGB
algorithm, though users are advised to test with their own setting and should be large enough to hold your entire
data sets. Administrators can modify the default working set. If the cache does not have enough space to
compression settings for all collections and indexes. load additional data, WiredTiger evicts pages from the
Compression is also configurable on a per-collection and cache to free up space. By default,
per-index basis during collection and index creation. storage.wiredTiger.engineConfig.cacheSizeGB is
set to 60% of available RAM - 1 GB; caution should be
Combine multiple storage & compr compression
ession types.
taken if raising the value as it takes resources from the OS,
MongoDB provides features to facilitate the management
and WiredTiger performance can actually degrade as the
of data lifecycles, including Time to Live indexes, and
filesystem cache becomes less effective.
capped collections. In addition, by using MongoDB Zones,
administrators can build highly efficient tiered storage For availability, multiple members of the same replica set
models to support the data lifecycle. By assigning shards to should not be co-located on the same physical hardware or
Zones, administrators can balance query latency with share any single point of failure such as a power supply.
storage density and cost by assigning data sets based on a
value such as a timestamp to specific storage devices: Use multiple query rrouters.
outers. Use multiple mongos
processes spread across multiple servers. A common
• Recent, frequently accessed data can be assigned to deployment is to co-locate the mongos process on
high performance SSDs with Snappy compression application servers, which allows for local communication
enabled. between the application and the mongos process.The
• Older, less frequently accessed data is tagged to appropriate number of mongos processes will depend on
lower-throughput hard disk drives where it is the nature of the application and deployment.

3
Exploit multiple cor
cores.
es. The WiredTiger storage engine is Compressing and decompressing network traffic requires
multi-threaded and can take advantage of many CPU CPU resources – typically low single digit percentage
cores. Specifically, the total number of active threads (i.e. overhead. Compression is ideal for those environments
concurrent operations) relative to the number of CPUs can where performance is bottlenecked by bandwidth, and
impact performance: sufficient CPU capacity is available.

• Throughput increases as the number of concurrent


active operations increases up to and beyond the Application Patterns
number of CPUs.

• Throughput eventually decreases as the number of


MongoDB is an extremely flexible database due to its
concurrent active operations exceeds the number of
dynamic schema and rich query model. The system
CPUs by some threshold amount.
provides extensive secondary indexing capabilities to
The threshold amount depends on your application. You optimize query performance. Users should consider the
can determine the optimum number of concurrent active flexibility and sophistication of the system in order to make
operations for your application by experimenting and the right trade-offs for their application. The following
measuring throughput and latency. considerations will help you optimize your application
patterns.
Due to its concurrency model, the MMAPv1 storage engine
does not require many CPU cores. As such, increasing the Issue updates to only modify fields that have
number of cores can help but does not provide significant changed. Rather than retrieving the entire document in
return. your application, updating fields, then saving the document
back to the database, instead issue the update to specific
Disable N
NUUMA, Running MongoDB on a system with fields. This has the advantage of less network usage and
Non-Uniform Access Memory (NUMA) can cause a reduced database overhead.
number of operational problems, including slow
performance for periods of time and high system process Avoid negation in queries. Like most database systems,
usage. MongoDB does not index the absence of values and
negation conditions may require scanning all documents. If
When running MongoDB servers and clients on NUMA
negation is the only condition and it is not selective (for
hardware, you should configure a memory interleave policy
example, querying an orders table where 99% of the
so that the host behaves in a non-NUMA fashion.
orders are complete to identify those that have not been
Network Compr
Compression.
ession. As a distributed database, fulfilled), all records will need to be scanned.
MongoDB relies on efficient network transport during
Use cover
covereded queries when possible. Covered queries
query routing and inter-node replication. MongoDB 3.4
return results from the indexes directly without accessing
introduced a new option to compress the wire protocol
documents and are therefore very efficient. For a query to
used for intra-cluster communications, MongoDB 3.6
be covered all the fields included in the query must be
extended this to cover compression of network traffic
present in an index, and all the fields returned by the query
between the client and the database. Based on the snappy
must also be present in that index. To determine whether a
compression algorithm, network traffic can be compressed
query is a covered query, use the explain() method. If
by up to 70%, providing major performance benefits in
the explain() output displays true for the indexOnly
bandwidth-constrained environments, and reducing
field, the query is covered by an index, and MongoDB
networking costs.
queries only that index to match the query and return the
Compression is off by default, but can be enabled by results.
setting networkMessageCompressors to snappy.
Test every query in your applic
application
ation with explain().
MongoDB provides an explain plan capability that shows

4
Figur
Figuree 1: MongoDB Compass visual query plan for performance optimization across distributed clusters

information about how a query will be, or was, resolved, query plans are abandoned, and the process of testing
including: multiple indexes is repeated to ensure the best possible
plan is used. The query plan can be calculated and
• The number of documents returned
returned without first having to run the query. This enables
• The number of documents read DBAs to review which plan will be used to execute the
• Which indexes were used query, without having to wait for the query to run to
completion.
• Whether the query was covered, meaning no documents
needed to be read to return results MongoDB Compass provides the ability to visualize explain
plans, presenting key information on how a query
• Whether an in-memory sort was performed, which
performed – for example the number of documents
indicates an index would be beneficial
returned, execution time, index usage, and more. Each
• The number of index entries scanned stage of the execution pipeline is represented as a node in
• How long the query took to resolve in milliseconds a tree, making it simple to view explain plans from queries
(when using the executionStats mode) distributed across multiple nodes.

• Which alternative query plans were rejected (when Update multiple array elements in a single operation.
using the allPlansExecution mode) With fully expressive array updates, developers can now
perform complex array manipulations against matching
The explain plan will show 0 milliseconds if the query was elements of an array – including elements embedded in
resolved in less than 1 ms, which is typical in well-tuned nested arrays – all in a single update operation. MongoDB
systems. When the explain plan is called, prior cached

5
3.6 adds a new arrayFilters option, allowing the update more information see the MongoDB Documentation on
to specify which elements to modify in the array field. Data Center Awareness.

Avoid sc
scatter-gather
atter-gather queries. In sharded systems, Only rread
ead fr
from
om primaries unless you ccan
an tolerate
queries that cannot be routed to a single shard must be eventual consistency
consistency.. Updates are typically replicated to
broadcast to multiple shards for evaluation. Because these secondaries quickly, depending on network latency.
queries involve multiple shards for each request they do However, reads on the secondaries will not be consistent
not scale well as more shards are added. with reads on the primary. Note that the secondaries are
not idle as they must process all writes replicated from the
Choose the apprappropriate
opriate write guarantees. MongoDB
primary. To increase read capacity in your operational
allows administrators to specify the level of persistence
system consider sharding. Secondary reads can be useful
guarantee when issuing writes to the database, which is
for analytics and ETL applications as this approach will
called the write concern. The following options can be
isolate traffic from operational workloads. You may choose
configured on a per connection, per database, per
to read from secondaries if your application can tolerate
collection, or even per operation basis. The options are as
eventual consistency.
follows:
Choose the right rread-concern.
ead-concern. To ensure isolation and
• Write Acknowledged: This is the default write concern.
consistency, the readConcern can be set to majority to
The mongod will confirm the execution of the write
indicate that data should only be returned to the
operation, allowing the client to catch network, duplicate
application if it has been replicated to a majority of the
key, Document Validation, and other exceptions.
nodes in the replica set, and so cannot be rolled back in
• Journal Acknowledged: The mongod will confirm the the event of a failure.
write operation only after it has flushed the operation to
MongoDB 3.4 adds a new readConcern level of
the journal on the primary. This confirms that the write
“Linearizable”. The linearizable read concern ensures that a
operation can survive a mongod crash and ensures that
node is still the primary member of the replica set at the
the write operation is durable on disk.
time of the read, and that the data it returns will not be
• Replica Acknowledged: It is also possible to wait for rolled back if another node is subsequently elected as the
acknowledgment of writes to other replica set members. new primary member. Configuring this read concern level
MongoDB supports writing to a specific number of can have a significant impact on latency, therefore a
replicas. This also ensures that the write is written to the maxTimeMS value should be supplied in order to timeout
journal on the secondaries. Because replicas can be long running operations.
deployed across racks within data centers and across
multiple data centers, ensuring writes propagate to Use ccausal
ausal consistency wherwheree needed. Introduced in
additional replicas can provide extremely robust MongoDB 3.6, causal consistency guarantees that every
durability. read operation within a client session will always see the
previous write operation, regardless of which replica is
• Majority: This write concern waits for the write to be
serving the request. You can minimize any latency impact
applied to a majority of replica set members. This also
by using causal consistency only where it is needed.
ensures that the write is recorded in the journal on
these replicas – including on the primary. Use the most rrecent
ecent drivers fr
from
om MongoDB.
• Data Center Awareness: Using tag sets, sophisticated MongoDB supports drivers for nearly a dozen languages.
policies can be created to ensure data is written to These drivers are engineered by the same team that
specific combinations of replicas prior to maintains the database kernel. Drivers are updated more
acknowledgment of success. For example, you can frequently than the database, typically every two months.
create a policy that requires writes to be written to at Always use the most recent version of the drivers when
least three data centers on two continents, or two possible. Install native extensions if available for your
servers across two racks in a specific data center. For

6
language. Join the MongoDB community mailing list to documents more like rows in a table than the tables
keep track of updates. themselves. Rather than maintaining lists of records in a
single document, instead make each record a document.
Ensur
Ensure e uniform distribution of shar
shard
d keys. When shard
For large media items, such as video or images, consider
keys are not uniformly distributed for reads and writes,
using GridFS, a convention implemented by all the drivers
operations may be limited by the capacity of a single shard.
that automatically stores the binary data across many
When shard keys are uniformly distributed, no single shard
smaller documents.
will limit the capacity of the system.
Avoid unb
unbounded
ounded document gr growth
owth – M MMAP
MAPv1.
v1. When
Use hash-based shar sharding
ding when appr
appropriate.
opriate. For
a document is updated in the MongoDB MMAPv1 storage
applications that issue range-based queries, range-based
engine, the data is updated in-place if there is sufficient
sharding is beneficial because operations can be routed to
space. If the size of the document is greater than the
the fewest shards necessary, usually a single shard.
allocated space, then the document may need to be
However, range-based sharding requires a good
re-written in a new location. The process of moving
understanding of your data and queries, which in some
documents and updating their associated indexes can be
cases may not be practical. Hash-based sharding ensures
I/O-intensive and can unnecessarily impact performance.
a uniform distribution of reads and writes, but it does not
To anticipate future growth, the usePowerOf2Sizes
provide efficient range-based operations.
attribute is enabled by default on each collection. This
setting automatically configures MongoDB to round up

Schema Design & Indexes allocation sizes to the powers of 2. This setting reduces the
chances of increased disk I/O at the cost of using some
additional storage.
MongoDB uses a binary document data model based
An additional strategy is to manually pad the documents to
called BSON that is based on the JSON standard. Unlike
provide sufficient space for document growth. If the
flat tables in a relational database, MongoDB's document
application will add data to a document in a predictable
data model is closely aligned to the objects used in modern
fashion, the fields can be created in the document before
programming languages, and in most cases it removes the
the values are known in order to allocate the appropriate
need for complex transactions or joins due to the
amount of space during document creation. Padding will
advantages of having related data for an entity or object
minimize the relocation of documents and thereby minimize
contained within a single document, rather than spread
over-allocation. Learn more by reviewing the record
across multiple tables. There are best practices for
allocation strategies in the documentation.
modeling data as documents, and the right approach will
depend on the goals of your application. The following Avoiding unbounded document growth is a best practice
considerations will help you make the right choices in schema design for any database, but the specific
designing the schema and indexes for your application. considerations above are not relevant to the default
WiredTiger storage engine which rewrites the document
Stor
Storee all dat
dataa for a rrecor
ecord
d in a single document.
for each update.
MongoDB provides ACID compliance at the document
level. When data for a record is stored in a single document Avoid lar
large
ge indexed arrays. Rather than storing a large
the entire record can be retrieved in a single seek array of items in an indexed field, store groups of values
operation, which is very efficient. In some cases it may not across multiple fields. Updates will be more efficient.
be practical to store all data in a single document, or it may
negatively impact other operations. Make the trade-offs Avoid unnecessarily long field names. Field names are
that are best for your application. repeated across documents and consume space. By using
smaller field names your data will consume less space,
Avoid lar
large
ge documents. The maximum size for which allows for a larger number of documents to fit in
documents in MongoDB is 16 MB. In practice, most RAM. Note that with WiredTiger's native compression, long
documents are a few kilobytes or less. Consider

7
field names have less of an impact on the amount of disk Underst
Understand
and any existing document sc
schema
hema –
space used but the impact on RAM is the same. MongoDB Compass. If there is an existing MongoDB
database that needs to be understood and optimized then
Use ccaution
aution when considering indexes on
MongoDB Compass is an invaluable tool.
low-c
low-carardinality
dinality fields. Queries on fields with low
cardinality can return large result sets. Avoid returning The MongoDB Compass GUI allows users to understand
large result sets when possible. Compound indexes may the structure of existing data in the database and perform
include values with low cardinality, but the value of the ad hoc queries against it – all with zero knowledge of
combined fields should exhibit high cardinality. MongoDB's query language. By understanding what kind
of data is present, you're better placed to determine what
Eliminate unnecessary indexes. Indexes are
indexes might be appropriate.
resource-intensive: even with compression enabled they
consume RAM, and as fields are updated their associated Without Compass, users wishing to understand the shape
indexes must be maintained, incurring additional disk I/O of their data would have to connect to the MongoDB shell
overhead. and write queries to reverse engineer the document
structure, field names and data types.
Remove indexes that ar are
e pr
prefixes
efixes of other indexes.
Compound indexes can be used for queries on leading MongoDB Compass is included with MongoDB
fields within an index. For example, a compound index on Professional and MongoDB Enterprise Advanced.
last name, first name can be also used to filter queries that
specify last name only. In this example an additional index
on last name only is unnecessary,

Use a compound index rather than index intersection.


For best performance when querying via multiple
predicates, compound indexes will generally be a better
option.

Use partial indexes. Reduce the size and performance of


indexes by only including documents that will be accessed
through the index. e.g. Create a partial index on the
orderID field that only includes order documents with an
orderStatus of "In progress", or only index the
Figur
Figuree 2: Document structure and contents exposed by
emailAddress field for documents where it exists.
MongoDB Compass
Avoid rregular
egular expr
expressions
essions that ar
are
e not left anc
anchor
hored
ed
Ops Manager 3.6 introduces the Data Explorer to examine
or rrooted.
ooted. Indexes are ordered by value. Leading wildcards
the database’s schema by running queries to review
are inefficient and may result in full index scans. Trailing
document structure, viewing collection metadata, and
wildcards can be efficient if there are sufficient
inspecting index usage statistics.
case-sensitive leading characters in the expression.
Identify & rremove
emove obsolete indexes. To understand the
Use index optimizations available in the W
Wir
iredT
edTiger
iger
effectiveness of the existing indexes being used, an
storage engine. As discussed earlier, the WiredTiger
$indexStats aggregation stage can be used to
engine compresses indexes by default. In addition,
determine how frequently each index is used. MongoDB
administrators have the flexibility to place indexes on their
Compass visualizes index coverage, enabling you to
own separate volume, allowing for faster disk paging and
determine which specific fields are indexed, their type, size,
lower contention.
and how often they are used. Ops Manager 3.6 adds a
performance advisor which continuously highlights

8
slow-running queries and provides intelligent index Disable access time settings. Most file systems will
recommendations to improve performance. Using Ops maintain metadata for the last time a file was accessed.
Manager automation, the administrator can then roll out the While this may be useful for some applications, in a
recommended indexes automatically, without incurring any database it means that the file system will issue a write
application downtime. every time the database accesses a page, which will
negatively impact the performance and throughput of the
system.
Disk I/O
Don't use Huge Pages. Do not use Huge Pages virtual
memory pages, MongoDB performs better with normal
While MongoDB performs all read and write operations virtual memory pages.
through in-memory data structures, data is persisted to
disk and queries on data not already in RAM trigger a read Use RAI
RAID1
D10.
0. Most MongoDB deployments should use
from disk. As a result, the performance of the storage RAID-10. RAID-5 and RAID-6 have limitations and may
sub-system is a critical aspect of any system. Users should not provide sufficient performance. RAID-0 provides good
take care to use high-performance storage and to avoid read and write performance, but insufficient fault tolerance.
networked storage when performance is a primary goal of MongoDB's replica sets allow deployments to provide
the system. The following considerations will help you use stronger availability for data, and should be considered with
the best storage configuration, including OS and file RAID and other factors to meet the desired availability
system settings. SLA.

Readahead size should be set to 0 for W


Wir
iredT
edTiger
iger.. Use By using separate storage devices for the journal and data
the blockdev --setra <value> command to set the files you can increase the overall throughput of the disk
readahead block size to 0 when using the WiredTiger subsystem. Because the disk I/O of the journal files tends
storage engine. A readahead value of 32 (16 kB) typically to be sequential, SSD may not provide a substantial
improvement and standard spinning disks may be more
works well when using MMAPv1.
cost effective.
If the readahead size is larger than the size of the data
Use multiple devices for differ
different
ent dat
databases
abases –
requested, a larger block will be read from disk – this is
Wir
iredT
edTiger
iger.. Set directoryForIndexes so that indexes
wasteful as most disk I/O in MongoDB is random. This has
two undesirable consequences which negatively effect are stored in separate directories from collections and
performance: directoryPerDB to use a different directory for each
database. The various directories can then be mapped to
1. The size of the read will consume RAM unnecessarily. different storage devices, thus increasing overall
2. More time will be spent reading data than is necessary. throughput.

Use EEXT4
XT4 or X XF
FS file systems; avoid E EXT3.
XT3. EXT3 is Note that using different storage devices will affect your
quite old and is not optimal for most database workloads. ability to create snapshot-style backups of your data, since
For example, MMAPv1 preallocates space for data. In the files will be on different devices and volumes.
EXT3 preallocation will actually write 0s to the disk to
• Implement multi-temperatur
multi-temperature
e storage & dat
dataa
allocate the space, which is time consuming. In EXT4 and
loc
locality
ality using MongoDB Zones. MongoDB Zones
XFS preallocation is performed as a logical operation,
(described as tag-aware sharding in earlier MongoDB
which is much more efficient.
releases) allow precise control over where data is
With the WiredTiger storage engine, use of XFS is strongly physically stored, accommodating a range of
recommended to avoid performance issues that have been deployment scenarios – for example by geography, by
observed when using EXT4 with WiredTiger. hardware configuration, or by application. Administrators
can continuously refine data placement rules by
modifying shard key ranges, and MongoDB will

9
automatically migrate the data to its new Zone. Monitor everything to loc
locate
ate your bbottlenec
ottlenecks.
ks. It is
MongoDB 3.4 adde new helper functions and additional important to understand the bottleneck for a benchmark.
options in Ops Manager and Cloud Manager to Depending on many factors any component of the overall
configure Zones, essential for managing large system could be the limiting factor. A variety of popular
deployments. tools can be used with MongoDB – many are listed in the
manual.

Considerations for Benchmarks The most comprehensive tool for monitoring MongoDB is
Ops Manager, available as a part of MongoDB Enterprise
Advanced. Featuring charts, custom dashboards, and
Generic benchmarks can be misleading and automated alerting, Ops Manager tracks 100+ key
misrepresentative of a technology and how well it will database and systems metrics including operations
perform for a given application. MongoDB instead counters, memory, and CPU utilization, replication status,
recommends that users model and benchmark their open connections, queues, and any node status. The
applications using data, queries, hardware, and other metrics are securely reported to Ops Manager where they
aspects of the system that are representative of their are processed, aggregated, alerted, and visualized in a
intended application. The following considerations will help browser, letting administrators easily determine the health
you develop benchmarks that are meaningful for your of MongoDB in real-time. The benefits of Ops Manager are
application. also available in the SaaS-based Cloud Manager, hosted by
MongoDB in the cloud. Organizations that run on
Model your benc
benchmark
hmark on your applic
application.
ation. The
MongoDB Enterprise Advanced can choose between Ops
queries, data, system configurations, and performance
Manager and Cloud Manager for their deployments.
goals you test in a benchmark exercise should reflect the
goals of your production system. Testing assumptions that
do not reflect your production system is likely to produce
misleading results.

Cr
Create
eate cchunks
hunks befor
beforee loading, or use hash-based
shar
sharding.
ding. If range queries are part of your benchmark use
range-based sharding and create chunks before loading.
Without pre-splitting, data may be loaded into a shard then
moved to a different shard as the load progresses. By
pre-splitting the data, documents will be loaded in parallel
into the appropriate shards. If your benchmark does not
include range queries, you can use hash-based sharding to
ensure a uniform distribution of writes. Figur
Figuree 3: Ops Manager & Cloud Manager provides real
time visibility into MongoDB performance.
Disable the balancer for bulk loading. Prevent the
balancer from rebalancing unnecessarily during bulk loads From MongoDB 3.4, Ops Manager allows telemetry data to
to improve performance. be collected every 10 seconds, up from the previous
minimum 60 seconds interval.
Prime the system for several minutes. In a production
MongoDB system the working set should fit in RAM, and In addition to monitoring, Ops Manager and Cloud Manager
all reads and writes will be executed against RAM. provide automated deployment, upgrades, on-line index
MongoDB must first page the working set into RAM, so builds, data exporation, and cross-shard on-line backups.
prime the system with representative queries for several
Pr
Profiling.
ofiling. MongoDB provides a profiling capability called
minutes before running the tests to get an accurate sense
Database Profiler, which logs fine-grained information
for how MongoDB will perform in production.
about database operations. The profiler can be enabled to

10
log information for all events or only those events whose simulate direct disk I/O as well as memory mapped I/O,
duration exceeds a configurable threshold (whose default with configurable options for number of threads, size of
is 100 ms). Profiling data is stored in a capped collection documents, and other factors. This tool can help you to
where it can easily be searched for relevant events. It may understand what sort of throughput is possible with your
be easier to query this collection than parsing the log files. system, for disk-bound I/O as well as memory-mapped I/
MongoDB Ops Manager and Cloud Manager can be used O.
to visualize output from the profiler when identifying slow
Follow configuration best practices. Review the
queries.
MongoDB production notes for the latest guidance on
Ops Manager and Cloud Manager include a Visual Query packages, hardware, networking, and operating system
Profiler that provides a quick and convenient way for tuning.
operations teams and DBAs to analyze specific queries or
query families. The Visual Query Profiler (as shown in
Figure 4) displays how query and write latency varies over MongoDB Atlas: Database as a
time – making it simple to identify slower queries with
Service For MongoDB
common access patterns and characteristics, as well as
identify any latency spikes. A single click in the Ops
Manager UI activates the profiler, which then consolidates MongoDB Atlas is a cloud database service that makes it
and displays metrics from every node in a single screen. easy to deploy, operate, and scale MongoDB in the cloud
by automating time-consuming administration tasks such
as database setup, security implementation, scaling,
patching, and more.

MongoDB Atlas is available on-demand through a


pay-as-you-go model and billed on an hourly basis.

It’s easy to get started – use a simple GUI to select the


public cloud provider, region, instance size, and features
you need. MongoDB Atlas provides:

Figur
Figuree 4: Visual Query Profiling in MongoDB Ops & Cloud • Security features to protect your data, with fine-grained
Manager
access control and end-to-end encryption

The Visual Query Profiler will analyze the data – • Built in replication for always-on availability.
recommending additional indexes and optionally add them Cross-region replication within a public cloud can be
through an automated, rolling index build. enabled to help tolerate the failure of an entire cloud
region.
Ops Manager 3.6 adds a performance advisor which
• Fully managed, continuous and consistent backups with
continuously highlights slow-running queries and provides
point in time recovery to protect against data corruption,
intelligent index recommendations to improve performance.
and the ability to query backups in-place without full
Using Ops Manager automation, the administrator can then
restores
roll out the recommended indexes automatically, without
incurring any application downtime. • Fine-grained monitoring and customizable alerts for
comprehensive performance visibility
MongoDB Compass visualizes index coverage, enabling
• One-click scale up, out, or down on demand. MongoDB
you to determine which specific fields are indexed, their
Atlas can provision additional storage capacity as
type, size, and how often those indexes are used.
needed without manual intervention.
Use mongoperf to ccharacterize
haracterize your storage system.
mongoperf is a free, open-source tool that allows users to

11
• Automated patching and single-click upgrades for new defining appropriate data access rules, you can selectively
major versions of the database, enabling you to take expose your existing MongoDB data to other applications
advantage of the latest and greatest MongoDB features through MongoDB Stitch's API.

• Live migration to move your self-managed MongoDB Take advantage of the free tier to get started; when you
clusters into the Atlas service with minimal downtime need more bandwidth, the usage-based pricing model
ensures you only pay for what you consume. Learn more
MongoDB Atlas can be used for everything from a quick
and try it out for yourself.
Proof of Concept, to test/QA environments, to powering
production applications. The user experience across
MongoDB Atlas, Cloud Manager, and Ops Manager is
consistent, ensuring that disruption is minimal if you decide
We Can Help
to manage MongoDB yourself and migrate to your own
infrastructure. We are the MongoDB experts. Over 4,300 organizations
rely on our commercial products, including startups and
Built and run by the same team that engineers the
more than half of the Fortune 100. We offer software and
database, MongoDB Atlas is the best way to run MongoDB
services to make your life easier:
in the cloud. Learn more or deploy a free cluster now. This
paper is aimed at people managing their own MongoDB MongoDB Enterprise Advanced is the best way to run
instances, performance best practices for MongoDB Atlas MongoDB in your data center. It's a finely-tuned package
are described in a dedicated paper – MongoDB Atlas Best of advanced software, support, certifications, and other
Practices. services designed for the way you do business.

MongoDB Atlas is a database as a service for MongoDB,


MongoDB Stitch: Backend as a letting you focus on apps instead of ops. With MongoDB
Atlas, you only pay for what you use with a convenient
Service hourly billing model. With the click of a button, you can
scale up and down when you need to, with no downtime,
MongoDB Stitch is a backend as a service (BaaS), giving full security, and high performance.
developers a REST-like API to MongoDB, and
MongoDB Stitch is a backend as a service (BaaS), giving
composability with other services, backed by a robust
developers full access to MongoDB, declarative read/write
system for configuring fine-grained data access controls.
controls, and integration with their choice of services.
Stitch provides native SDKs for JavaScript, iOS, and
Android. MongoDB Cloud Manager is a cloud-based tool that helps
you manage MongoDB on your own infrastructure. With
Built-in integrations give your application frontend access
automated provisioning, fine-grained monitoring, and
to your favorite third party services: Twilio, AWS S3, Slack,
continuous backups, you get a full management suite that
Mailgun, PubNub, Google, and more. For ultimate flexibility,
reduces operational overhead, while maintaining full control
you can add custom integrations using MongoDB Stitch's
over your databases.
HTTP service.
MongoDB Professional helps you manage your
MongoDB Stitch allows you to compose multi-stage
deployment and keep it running smoothly. It includes
pipelines that orchestrate data across multiple services;
support from MongoDB engineers, as well as access to
where each stage acts on the data before passing its
MongoDB Cloud Manager.
results on to the next.
Development Support helps you get up and running quickly.
Unlike other BaaS offerings, MongoDB Stitch works with
It gives you a complete package of software and services
your existing as well as new MongoDB clusters, giving you
for the early stages of your project.
access to the full power and scalability of the database. By

12
MongoDB Consulting packages get you to production
faster, help you tune performance in production, help you
scale, and free you up to focus on your next release.

MongoDB Training helps you become a MongoDB expert,


from design to operating mission-critical systems at scale.
Whether you're a developer, DBA, or architect, we can
make you better at MongoDB.

Resources

For more information, please visit mongodb.com or contact


us at sales@mongodb.com.

Case Studies (mongodb.com/customers)


Presentations (mongodb.com/presentations)
Free Online Training (university.mongodb.com)
Webinars and Events (mongodb.com/events)
Documentation (docs.mongodb.com)
MongoDB Enterprise Download (mongodb.com/download)
MongoDB Atlas database as a service for MongoDB
(mongodb.com/cloud)
MongoDB Stitch backend as a service (mongodb.com/
cloud/stitch)

US 866-237-8815 • INTL +1-650-440-4474 • info@mongodb.com


© 2017 MongoDB, Inc. All rights reserved.

13

You might also like