KEMBAR78
MongoDB Manual Master | PDF | Mongo Db | Advanced Packaging Tool
0% found this document useful (0 votes)
2K views618 pages

MongoDB Manual Master

No Sql Documentation with Mongo DB

Uploaded by

deepeshmathur27
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views618 pages

MongoDB Manual Master

No Sql Documentation with Mongo DB

Uploaded by

deepeshmathur27
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 618

MongoDB Documentation

Release 2.0.6

MongoDB Documentation Project

September 13, 2012

CONTENTS

I
1 2

About MongoDB Documentation


About MongoDB About the Documentation Project 2.1 This Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Contributing to the Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Writing Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1
3 5 5 5 6

II
3

Installing MongoDB
Installation Guides 3.1 Install MongoDB on RedHat Enterprise, CentOS, or Fedora Linux . 3.2 Install MongoDB on Ubuntu . . . . . . . . . . . . . . . . . . . . . 3.3 Install MongoDB on Debian . . . . . . . . . . . . . . . . . . . . . 3.4 Install MongoDB on Linux . . . . . . . . . . . . . . . . . . . . . 3.5 Install MongoDB on OS X . . . . . . . . . . . . . . . . . . . . . . 3.6 Install MongoDB on Windows . . . . . . . . . . . . . . . . . . . . Release Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7
9 9 12 14 17 19 22 27

III
5

Replication
Documentation 5.1 Replication Fundamentals . . . . . . . . . 5.2 Replica Set Administration . . . . . . . . . 5.3 Replication Architectures . . . . . . . . . 5.4 Application Development with Replica Sets 5.5 Replication Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29
33 33 38 47 50 57 61 61 64 66 71 73 79 83

Tutorials 6.1 Deploy a Replica Set . . . . . . . . . . . . . . . . . 6.2 Add Members to a Replica Set . . . . . . . . . . . . 6.3 Deploy a Geographically Distributed Replica Set . . 6.4 Change the Size of the Oplog . . . . . . . . . . . . 6.5 Convert a Replica Set to a Replicated Shard Cluster 6.6 Change Hostnames in a Replica Set . . . . . . . . . 6.7 Convert a Secondary to an Arbiter . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

Reference

87

IV
8

Sharding
Documentation 8.1 Sharding Fundamentals . . . 8.2 Shard Cluster Administration 8.3 Shard Cluster Architectures . 8.4 Sharding Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89
93 . 93 . 99 . 114 . 116 123 123 126 127 129 133

Tutorials 9.1 Deploy a Shard Cluster . . . . . . . . . . . . . 9.2 Add Shards to an Existing Cluster . . . . . . . 9.3 Remove Shards from an Existing Shard Cluster 9.4 Enforce Unique Keys for Sharded Collections .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

10 Reference

Administration
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

135
139 139 143 146 153 156

11 Core Competencies 11.1 Run-time Database Conguration . . . . 11.2 Using MongoDB with SSL Connections 11.3 Monitoring Database Systems . . . . . . 11.4 Importing and Exporting MongoDB Data 11.5 Backup and Restoration Strategies . . . .

12 Tutorials 165 12.1 Recover MongoDB Data following Unexpected Shutdown . . . . . . . . . . . . . . . . . . . . . . . 165 12.2 Convert a Replica Set to a Replicated Shard Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . 167

VI

Indexes

175

13 Documentation 179 13.1 Indexing Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 13.2 Indexing Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 13.3 Indexing Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

VII

Aggregation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

195
199 199 199 200 201 202 203

14 Aggregation Framework 14.1 Overview . . . . . . . . 14.2 Framework Components 14.3 Use . . . . . . . . . . . 14.4 Optimizing Performance 14.5 Sharded Operation . . . 14.6 Limitations . . . . . . .

15 Aggregation Framework Examples 205 15.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 15.2 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 15.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 ii

16 Aggregation Framework Reference 209 16.1 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 16.2 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

VIII

Application Development

221

17 Application Development 225 17.1 Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 17.2 Database References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 18 Patterns 229 18.1 Perform Two Phase Commits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 18.2 Expire Data from Collections by Setting TTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

IX

Using the MongoDB Shell

237
241 243

19 mongo Shell 20 MongoDB Shell Interface

Use Cases

245

21 Operational Intelligence 249 21.1 Storing Log Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 21.2 Pre-Aggregated Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 21.3 Hierarchical Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 22 Product Data Management 22.1 Product Catalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Inventory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Category Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 277 284 291

23 Content Management Systems 299 23.1 Metadata and Asset Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 23.2 Storing Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 24 Python Application Development 317 24.1 Write a Tumblelog Application with Django MongoDB Engine . . . . . . . . . . . . . . . . . . . . 317 24.2 Write a Tumblelog Application with Flask and MongoEngine . . . . . . . . . . . . . . . . . . . . . 329

XI

Frequently Asked Questions


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

347
349 349 349 350 350 350 350 351 351 351

25 FAQ: MongoDB Fundamentals 25.1 What kind of Database is MongoDB? . . . . . . . . . 25.2 What languages can I use to work with the MongoDB? 25.3 Does MongoDB support SQL? . . . . . . . . . . . . 25.4 What are typical uses for MongoDB? . . . . . . . . . 25.5 Does MongoDB support transactions? . . . . . . . . . 25.6 Does MongoDB require a lot of RAM? . . . . . . . . 25.7 How do I congure the cache size? . . . . . . . . . . 25.8 Are writes written to disk immediately, or lazily? . . . 25.9 Does MongoDB handle caching? . . . . . . . . . . .

iii

25.10 What language is MongoDB written in? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 25.11 What are the 32-bit limitations? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 26 FAQ: MongoDB for Application Developers 26.1 What is a namespace? . . . . . . . . . . . . . . . . . . . . . . . 26.2 How do you copy all objects from one collection to another? . . . . 26.3 If you remove a document, does MongoDB remove it from disk? . 26.4 When does MongoDB write updates to disk? . . . . . . . . . . . . 26.5 How do I do transactions and locking in MongoDB? . . . . . . . . 26.6 How do you aggregate data with MongoDB? . . . . . . . . . . . . 26.7 Why does MongoDB log so many Connection Accepted events? 26.8 Does MongoDB run on Amazon EBS? . . . . . . . . . . . . . . . 26.9 Why are MongoDBs data les so large? . . . . . . . . . . . . . . 26.10 How does MongoDB address SQL or Query injection? . . . . . . . 26.11 How does MongoDB provide concurrency? . . . . . . . . . . . . . 26.12 What is the compare order for BSON types? . . . . . . . . . . . . 353 353 354 354 354 354 355 355 355 355 355 357 357 359 359 360 360 360 360 360 360 361 361 361 361 362 362 362 362 362 362 363 363 363 363 363 363 364 364 364 365 365 365 366 366 366 366 367 367

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

27 FAQ: Sharding with MongoDB 27.1 Is sharding appropriate for a new deployment? . . . . . . . . . . . . . . . . . . . . . . . . . 27.2 How does sharding work with replication? . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.3 What happens to unsharded collections in sharded databases? . . . . . . . . . . . . . . . . . 27.4 How does MongoDB distribute data across shards? . . . . . . . . . . . . . . . . . . . . . . . 27.5 What happens if a client updates a document in a chunk during a migration? . . . . . . . . . 27.6 What happens to queries if a shard is inaccessible or slow? . . . . . . . . . . . . . . . . . . . 27.7 How does MongoDB distribute queries among shards? . . . . . . . . . . . . . . . . . . . . . 27.8 How does MongoDB sort queries in sharded environments? . . . . . . . . . . . . . . . . . . 27.9 How does MongoDB ensure unique _id eld values when using a shard key other than _id? 27.10 Ive enabled sharding and added a second shard, but all the data is still on one server. Why? . 27.11 Is it safe to remove old les in the moveChunk directory? . . . . . . . . . . . . . . . . . . . 27.12 How many connections does each mongos need? . . . . . . . . . . . . . . . . . . . . . . . 27.13 Why does mongos hold connections? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.14 Where does MongoDB report on connections used by mongos? . . . . . . . . . . . . . . . . 27.15 What does writebacklisten in the log mean? . . . . . . . . . . . . . . . . . . . . . . . 27.16 How should administrators deal with failed migrations? . . . . . . . . . . . . . . . . . . . . 27.17 What is the process for moving, renaming, or changing the number of cong servers? . . . . 27.18 When do the mongos servers detect cong server changes? . . . . . . . . . . . . . . . . . . 27.19 Is it possible to quickly update mongos servers after updating a replica set conguration? . . 27.20 What does the maxConns setting on mongos do? . . . . . . . . . . . . . . . . . . . . . . . 27.21 How do indexes impact queries in sharded systems? . . . . . . . . . . . . . . . . . . . . . . 27.22 Can shard keys be randomly generated? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.23 Can shard keys have a non-uniform distribution of values? . . . . . . . . . . . . . . . . . . . 27.24 Can you shard on the _id eld? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.25 Can shard key be in ascending order, like dates or timestamps? . . . . . . . . . . . . . . . . . 27.26 What do moveChunk commit failed errors mean? . . . . . . . . . . . . . . . . . . . 28 FAQ: Replica Sets and Replication in MongoDB 28.1 What kinds of replication does MongoDB support? . . . . . . . . . . 28.2 What do the terms primary and master mean? . . . . . . . . . . 28.3 What do the terms secondary and slave mean? . . . . . . . . . . 28.4 How long does replica set failover take? . . . . . . . . . . . . . . . . 28.5 Does replication work over the Internet and WAN connections? . . . 28.6 Can MongoDB replicate over a noisy connection? . . . . . . . . . 28.7 What is the preferred replication method: master/slave or replica sets? 28.8 What is the preferred replication method: replica sets or replica pairs?

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

iv

28.9 28.10 28.11 28.12 28.13 28.14

Why use journaling if replication already provides data redundancy? . Are write operations durable without getLastError? . . . . . . . . How many arbiters do replica sets need? . . . . . . . . . . . . . . . . What information do arbiters exchange with the rest of the replica set? Which members of a replica set vote in elections? . . . . . . . . . . . Do hidden members vote in replica set elections? . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

367 367 367 368 368 368 369 369 369 369 370 370 370 370

29 FAQ: MongoDB Storage 29.1 What are memory mapped les? . . . . . . . . . . . . . . . . 29.2 How do memory mapped les work? . . . . . . . . . . . . . 29.3 How does MongoDB work with memory mapped les? . . . 29.4 What are page faults? . . . . . . . . . . . . . . . . . . . . . 29.5 What is the difference between soft and hard page faults? . . 29.6 What tools can I use to investigate storage use in MongoDB? 29.7 What is the working set? . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

XII

Reference

371

30 MongoDB Interface 373 30.1 Query, Update, Projection, and Aggregation Operators . . . . . . . . . . . . . . . . . . . . . . . . . 373 30.2 Database Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 30.3 JavaScript Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 31 Manual Pages 31.1 mongod Manual . . . . . 31.2 mongos Manual . . . . . 31.3 Conguration File Options 31.4 mongo Manual . . . . . . 31.5 mongodump Manual . . 31.6 mongorestore Manual 31.7 mongoimport Manual . 31.8 mongoexport Manual . 31.9 mongostat Manual . . 31.10 mongotop Manual . . . 31.11 mongooplog Manual . . 31.12 mongosniff Manual . . 31.13 mongofiles Manual . . 31.14 bsondump Manual . . . 31.15 mongod.exe Manual . . 31.16 mongos.exe Manual . . 485 485 491 494 503 505 508 511 514 516 520 522 525 526 528 529 530 533 533 537 551 552 554 557 558 561 566 567 573 v

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

32 Status and Reporting 32.1 Server Status Output Index . . . . . . 32.2 Server Status Reference . . . . . . . 32.3 Database Statistics Reference . . . . 32.4 Collection Statistics Reference . . . . 32.5 Collection Validation Data . . . . . . 32.6 Connection Pool Statistics Reference 32.7 Replica Status Reference . . . . . . . 32.8 Replica Set Conguration . . . . . . 32.9 Replication Info Reference . . . . . . 32.10 Current Operation Reporting . . . . . 33 General Reference

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

33.1 MongoDB Limits and Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 33.2 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 34 Release Notes 585 34.1 Release Notes for MongoDB 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Index 595

vi

Part I

About MongoDB Documentation

CHAPTER

ONE

ABOUT MONGODB
MongoDB is a document-oriented database management system designed for performance, horizontal scalability, high availability, and advanced queryability. See the following wiki pages for more information about MongoDB: Introduction Philosophy About If you want to download MongoDB, see the downloads page. If youd like to learn how to use MongoDB with your programming language of choice, see the introduction to the drivers (page 225).

MongoDB Documentation, Release 2.0.6

Chapter 1. About MongoDB

CHAPTER

TWO

ABOUT THE DOCUMENTATION PROJECT


2.1 This Manual
The MongoDB documentation project aims to provide a complete manual for the MongoDB database server describing its use, behavior, operation, and administration. These docs will eventually replace MongoDBs original documentation.

2.1.1 Licensing
This manual is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (i.e. CC-BY-NC-SA) license. The MongoDB Manual is copyright 2011-2012 10gen, Inc.

2.1.2 Version and Revisions


This version of the manual reects version 2.0.6 of MongoDB. See the MongoDB Documentation Project Page for an overview of all editions and output formats of the MongoDB Manual. You can see the full revision history and track ongoing improvements and additions for all versions of the manual from its GitHub repository. This edition of the manual reects master branch of the documentation as of the 90aa10fb684d48138891e2a76b9b783092d14bb1 revision. This branch is explicitly accessible via http://docs.mongodb.org/master and you can always reference the commit of the current manual in the release.txt le. The most up-to-date, current, http://docs.mongodb.org/manual/. and stable version of the manual is always available at

2.2 Contributing to the Documentation


The entire source of the documentation is available in the docs repository along with all of the other MongoDB project repositories on GitHub. You can clone the repository by issuing the following command at your system shell:

MongoDB Documentation, Release 2.0.6

git clone git://github.com/mongodb/docs.git

If you have a GitHub account and want to fork this repository, you may issue pull requests, and someone on the documentation team will merge in your contributions promptly. In order to accept your changes to the Manual, you have to complete the MongoDB/10gen Contributor Agreement. This project tracks issues at MongoDBs DOCS project. If you see a problem with the documentation, please report it there.

2.3 Writing Documentation


The MongoDB Manual uses Sphinx, a sophisticated documentation engine built upon Python Docutils. The original reStructured Text les, as well as all necessary Sphinx extensions and build tools, are available in the same repository as the documentation. You can view the documentation style guide and the build instructions in reStructured Text les in the top-level of the documentation repository. If you have any questions, please feel free to open a Jira Case.

Chapter 2. About the Documentation Project

Part II

Installing MongoDB

CHAPTER

THREE

INSTALLATION GUIDES
MongoDB runs on most platforms, and supports 32-bit and 64-bit architectures. 10gen, the MongoDB makers, provides both binaries and packages. Choose your platform below:

3.1 Install MongoDB on RedHat Enterprise, CentOS, or Fedora Linux


3.1.1 Synopsis
This tutorial outlines the basic installation process for deploying MongoDB on RedHat Enterprise Linux, CentOS Linux, Fedora Linux and related systems. This procedure uses .rpm packages as the basis of the installation. 10gen publishes packages of the MongoDB releases as .rpm packages for easy installation and management for users of Debian systems. While some of these distributions include their own MongoDB packages, the 10gen packages are generally more up to date. This tutorial includes: an overview of the available packages, instructions for conguring the package manager, the process install packages from the 10gen repository, and preliminary MongoDB conguration and operation. See Also: The documentation of following related processes and concepts. Other installation tutorials: /tutorial/install-mongodb-on-debian-or-ubuntu-linux Install MongoDB on Debian (page 14) Install MongoDB on Ubuntu (page 12) Install MongoDB on Linux (page 17) Install MongoDB on OS X (page 19) Install MongoDB on Windows (page 22)

3.1.2 Package Options


The 10gen repository contains four packages: mongo-10gen This package contains MongoDB tools from latest stable release. Install this package on all production MongoDB hosts and optionally on other systems from which you may need to administer MongoDB systems.

MongoDB Documentation, Release 2.0.6

mongo-10gen-server This package contains the mongod and mongos daemons from the latest stable release and associated conguration and init scripts. mongo18-10gen This package contains MongoDB tools from previous release. Install this package on all production MongoDB hosts and optionally on other systems from which you may need to administer MongoDB systems. mongo18-10gen-server This package contains the mongod and mongos daemons from previous stable release and associated conguration and init scripts. The MongoDB tools included in the mongo-10gen packages are: mongo mongodump mongorestore mongoexport mongoimport mongostat mongotop bsondump

3.1.3 Installing MongoDB


Congure Package Management System (YUM) Create a /etc/yum.repos.d/10gen.repo le to hold information about your repository. If you are running a 64-bit system (recommended,) place the following conguration in /etc/yum.repos.d/10gen.repo le:
[10gen] name=10gen Repository baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64 gpgcheck=0 enabled=1

If you are running a 32-bit system, which isnt recommended for production deployments, place the following conguration in /etc/yum.repos.d/10gen.repo le:
[10gen] name=10gen Repository baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686 gpgcheck=0 enabled=1

Installing Packages Issue the following command (as root or with sudo) to install the latest stable version of MongoDB and the associated tools:

10

Chapter 3. Installation Guides

MongoDB Documentation, Release 2.0.6

yum install mongo-10gen mongo-10gen-server

When this command completes, you have successfully installed MongoDB! Continue for conguration and start-up suggestions.

3.1.4 Congure MongoDB


These packages congure MongoDB using the /etc/mongod.conf le in conjunction with the control script. You can nd the init script at /etc/rc.d/init.d/mongod. This MongoDB instance will store its data les in the /var/lib/mongo and its log les in /var/log/mongo, and run using the mongod user account. Note: If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongo and /var/log/mongo directories.

3.1.5 Control MongoDB


Start MongoDB Start the mongod process by issuing the following command (as root, or with sudo):
service mongod start

You can verify that the mongod process has started successfully by checking the contents of the log le at /var/log/mongo/mongod.log. You may optionally, ensure that MongoDB will start following a system reboot, by issuing the following command (with root privileges:)
chkconfig mongod on

Stop MongoDB Stop the mongod process by issuing the following command (as root, or with sudo):
service mongod stop

Restart MongoDB You can restart the mongod process by issuing the following command (as root, or with sudo):
service mongod restart

Follow the state of this process by watching the output in the /var/log/mongo/mongod.log le to watch for errors or important messages from the server. Control mongos As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script. 3.1. Install MongoDB on RedHat Enterprise, CentOS, or Fedora Linux 11

MongoDB Documentation, Release 2.0.6

3.1.6 Using MongoDB


Among the tools included in the mongo-10gen package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:
mongo

This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the test collection of the (default) test database and then retrieve that document.
> db.test.save( { a: 1 } ) > db.test.find()

See Also: mongo and /reference/javascript

3.2 Install MongoDB on Ubuntu


3.2.1 Synopsis
This tutorial outlines the basic installation process for installing MongoDB on Ubuntu Linux systems. This tutorial uses .deb packages as the basis of the installation. 10gen publishes packages of the MongoDB releases as .deb packages for easy installation and management for users of Ubuntu systems. Ubuntu does include MongoDB packages, the 10gen packages are generally more up to date. This tutorial includes: an overview of the available packages, instructions for conguring the package manager, the process for installing packages from the 10gen repository, and preliminary MongoDB conguration and operation. Note: If you use an older Ubuntu that does not use Upstart, (i.e. any since version 9.10 Karmic,) please follow the instructions on the Install MongoDB on Debian (page 14) tutorial. See Also: The documentation of following related processes and concepts. Other installation tutorials: Install MongoDB on RedHat Enterprise, CentOS, or Fedora Linux (page 9) Install MongoDB on Debian (page 14) Install MongoDB on Linux (page 17) Install MongoDB on OS X (page 19) Install MongoDB on Windows (page 22)

3.2.2 Package Options


The 10gen repository contains three packages: mongodb-10gen This package contains the latest stable release. Use this for production deployments.

12

Chapter 3. Installation Guides

MongoDB Documentation, Release 2.0.6

mongodb20-10gen This package contains the stable release of v2.0 branch. mongodb18-10gen This package contains the stable release of v1.8 branch. You cannot install these packages concurrently with each other or with the mongodb package that your release of Ubuntu may include. 10gen also provides packages for unstable or development versions of MongoDB. Use the mongodb-10gen-unstable package to test the latest development release of MongoDB, but do not use this version in production.

3.2.3 Installing MongoDB


Congure Package Management System (APT) The Ubuntu package management tool (i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with GPG keys. Issue the following command to import the 10gen public GPG Key:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

Create a the /etc/apt/sources.list.d/10gen.list le and include the most appropriate version of the following lines for the 10gen repository.
deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen

Now issue the following command to reload your repository:


sudo apt-get update

Install Packages Issue the following command to install the latest stable version of MongoDB:
sudo apt-get install mongodb-10gen

When this command completes, you have successfully installed MongoDB! Continue for conguration and start-up suggestions.

3.2.4 Congure MongoDB


These packages congure MongoDB using the /etc/mongodb.conf le in conjunction with the control script. You will nd the control script is at /etc/init/mongodb.conf. This MongoDB instance will store its data les in the /var/lib/mongodb and its log les in /var/log/mongodb, and run using the mongodb user account. Note: If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongodb and /var/log/mongodb directories.

3.2. Install MongoDB on Ubuntu

13

MongoDB Documentation, Release 2.0.6

3.2.5 Controlling MongoDB


Starting MongoDB You can start the mongod process by issuing the following command:
sudo service mongodb start

You can verify that mongod has started successfully by checking the contents of the log le at /var/log/mongodb/mongodb.log. Stopping MongoDB As needed, you may stop the mongod process by issuing the following command:
sudo service mongodb stop

Restarting MongoDB You may restart the mongod process by issuing the following command:
sudo service mongodb restart

Controlling mongos As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script.

3.2.6 Using MongoDB


Among the tools included with the MongoDB package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:
mongo

This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the test collection of the (default) test database.
> db.test.save( { a: 1 } ) > db.test.find()

See Also: mongo and /reference/javascript

3.3 Install MongoDB on Debian


3.3.1 Synopsis
This tutorial outlines the basic installation process for installing MongoDB on Debian systems. This tutorial uses .deb packages as the basis of the installation. 10gen publishes packages of the MongoDB releases as .deb packages 14 Chapter 3. Installation Guides

MongoDB Documentation, Release 2.0.6

for easy installation and management for users of Debian systems. While some of these distributions include their own MongoDB packages, the 10gen packages are generally more up to date. This tutorial includes: an overview of the available packages, instructions for conguring the package manager, the process for installing packages from the 10gen repository, and preliminary MongoDB conguration and operation. Note: If youre running a version of Ubuntu Linux prior to 9.10 Karmic, use this tutorial. Other Ubuntu users will want to follow the Install MongoDB on Ubuntu (page 12) tutorial. See Also: The documentation of following related processes and concepts. Other installation tutorials: Install MongoDB on RedHat Enterprise, CentOS, or Fedora Linux (page 9) Install MongoDB on Ubuntu (page 12) Install MongoDB on Linux (page 17) Install MongoDB on OS X (page 19) Install MongoDB on Windows (page 22)

3.3.2 Package Options


The 10gen repository contains three packages: mongodb-10gen This package contains the latest stable release. Use this for production deployments. mongodb20-10gen This package contains the stable release of v2.0 branch. mongodb18-10gen This package contains the stable release of v1.8 branch. You cannot install these packages concurrently with each other or with the mongodb package that your release of Debian may include. 10gen also provides packages for unstable or development versions of MongoDB. Use the mongodb-10gen-unstable package to test the latest development release of MongoDB, but do not use this version in production.

3.3.3 Installing MongoDB


Congure Package Management System (APT) The Debian package management tool (i.e. dpkg and apt) ensure package consistency and authenticity by requiring that distributors sign packages with GPG keys. Issue the following command to import the 10gen public GPG Key:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

Create a the /etc/apt/sources.list.d/10gen.list le and include the following line for the 10gen repository.

3.3. Install MongoDB on Debian

15

MongoDB Documentation, Release 2.0.6

deb http://downloads-distro.mongodb.org/repo/debian-sysvinit dist 10gen

Now issue the following command to reload your repository:


sudo apt-get update

Install Packages Issue the following command to install the latest stable version of MongoDB:
sudo apt-get install mongodb-10gen

When this command completes, you have successfully installed MongoDB! Continue for conguration and start-up suggestions.

3.3.4 Congure MongoDB


These packages congure MongoDB using the /etc/mongodb.conf le in conjunction with the control script. You can nd the control script at /etc/init.d/mongodb. This MongoDB instance will store its data les in the /var/lib/mongodb and its log les in /var/log/mongodb, and run using the mongodb user account. Note: If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongodb and /var/log/mongodb directories.

3.3.5 Controlling MongoDB


Starting MongoDB Issue the following command to start mongod:
sudo /etc/init.d/mongodb start

You can verify that mongod has started successfully by checking the contents of the log le at /var/log/mongodb/mongodb.log. Stopping MongoDB Issue the following command to stop mongod:
sudo /etc/init.d/mongodb stop

Restarting MongoDB Issue the following command to restart mongod:


sudo /etc/init.d/mongodb restart

16

Chapter 3. Installation Guides

MongoDB Documentation, Release 2.0.6

Controlling mongos As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script.

3.3.6 Using MongoDB


Among the tools included with the MongoDB package, is the mongo shell. You can connect to your MongoDB instance by issuing the following command at the system prompt:
mongo

This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the test collection of the (default) test database.
> db.test.save( { a: 1 } ) > db.test.find()

See Also: mongo and /reference/javascript

3.4 Install MongoDB on Linux


3.4.1 Synopsis
10gen provides compiled versions of MongoDB for use on Linux that provides a simple option for users who cannot use packages. This tutorial outlines the basic installation of MongoDB using these compiled versions and an initial usage guide. See Also: The documentation of following related processes and concepts. Other installation tutorials: Install MongoDB on RedHat Enterprise, CentOS, or Fedora Linux (page 9) Install MongoDB on Ubuntu (page 12) Install MongoDB on Debian (page 14) Install MongoDB on OS X (page 19) Install MongoDB on Windows (page 22)

3.4.2 Download MongoDB


Note: You should place the MongoDB binaries in a central location on the le system that is easy to access and control. Consider /opt or /usr/local/bin. In a terminal session, begin by downloading the latest release. In most cases you will want to download the 64-bit version of MongoDB.

3.4. Install MongoDB on Linux

17

MongoDB Documentation, Release 2.0.6

curl http://downloads.mongodb.org/linux/mongodb-linux-x86_64-x.y.z.tgz > mongo.tgz

If you need to run the 32-bit version, use the following command.
curl http://downloads.mongodb.org/linux/mongodb-linux-i686-x.y.z.tgz > mongo.tgz

Note: Replace x.y.z with the current stable version (i.e. 2.0.6). You may also choose to install a development release, in which case you will need to specify that version number above. Once youve downloaded the release, issue the following command to extract the les from the archive:
tar -zxvf mongo.tgz

Optional You may use the following command to copy the extracted folder into a more generic location.
cp -R -n mongodb-osx-20??-??-??/ mongodb

You can nd the mongod binary, and the binaries all of the associated MongoDB utilities, in the bin/ directory within the extracted directory. Using MongoDB Before you start mongod for the rst time, you will need to create the data directory. By default, mongod writes data to the /data/db/ directory. To create this directory, use the following command:
mkdir -p /data/db

Note: Ensure that the system account that will run the mongod process has read and write permissions to this directory. If mongod runs under the mongo user account, issue the following command to change the owner of this folder:
chown mongo /data/db

If you use an alternate location for your data directory, ensure that this user can write to your chosen data path. You can specify, and create, an alternate path using the --dbpath (page 487) option to mongod and the above command. The 10gen builds of MongoDB contain no control scripts or method to control the mongod process. You may wish to create control scripts, modify your path, and/or create symbolic links to the MongoDB programs in your /usr/local/bin or /usr/bin directory for easier use. For testing purposes, you can start a mongod directly in the terminal without creating a control script:
mongod --config /etc/mongod.conf

Note: The above command assumes that the mongod binary is accessible via your systems search path, and that you have created a default conguration le located at /etc/mongod.conf.

18

Chapter 3. Installation Guides

MongoDB Documentation, Release 2.0.6

Among the tools included with this MongoDB distribution, is the mongo shell. You can use this shell to connect to your MongoDB instance by issuing the following command at the system prompt:
./bin/mongo

Note: The ./bin/mongo command assumes that the mongo binary is in the bin/ sub-directory of the current directory. This is the directory into which you extracted the .tgz le. This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the test collection of the (default) test database and then retrieve that record:
> db.test.save( { a: 1 } ) > db.test.find()

See Also: mongo and /reference/javascript

3.5 Install MongoDB on OS X


3.5.1 Synopsis
This tutorial outlines the basic installation process for deploying MongoDB on Macintosh OS X systems. This tutorial provides two main methods of installing the MongoDB server (i.e. mongod) and associated tools: rst using the community package management tools, and second using builds of MongoDB provided by 10gen. See Also: The documentation of following related processes and concepts. Other installation tutorials: Install MongoDB on RedHat Enterprise, CentOS, or Fedora Linux (page 9) Install MongoDB on Ubuntu (page 12) Install MongoDB on Debian (page 14) Install MongoDB on Linux (page 17) Install MongoDB on Windows (page 22)

3.5.2 Installing with Package Management


Both community package management tools: Homebrew and MacPorts require some initial setup and conguration. This conguration is beyond the scope of this document. You only need to use one of these tools. If you want to use package management, and do not already have a system installed, Homebrew is typically easier and simpler to use. Homebrew Homebrew installs binary packages based on published formula. Issue the following command at the system shell to update the brew package manager:

3.5. Install MongoDB on OS X

19

MongoDB Documentation, Release 2.0.6

brew update

Use the following command to install the MongoDB package into your Homebrew system.
brew install mongodb

MacPorts MacPorts distributes build scripts that allow you to easily build packages and their dependencies on your own system. The compilation process can take signicant period of time depending on your systems capabilities and existing dependencies. Issue the following command in the system shell:
port install mongodb

Using MongoDB from Homebrew and MacPorts The packages installed with Homebrew and MacPorts contain no control scripts or interaction with the systems process manager. If you have congured Homebrew and MacPorts correctly, including setting your PATH, the MongoDB applications and utilities will be accessible from the system shell. Start the mongod process in a terminal (for testing or development) or using a process management tool.
mongod

Then open the mongo shell by issuing the following command at the system prompt:
mongo

This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the test collection of the (default) test database and then retrieve that record.
> db.test.save( { a: 1 } ) > db.test.find()

See Also: mongo and /reference/javascript

3.5.3 Installing from 10gen Builds


10gen provides compiled binaries of all MongoDB software compiled for OS X, which may provide a more straightforward installation process. Download MongoDB In a terminal session, begin by downloading the latest release. In most cases you will want to download the 64-bit version of MongoDB.
curl http://downloads.mongodb.org/osx/mongodb-osx-x86_64-x.y.z.tgz > mongo.tgz

If you need to run the 32-bit version, use the following command:

20

Chapter 3. Installation Guides

MongoDB Documentation, Release 2.0.6

curl http://downloads.mongodb.org/osx/mongodb-osx-i386-x.y.z.tgz > mongo.tgz

Note: Replace x.y.z with the current stable version (i.e. 2.0.6). You may also choose to install a development release, in which case you will need to specify that version number above.

Note: The mongod process will not run on older Macintosh computers with PowerPC (i.e. non-Intel) processors. While 32-bit builds of MongoDB are ne for testing purposes, it is impossible to use multi-gigabyte databases with 32-bit systems. All recent Macintosh systems (including all Intel-based systems) have support for 64-bit builds of MongoDB. Once youve downloaded the release, issue the following command to extract the les from the archive:
tar -zxvf mongo.tgz

Optional You may use the following command to move the extracted folder into a more generic location.
mv -n mongodb-osx-[platform]-[version]/ /path/to/new/location/

Replace [platform] with i386 or x86_64 depending on your system and the version you downloaded, and [version] with 2.0.6 or the version of MongoDB that you are installing. You can nd the mongod binary, and the binaries all of the associated MongoDB utilities, in the bin/ directory within the archive. Using MongoDB from 10gen Builds Before you start mongod for the rst time, you will need to create the data directory. By default, mongod writes data to the /data/db/ directory. To create this directory, and set the appropriate permissions use the following commands:
sudo mkdir -p /data/db sudo chown id -u /data/db

You can specify an alternate path for data les using the --dbpath (page 487) option to mongod. The 10gen builds of MongoDB contain no control scripts or method to control the mongod process. You may wish to create control scripts, modify your path, and/or create symbolic links to the MongoDB programs in your /usr/local/bin directory for easier use. For testing purposes, you can start a mongod directly in the terminal without creating a control script:
mongod --config /etc/mongod.conf

Note: This command assumes that the mongod binary is accessible via your systems search path, and that you have created a default conguration le located at /etc/mongod.conf. Among the tools included with this MongoDB distribution, is the mongo shell. You can use this shell to connect to your MongoDB instance by issuing the following command at the system prompt from inside of the directory where you extracted mongo: 3.5. Install MongoDB on OS X 21

MongoDB Documentation, Release 2.0.6

./bin/mongo

Note: The ./bin/mongo command assumes that the mongo binary is in the bin/ sub-directory of the current directory. This is the directory into which you extracted the .tgz le. This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the test collection of the (default) test database and then retrieve that record:
> db.test.save( { a: 1 } ) > db.test.find()

See Also: mongo and /reference/javascript

3.6 Install MongoDB on Windows


3.6.1 Synopsis
This tutorial provides a method for installing and running the MongoDB server (i.e. mongod.exe) on the Microsoft Windows platform through the Command Prompt and outlines the process for setting up MongoDB as a Windows Service. Operating MongoDB with Windows is similar to MongoDB on other platforms. Most components share the same operational patterns.

3.6.2 Procedure
Download MongoDB for Windows Download the latest production release of MongoDB from the MongoDB downloads page. There are three builds of MongoDB for Windows: MongoDB for Windows Server 2008 R2 edition only runs on Windows Server 2008 R2, Windows 7 64-bit, and newer versions of Windows. This build takes advantage of recent enhancements to the Windows Platform and cannot operate on older versions of Windows. MongoDB for Windows 64-bit runs on any 64-bit version of Windows newer than Windows XP, including Windows Server 2008 R2 and Windows 7 64-bit. MongoDB for Windows 32-bit runs on any 32-bit version of Windows newer than Windows XP. 32-bit versions of MongoDB are only intended for older systems and for use in testing and development systems. Changed in version 2.2: MongoDB does not support Windows XP. Please use a more recent version of Windows to use more recent releases of MongoDB. Note: Always download the correct version of MongoDB for your Windows system. The 64-bit versions of MongoDB will not work with 32-bit Windows. 32-bit versions of MongoDB are suitable only for testing and evaluation purposes and only support databases smaller than 2GB.

22

Chapter 3. Installation Guides

MongoDB Documentation, Release 2.0.6

You can nd the architecture of your version of Windows platform using the following command in the Command Prompt
wmic os get osarchitecture

In Windows Explorer, nd the MongoDB download le, typically in the default Downloads directory. Extract the archive to C:\ by right clicking on the archive and selecting Extract All and browsing to C:\. Note: The folder name will be either:
C:\mongodb-win32-i386-[version]

Or:
C:\mongodb-win32-x86_64-[version]

In both examples, replace [version] with the version of MongoDB downloaded.

Set up the Environment Start the Command Prompt by selecting the Start Menu, then All Programs, then Accessories, then right click Command Prompt, and select Run as Administrator from the popup menu. In the Command Prompt, issue the following commands:
cd \ move C:\mongodb-win32-* C:\mongodb

Note: MongoDB is self-contained and does not have any other system dependencies. You can run MongoDB from any folder you choose. You may install MongoDB in any directory (e.g. D:\test\mongodb) MongoDB requires a data folder to store its les. The default location for the MongoDB data directory is C:\data\db. Create this folder using the Command Prompt. Issue the following command sequence:
md data md data\db

Note: You may specify an alternate path for \data\db with the dbpath (page 497) setting for mongod.ext, as in the following example:
C:\mongodb\bin\mongod.exe --dbpath d:\test\mongodb\data

If your path includes spaces, enclose the entire path in double quotations, for example:
C:\mongodb\bin\mongod.exe --dbpath "d:\test\mongo db data"

Start MongoDB To start MongoDB, execute from the Command Prompt:


C:\mongodb\bin\mongod.exe

3.6. Install MongoDB on Windows

23

MongoDB Documentation, Release 2.0.6

This will start the main MongoDB database process. The waiting for connections message in the console output indicates that the mongod.exe process is running successfully. Note: Depending on the security level of your system, Windows will issue a Security Alert dialog box about blocking some features of C:\\mongodb\bin\mongod.exe from communicating on networks. All users should select Private Networks, such as my home or work network and click Allow access. For additional information on security and MongoDB, please read the Security and Authentication wiki page. Warning: Do not allow mongod.exe to be accessible to public networks without running in Secure Mode (i.e. auth (page 497).) MongoDB is designed to be run in trusted environments and the database does not enable authentication or Secure Mode by default. Connect to MongoDB using the mongo.exe shell. Open another Command Prompt and issue the following command:
C:\mongodb\bin\mongo.exe

Note: Executing the command start C:\mongodb\bin\mongo.exe will automatically start the mongo.exe shell in a separate Command Prompt window. The mongo.exe shell will connect to mongod.exe running on the localhost interface and port 27017 by default. At the mongo.exe prompt, issue the following two commands to insert a record in the test collection of the default test database and then retrieve that record:
> db.test.save( { a: 1 } ) > db.test.find()

See Also: mongo and /reference/javascript. If you want to develop applications using .NET, see the C# Language Center wiki page for more information.

3.6.3 MongoDB as a Windows Service


New in version 2.0. Setup MongoDB as a Windows Service, so that the database will start automatically following each reboot cycle. Note: mongod.exe added support for running as a Windows service in version 2.0, and mongos.exe added support for running as a Windows Service in version 2.1.1.

Congure the System You should specify two options when running MongoDB as a Windows Service: a path for the log output (i.e. logpath (page 496)) and a conguration le (page 494). 1. Create a specic directory for MongoDB log les:
md C:\mongodb\log

2. Create a conguration le for the logpath (page 496) option for MongoDB in the Command Prompt by issuing this command:

24

Chapter 3. Installation Guides

MongoDB Documentation, Release 2.0.6

echo logpath=C:\mongodb\log\mongo.log > C:\mongodb\mongod.cfg

While these optional steps are optional, creating a specic location for log les and using the conguration le are good practice. Note: Consider setting the logappend (page 496) option. If you do not, mongod.exe will delete the contents of the existing log le when starting. Changed in version 2.2: The default logpath (page 496) and logappend (page 496) behavior will change in the 2.2 release.

Install and Run the MongoDB Service Run all of the following commands in Command Prompt with Administrative Privileges: 1. To install the MongoDB service:
C:\mongodb\bin\mongod.exe --config C:\mongodb\mongod.cfg --install

Modify the path to the mongod.cfg le as needed. For the --install (page 529) option to succeed, you must specify a logpath (page 496) setting or the --logpath (page 486) run-time option. 2. To run the MongoDB service:
net start MongoDB

Note: If you wish to use an alternate path for your dbpath (page 497) specify it in the cong le (e.g. C:\mongodb\mongod.cfg) on that you specied in the --install (page 529) operation. You may also specify --dbpath (page 487) on the command line; however, always prefer the conguration le. If the dbpath directory does not exist, mongod.exe will not be able to start. The default value for dbpath (page 497) is \data\db.

Stop or Remove the MongoDB Service To stop the MongoDB service:


net stop MongoDB

To remove the MongoDB service:


C:\mongodb\bin\mongod.exe --remove

3.6. Install MongoDB on Windows

25

MongoDB Documentation, Release 2.0.6

26

Chapter 3. Installation Guides

CHAPTER

FOUR

RELEASE NOTES
You should always install the latest, stable version of MongoDB. Stable versions have an even-numbered minor version number. For example: v2.2 is stable, v2.0 and v1.8 were previously the stable, while v2.1 and v2.3 is a development version. Current Stable Release Release Notes for MongoDB 2.2 (page 585) Previous Stable Releases /release-notes/2.0 /release-notes/1.8 Current Development Release (v2.3-series)

27

MongoDB Documentation, Release 2.0.6

28

Chapter 4. Release Notes

Part III

Replication

29

MongoDB Documentation, Release 2.0.6

Database replication ensures redundancy, backup, and automatic failover. Replication occurs through groups of servers known as replica sets. This page lists the documents, tutorials, and reference pages that describe replica sets. For an overview, see Replication Fundamentals (page 33). To work with members, see Replica Set Administration (page 38). To congure deployment architecture, see Replication Architectures (page 47). To modify read and write operations, see Application Development with Replica Sets (page 50). For procedures for performing certain replication tasks, see the Tutorials (page 61) list.

31

MongoDB Documentation, Release 2.0.6

32

CHAPTER

FIVE

DOCUMENTATION
The following is the outline of the main documentation:

5.1 Replication Fundamentals


A MongoDB replica set is a cluster of mongod instances that replicate amongst one another and ensure automated failover. Most replica sets consists of two or more mongod instances with at most one of these designated as the primary and the rest as secondary members. Clients direct all writes to the primary, while the secondary members replicate from the primary asynchronously. Database replication with MongoDB adds redundancy, helps to ensure high availability, simplies certain administrative tasks such as backups, and may increase read capacity. Most production deployments use replication. If youre familiar with other database systems, you may think about replica sets as a more sophisticated form of traditional master-slave replication. 1 In master-slave replication, a master node accepts writes while one or more slave nodes replicate those write operations and thus maintain data sets identical to the master. For MongoDB deployments, the member that accepts write operations is the primary, and the replicating members are secondaries. MongoDBs replica sets provide automated failover. If a primary fails, the remaining members will automatically try to elect a new primary. A replica set can have up to 12 members, but only 7 members can have votes. For information regarding non-voting members, see non-voting members (page 42) See Also: Replica Set Implementation Details (page 57) and the Replication (page 31) index for a list of all documents in this manual that contain information related to the operation and use of MongoDBs replica sets.

5.1.1 Member Congurations


All replica sets at most use a single primary and one or more secondary members. You can congure secondary members in a variety of ways, as listed here. For details, see Member Congurations (page 39) in the Replica Set Administration (page 38) document. You can congure a member as any of the following: Secondary-Only: These members cannot become primary. See Secondary-Only Members (page 39). Hidden: These members are invisible to client applications. See Hidden Members (page 40).
1 MongoDB also provides conventional master/slave replication. Master/slave replication operates by way of the same mechanism as replica sets, but lacks the automatic failover capabilities. While replica sets are the recommended solution for production, a replica set can support only 12 members in total. If your deployment requires more than 11 slave members, youll need to use master/slave replication.

33

MongoDB Documentation, Release 2.0.6

Delayed: These members apply operations from the primarys oplog with a specied delay. See Delayed Members (page 40). Arbiters: These members do not hold data and exist solely to participate in elections (page 34). See Arbiters (page 41). Non-Voting: These members cannot vote in elections. See Non-Voting Members (page 42).

5.1.2 Failover
Replica sets feature automated failover. If the primary goes ofine or becomes unresponsive and a majority of the original set members can still connect to each other, the set will elect a new primary. For details, see Failover (page 34) Elections When any failover occurs, an election takes place to decide which member should become primary. Elections provide a mechanism for the members of a replica set to autonomously select a new primary without administrator intervention. The election allows replica sets to recover from failover situations very quickly and robustly. Whenever the primary becomes unreachable, the secondary members trigger an election. The rst member to receive votes from a majority of the set will become primary. The most important feature of replica set elections is that a majority of the original number of members in the replica set must be present for election to succeed. If you have a three-member replica set, the set can elect a primary when two or three members can connect to each other. If two members in the replica go ofine, then the remaining member will remain a secondary. Note: When the current primary steps down and triggers an election, the mongod instances will close all client connections. This ensures that the clients maintain an accurate view of the replica set and helps prevent rollbacks. See Also: The Elections (page 58) section in the Replication Internals (page 57) document, as well as the Failover and Recovery (page 46) section in the Replica Set Administration (page 38) document. Member Priority In a replica set, every member has a priority, that helps determine eligibility for election (page 34) to primary. By default, all members have a priority of 1, unless you modify the members[n].priority (page 562) value. All members have a single vote in elections. Warning: Always congure the members[n].priority (page 562) value to control which members will become primary. Do not congure members[n].votes (page 563) except to permit more than 7 secondary members. See Also: Member Priority Conguration (page 43)

5.1.3 Consistency
In MongoDB, all read operations issued to the primary of a replica set are consistent, with the last write operation.

34

Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

If clients congure the read preference to permit allow secondary reads, read operations cannot return from secondary members that have not replicated more recent updates or operations. In these situations the query results may reect a previous state. This behavior is sometimes characterized as eventual consistency because the secondary members state will eventually reect the primarys state and MongoDB cannot guarantee strict consistency for read operations from secondary members. There is no way to guarantee consistency for reads from secondary members, except by conguring the client and driver to ensure that write operations succeed on all members before completing successfully. This section provides an overview of the concepts that underpin database consistency and the MongoDB mechanisms to ensure that users have access to consistent data. Rollbacks In some failover situations primaries will have accepted write operations that have not replicated to the secondaries after a failover occurs. This case is rare and typically occurs as a result of a network partition with replication lag. When this member (the former primary) rejoins the replica set and attempts to continue replication as a secondary the former primary must revert these operations or roll back these operations to maintain database consistency across the replica set. MongoDB writes the rollback data to a BSON le in the databases dbpath (page 497) directory. Use bsondump (page 528) to read the contents of these rollback les and then manually apply the changes to the new primary. There is no way for MongoDB to appropriately and fairly handle rollback situations without manual intervention. Even after the member completes the rollback and returns to secondary status, administrators will need to apply or decide to ignore the rollback data. The best strategy for avoiding all rollbacks is to ensure write propagation (page 50) to all or some of the members in the set. Using these kinds of policies prevents situations that might create rollbacks. Warning: A mongod instance will not rollback more than 300 megabytes of data. If your system needs to rollback more than 300 MB, you will need to manually intervene to recover this data.

Application Concerns Client applications are indifferent to the conguration and operation of replica sets. While specic conguration depends to some extent on the client drivers (page 225), there is often minimal or no difference between applications using replica sets or standalone instances. There are two major concepts that are important to consider when working with replica sets: 1. Write Concern (page 50). By default, MongoDB clients receive no response from the server to conrm successful write operations. Most drivers provide a congurable safe mode, where the server will return a response for all write operations using getLastError. For replica sets, write concern is congurable to ensure that secondary members of the set have replicated operations before the write returns. 2. Read Preference (page 51) By default, read operations issued against a replica set return results from the primary. Users may congure read preference on a per-connection basis to prefer that read operations return on the secondary members. Read preference and write concern have particular consistency (page 34) implications. See Also:

5.1. Replication Fundamentals

35

MongoDB Documentation, Release 2.0.6

Application Development with Replica Sets (page 50), Write Concern (page 50), and Read Preference (page 51).

5.1.4 Administration and Operations


This section provides a brief overview of relevant concerns for administrators of replica set deployments. See Also: Replica Set Administration (page 38) Replication Architectures (page 47) Oplog The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify that data stored in your databases. MongoDB applies database operations on the primary and then records the operations on the primarys oplog. The secondary members then replicate this log and apply the operations to themselves in an asynchronous process. All replica set members contain a copy of the oplog, allowing them to maintain the current state of the database. Operations in the oplog are idempotent. By default, the size of the oplog is as follows: For 64-bit Linux, Solaris, and FreeBSD systems, MongoDB will allocate 5% of the available free disk space to the oplog. If this amount is smaller than a gigabyte, then MongoDB will allocate 1 gigabyte of space. For 64-bit OS X systems, MongoDB allocates 183 megabytes of space to the oplog. For 32-bit systems, MongoDB allocates about 48 megabytes of space to the oplog. Before oplog creation, you can specify the size of your oplog with the oplogSize (page 501) option. After you start a replica set member for the rst time, you can only change the size of the oplog by using the Change the Size of the Oplog (page 71) tutorial. In most cases, the default oplog size is sufcient. For example, if an oplog that is 5% of free disk space lls up in 24 hours of operations, then secondaries can stop copying entries from the oplog for 24 hours before they require full resyncing. However, most replica sets have much lower operation volumes, and their oplogs can hold a much larger number of operations. The following factors affect how MongoDB uses space in the oplog: Update operations that affect multiple documents at once. The oplog must translate multi-updates into individual operations, in order to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in disk utilization. If you delete roughly the same amount of data as you insert. In this situation the database will not grow signicantly in disk utilization, but the size of the operation log can be quite large. If a signicant portion of your workload entails in-place updates. In-place updates create a large number of operations but do not change the quantity data on disk. If you can predict your replica sets workload to resemble one of the above patterns, then you may want to consider creating an oplog that is larger than the default. Conversely, if the predominance of activity of your MongoDB-based application are reads and you are writing a small amount of data, you may nd that you need a much smaller oplog. See Also:

36

Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

The Oplog (page 57) topic in the Replication Internals (page 57) document. Deployment Without replication, a standalone MongoDB instance represents a single point of failure and any disruption of the MongoDB system will render the database unusable and potentially unrecoverable. Replication increase the reliability of the database instance, and replica sets are capable of distributing reads to secondary members depending on read preference. For database work loads dominated by read operations, (i.e. read heavy) replica sets can greatly increase the capability of the database system. The minimum requirements for a replica set include two members with data, for a primary and a secondary, and an arbiters (page 41). In most circumstances, however, you will want to deploy three data members. For those deployments that rely heavily on distributing reads to secondary instances, add additional members to the set as load increases. As your deployment grows, consider adding or moving replica set members to secondary data centers or to geographically distinct locations for additional redundancy. While many architectures are possible, always ensure that the quorum of members required to elect a primary remains in your main facility. Depending on your operational requirements, you may consider adding members congured for a specic purpose including, a delayed member to help provide protection against human errors and change control, a hidden member to provide an isolated member for reporting and monitoring, and/or a secondary only member (page 39) for dedicated backups. The process of establishing a new replica set member can be resource intensive on existing members. As a result, deploy new members to existing replica sets signicantly before current demand saturates the existing members. Note: Journaling, provides single-instance write durability. The journaling greatly improves the reliability and durability of a database. Unless MongoDB runs with journaling, when a MongoDB instance terminates ungracefully, the database can end in a corrupt and unrecoverable state. You should assume that a database, running without journaling, that suffers a crash or unclean shutdown is in corrupt or inconsistent state. Use journaling, however, do not forego proper replication because of journaling. 64-bit versions of MongoDB after version 2.0 have journaling enabled by default.

Security In most cases, replica set administrators do not have to keep additional considerations in mind beyond the normal security precautions that all MongoDB administrators must take. However, ensure that: Your network conguration will allow every member of the replica set to contact every other member of the replica set. If you use MongoDBs authentication system to limit access to your infrastructure, ensure that you congure a keyFile (page 496) on all members to permit authentication. See Also: Replica Set Security (page 44) Architectures The architecture and design of the replica set deployment can have a great impact on the sets capacity and capability. This section provides a general overview of best practices for replica set architectures.

5.1. Replication Fundamentals

37

MongoDB Documentation, Release 2.0.6

This document provides an overview of the complete functionality of replica sets, which highlights the exibility of the replica set and its conguration. However, for most production deployments a conventional 3-member replica set with members[n].priority (page 562) values of 1 are sufcient. While the additional exibility discussed is below helpful for managing a variety of operational complexities, it always makes sense to let those complex requirements dictate complex architectures, rather than add unnecessary complexity to your deployment. Consider the following factors when developing an architecture for your replica set: Ensure that the members of the replica set will always be able to elect a primary. Run an odd number of members or run an arbiter on one of your application servers if you have an even number of members. With geographically distributed members, be aware of where the quorum of members will be in case of likely network partitions, attempt to ensure that the set can elect a primary among the members in the primary data center. Consider including a hidden (page 40) or delayed member (page 40) in your replica set to support dedicated functionality, like backups, reporting, and testing. Consider keeping one or two members of the set in an off-site data center, but make sure to congure the priority (page 34) to prevent it from becoming primary. See Also: Replication Architectures (page 47) for more information regarding replica set architectures.

5.2 Replica Set Administration


Replica sets automate most administrative tasks associated with database replication. Nevertheless, several operations related to deployment and systems management require administrator intervention remain. This document provides an overview of those tasks, in addition to a collection of troubleshooting suggestions for administers of replica sets. See Also: rs.status() (page 479) and db.isMaster() (page 469) Replica Set Reconguration Process (page 564) rs.conf() (page 477) and rs.reconfig() (page 478) Replica Set Conguration (page 561) The following tutorials provide task-oriented instructions for specic administrative tasks related to replica set operation. Change Hostnames in a Replica Set (page 79) Change the Size of the Oplog (page 71) Convert a Replica Set to a Replicated Shard Cluster (page 167) Convert a Secondary to an Arbiter (page 83) Deploy a Geographically Distributed Replica Set (page 66) Deploy a Replica Set (page 61) Add Members to a Replica Set (page 64)

38

Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

5.2.1 Member Congurations


All replica sets have a single primary and one or more secondaries. Replica sets allow you to congure secondary members in a variety of ways. This section describes these congurations. Note: A replica set can have up to 12 members, but only 7 members can have votes. For conguration information regarding non-voting members, see Non-Voting Members (page 42). Warning: The rs.reconfig() (page 478) shell command can force the current primary to step down, which causes an election (page 34). When the primary steps down, the mongod closes all client connections. While, this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods. See Also: The Elections (page 34) topic in the Replication Fundamentals (page 33) document, and the Elections (page 58) topic in the Replication Internals (page 57) document. Secondary-Only Members The secondary-only conguration prevents a secondary member in a replica set from ever becoming a primary in a failover. You can set secondary-only mode for any member of the set. For example, you may want to congure all members of a replica sets located outside of the main data centers as secondary-only to prevent these members from ever becoming primary. To congure a member as secondary-only, set its members[n].priority (page 562) value to 0. Any member with a members[n].priority (page 562) equal to 0 will never seek election (page 34) and cannot become primary in any situation. For more information on priority levels, see Member Priority (page 34). As an example of modifying member priorities, assume a four-member replica set with member _id values of: 0, 1, 2, and 3. Use the following sequence of operations in the mongo shell to modify member priorities:
cfg = rs.conf() cfg.members[0].priority cfg.members[1].priority cfg.members[2].priority cfg.members[3].priority rs.reconfig(cfg) = = = = 0 0.5 1 2

This sets the following: Member 0 to a priority of 0 so that it can never become primary. Member 1 to a priority of 0.5, which makes it less likely to become primary than other members but doesnt prohibit the possibility. Member 2 to a priority of 1, which is the default value. Member 2 becomes primary if no member with a higher priority is eligible. Member 3 to a priority of 2. Member 3 becomes primary, if eligible, under most circumstances. Note: If your replica set has an even number of members, add an arbiter (page 41) to ensure that members can quickly obtain a majority of votes in an election for primary. See Also: members[n].priority (page 562) and Replica Set Reconguration (page 564). 5.2. Replica Set Administration 39

MongoDB Documentation, Release 2.0.6

Hidden Members Hidden members are part of a replica set but cannot become primary and are invisible to client applications. However, hidden members do vote in elections (page 34). Hidden members are ideal for instances that will have signicantly different usage patterns than the other members and require separation from normal trafc. Typically, hidden members provide reporting, dedicated backups, and dedicated read-only testing and integration support. Hidden members have members[n].priority (page 562) set 0 and have members[n].hidden (page 562) set to true. To congure a hidden member, use the following sequence of operations in the mongo shell:
cfg = rs.conf() cfg.members[0].priority = 0 cfg.members[0].hidden = true rs.reconfig(cfg)

After re-conguring the set, the member with the _id of 0 has a priority of 0 so that it cannot become primary. The other members in the set will not advertise the hidden member in the isMaster or db.isMaster() (page 469) output. Note: You must send the rs.reconfig() (page 478) command to a set member that can become primary. In the above example, if you issue the rs.reconfig() (page 478) operation to the member with the _id of 0, the operation fails. See Also: Replica Set Read Preference (page 51) and Replica Set Reconguration (page 564). Delayed Members Delayed members copy and apply operations from the primarys oplog with a specied delay. If a member has a delay of one hour, then the latest entry in this members oplog will not be more recent than one hour old, and the state of data for the member will reect the state of the set an hour earlier. Example If the current time is 09:52 and the secondary is a delayed by an hour, no operation will be more recent than 08:52. Delayed members may help recover from various kinds of human error. Such errors may include inadvertently deleted databases or botched application upgrades. Consider the following factors when determining the amount of slave delay to apply: Ensure that the length of the delay is equal to or greater than your maintenance windows. The size of the oplog is sufcient to capture more than the number of operations that typically occur in that period of time. For more information on oplog size, see the Oplog (page 36) topic in the Replication Fundamentals (page 33) document. Delayed members must have a priority set to 0 to prevent them from becoming primary in their replica sets. Also these members should be hidden (page 40) to prevent your application from seeing or querying this member. To congure a replica set member with a one hour delay, use the following sequence of operations in the mongo shell:

40

Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

cfg = rs.conf() cfg.members[0].priority = 0 cfg.members[0].slaveDelay = 3600 rs.reconfig(cfg)

After the replica set recongures, the set member with the _id of 0 has a priority of 0 and cannot become primary. The slaveDelay (page 562) value delays both replication and the members oplog by 3600 seconds (1 hour). Setting slaveDelay (page 562) to a non-zero value also sets hidden (page 562) to true for this replica set so that it does not receive application queries in normal operations. Warning: The length of the secondary slaveDelay (page 562) must t within the window of the oplog. If the oplog is shorter than the slaveDelay (page 562) window, the delayed member cannot successfully replicate operations. See Also: members[n].slaveDelay (page 562), Replica Set Reconguration (page 564), Oplog (page 36), Changing Oplog Size (page 44) in this document, and the Change the Size of the Oplog (page 71) tutorial. Arbiters Arbiters are special mongod instances that do not hold a copy of the data and thus cannot become primary. Arbiters exist solely participate in elections (page 34). Note: Because of their minimal system requirements, you may safely deploy an arbiter on a system with another workload such as an application server or monitoring member. Warning: Do not run arbiter processes on a system that is an active primary or secondary of its replica set. Arbiters never receive the contents of any collection but do have the following interactions with the rest of the replica set: Credential exchanges that authenticate the arbiter with the replica set. All MongoDB processes within a replica set use keyles. These exchanges are encrypted. The only cryptographically secure exchange is authentication. Replica set conguration data and voting are not encrypted. If your MongoDB deployment uses SSL, then all communications between arbiters and the other members of the replica set are secure. See the documentation for Using MongoDB with SSL Connections (page 143) for more information. Run all arbiters on secure networks, as with all MongoDB components. To start an arbiter, use the following command:
mongod --replSet [setname]

Replace [setname] with the name of the replica set that the arbiter will join. Then in the mongo shell, while connected to the current primary, issue the following command:
rs.addArb("[hostname]:[port]")

Replace the [hostname]:[port] string with the arbiters hostname and the port. See Also: replSet (page 501), --replSet (page 490), and rs.addArb() (page 476).

5.2. Replica Set Administration

41

MongoDB Documentation, Release 2.0.6

Non-Voting Members You may choose to change the number of votes that each member has in elections (page 34) for primary. In general, all members should have only 1 vote to prevent intermittent ties, deadlock, or the wrong members from becoming primary. Use replica set priorities (page 34) to control which members are more likely to become primary. To disable a members ability to vote in elections, use the following command sequence in the mongo shell.
cfg = rs.conf() cfg.members[3].votes = 0 cfg.members[4].votes = 0 cfg.members[5].votes = 0 rs.reconfig(cfg)

This sequence gives 0 votes to set members with the _id values of 3, 4, and 5. This setting allows the set to elect these members as primary but does not allow them to vote in elections. If you have three non-voting members, you can add three additional voting members to your set. Place voting members so that your designated primary or primaries can reach a majority of votes in the event of a network partition. Note: In general and when possible, all members should have only 1 vote. This prevents intermittent ties, deadlocks, or the wrong members from becoming primary. Use Replica Set Priorities (page 34) to control which members are more likely to become primary. See Also: members[n].votes (page 563) and Replica Set Reconguration (page 564).

5.2.2 Procedures
Adding Members Before adding a new member to an existing replica set, do one of the following to prepare the new members data directory: Make sure the new members data directory does not contain data. The new member will copy the data from an existing member. If the new member is in a recovering state, it must exit and become a secondary before MongoDB can copy all data as part of the replication process. This process takes time but does not require administrator intervention. Manually copy the data directory from an existing member. The new member becomes a secondary member and will catch up to the current state of the replica set after a short interval. Copying the data over manually shortens the amount of time for the new member to become current. Ensure that you can copy the data directory to the new member and begin replication within the window allowed by the oplog (page 36). If the difference in the amount of time between the most recent operation and the most recent operation to the database exceeds the length of the oplog on the existing members, then the new instance will have to completely re-synchronize. Use db.printReplicationInfo() (page 470) to check the current state of replica set members with regards to the oplog. For the procedure to add a member to a replica set, see Add Members to a Replica Set (page 64).

42

Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

Removing Members You may remove a member of a replica at any time. Use the rs.remove() (page 478) function in the mongo shell while connected to the current primary. Issue the db.isMaster() (page 469) command when connected to any member of the set to determine the current primary. Use a command in either of the following forms to remove the member:
rs.remove("mongo2.example.net:27017") rs.remove("mongo3.example.net")

This operation disconnects the shell briey and forces a re-connection as the replica set renegotiates which member will be primary. The shell displays an error even if this command succeeds. You can re-add a removed member to a replica set at any time using the procedure for adding replica set members (page 42). Additionally, consider using the replica set reconguration procedure (page 564) to change the members[n].host (page 561) value to rename a member in a replica set directly. Replacing a Member Use this procedure to replace a member of a replica set when the host name has changed. This procedure preserves all existing conguration for a member, except its hostname/location. You may need to replace a replica set member if you want to replace an existing system and only need to change the hostname rather than completely replace all congured options related to the previous member. Use rs.reconfig() (page 478) to change the value of the members[n].host (page 561) eld to reect the new hostname or port number. rs.reconfig() (page 478) will not change the value of members[n]._id (page 561).
cfg = rs.conf() cfg.members[0].host = "mongo2.example.net:27019" rs.reconfig(cfg)

Warning: Any replica set conguration change can trigger the current primary to step down, which forces an election (page 34). This causes the current shell session, and clients connected to this replica set, to produce an error even when the operation succeeds.

Adjusting Priority To change the value of the members[n].priority (page 562) value in the replica set conguration, use the following sequence of commands in the mongo shell:
cfg = rs.conf() cfg.members[0].priority = 0.5 cfg.members[1].priority = 2 cfg.members[2].priority = 2 rs.reconfig(cfg)

The rst operation uses rs.conf() (page 477) to set the local variable cfg to the contents of the current replica set conguration, which is a document. The next three operations change the members[n].priority (page 562) value in the cfg document for members[n]._id (page 561) of 0, 1, or 2. The nal operation calls rs.reconfig() (page 478) with the argument of cfg to initialize the new conguration. If a member has members[n].priority (page 562) set to 0, it is ineligible to become primary and will not seek election. Hidden members (page 40), delayed members (page 40), and arbiters (page 41) all have members[n].priority (page 562) set to 0. 5.2. Replica Set Administration 43

MongoDB Documentation, Release 2.0.6

All members have a members[n].priority (page 562) equal to 1 by default. The value of members[n].priority (page 562) can be any oating point (i.e. decimal) number between 0 and 1000. Priorities are only used to determine the preference in election. The priority value is used only in relation to other members. With the exception of members with a priority of 0, the absolute value of the members[n].priority (page 562) value is irrelevant. Replica sets will preferentially elect and maintain the primary status of the member with the highest members[n].priority (page 562) setting. Warning: Replica set reconguration can force the current primary to step down, leading to an election for primary in the replica set. Elections cause the current primary to close all open client connections. Perform routine replica set reconguration during scheduled maintenance windows. See Also: The Replica Reconguration Usage (page 564) example revolves around changing the priorities of the members of a replica set. Changing Oplog Size The following is an overview of the procedure for changing the size of the oplog: 1. Shut down the current primary instance in the replica set and then restart it on a different port and in standalone mode. 2. Create a backup of the old (current) oplog. This is optional. 3. Save the last entry from the old oplog. 4. Drop the old oplog. 5. Create a new oplog of a different size. 6. Insert the previously saved last entry from the old oplog into the new oplog. 7. Restart the server as a member of the replica set on its usual port. 8. Apply this procedure to any other member of the replica set that could become primary. For a detailed procedure, see Change the Size of the Oplog (page 71).

5.2.3 Security
In most cases, the most effective ways to control access and to secure the connection between members of a replica set depend on network-level access control. Use your environments rewall and network routing to ensure that trafc only from clients and other replica set members can reach your mongod instances. If needed, use virtual private networks (VPNs) to ensure secure connections over wide area networks (WANs.) Additionally, MongoDB provides an authentication mechanism for mongod and mongos instances connecting to replica sets. These instances enable authentication but specify a shared key le that serves as a shared password. New in version 1.8: for replica sets (1.9.1 for sharded replica sets) added support for authentication. To enable authentication add the following option to your conguration le:
keyFile = /srv/mongodb/keyfile

Note: You may chose to set these run-time conguration options using the --keyFile (page 486) (or mongos --keyFile (page 493)) options on the command line.

44

Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

Setting keyFile (page 496) enables authentication and species a key le for the replica set members to use when authenticating to each other. The content of the key le is arbitrary but must be the same on all members of the replica set and on all mongos instances that connect to the set. The key le must be less one kilobyte in size and may only contain characters in the base64 set. The key le must not have group or world permissions on UNIX systems. Use the following command to use the OpenSSL package to generate random content for use in a key le:
openssl rand -base64 753

Note: Key le permissions are not checked on Windows systems.

5.2.4 Troubleshooting
This section denes reasonable troubleshooting processes for common operational challenges. While there is no single causes or guaranteed response strategies for any of these symptoms, the following sections provide good places to start a troubleshooting investigation with replica sets. See Also: Monitoring Database Systems (page 146). Replication Lag Replication lag is a delay between an operation on the primary and the application of that operation from the oplog to the secondary. Such lag can be a signicant issue and can seriously affect MongoDB replica set deployments. Excessive replication lag makes lagged members ineligible to quickly become primary and increases the possibility that distributed read operations will be inconsistent. Identify replication lag by checking the value of members[n].optimeDate for each member of the replica set using the rs.status() (page 479) function in the mongo shell. Also, you can monitor how fast replication occurs by watching the oplog time in the replica graph in the MongoDB Monitoring Service. Also see the documentation for MMS. Possible causes of replication lag include: Network Latency Check the network routes between the members of your set to ensure that there is no packet loss or network routing issue. Use tools including ping to test latency between set members and traceroute to expose the routing of packets network endpoints. Disk Throughput If the le system and disk device on the secondary is unable to ush data to disk as quickly as the primary, then the secondary will have difculty keeping state. Disk-related issues are incredibly prevalent on multitenant systems, including vitalized instances, and can be transient if the system accesses disk devices over an IP network (as is the case with Amazons EBS system.) Use system-level tools to assess disk status, including iostat or vmstat. Concurrency

5.2. Replica Set Administration

45

MongoDB Documentation, Release 2.0.6

In some cases, long-running operations on the primary can block replication on secondaries. You can use write concern to prevent write operations from returning if replication cannot keep up with the write load. Use the database proler to see if there are slow queries or long-running operations that correspond to the incidences of lag. Appropriate Write Concern If you are performing a large data ingestion or bulk load operation that requires a large number of writes to the primary, the secondaries will not be able to read the oplog fast enough to keep up with changes. Setting some level write concern (page 50), can slow the overall progress of the batch, but will prevent the secondary from falling too far behind. To prevent this, use write concern so that MongoDB will perform a safe write (i.e. call getLastError) after every 100, 1,000, or other designated number of operations. This provides an opportunity for secondaries to catch up with the primary. Using safe writes, even in batches, can impact write throughout; however, calling getLastError will prevents the secondaries from falling too far behind the primary. For more information see: Write Concern (page 50). The Oplog (page 36) topic in the Replication Fundamentals (page 33) document. The Oplog (page 57) topic in the Replication Internals (page 57) document. The Changing Oplog Size (page 44) topic this document. The Change the Size of the Oplog (page 71) tutorial. Failover and Recovery Replica sets feature automated failover. If the primary goes ofine or becomes unresponsive and a majority of the original set members can still connect to each other, the set will elect a new primary. While failover is automatic, replica set administrators should still understand exactly how this process works. This section below describe failover in detail. In most cases, failover occurs without administrator intervention seconds after the primary either steps down, becomes inaccessible, or becomes otherwise ineligible to act as primary. If your MongoDB deployment does not failover according to expectations, consider the following operational errors: No remaining member is able to form a majority. This can happen as a result of network partitions that render some members inaccessible. Design your deployment to ensure that a majority of set members can elect a primary in the same facility as core application systems. No member is eligible to become primary. Members must have a members[n].priority (page 562) setting greater than 0, have a state that is less than ten seconds behind the last operation to the replica set, and generally be more up to date than the voting members. In many senses, rollbacks (page 35) represent a graceful recovery from an impossible failover and recovery situation. Rollbacks occur when a primary accepts writes that other members of the set do not successfully replicate before the primary steps down. When the former primary begins replicating again it performs a rollback. Rollbacks remove those operations from the instance that were never replicated to the set so that the data set is in a consistent state. The mongod program writes rolled back data to a BSON le that you can view using bsondump, applied manually using mongorestore. You can prevent rollbacks by ensuring safe writes by using the appropriate write concern. See Also:

46

Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

The Elections (page 34) topic in the Replication Fundamentals (page 33) document, and the Elections (page 58) topic in the Replication Internals (page 57) document.

5.3 Replication Architectures


There is no single ideal replica set architecture for every deployment or environment. Indeed the exibility of replica sets might be their greatest strength. This document describes the most commonly used deployment patterns for replica sets. The descriptions are necessarily not mutually exclusive, and you can combine features of each architecture in your own deployment. See Also: Replica Set Administration (page 38) and Replica Set Conguration (page 561).

5.3.1 Three Member Sets


The minimum recommended architecture for a replica set consists of: One primary and Two secondary members, either of which can become the primary at any time. This makes failover (page 34) possible and ensures there exists two full and independent copies of the data set at all times. If the primary fails, the replica set elects another member as primary and continues replication until the primary recovers. Note: While not recommended, the minimum supported conguration for replica sets includes one primary, one secondary, and one arbiter (page 41). The arbiter requires fewer resources and lowers costs but sacrices operational exibility and redundancy. See Also: Deploy a Replica Set (page 61).

5.3.2 Sets with Four or More Members


To increase redundancy or to provide additional resources for distributing secondary read operations, you can add additional members to a replica set. When adding additional members, ensure the following architectural conditions are true: The set has an odd number of voting members. If you have an even number of voting members, deploy an arbiter (page 41) to create an odd number. The set has no more than 7 voting members at a time. Members that cannot function as primaries in a failover have their priority (page 562) values set to 0. If a member cannot function as a primary because of resource or network latency constraints a priority (page 562) value of 0 prevents it from being a primary. Any member with a priority value greater than 0 is available to be a primary. A majority of the sets members operate in the main data center. See Also: Add Members to a Replica Set (page 64). 5.3. Replication Architectures 47

MongoDB Documentation, Release 2.0.6

5.3.3 Geographically Distributed Sets


A geographically distributed replica set provides data recovery should one data center fail. These sets include at least one member in a secondary data center. The member has its the priority (page 562) set (page 564) to 0 to prevent the member from ever becoming primary. In many circumstances, these deployments consist of the following: One primary in the rst (i.e., primary) data center. One secondary member in the primary data center. This member can become the primary member at any time. One secondary member in a secondary data center. This member is ineligible to become primary. Set its members[n].priority (page 562) to 0. If the primary is unavailable, the replica set will elect a new primary from the primary data center. If the connection between the primary and secondary data centers fails, the member in the secondary center cannot independently become the primary. If the primary data center fails, you can manually recover the data set from the secondary data center. With proper write concern there will be no data loss and downtime can be minimal. When you add a secondary data center, make sure to keep an odd number of members overall to prevent ties during elections for primary by deploying an arbiter (page 41) in your primary data center. For example, if you have three members in the primary data center and add a member in a secondary center, you create an even number. To create an odd number and prevent ties, deploy an arbiter (page 41) in your primary data center. See Also: Deploy a Geographically Distributed Replica Set (page 66)

5.3.4 Non-Production Members


In some cases it may be useful to maintain a member that has an always up-to-date copy of the entire data set but that cannot become primary. You might create such a member to provide backups, to support reporting operations, or to act as a cold standby. Such members fall into one or more of the following categories: Low-Priority: These members have members[n].priority (page 562) settings such that they are either unable to become primary or very unlikely to become primary. In all other respects these low-priority members are identical to other replica set member. (See: Secondary-Only Members (page 39).) Hidden: These members cannot become primary and the set excludes them from the output of db.isMaster() (page 469) and from the output of the database command isMaster. Excluding hidden members from such outputs prevents clients and drivers from using hidden members for secondary reads. (See: Hidden Members (page 40).) Voting: This changes the number of votes that a member of the replica set has in elections. In general, use priority to control the outcome of elections, as weighting votes introduces operational complexities and risks. Only modify the number of votes when you need to have more than 7 members in a replica set. (See: Non-Voting Members (page 42).) Note: All members of a replica set vote in elections except for non-voting (page 42) members. Priority, hidden, or delayed status does not affect a members ability to vote in an election.

48

Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

Backups For some deployments, keeping a replica set member for dedicated backup purposes is operationally advantageous. Ensure this member is close, from a networking perspective, to the primary or likely primary. Ensure that the replication lag is minimal or non-existent. To create a dedicated hidden member (page 40) for the purpose of creating backups. If this member runs with journaling enabled, you can safely use standard block level backup methods (page 157) to create a backup of this member. Otherwise, if your underlying system does not support snapshots, you can connect mongodump to create a backup directly from the secondary member. In these cases, use the --oplog (page 507) option to ensure a consistent point-in-time dump of the database state. See Also: Backup and Restoration Strategies (page 156). Delayed Replication Delayed members are special mongod instances in a replica set that apply operations from the oplog on a delay to provide a running historical snapshot of the data set, or a rolling backup. Typically these members provide protection against human error, such as unintentionally deleted databases and collections or failed application upgrades or migrations. Otherwise, delayed member function identically to secondary members, with the following operational differences: they are not eligible for election to primary and do not receive secondary queries. Delayed members do vote in elections for primary. See Replica Set Delayed Nodes (page 40) for more information about conguring delayed replica set members. Reporting Typically hidden members provide a substrate for reporting purposes, because the replica set segregates these instances from the cluster. Since no secondary reads reach hidden members, they receive no trafc beyond what replication requires. While hidden members are not electable as primary, they are still able to vote in elections for primary. If your operational parameters requires this kind of reporting functionality, see Hidden Replica Set Nodes (page 40) and members[n].hidden (page 562) for more information regarding this functionality. Cold Standbys For some sets, it may not be possible to initialize a new members in a reasonable amount of time. In these situations, it may be useful to maintain a secondary member with an up-to-date copy for the purpose of replacing another member in the replica set. In most cases, these members can be ordinary members of the replica set, but in large sets, with varied hardware availability, or given some patterns of geographical distribution (page 48), you may want to use a member with a different priority, hidden, or voting status. Cold standbys may be valuable when your primary and hot standby secondaries members have a different hardware specication or connect via a different network than the main set. In these cases, deploy members with priority equal to 0 to ensure that they will never become primary. These members will vote in elections for primary but will never be eligible for election to primary. Consider likely failover scenarios, such as inter-site network partitions, and ensure there will be members eligible for election as primary and a quorum of voting members in the main facility. Note: If your set already has 7 members, set the members[n].votes (page 563) value to 0 for these members, so that they wont vote in elections.

5.3. Replication Architectures

49

MongoDB Documentation, Release 2.0.6

See Also: Secondary Only (page 39), and Hidden Nodes (page 40).

5.3.5 Arbiters
Always deploy an arbiter to ensure that a replica set will have a sufcient number of members to elect a primary. While having replica sets with 2 members is not recommended for production environments, in these circumstances, and any replica set with an even number of members, deploy an arbiter. To add an arbiter, while connected to the current primary in the mongo shell, issue the following command:
rs.addArb("[hostname]:[port]")

Because arbiters do not hold a copy of the data, they have minimal resource requirements and do not require dedicated hardware. Do not add an arbiter to a set if you have an odd number of voting members that hold data, to prevent tied elections. See Also: Arbiters (page 41), replSet (page 501), mongod --replSet (page 490), and rs.addArb() (page 476).

5.4 Application Development with Replica Sets


From the perspective of a client application, whether a MongoDB instance is running as a single server (i.e. standalone) or a replica set is transparent. However, replica sets offer some conguration options for write and read operations. 2 This document describes those options and their implications.

5.4.1 Write Concern


When a client sends a write operation to a database server, the operation returns without waiting for the operation to succeed or complete by default. To check if write operations have succeeded, use the getLastError command. getLastError supports the following options that allow you to tailor the level of write concern provided by the commands return or acknowledgment: no options. Conrms that the mongod instance received the write operations. When your application receives this response, the mongod instance has committed the write operation to the in-memory representation of the database. This provides a simple and low-latency level of write concern and will allow your application to detect situations where the mongod instance becomes inaccessible or insertion errors caused by duplicate key errors (page 182). j or journal option. Conrms the above, and that the mongod instance has written the data to the on-disk journal. This ensures that the data is durable if mongod or the server itself crashes or shuts down unexpectedly. fsync option. Deprecated since version 1.8. Do not use the fsync option. Conrms that the mongod has ushed the in-memory representation of the data to the disk. Instead, use the j option to ensure durability. w option. Only use with replica sets. Conrms that the write operation has replicated to the specied number of replica set members. You may specify a specic number of servers or specify majority to ensure that the write propagates to a majority of set members. The default value of w is 1.
2

Shard clusters where the shards are also replica sets provide the same conguration options with regards to write and read operations.

50

Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

You may combine multiple options, such as j and w, into a single getLastError operation. Many drivers have a safe mode or write concern that automatically issues getLastError after write operations to ensure the operations complete. Safe mode provides conrmation of write operations, but safe writes can take longer to return and are not required in all applications. Consider the following operations:
db.runCommand( { getLastError: 1, w: "majority" } ) db.getLastErrorObj("majority")

These equivalent getLastError operations ensure that write operations return only after a write operation has replicated to a majority of the members of the set. Note: If you specify a w value greater than the number of available non-arbiter replica set members, the operation will block until those members become available. This could cause the operation to block forever. To specify a timeout threshold for the getLastError operation, use the wtimeout argument. You can also congure your own default getLastError behavior for the replica set. Use the settings.getLastErrorDefaults (page 563) setting in the replica set conguration (page 561). For instance:
cfg = rs.conf() cfg.settings.getLastErrorDefaults = {w: "majority", j: true} rs.reconfig(cfg)

When the new conguration is active, the getLastError operation waits for the write operation to complete on a majority of the set members before returning. Specifying j: true makes getLastError wait for a complete commit of the operations to the journal before returning. The getLastErrorDefaults setting only affects getLastError commands with no other arguments. Note: Use of inappropriate write concern can lead to rollbacks (page 35) in the case of replica set failover (page 34). Always ensure that your operations have specied the required write concern for your application.

5.4.2 Read Preference


Read preference describes how MongoDB clients route read operations to secondary members of a replica set. Background By default, an application directs its read operations to the primary member in a replica set. Reading from the primary guarantees that read operations reect the latest version of a document. However, for an application that does not require fully up-to-date data, you can improve read throughput by distributing some or all reads to secondary members of the replica set. The following are use cases where you might use secondary reads: Running systems operations that do not affect the front-end application, operations such as backups and reports. Providing low-latency queries for geographically distributed deployments. If one secondary is closer to an application server than the primary, you may see better performance for that application if you use secondary reads. Providing graceful degradation in failover (page 34) situations where a set has no primary for 10 seconds or more. In this use case, you should give the application the primaryPreferred (page 52) read preference, which prevents the application from performing reads if the set has no primary.

5.4. Application Development with Replica Sets

51

MongoDB Documentation, Release 2.0.6

MongoDB drivers allow client applications to congure a read preference on a per-connection, per-collection, or peroperation basis. For more information about secondary read operations in the mongo shell, see the rs.slaveOk() (page 479) method. For more information about a drivers read preference conguration, see the appropriate Drivers (page 225) API documentation. Note: Read preferences affect how an application selects which member to use for read operations. As a result read preferences dictate if the application receives stale or current data from MongoDB. Use appropriate write concern (page 50) policies to ensure proper data replication and constancy. If read operations account for a large percentage of your applications trafc, distributing reads to secondary members can improve read throughput. However, in most cases sharding (page 93) provides better support for larger scale operations, as shard clusters can distribute read and write operations across a group of machines.

Read Preference Modes New in version 2.2. MongoDB drivers drivers (page 225) support ve read preference modes: primary (page 52) primaryPreferred (page 52) secondary (page 53) secondaryPreferred (page 53) nearest (page 53) You can specify a read preference mode on a per-collection or per-operation basis. The syntax for specifying the read preference mode is specic to the driver and to the idioms of the host language. Read preference modes are also available to clients connecting to a shard cluster through a mongos. The mongos instance obeys specied read preferences when connecting to the replica set that provides each shard in the cluster. In the mongo shell, use the readPref() (page 451) cursor method provides access to read preferences. Warning: All read preference modes except primary (page 52) may return stale data as secondaries replicate operations from the primary with some delay. Ensure that your application can tolerate stale data if you choose to use a non-primary (page 52) mode. For more information, see read preference background (page 51) and read preference behavior (page 54). See also the documentation for your driver. primary All read operations use only the current replica set primary. This is the default. If the primary is unavailable, read operations produce an error or throw an exception. primary (page 52) read preference modes are not compatible with read preferences mode that use tag sets (page 53). If you specify a tag set with primary (page 52), the driver produces an error. primaryPreferred In most situations, operations read from the primary member of the set. However, if the primary is unavailable, as is the case during failover situations, operations read from secondary members. When the read preference includes a tag set (page 53), the client reads rst from the primary, if available, and then from secondaries that match the specied tags. If no secondaries have matching tags, the read operation produces an error. Since the application may receive data from a secondary, read operations using the primaryPreferred (page 52) mode may return stale data in some situations. 52 Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

Warning: Changed in version 2.2: mongos added full support for read preferences. When connecting to a mongos instance older than 2.2, using a client that supports read preference modes, primaryPreferred (page 52) will send queries to secondaries. secondary Operations read only from the secondary members of the set. If no secondaries are available, then this read operation produces an error or exception. Most sets have at least one secondary, but there are situations where there may be no available secondary. For example, a set with a primary, a secondary, and an arbiter may not have any secondaries if a member is in recovering state or unavailable. When the read preference includes a tag set (page 53), the client attempts to nd secondary members that match the specied tag set and directs reads to a random secondary from among the nearest group (page 55). If no secondaries have matching tags, the read operation produces an error. 3 Read operations using the secondary (page 53) mode may return stale data. secondaryPreferred In most situations, operations read from secondary members, but in situations where the set consists of a single primary (and no other members,) the read operation will use the sets primary. When the read preference includes a tag set (page 53), the client attempts to nd a secondary member that matches the specied tag set and directs reads to a random secondary from among the nearest group (page 55). If no secondaries have matching tags, the read operation produces an error. Read operations using the secondaryPreferred (page 53) mode may return stale data. nearest The driver reads from the nearest member of the set according to the member selection (page 55) process. Reads in the nearest (page 53) mode do not consider the members type. Reads in nearest (page 53) mode may read from both primaries and secondaries. Set this mode to minimize the effect of network latency on read operations without preference for current or stale data. If you specify a tag set (page 53), the client attempts to nd a secondary member that matches the specied tag set and directs reads to a random secondary from among the nearest group (page 55). Read operations using the nearest (page 53) mode may return stale data. Note: All operations read from a member of the nearest group of the replica set that matches the specied read preference mode. The nearest (page 53) mode prefers low latency reads over a members primary or secondary status. For nearest (page 53), the client assembles a list of acceptble hosts based on tag set and then narrows that list to the host with the shortest ping time and all other members of the set that are within the local threshold, or acceptable latency. See Member Selection (page 55) for more information.

Tag Sets Tag sets allow you to specify custom read preferences (page 51) so that your application can target read operations to specic members, based on custom parameters. A tag set for a read operation may resemble the following:
3 If your set has more than one secondary, and you use the secondary (page 53) read preference mode, consider the following effect. If you have a three member replica set (page 47) with a primary and two secondaries, and if one secondary becomes unavailable, all secondary (page 53) queries must target the remaining secondary. This will double the load on this secondary. Plan and provide capacity to support this as needed.

5.4. Application Development with Replica Sets

53

MongoDB Documentation, Release 2.0.6

{ "disk": "ssd", "use": "reporting" }

To fulll the request, a member would need to have both of these tag sets. Therefore the following tag sets, would satisfy this requirement:
{ { { { "disk": "disk": "disk": "disk": "ssd", "ssd", "ssd", "ssd", "use": "use": "use": "use": "reporting" } "reporting", "rack": 1 } "reporting", "rack": 4 } "reporting", "mem": "64"}

However, the following tag sets would not be able to fulll this query:
{ { { { { "disk": "ssd" } "use": "reporting" } "disk": "ssd", "use": "production" } "disk": "ssd", "use": "production", "rack": 3 } "disk": "spinning", "use": "reporting", "mem": "32" }

Therefore, tag sets make it possible to ensure that read operations target specic members in a particular data center or mongod instances designated for a particular class of operations, such as reporting or analytics. For information on conguring tag sets, see Tag Sets (page 565) in the Replica Set Conguration (page 561) document. You can specify tag sets with the following read preference modes: primaryPreferred (page 52) secondary (page 53) secondaryPreferred (page 53) nearest (page 53) You cannot specify tag sets with the primary (page 52) read preference mode. Tags are not compatible with primary (page 52) and only apply when selecting (page 55) a secondary member of a set for a read operation. However, the nearest (page 53) read mode, when combined with a tag set will select the nearest member that matches the specied tag set, which may be a primary or secondary. All interfaces use the same member selection logic (page 55) to choose the member to which to direct read operations, basing the choice on read preference mode and tag sets. For more information on how read preferences modes (page 52) interact with tag sets, see the documentation for each read preference mode. Behavior Changed in version 2.2.
Auto-Retry

Connection between MongoDB drivers and mongod instances in a replica set must balance two concerns: 1. The client should attempt to prefer current results, and any connection should read from the same member of the replica set as much as possible. 2. The client should minimize the amount of time that the database is inaccessible as the result of a connection issue, networking problem, or failover in a replica set. As a result, MongoDB drivers and mongos:

54

Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

Reuse a connection to specic mongod for as long as possible after establishing a connection to that instance. This connection is pinned to this mongod. Attempt to reconnect to a new member, obeying existing read preference modes (page 52), if connection to mongod is lost. Reconnections are transparent to the application itself. If the connection permits reads from secondary members, after reconnecting, the application can receive two sequential reads returning from different secondaries. Depending on the state of the individual secondary members replication, the documents can reect the state of your database at different moments. Return an error only after attempting to connect to three members of the set that match the read preference mode (page 52) and tag set (page 53). If there are fewer than three members of the set, the client will error after connecting to all existing members of the set. After this error, the driver selects a new member using the specied read preference mode. In the absence of a specied read preference, the driver uses PRIMARY. After detecting a failover situation, possible.
Request Association
4

the driver attempts to refresh the state of the replica set as quickly as

Reads from secondary may reect the state of the data set at different points in time because secondary members of a replica set may lag behind the current state of the primary by different amounts. To prevent subsequent reads from jumping around in time, the driver can associate application threads to a specic member of the set after the rst read. The thread will continue to read from the same member until: The application performs a read with a different read preference. The thread terminates. The client receives a socket exception, as is the case when theres a network error or when the mongod closes connections during a failover. This triggers a retry (page 54), which may be transparent to the application. If an application thread issues a query with the primaryPreferred (page 52) mode while the primary is inaccessible, the thread will carry the association with that secondary for the lifetime of the thread. The thread will associate with the primary, if available, only after issuing a query with a different read preference, even if a primary becomes available. By extension, if a thread issues a read with the secondaryPreferred (page 53) when all secondaries are down, it will carry an association with the primary. This application thread will continue to read from the primary even if a secondary becomes available later in the threads lifetime.
Member Selection

Both clients, by way of their drivers, and mongos instances for shard clusters send periodic ping, messages to all member of the replica set to determine latency from the application to each mongod instance. For any operation that targets a member other than the primary, the driver: 1. Assembles a list of suitable members, taking into account member type (i.e. secondary, primary, or all members.) 2. Determines which suitable member is the closest to the client in absolute terms. 3. Builds a list of members that are within a dened ping distance (in milliseconds) of the absolute nearest member. 5
4 When a failover occurs, all members of the set close all client connections that produce a socket error in the driver. This behavior prevents or minimizes rollback. 5 Applications can congure the threshold used in this stage. The default acceptable latency is 15 milliseconds. For mongos you can use the --localThreshold (page 494) or localThreshold (page 503) runtime options to set this value.

5.4. Application Development with Replica Sets

55

MongoDB Documentation, Release 2.0.6

4. Selects a member from these hosts at random. The member receives the read operation. Once the application selects a member of the set to use for read operations, the driver continues to use this connection for read preference until the application species a new read preference or something interrupts the connection. See Request Association (page 55) for more information.
Sharding and mongos

Changed in version 2.2: Before version 2.2, mongos did not support the read preference mode semantics (page 52). In most shard clusters, a replica set provides each shard where read preferences are also applicable. Read operations in a shard cluster, with regard to read preference, are identical to unsharded replica sets. Unlike simple replica sets, in shard clusters, all interactions with the shards pass from the clients to the mongos instances that are actually connected to the set members. mongos is responsible for the application of the read preferences, which is transparent to applications. There are no conguration changes required for full support of read preference modes in sharded environments, as long as the mongos is at least version 2.2. All mongos maintain their own connection pool to the replica set members. As a result: A request without a specied preference has primary (page 52), the default, unless, the mongos reuses an existing connection that has a different mode set. Always explicitly set your read preference mode to prevent confusion. All nearest (page 53) and latency calculations reect the connection between the mongos and the mongod instances, not the client and the mongod instances. This produces the desired result, because all results must pass through the mongos before returning to the client.
Database Commands

Because some database commands read and return data from the database, all of the ofcial drivers support full read preference mode semantics (page 52) for the following commands: group mapReduce 6 aggregate collStats dbStats count distinct geoNear geoSearch geoWalk
6 Only inline mapReduce operations that do not write data support read preference, otherwise these operations must run on the primary members.

56

Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

Uses for non-Primary Read Preferences You must exercise care when specifying read preference: modes other than primary (page 52) can and will return stale data. These secondary queries will not include most recent write operations to the replica sets primary. Nevertheless, there are several common use cases for using non-primary (page 52) read preference modes: Reporting and analytics workloads. Having these queries target a secondary helps distribute load and prevent these operations from affecting the primary workload of the primary. Also consider using secondary (page 53) in conjunction with a direct connection to a hidden member (page 40) of the set. Providing local reads for geographically distributed applications. If you have application servers in multiple data centers, you may consider having a geographically distributed replica set (page 48) and using a non primary read preference or the nearest (page 53) to avoid network latency. Using read modes other than primary (page 52) and primaryPreferred (page 52) to provide extra capacity is not in and of itself justication for non-primary (page 52) in many cases. Furthermore, sharding (page 91) increases read and write capacity by distributing read and write operations across a group of machines.

5.5 Replication Internals


5.5.1 Synopsis
This document provides a more in-depth explanation of the internals and operation of replica set features. This material is not necessary for normal operation or application development but may be useful for troubleshooting and for further understanding MongoDBs behavior and approach.

5.5.2 Oplog
For an explanation of the oplog, see the Oplog (page 36) topic in the Replication Fundamentals (page 33) document. Under various exceptional situations, updates to a secondarys oplog might lag behind the desired performance time. See Replication Lag (page 45) for details. All members of a replica set send heartbeats (pings) to all other members in the set and can import operations to the local oplog from any other member in the set. Replica set oplog operations are idempotent. The following operations require idempotency: initial sync post-rollback catch-up sharding chunk migrations

5.5.3 Data Integrity


Read Preferences MongoDB uses single-master replication to ensure that the database remains consistent. However, clients may modify the read preferences (page 51) on a per-connection basis in order to distribute read operations to the secondary

5.5. Replication Internals

57

MongoDB Documentation, Release 2.0.6

members of a replica set. Read-heavy deployments may achieve greater query throughput by distributing reads to secondary members. But keep in mind that replication is asynchronous; therefore, reads from secondaries may not always reect the latest writes to the primary. See Also: Consistency (page 34) Note: Use db.getReplicationInfo() (page 469) from a secondary member and the replication status (page 566) output to asses the current state of replication and determine if there is any unintended replication delay.

5.5.4 Member Congurations


Replica sets can include members with the following four special congurations that affect membership behavior: Secondary-only (page 39) members have their priority (page 562) values set to 0 and thus are not eligible for election as primaries. Hidden (page 40) members do not appear in the output of db.isMaster() (page 469). This prevents clients from discovering and potentially querying the member in question. Delayed (page 40) members lag a xed period of time behind the primary. These members are typically used for disaster recovery scenarios. For example, if an administrator mistakenly truncates a collection, and you discover the mistake within the lag window, then you can manually fail over to the delayed member. Arbiters (page 41) exist solely to participate in elections. They do not replicate data from the primary. In almost every case, replica sets simplify the process of administering database replication. However, replica sets still have a unique set of administrative requirements and concerns. Choosing the right system architecture (page 47) for your data set is crucial. See Also: The Member Congurations (page 39) topic in the Replica Set Administration (page 38) document.

5.5.5 Security
Administrators of replica sets also have unique monitoring (page 151) and security (page 44) concerns. The replica set functions in the mongo shell, provide the tools necessary for replica set administration. In particular use the rs.conf() (page 477) to return a document that holds the replica set conguration (page 561) and use rs.reconfig() (page 478) to modify the conguration of an existing replica set.

5.5.6 Elections
Elections are the process replica set members use to select which member should become primary. A primary is the only member in the replica set that can accept write operations, including insert() (page 460), update() (page 464), and remove() (page 462). The following events can trigger an election: You initialize a replica set for the rst time. A primary steps down. A primary will step down in response to the replSetStepDown (page 441) command or if it sees that one of the current secondaries is eligible for election and has a higher priority. A primary also will step down when it cannot contact a majority of the members of the replica set. When the current

58

Chapter 5. Documentation

MongoDB Documentation, Release 2.0.6

primary steps down, it closes all open client connections to prevent clients from unknowingly writing data to a non-primary member. A secondary member loses contact with a primary. A secondary will call for an election if it cannot establish a connection to a primary. A failover occurs. In an election, all members have one vote, including hidden (page 40) members, arbiters (page 41), and even recovering members. Any mongod can veto an election. In the default conguration, all members have an equal chance of becoming primary; however, its possible to set priority (page 562) values that weight the election. In some architectures, there may be operational reasons for increasing the likelihood of a specic replica set member becoming primary. For instance, a member located in a remote data center should not become primary. See: Member Priority (page 34) for more information. Any member of a replica set can veto an election, even if the member is a non-voting member (page 42). A member of the set will veto an election under the following conditions: If the member seeking an election is not a member of the voters set. If the member seeking an election is not up-to-date with the most recent operation accessible in the replica set. If the member seeking an election has a lower priority than another member in the set that is also eligible for election. If the current primary member has more recent operations (i.e. a higher optime) than the member seeking election, from the perspective of the voting member. The current primary will veto an election if it has the same or more recent operations (i.e. a higher or equal optime) than the member seeking election. The rst member to receive votes from a majority of members in a set becomes the next primary until the next election. Be aware of the following conditions and possible situations: Replica set members send heartbeats (pings) to each other every 2 seconds. If a heartbeat does not return for more than 10 seconds, the other members mark the delinquent member as inaccessible. Replica set members compare priorities only with other members of the set. The absolute value of priorities does not have any impact on the outcome of replica set elections, with the exception of the value 0, which indicates the member cannot become primary and cannot seek election. For details, see Adjusting Priority (page 43). A replica set member cannot become primary unless it has the highest optime of any visible member in the set. If the member of the set with the highest priority is within 10 seconds of the latest oplog entry, then the set will not elect a primary until the member with the highest priority catches up to the latest operation. See Also: Non-voting members in a replica set (page 42), Adjusting Priority (page 43), and replica configuration (page 563). Elections and Network Partitions Members on either side of a network partition cannot see each other when determining whether a majority is available to hold an election. That means that if a primary steps down and neither side of the partition has a majority on its own, the set will not elect a new primary and the set will become read only. The best practice is to have and a majority of servers in one data center and one server in another.

5.5. Replication Internals

59

MongoDB Documentation, Release 2.0.6

5.5.7 Syncing
In order to remain up-to-date with the current state of the replica set, set members sync, or copy, oplog entries from other members. When a new member joins a set or an existing member restarts, the member waits to receive heartbeats from other members. By default, the member syncs from the the closest member of the set that is either the primary or another secondary with more recent oplog entries. This prevents two secondaries from syncing from each other. In version 2.0, secondaries only change sync targets if the connection between secondaries drops or produces an error. For example: 1. If you have two secondary members in one data center and a primary in a second facility, and if you start all three instances at roughly the same time (i.e. with no existing data sets or oplog,) both secondaries will likely sync from the primary, as neither secondary has more recent oplog entries. If you restart one of the secondaries, then when it rejoins the set it will likely begin syncing from the other secondary, because of proximity. 2. If you have a primary in one facility and a secondary in an alternate facility, and if you add another secondary to the alternate facility, the new secondary will likely sync from the existing secondary because it is closer than the primary.

60

Chapter 5. Documentation

CHAPTER

SIX

TUTORIALS
The following tutorials describe certain replica set maintenance operations in detail:

6.1 Deploy a Replica Set


This tutorial describes the process for deploying a three member replica set , with specic instructions for development and test as well as for production systems. See Also: Replication Fundamentals (page 33) and Replication Architectures (page 47) for appropriate background.

6.1.1 Overview
Three member replica sets provide enough redundancy to survive most network partitions and other system failures. Additionally, these sets have sufcient capacity for many distributed read operations. Most deployments require no additional nodes or conguration.

6.1.2 Requirements
Three distinct systems, so that each system can run its own instance of mongod. For development systems you may run all three instances of the mongod process on a local system. (e.g. a laptop) or within a virtual instance. For production environments, you should endeavor to maintain as much separation between the nodes. For example, when using VMs in Production, each node should live on separate host servers, served by redundant power circuits, and with redundant network paths.

6.1.3 Procedure
This procedure assumes that you already have instances of MongoDB installed on all systems that you hope to have as members of your replica set. If you have not already installed MongoDB, see the installation tutorials (page 9). Development and Test Replica Set Begin by starting three instances of mongod. For ephemeral tests and the purposes of this guide, you may run the mongod instances in separate windows of GNU Screen. OS X and most Linux distributions come with screen installed by default 1 systems.
1

GNU Screen is packaged as screen on Debian-based, Fedora/Red Hat-based, and Arch Linux.

61

MongoDB Documentation, Release 2.0.6

Issue the following command to create the necessary data directories:


mkdir -p /srv/mongodb/rs0-0 /srv/mongodb/rs0-1 /srv/mongodb/rs0-2

Issue the following commands, each in a distinct screen window:


mongod --port 27017 --dbpath /srv/mongodb/rs0-0 --replSet rs0 mongod --port 27018 --dbpath /srv/mongodb/rs0-1 --replSet rs0 mongod --port 27019 --dbpath /srv/mongodb/rs0-2 --replSet rs0

These command start members of a replica set named rs0, each running on a distinct port. Alternatively, if you are already using these ports, you can select different ports. See the documentation of the following options for more information: --port (page 485), --dbpath (page 487), and --replSet (page 490). Connect to the mongod instance with the mongo shell to the rst host. If youre running this command remotely, replace localhost with the appropriate hostname. In a new shell session, enter the following:
mongo localhost:27017

Issue the following shell function to initiate a replica set consisting of the current node, using the default conguration:
rs.initiate()

Use the following shell function to display the current replica conguration (page 561):
rs.conf()

Now, issue the following sequence of commands to add two nodes to the replica set.
rs.add("localhost:27018") rs.add("localhost:27019")

Congratulations, after these commands return you will have a fully functional replica set. The new replica set should successfully elect a primary node within a few seconds. Check the status of your replica set at any time with the rs.status() (page 479) operation. See the documentation of the following shell functions for more information: rs.initiate() (page 477), rs.conf() (page 477), rs.reconfig() (page 478) and rs.add() (page 476). See Also: You may also consider the simple setup script as an example of a basic automatically congured replica set. Production Replica Set Production replica sets are very similar to the development or testing deployment described above, with the following differences: Each member of the replica set will reside on its own machine, and the MongoDB processes will all bind to port 27017, which is the standard MongoDB port. Specify all run-time conguration in conguration les (page 494) rather than as command line options (page 485). Each member of the replica set needs to be accessible by way of resolvable DNS or hostnames in the following scheme: mongodb0.example.net mongodb1.example.net mongodb2.example.net 62 Chapter 6. Tutorials

MongoDB Documentation, Release 2.0.6

Congure DNS names appropriately, or set up your systems /etc/host le to reect this conguration. Use the following conguration for each MongoDB instance.
port = 27017 bind_ip = 10.8.0.10 dbpath = /srv/mongodb/ fork = true replSet = rs0

You do not need to specify an interface with bind_ip (page 495). However, if you do not specify an interface, MongoDB will listen for connections on all available IPv4 interfaces. Modify bind_ip (page 495) to reect a secure interface on your system that will be able to access all other members of the set and on which all other members of the replica set can access the current node. The DNS or host names need to point and resolve to this IP address. Congure network rules or a virtual private network (i.e. VPN) to permit this access. For more documentation of the run time options used above, see: dbpath (page 497), port (page 495), replSet (page 501), bind_ip (page 495), and fork (page 497). Also consider any additional conguration options (page 494) that your deployment may require. Store the conguration le on each system, at /etc/mongodb.conf or a related location on the le system. On each system issue the following command to start the mongod process:
mongod --config /etc/mongodb.conf

Note: In production deployments you likely want to use and congure a control script to manage this process based on this command. Control scripts are beyond the scope of this document. Log in with the mongo shell to this host using the following command:
mongo

Issue the following shell function to initiate a replica set consisting of the current node, using the default conguration:
rs.initiate()

Use the following shell function to display the current replica conguration (page 561):
rs.config()

Now, issue the following sequence of commands to add two nodes to the replica set.
rs.add("mongodb1.example.net") rs.add("mongodb2.example.net")

Congratulations, after these commands return you will have a fully functional replica set. New replica sets will elect a primary within a few moments. See Also: The documentation of the following shell functions for more information: rs.initiate() (page 477), rs.conf() (page 477), rs.reconfig() (page 478), and 6.1. Deploy a Replica Set 63

MongoDB Documentation, Release 2.0.6

rs.add() (page 476).

6.2 Add Members to a Replica Set


6.2.1 Overview
This tutorial explains how to add an additional member to an existing replica set. Before adding a new member, see the Adding Members (page 42) topic in the Replica Set Administration (page 38) document. For background on replication deployment patterns, see the Replication Architectures (page 47) document.

6.2.2 Requirements
1. An active replica set. 2. A new MongoDB system capable of supporting your dataset, accessible by the active replica set through the network. If neither of these conditions are satised, please use the MongoDB installation tutorial (page 9) and the Deploy a Replica Set (page 61) guide instead.

6.2.3 Procedures
The examples in this procedure use the following conguration: The active replica set is rs0. The new member to be added is mongodb3.example.net. The mongod instance default port is 27017. The mongodb.conf conguration le exists in the /etc directory and contains the following replica set information:
port = 27017 bind_ip = 10.8.0.10 dbpath = /srv/mongodb/ logpath = /var/log/mongodb.log fork = true replSet = rs0

For more information on conguration options, see Conguration File Options (page 494). Add a Member to an Existing Replica Set This procedure uses the above example conguration (page 64). 1. Deploy a new mongod instance, specifying the name of the replica set. You can do this one of two ways:

64

Chapter 6. Tutorials

MongoDB Documentation, Release 2.0.6

Using the mongodb.conf le. On the primary, issue a command that resembles the following:
mongod --config /etc/mongodb.conf

Using command line arguments. On the primary, issue command that resembles the following:
mongod --replSet rs0

Take note of the host name and port information for the new mongod instance. 2. Open a mongo shell connected to the replica sets primary:
mongo

Note: The primary is the only member that can add or remove members from the replica set. If you do not know which member is the primary, log into any member of the replica set using mongo and issue the db.isMaster() (page 469) command to determine which member is in the isMaster.primary eld. For example:
mongo mongodb0.example.net db.isMaster()

If you are not connected to the primary, disconnect from the current client and reconnect to the primary. 3. In the mongo shell, issue the following command to add the new member to the replica set.
rs.add("mongodb3.example.net")

Note: You can also include the port number, depending on your setup:
rs.add("mongodb3.example.net:27017")

4. Verify that the member is now part of the replica set by calling the rs.config() method, which displays the replica set conguration (page 561):
rs.config()

You can use the rs.status() (page 479) function to provide an overview of replica set status (page 558). Add a Member to an Existing Replica Set (Alternate Procedure) Alternately, you can add a member to a replica set by specifying an entire conguration document with some or all of the elds in a members (page 561) document. For example:
rs.add({_id: 1, host: "mongodb3.example.net:27017", priority: 0, hidden: true})

This congures a hidden member that is accessible at mongodb3.example.net:27017. See host (page 561), priority (page 562), and hidden (page 562) for more information about these settings. When you specify a full conguration object with rs.add() (page 476), you must declare the _id eld, which is not automatically populated in this case.

6.2.4 Production Notes


In production deployments you likely want to use and congure a control script to manage this process based on this command. 6.2. Add Members to a Replica Set 65

MongoDB Documentation, Release 2.0.6

A member can be removed from a set and re-added later. If the removed members data is still relatively fresh, it can recover and catch up from its old data set. See the rs.add() (page 476) and rs.remove() (page 478) helpers. If you have a backup or snapshot of an existing member, you can move the data les (i.e. /data/db or dbpath (page 497)) to a new system and use them to quickly initiate a new member. These les must be: clean: the existing dataset must be from a consistent copy of the database from a member of the same replica set. See the Backup and Restoration Strategies (page 156) document for more information. recent: the copy must more recent than the oldest operation in the primary members oplog. The new secondary must be able to become current using operations from the primarys oplog. There is a maximum of seven voting members (page 58) in any replica set. When adding more members to a replica set that already has seven votes, you must either: add the new member as a non-voting members (page 42) or, remove votes from an existing member (page 563).

6.3 Deploy a Geographically Distributed Replica Set


This document describes the procedure for deploying a replica set with members in multiple locations, and addresses both three member replica sets, four member replica sets, and replica sets with more than four members. See Also: Replication Fundamentals (page 33) and Replication Architectures (page 47) for appropriate background. The Deploy a Replica Set (page 61) and Add Members to a Replica Set (page 64) tutorials provide documentation of related operations.

6.3.1 Overview
While replica sets provide basic protection against single-instance failure, when all of the members of a replica set reside within a single facility, the replica set is still susceptible to some classes of errors within that facility including power outages, networking distortions, and natural disasters. To protect against these classes of failures, deploy a replica set with one or more members in a geographically distinct facility or data center.

6.3.2 Requirements
For a three-member replica set you will need two instances in a primary facility (hereafter, Site A) and one member in a secondary facility (hereafter, Site B.) Site A should be the same facility or very close to your primary application infrastructure (i.e. application servers, caching layer, users, etc.) For a four-member replica set you will need two systems within Site A, two members in Site B (or one member in Site B, and one member in Site C,) and a single arbiter in Site A. If you wish to deploy additional members in the secondary facility or multiple secondary facilities, the requirements are the same with the following notes: Ensure that a majority of the total number of voting nodes (page 42) are within Site A. This includes secondaryonly members (page 39) and arbiters (page 41). If you deploy a replica set with an uneven number of members, deploy an arbiter (page 41) within Site A.

66

Chapter 6. Tutorials

MongoDB Documentation, Release 2.0.6

6.3.3 Procedure
Although you may to deploy more than one replica set member on a single system, this conguration reduces the redundancy and capacity of the replica set. Such deployments are typically for testing purposes and beyond the scope of this tutorial. Three Member Replica Set Consider the following features of this deployment: Each member of the replica set, except for the arbiter (see below), will reside on its own machine, and the MongoDB processes will all bind to port 27017, or the standard MongoDB port. Conguration les (page 494) provide runtime conguration rather than as command line options (page 485). Each member of the replica set needs to be accessible by way of resolvable DNS or hostnames in the following scheme: mongodb0.example.net mongodb1.example.net mongodb2.example.net Congure DNS names appropriately, or set up your systems /etc/host le to reect this conguration. Ensure that one system (e.g. mongodb2.example.net) resides in Site B. Host all other hosts systems in Site A. Ensure that network trafc can pass between all members in the network securely and efciently. Consider the following: Establish a virtual private network between the systems in Site A and Site B (and Site C if it exists) to encrypt all trafc between the sites and remains private. Ensure that your network topology routs all trafc between members within a single site over the local area network. Congure authentication using auth (page 497) and keyFile (page 496), so that only servers and process with authentication can connect to the replica set. Congure networking and rewall rules so that only trafc (incoming and outgoing packets) on the default MongoDB port (e.g. 27017) from within your deployment. See Also: The Security (page 44) section for more information regarding security and rewalls. Use the following conguration for each MongoDB instance:
port = 27017 bind_ip = 10.8.0.10 dbpath = /srv/mongodb/ fork = true replSet = rs0/mongodb0.example.net,mongodb1.example.net,mongodb2.example.net

Modify the bind_ip (page 495) to reect a secure interface on your system that will be able to access all other members of the set and that is accessible to all other members of the replica set. The DNS or host names need to point and resolve to this IP address. Congure network rules or a virtual private network (i.e. VPN) to permit this access.

6.3. Deploy a Geographically Distributed Replica Set

67

MongoDB Documentation, Release 2.0.6

Note: The portion of the replSet (page 501) following the / provides a seed list of known members of the replica set. mongod uses this list to fetch conguration changes following restarts. It is acceptable to omit this section entirely, and have the replSet (page 501) option resemble:
replSet = rs0

Store this le on each system, located at /etc/mongodb.conf on the le system. See the documentation of the conguration options used above: dbpath (page 497), port (page 495), replSet (page 501), bind_ip (page 495), and fork (page 497). Also consider any additional conguration options (page 494) that your deployment requires. On each system issue the following command to start the mongod process:
mongod --config /etc/mongodb.conf

Note: In production deployments you likely want to use and congure a control script to manage this process based on this command. Control scripts are beyond the scope of this document. Log in with the mongo shell to this host using the mongo command at the system prompt. Call the following shell helper to initiate a replica set consisting of the current instance, using the default conguration:
rs.initiate()

Use the following shell function to display the current replica conguration (page 561):
rs.config()

Now, issue the following sequence of commands to add the remaining members to the replica set. The following example assumes that the current primary is mongodb0.example.net.
rs.add("mongodb1.example.net") rs.add("mongodb2.example.net") rs.add("mongodb3.example.net")

Make sure that you have congured the member located in Site B (i.e. mongodb3.example.net) as a secondaryonly member (page 39). First, issue the following command determine the members[n]._id (page 561) value for mongodb3.example.net:
rs.config()

In the member array (page 561) for this host, save the members[n]._id (page 561) value. The next example assumes that this value is 2. Next, in the shell connected to the replica sets primary, issue the following command sequence:
cfg = rs.conf() cfg.members[2].priority = 0 rs.reconfig(cfg)

Note: The rs.reconfig() (page 478) shell command can force the current primary to step down and causes an election in some situations. When the primary steps down, all clients will disconnect. This is the intended behavior. While, this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods. Congratulations! You have now deployed a geographically distributed three-member replica set.

68

Chapter 6. Tutorials

MongoDB Documentation, Release 2.0.6

Four Member Replica Set Consider the following features of this deployment: Each member of the replica set, except for the arbiter (see below), will reside on its own machine, and the MongoDB processes will all bind to port 27017, or the standard MongoDB port. Conguration les (page 494) provide runtime conguration rather than as command line options (page 485). Each member of the replica set needs to be accessible by way of resolvable DNS or hostnames in the following scheme: mongodb0.example.net mongodb1.example.net mongodb2.example.net mongodb3.example.net Congure DNS names appropriately, or set up your systems /etc/host le to reect this conguration. Ensure that one system (e.g. mongodb2.example.net) resides in Site B. Host all other hosts systems in Site A. One host (e.g. mongodb3.example.net) will be an arbiter and can run on a system that is also used for an application server or some other shared purpose. There are three possible architectures for this replica set: Two members in Site A, two secondary-only members (page 39) in Site B, and an arbiter in Site A. Three members in Site A and one secondary-only member (page 39) in Site B. Two members in Site A, one secondary-only member (page 39) in Site B, one secondary-only member (page 39) in Site C and an arbiter in site A. In most cases the rst architecture is preferable because it is the lest complex. Ensure that network trafc can pass between all members in the network securely and efciently. Consider the following: Establish a virtual private network between the systems in Site A and Site B (and Site C if it exists) to encrypt all trafc between the sites and remains private. Ensure that your network topology routs all trafc between members within a single site over the local area network. Congure authentication using auth (page 497) and keyFile (page 496), so that only servers and process with authentication can connect to the replica set. Congure networking and rewall rules so that only trafc (incoming and outgoing packets) on the default MongoDB port (e.g. 27017) from within your deployment. See Also: The Security (page 44) section for more information regarding security practices with replica sets. Use the following conguration for each MongoDB instance:
port = 27017 bind_ip = 10.8.0.10 dbpath = /srv/mongodb/ fork = true

6.3. Deploy a Geographically Distributed Replica Set

69

MongoDB Documentation, Release 2.0.6

replSet = rs0/mongodb0.example.net,mongodb1.example.net,mongodb2.example.net,mongodb3.example.net

Modify the bind_ip (page 495) to reect a secure interface on your system that will be able to access all other members of the set and that is accessible to all other members of the replica set. The DNS or host names need to point and resolve to this IP address. Congure network rules or a virtual private network (i.e. VPN) to permit this access. Note: The portion of the replSet (page 501) following the / provides a seed list of known members of the replica set. mongod uses this list to fetch conguration changes following restarts. It is acceptable to omit this section entirely, and have the replSet (page 501) option resemble:
replSet = rs0

Store this le on each system, located at /etc/mongodb.conf on the le system. See the documentation of the conguration options used above: dbpath (page 497), port (page 495), replSet (page 501), bind_ip (page 495), and fork (page 497). Also consider any additional conguration options (page 494) that your deployment requires. On each system issue the following command to start the mongod process:
mongod --config /etc/mongodb.conf

Note: In production deployments you likely want to use and congure a control script to manage this process based on this command. Control scripts are beyond the scope of this document. Log in with the mongo shell to this host using the mongo command at the system prompt. Call the following shell helper to initiate a replica set consisting of the current instance using the default conguration:
rs.initiate()

Use the following shell function to display the current replica conguration (page 561):
rs.config()

Now, issue the following sequence of commands to add the remaining instances to the replica set. The following example assumes that the current primary is mongodb0.example.net.
rs.add("mongodb1.example.net") rs.add("mongodb2.example.net") rs.add("mongodb3.example.net")

In the same shell session, issue the following command to add the arbiter (i.e. mongodb4.example.net):
rs.addArb("mongodb4.example.net")

Make sure that you have congured the member located in Site B (i.e. mongodb3.example.net) as a secondaryonly member (page 39). First, issue the following command determine the members[n]._id (page 561) value for mongodb3.example.net:
rs.config()

In the member array (page 561) for this host, save the members[n]._id (page 561) value. The next example assumes that this value is 2. Next, in the shell connected to the replica sets primary, issue the following command sequence:
cfg = rs.conf() cfg.members[2].priority = 0 rs.reconfig(cfg)

70

Chapter 6. Tutorials

MongoDB Documentation, Release 2.0.6

Note: The rs.reconfig() (page 478) shell command can force the current primary to step down and causes an election in some situations. When the primary steps down, all clients will disconnect. This is the intended behavior. While, this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods. Congratulations! You have now deployed a geographically distributed four-member replica set. Larger Replica Set Considerations The procedure for deploying a geographically distributed set with more than three or four members resembles the above procedures. However, consider the following: Never deploy more than seven voting members. Use the procedure for a four member replica set if you have an even number of members. Ensure that Site A always has a majority of the members by deploying the arbiter within Site A. For six member sets, deploy at least three voting members in addition to the arbiter in Site A, the remaining members in alternate sites. Use the procedure for a three member replica set if you have an odd number of members. Ensure that Site A always has a majority of the members of the set. For example, if a set has ve members, deploy three remember within the primary facility and two remember in other facilities. If you have a majority of the members of the set outside of Site A and the network partitions to prevent communication between sites, the current primary in Site A will step down, even if none of the members outside of Site A are eligible to become primary.

6.4 Change the Size of the Oplog


The oplog exists internally as a capped collection, so you cannot modify its size in the course of normal operations. In most cases the default oplog size (page 36) is an acceptable size; however, in some situations you may need a larger or smaller oplog. For example, you might need to change the oplog size if your applications perform large numbers of multi-updates or deletes in short periods of time. This tutorial describes how to resize the oplog. For a detailed explanation of oplog sizing, see the Oplog (page 36) topic in the Replication Fundamentals (page 33) document. For details on the how oplog size affects delayed members and affects replication lag, see the Delayed Members (page 40) topic and the Replication Lag (page 45) topic in Replica Set Administration (page 38).

6.4.1 Overview
The following is an overview of the procedure for changing the size of the oplog: 1. Shut down the current primary instance in the replica set and then restart it on a different port and in standalone mode. 2. Create a backup of the old (current) oplog. This is optional. 3. Save the last entry from the old oplog. 4. Drop the old oplog. 5. Create a new oplog of a different size. 6. Insert the previously saved last entry from the old oplog into the new oplog.

6.4. Change the Size of the Oplog

71

MongoDB Documentation, Release 2.0.6

7. Restart the server as a member of the replica set on its usual port. 8. Apply this procedure to any other member of the replica set that could become primary.

6.4.2 Procedure
The examples in this procedure use the following conguration: The active replica set is rs0. The replica set is running on port is 27017. The replica set is running with a data directory (page 497) of /srv/mongodb. To change the size of the oplog for a replica set, use the following procedure for every member of the set that may become primary. 1. Shut down the mongod instance and restart it in standalone mode running on a different port. Note: Shutting down the primary member of the set will trigger a failover situation and another member in the replica set will become primary. In most cases, it is least disruptive to modify the oplogs of all the secondaries before modifying the primary. To shut down the current primary instance, use a command that resembles the following:
mongod --dbpath /srv/mongodb --shutdown

To restart the instance on a different port and in standalone mode (i.e. without replSet (page 501) or --replSet), use a command that resembles the following:
mongod --port 37017 --dbpath /srv/mongodb

2. Backup the existing oplog on the standalone instance. Use the following sequence of commands:
mongodump --db local --collection oplog.rs --port 37017

Connect to the instance using the mongo shell:


mongo --port 37017

3. Save the last entry from the old (current) oplog. (a) In the mongo shell, enter the following command to use the local database to interact with the oplog:
use local

(b) Use the db.collection.save() (page 463) operation to save the last entry in the oplog to a temporary collection:
db.temp.save( db.oplog.rs.find().sort( {$natural : -1} ).limit(1).next() )

You can see this oplog entry in the temp collection by issuing the following command:
db.temp.find()

4. Drop the old oplog.rs collection in the local database. Use the following command:
db.oplog.rs.drop() This will return true on the shell.

72

Chapter 6. Tutorials

MongoDB Documentation, Release 2.0.6

5. Use the create command to create a new oplog of a different size. Specify the size argument in bytes. A value of 2147483648 will create a new oplog thats 2 gigabytes:
db.runCommand( { create : "oplog.rs", capped : true, size : 2147483648 } )

Upon success, this command returns the following status:


{ "ok" : 1 }

6. Insert the previously saved last entry from the old oplog into the new oplog:
db.oplog.rs.save( db.temp.findOne() )

To conrm the entry is in the new oplog, issue the following command:
db.oplog.rs.find()

7. Restart the server as a member of the replica set on its usual port:
mongod --dbpath /srv/mongodb --shutdown mongod --replSet rs0 --dbpath /srv/mongodb

The replica member will recover and catch up and then will be eligible for election to primary. To step down the temporary primary that took over when you initially shut down the server, use the rs.stepDown() (page 479) method. This will force an election for primary. If the servers priority (page 34) is higher than all other members in the set and if it has successfully caught up, then it will likely become primary. 8. Repeat this procedure for all other members of the replica set that are or could become primary.

6.5 Convert a Replica Set to a Replicated Shard Cluster


6.5.1 Overview
Following this tutorial, you will convert a single 3-member replica set to a shard cluster that consists of 2 shards. Each shard will consist of an independent 3-member replica set. The tutorial uses a test environment running on a local system UNIX-like system. You should feel encouraged to follow along at home. If you need to perform this process in a production environment, notes throughout the document indicate procedural differences. The procedure, from a high level, is as follows: 1. Create or select a 3-member replica set and insert some data into a collection. 2. Start the cong databases and create a shard cluster with a single shard. 3. Create a second replica set with three new mongod instances. 4. Add the second replica set to the shard cluster. 5. Enable sharding on the desired collection or collections.

6.5.2 Process
Install MongoDB according to the instructions in the MongoDB Installation Tutorial (page 9).

6.5. Convert a Replica Set to a Replicated Shard Cluster

73

MongoDB Documentation, Release 2.0.6

Deploy a Replica Set with Test Data If have an existing MongoDB replica set deployment, you can omit the this step and continue from Deploy Sharding Infrastructure (page 169). Use the following sequence of steps to congure and deploy a replica set and to insert test data. 1. Create the following directories for the rst replica set instance, named rstset: /data/example/firstset1 /data/example/firstset2 /data/example/firstset3 To create directories, issue the following command:
mkdir -p /data/example/firstset1 /data/example/firstset2 /data/example/firstset3

2. In a separate terminal window or GNU Screen window, start three mongod instances by running each of the following commands:
mongod --dbpath /data/example/firstset1 --port 10001 --replSet firstset --oplogSize 700 --rest mongod --dbpath /data/example/firstset2 --port 10002 --replSet firstset --oplogSize 700 --rest mongod --dbpath /data/example/firstset3 --port 10003 --replSet firstset --oplogSize 700 --rest

Note: The --oplogSize 700 (page 490) option restricts the size of the operation log (i.e. oplog) for each mongod instance to 700MB. Without the --oplogSize (page 490) option, each mongod reserves approximately 5% of the free disk space on the volume. By limiting the size of the oplog, each instance starts more quickly. Omit this setting in production environments. 3. In a mongo shell session in a new terminal, connect to the mongodb instance on port 10001 by running the following command. If you are in a production environment, rst read the note below.
mongo localhost:10001/admin

Note: Above and hereafter, if you are running in a production environment or are testing this process with mongod instances on multiple systems, replace localhost with a resolvable domain, hostname, or the IP address of your system. 4. In the mongo shell, initialize the rst replica set by issuing the following command:
db.runCommand({"replSetInitiate" : {"_id" : "firstset", "members" : [{"_id" : 1, "host" {"_id" : 2, "host" {"_id" : 3, "host" ]}}) { "info" : "Config now saved locally. Should come online in about "ok" : 1 } : "localhost:10001"}, : "localhost:10002"}, : "localhost:10003"}

a minute.",

5. In the mongo shell, create and populate a new collection by issuing the following sequence of JavaScript operations:

use test switched to db test people = ["Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy", "Greg", "Steve", "Kristina

74

Chapter 6. Tutorials

MongoDB Documentation, Release 2.0.6

for(var i=0; i<1000000; i++){ name = people[Math.floor(Math.random()*people.length)]; user_id = i; boolean = [true, false][Math.floor(Math.random()*2)]; added_at = new Date(); number = Math.floor(Math.random()*10001); db.test_collection.save({"name":name, "user_id":user_id, "boolean": }

The above operations add one million documents to the collection test_collection. This can take several minutes, depending on your system. The script adds the documents in the following form:

{ "_id" : ObjectId("4ed5420b8fc1dd1df5886f70"), "name" : "Greg", "user_id" : 4, "boolean" : true, "ad

Deploy Sharding Infrastructure This procedure creates the three cong databases that store the clusters metadata. Note: For development and testing environments, a single cong database is sufcient. In production environments, use three cong databases. Because cong instances store only the metadata for the shard cluster, they have minimal resource requirements. 1. Create the following data directories for three cong database instances: /data/example/config1 /data/example/config2 /data/example/config3 Issue the following command at the system prompt:
mkdir -p /data/example/config1 /data/example/config2 /data/example/config3

2. In a separate terminal window or GNU Screen window, start the cong databases by running the following commands:
mongod --configsvr --dbpath /data/example/config1 --port 20001 mongod --configsvr --dbpath /data/example/config2 --port 20002 mongod --configsvr --dbpath /data/example/config3 --port 20003

3. In a separate terminal window or GNU Screen window, start mongos instance by running the following command:
mongos --configdb localhost:20001,localhost:20002,localhost:20003 --port 27017 --chunkSize 1

Note: If you are using the collection created earlier or are just experimenting with sharding, you can use a small --chunkSize (page 493) (1MB works well.) The default chunkSize (page 503) of 64MB means that your cluster must have 64MB of data before the MongoDBs automatic sharding begins working. In production environments, do not use a small shard size. The configdb (page 502) options specify the conguration databases (e.g. localhost:20001, localhost:20002, and localhost:2003). The mongos instance runs on the default MongoDB port

6.5. Convert a Replica Set to a Replicated Shard Cluster

75

MongoDB Documentation, Release 2.0.6

(i.e. 27017), while the databases themselves are running on ports in the 30001 series. In the this example, you may omit the --port 27017 (page 492) option, as 27017 is the default port. 4. Add the rst shard in mongos. In a new terminal window or GNU Screen session, add the rst shard, according to the following procedure: (a) Connect to the mongos with the following command:
mongo localhost:27017/admin

(b) Add the rst shard to the cluster by issuing the addShard command:
db.runCommand( { addShard : "firstset/localhost:10001,localhost:10002,localhost:10003" } )

(c) Observe the following message, which denotes success:


{ "shardAdded" : "firstset", "ok" : 1 }

Deploy a Second Replica Set This procedure deploys a second replica set. This closely mirrors the process used to establish the rst replica set above, omitting the test data. 1. Create the following data directories for the members of the second replica set, named secondset: /data/example/secondset1 /data/example/secondset2 /data/example/secondset3 2. In three new terminal windows, start three instances of mongod with the following commands:

mongod --dbpath /data/example/secondset1 --port 10004 --replSet secondset --oplogSize 700 --rest mongod --dbpath /data/example/secondset2 --port 10005 --replSet secondset --oplogSize 700 --rest mongod --dbpath /data/example/secondset3 --port 10006 --replSet secondset --oplogSize 700 --rest

Note: As above, the second replica set uses the smaller oplogSize (page 501) conguration. Omit this setting in production environments. 3. In the mongo shell, connect to one mongodb instance by issuing the following command:
mongo localhost:10004/admin

4. In the mongo shell, initialize the second replica set by issuing the following command:
db.runCommand({"replSetInitiate" : {"_id" : "secondset", "members" : [{"_id" : 1, "host" : "localhost:10004"}, {"_id" : 2, "host" : "localhost:10005"}, {"_id" : 3, "host" : "localhost:10006"} ]}}) { "info" : "Config now saved locally. "ok" : 1 } Should come online in about a minute.",

5. Add the second replica set to the shard cluster. Connect to the mongos instance created in the previous procedure and issue the following sequence of commands:

76

Chapter 6. Tutorials

MongoDB Documentation, Release 2.0.6

use admin db.runCommand( { addShard : "secondset/localhost:10004,localhost:10005,localhost:10006" } )

This command returns the following success message:


{ "shardAdded" : "secondset", "ok" : 1 }

6. Verify that both shards are properly congured by running the listShards command. View this and example output below:
db.runCommand({listshards:1}) { "shards" : [ { "_id" : "firstset", "host" : "firstset/localhost:10001,localhost:10003,localhost:10002" }, { "_id" : "secondset", "host" : "secondset/localhost:10004,localhost:10006,localhost:10005" } ], "ok" : 1 }

Enable Sharding MongoDB must have sharding enabled on both the database and collection levels.
Enabling Sharding on the Database Level

Issue the enableSharding command. The following example emables sharding on the test database:
db.runCommand( { enablesharding : "test" } ) { "ok" : 1 }

Create an Index on the Shard Key

MongoDB uses the shard key to distribute documents between shards. Once selected, you cannot change the shard key. Good shard keys: have values that are evenly distributed among all documents, group documents that are often accessed at the same time into contiguous chunks, and allow for effective distribution of activity among shards. Typically shard keys are compound, comprising of some sort of hash and some sort of other primary key. Selecting a shard key depends on your data set, application architecture, and usage pattern, and is beyond the scope of this document. For the purposes of this example, we will shard the number key. This typically would not be a good shard key for production deployments. Create the index with the following procedure:
use test db.test_collection.ensureIndex({number:1})

6.5. Convert a Replica Set to a Replicated Shard Cluster

77

MongoDB Documentation, Release 2.0.6

See Also: The Shard Key Overview (page 96) and Shard Key (page 116) sections.
Shard the Collection

Issue the following command:


use admin db.runCommand( { shardcollection : "test.test_collection", key : {"number":1} }) { "collectionsharded" : "test.test_collection", "ok" : 1 }

The collection test_collection is now sharded! Over the next few minutes the Balancer begins to redistribute chunks of documents. You can conrm this activity by switching to the test database and running db.stats() (page 472) or db.printShardingStatus() (page 470). As clients insert additional documents into this collection, mongos distributes the documents evenly between the shards. In the mongo shell, issue the following commands to return statics against each cluster:
use test db.stats() db.printShardingStatus()

Example output of the db.stats() (page 472) command:


{ "raw" : { "firstset/localhost:10001,localhost:10003,localhost:10002" : { "db" : "test", "collections" : 3, "objects" : 973887, "avgObjSize" : 100.33173458522396, "dataSize" : 97711772, "storageSize" : 141258752, "numExtents" : 15, "indexes" : 2, "indexSize" : 56978544, "fileSize" : 1006632960, "nsSizeMB" : 16, "ok" : 1 }, "secondset/localhost:10004,localhost:10006,localhost:10005" : { "db" : "test", "collections" : 3, "objects" : 26125, "avgObjSize" : 100.33286124401914, "dataSize" : 2621196, "storageSize" : 11194368, "numExtents" : 8, "indexes" : 2, "indexSize" : 2093056, "fileSize" : 201326592, "nsSizeMB" : 16, "ok" : 1 }

78

Chapter 6. Tutorials

MongoDB Documentation, Release 2.0.6

}, "objects" : 1000012, "avgObjSize" : 100.33176401883178, "dataSize" : 100332968, "storageSize" : 152453120, "numExtents" : 23, "indexes" : 4, "indexSize" : 59071600, "fileSize" : 1207959552, "ok" : 1 }

Example output of the db.printShardingStatus() (page 470) command:


--- Sharding Status --sharding version: { "_id" : 1, "version" : 3 } shards: { "_id" : "firstset", "host" : "firstset/localhost:10001,localhost:10003,localhost:10002" } { "_id" : "secondset", "host" : "secondset/localhost:10004,localhost:10006,localhost:10005" databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : true, "primary" : "firstset" } test.test_collection chunks: secondset 5 firstset 186 [...]

In a few moments you can run these commands for a second time to demonstrate that chunks are migrating from firstset to secondset. When this procedure is complete, you will have converted a replica set into a shard cluster where each shard is itself a replica set.

6.6 Change Hostnames in a Replica Set


6.6.1 Synopsis
For most replica sets the hostnames 2 in the members[n].host (page 561) eld never change. However, in some cases you must migrate some or all host names in a replica set as organizational needs change. This document presents two possible procedures for changing the hostnames in the members[n].host (page 561) eld. Depending on your environments availability requirements, you may: 1. Make the conguration change without disrupting the availability of the replica set. While this ensures that your application will always be able to read and write data to the replica set, this procedure can take a long time and may incur downtime at the application layer. 3 For this procedure, see Changing Hostnames while Maintaining the Replica Sets Availability (page 80). 2. Stop all members of the replica set at once running on the old hostnames or interfaces, make the conguration changes, and then start the members at the new hostnames or interfaces. While the set will be totally unavailable during the operation, the total maintenance window is often shorter.
2 Always use resolvable hostnames for the value of the members[n].host (page 561) eld in the replica set conguration to avoid confusion and complexity. 3 You will have to congure your applications so that they can connect to the replica set at both the old and new locations. This often requires a restart and reconguration at the application layer, which may affect the availability of your applications. This re-conguration is beyond the scope of this document and makes the second option (page 82) preferable when you must change the hostnames of all members of the replica set at once.

6.6. Change Hostnames in a Replica Set

79

MongoDB Documentation, Release 2.0.6

For this procedure, see Changing All Hostnames in Replica Set at Once (page 82). See Also: Replica Set Conguration (page 561) Replica Set Reconguration Process (page 564) rs.conf() (page 477) and rs.reconfig() (page 478) And the following tutorials: Deploy a Replica Set (page 61) Add Members to a Replica Set (page 64)

6.6.2 Procedures
Given a replica set with three members: database0.example.com:27017 (the primary) database1.example.com:27017 database2.example.com:27017 And with the following rs.config() output:
{ "_id" : "rs", "version" : 3, "members" : [ { "_id" : 0, "host" : "database0.example.com:27017" }, { "_id" : 1, "host" : "database1.example.com:27017" }, { "_id" : 2, "host" : "database2.example.com:27017" } ] }

The following procedures change the members hostnames as follows: mongodb0.example.net:27017 (the primary) mongodb1.example.net:27017 mongodb2.example.net:27017 Use the most appropriate procedure for your deployment. Changing Hostnames while Maintaining the Replica Sets Availability This procedure uses the above assumptions (page 80). 1. For each secondary in the replica set, perform the following sequence of operations:

80

Chapter 6. Tutorials

MongoDB Documentation, Release 2.0.6

(a) Stop the secondary. (b) Restart the secondary at the new location. (c) Open a mongo shell connected to the replica sets primary. In our example, the primary runs on port 27017 so you would issue the following command:
mongo --port 27017

(d) Run the following recongure option, for the members[n].host (page 561) value where n is 1:
cfg = rs.conf() cfg.members[1].host = "mongodb1.example.net:27017" rs.reconfig(cfg)

See Replica Set Conguration (page 561) for more information. (e) Make sure your client applications are able to access the set at the new location and that the secondary has a chance to catch up with the other members of the set. Repeat the above steps for each non-primary member of the set. 2. Open a mongo shell connected to the primary and step down the primary using replSetStepDown (page 441). In the mongo shell, use the rs.stepDown() (page 479) wrapper, as follows:
rs.stepDown()

3. When the step down succeeds, shut down the primary. 4. To make the nal conguration change, connect to the new primary in the mongo shell and recongure the members[n].host (page 561) value where n is 0:
cfg = rs.conf() cfg.members[0].host = "mongodb0.example.net:27017" rs.reconfig(cfg)

5. Start the original primary. 6. Open a mongo shell connected to the primary. 7. To conrm the new conguration, call rs.config() in the mongo shell. Your output should resemble:
{ "_id" : "rs", "version" : 4, "members" : [ { "_id" : 0, "host" : "mongodb0.example.net:27017" }, { "_id" : 1, "host" : "mongodb1.example.net:27017" }, { "_id" : 2, "host" : "mongodb2.example.net:27017"

6.6. Change Hostnames in a Replica Set

81

MongoDB Documentation, Release 2.0.6

} ] }

Changing All Hostnames in Replica Set at Once This procedure uses the above assumptions (page 80). 1. Stop all members in the replica set. 2. Restart each member on a different port and without using the --replSet run-time option. Changing the port number during maintenance prevents clients from connecting to this host while you perform maintenance. Use the members usual --dbpath, which in this example is /data/db1. Use a command that resembles the following:
mongod --dbpath /data/db1/ --port 37017

3. For each member of the replica set, perform the following sequence of operations: (a) Open a mongo shell connected to the mongod running on the new, temporary port. For example, for a member running on a temporary port of 37017, you would issue this command:
mongo --port 37017

(b) Edit the replica set conguration manually. The replica set conguration is the only document in the system.replset collection in the local database. Edit the replica set conguration with the new hostnames and correct ports for all the members of the replica set. Consider the following sequence of commands to change the hostnames in a three-member set:
use local cfg = db.system.replset.findOne( { "_id": "rs" } ) cfg.members[0].host = "mongodb0.example.net:27017" cfg.members[1].host = "mongodb1.example.net:27017" cfg.members[2].host = "mongodb2.example.net:27017" db.system.replset.update( { "_id": "rs" } , cfg )

(c) Stop the mongod process on the member. 4. After re-conguring all members of the set, start each mongod instance in the normal way: use the usual port number and use the --replSet option. For example:
mongod --dbpath /data/db1/ --port 27017 --replSet rs

5. Connect to one of the mongod instances using the mongo shell. For example:
mongo --port 27017

6. To conrm the new conguration, call rs.config() in the mongo shell. Your output should resemble:
{ "_id" : "rs", "version" : 4, "members" : [

82

Chapter 6. Tutorials

MongoDB Documentation, Release 2.0.6

{ "_id" : 0, "host" : "mongodb0.example.net:27017" }, { "_id" : 1, "host" : "mongodb1.example.net:27017" }, { "_id" : 2, "host" : "mongodb2.example.net:27017" } ] }

6.7 Convert a Secondary to an Arbiter


If you have a secondary in a replica set that no longer needs to hold a copy of the data but that you want to retain in the set to ensure that the replica set will be able to elect a primary (page 34), you can convert the secondary into an arbiter (page 41). This document provides two equivalent procedures for this process.

6.7.1 Synopsis
Both of the following procedures are operationally equivalent. Choose whichever procedure you are most comfortable with: 1. You may operate the arbiter on the same port as the former secondary. In this procedure, you must shut down the secondary and remove its data before restarting and reconguring it as an arbiter. For this procedure, see Convert a Secondary to an Arbiter and Reuse the Port Number (page 83). 2. Run the arbiter on a new port. In this procedure, you can recongure the server as an arbiter before shutting down the instance running as a secondary. For this procedure, see Convert a Secondary to an Arbiter Running on a New Port Number (page 84). See Also: Arbiters (page 41) rs.addArb() (page 476) Replica Set Administration (page 38)

6.7.2 Procedures
Convert a Secondary to an Arbiter and Reuse the Port Number 1. If your application is connecting directly to the secondary, modify the application so that MongoDB queries dont reach the secondary. 2. Shut down the secondary. 3. Remove the secondary from the replica set by calling the rs.remove() (page 478) method. Perform this operation while connected to the current primary in the mongo shell:

6.7. Convert a Secondary to an Arbiter

83

MongoDB Documentation, Release 2.0.6

rs.remove("<hostname>:<port>")

4. Verify that the replica set no longer includes the secondary by calling the rs.config() method in the mongo shell:
rs.config()

5. Move the secondarys data directory to an archive folder. For example:


mv /data/db /data/db-old

Optional You may remove the data instead. 6. Create a new, empty data directory to point to when restarting the mongod instance. You can reuse the previous name. For example:
mkdir /data/db

7. Restart the mongod instance for the secondary, specifying the port number, the empty data directory, and the replica set. You can use the same port number you used before. Issue a command similar to the following:
mongod --port 27021 --dbpath /data/db --replSet rs

8. In the mongo shell convert the secondary to an arbiter using the rs.addArb() (page 476) method:
rs.addArb("<hostname>:<port>")

9. Verify the arbiter belongs to the replica set by calling the rs.config() method in the mongo shell.
rs.config()

The arbiter member should include the following:


"arbiterOnly" : true

Convert a Secondary to an Arbiter Running on a New Port Number 1. If your application is connecting directly to the secondary or has a connection string referencing the secondary, modify the application so that MongoDB queries dont reach the secondary. 2. Create a new, empty data directory to be used with the new port number. For example:
mkdir /data/db-temp

3. Start a new mongod instance on the new port number, specifying the new data directory and the existing replica set. Issue a command similar to the following:
mongod --port 27021 --dbpath /data/db-temp --replSet rs

4. In the mongo shell connected to the current primary, convert the new mongod instance to an arbiter using the rs.addArb() (page 476) method:
rs.addArb("<hostname>:<port>")

5. Verify the arbiter has been added to the replica set by calling the rs.config() method in the mongo shell.

84

Chapter 6. Tutorials

MongoDB Documentation, Release 2.0.6

rs.config()

The arbiter member should include the following:


"arbiterOnly" : true

6. Shut down the secondary. 7. Remove the secondary from the replica set by calling the rs.remove() (page 478) method in the mongo shell:
rs.remove("<hostname>:<port>")

8. Verify that the replica set no longer includes the old secondary by calling the rs.config() method in the mongo shell:
rs.config()

9. Move the secondarys data directory to an archive folder. For example:


mv /data/db /data/db-old

Optional You may remove the data instead.

6.7. Convert a Secondary to an Arbiter

85

MongoDB Documentation, Release 2.0.6

86

Chapter 6. Tutorials

CHAPTER

SEVEN

REFERENCE
The following describes the replica set conguration object: Replica Set Conguration (page 561) The following describe status commands: Replica Status Reference (page 558) Replication Info Reference (page 566)

87

MongoDB Documentation, Release 2.0.6

88

Chapter 7. Reference

Part IV

Sharding

89

MongoDB Documentation, Release 2.0.6

Sharding distributes a single logical database system across a cluster of machines. Sharding uses range-based portioning to distribute documents based on a specic shard key. This page lists the documents, tutorials, and reference pages that describe sharding. For an overview, see Sharding Fundamentals (page 93). To congure, maintain, and troubleshoot shard clusters, see Shard Cluster Administration (page 99). For deployment architectures, see Shard Cluster Architectures (page 114). For details on the internal operations of sharding, see Sharding Internals (page 116). For procedures for performing certain sharding tasks, see the Tutorials (page 123) list.

91

MongoDB Documentation, Release 2.0.6

92

CHAPTER

EIGHT

DOCUMENTATION
The following is the outline of the main documentation:

8.1 Sharding Fundamentals


MongoDBs sharding system allows users to partition the data of a collection within a database to distribute documents across a number of mongod instances or shards. Sharded clusters allow increases in write capacity, provide the ability to support larger working sets, and raise the limits of total data size beyond the physical resources of a single node. This document provides an overview of the fundamental concepts and operation of sharding with MongoDB. See Also: The Sharding (page 91) index for a list of all documents in this manual that contain information related to the operation and use of shard clusters in MongoDB. This includes: Sharding Internals (page 116) Shard Cluster Administration (page 99) Shard Cluster Architectures (page 114) If you are not yet familiar with sharding, see the Sharding FAQ (page 359) document.

8.1.1 Overview
Features With sharding MongoDB automatically distributes data among a collection of mongod instances. Sharding, as implemented in MongoDB has the following features: Range-based Data Partitioning MongoDB distributes documents among shards based on the value of the shard key (page 96). Each chunk represents a block of documents with values that fall within a specic range. When chunks grow beyond the chunk size (page 120), MongoDB divides the chunks into smaller chunks (i.e. splitting) based on the shard key. Automatic Data Volume Distribution The sharding system automatically balances data across the cluster without intervention from the application layer. Effective automatic sharding depends on a well chosen shard key (page 96), but requires no additional complexity, modications, or intervention from developers.

93

MongoDB Documentation, Release 2.0.6

Transparent Query Routing Sharding is completely transparent to the application layer, because all connections to a shard cluster go through mongos. Sharding in MongoDB requires some basic initial conguration (page 100), but ongoing function is entirely transparent to the application. Horizontal Capacity Sharding increases capacity in two ways: 1. Effective partitioning of data can provide additional write capacity by distributing the write load over a number of mongod instances. 2. Given a shard key with sufcient cardinality (page 116), partitioning data allows users to increase the potential amount of data to mange with MongoDB and expand the working set. A typical shard cluster consists of: 3 cong servers that store metadata. The metadata maps chunks to shards. More than one replica sets that hold data. These are the shards. A number of lightweight routing processes, called mongos (page 491) instances. The mongos process routes operations to the correct shard based the cluster conguration. Indications While sharding is a powerful and compelling feature, it comes with signicant infrastructure requirements (page 94) and some limited complexity costs. As a result, use sharding only as necessary, and when indicated by actual operational requirements. Consider the following overview of indications it may be time to consider sharding. You should consider deploying a shard cluster, if: your data set approaches or exceeds the storage capacity of a single node in your system. the size of your systems active working set will soon exceed the capacity of the maximum amount of RAM for your system. your system has a large amount of write activity, a single MongoDB instance cannot write data fast enough to meet demand, and all other approaches have not reduced contention. If these attributes are not present in your system, sharding will only add additional complexity to your system without providing much benet. When designing your data model, if you will eventually need a shard cluster, consider which collections you will want to shard and the corresponding shard keys. Warning: It takes time and resources to deploy sharding, and if your system has already reached or exceeded its capacity, you will have a difcult time deploying sharding without impacting your application. As a result, if you think you will need to partition your database in the future, do not wait until your system is overcapacity to enable sharding.

8.1.2 Requirements
Infrastructure A shard cluster has the following components: Three cong servers. These special mongod instances store the metadata for the cluster. The mongos instances cache this data and use it to determine which shard is responsible for which chunk. For testing purposes you may deploy a shard cluster with a single conguration server, but this is not recommended for production.

94

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

Warning: If you choose to run a single cong server and it becomes inoperable for any reason, the cluster will be unusable. Two or more mongod instances, to hold data. These are normal, mongod instances that hold all of the actual data for the cluster. Typically, one or more replica sets, consisting of multiple mongod instances, compose a shard cluster. The members of the replica set provide redundancy for the data and increase the overall reliability and robustness of the cluster. Warning: MongoDB enables data partitioning (i.e. sharding) on a per collection basis. You must access all data in a shard cluster via the mongos instances. One or more mongos instances. These nodes cache cluster metadata from the cong servers and direct queries from the application layer to the shards that hold the data. Note: In most situations mongos instances use minimal resources, and you can run them on your application servers without impacting application performance. However, if you use the aggregation framework some processing may occur on the mongos instances, causing that mongos to require more system resources.

Data Your cluster must manage a signicant quantity of data for sharding to have an effect on your collection. The default chunk size is 64 megabytes, 1 and the balancer (page 98) will not begin moving data until the imbalance of chunks in the cluster exceeds the migration threshold (page 120). Practically, this means that unless your cluster has enough data, chunks will remain on the same shard. You can set a smaller chunk size, or manually create splits in your collection (page 105) using the sh.splitFind() (page 483) and sh.splitAt() (page 483) operations in the mongo shell. Remember that the default chunk size and migration threshold are explicitly congured to prevent unnecessary splitting or migrations. While there are some exceptional situations where you may need to shard a small collection of data, most of the time the additional complexity added by sharding is not worth the operational costs unless you need the additional concurrency/capacity for some reason. If you have a small data set, the chances are that a properly congured single MongoDB instance or replica set will be more than sufcient for your persistence layer needs. Sharding and localhost Addresses Because all components of a shard cluster must communicate with each other over the network, there are special restrictions regarding the use of localhost addresses: If you use either localhost or 127.0.0.1 as the host identier, then you must use localhost or 127.0.0.1 for all host settings for any MongoDB instances in the cluster. This applies to both the host argument to addShard and the value to the mongos --configdb (page 493) run time option. If you mix localhost addresses with remote host address, MongoDB will produce errors.
While the default chunk size is 64 megabytes, the size is user configurable (page 493). When deciding chunk size, MongoDB (for defaults) and users (for custom values) must consider that: smaller chunks offer the possibility of more even data distribution, but increase the likelihood of chunk migrations. Larger chunks decrease the need for migrations, but increase the amount of time required for a chunk migration. See the Chunk Size (page 120) section in the Sharding Internals (page 116) document for more information on this topic.
1

8.1. Sharding Fundamentals

95

MongoDB Documentation, Release 2.0.6

8.1.3 Shard Keys


Shard keys refer to the eld that exists in every document in a collection that that MongoDB uses to distribute documents among the shards. Shard keys, like indexes, can be either a single eld, or may be a compound key, consisting of multiple elds. Remember, MongoDBs sharding is range-based: each chunk holds documents having specic range of values for the shard key. Thus, choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster. Appropriate shard key choice depends on the schema of your data and the way that your application queries and writes data to the database. The ideal shard key: is easily divisible which makes it easy for MongoDB to distribute content among the shards. Shard keys that have a limited number of possible values are not ideal as they can result in some chunks that are unsplitable. See the Cardinality (page 116) section for more information. will distribute write operations among the cluster, to prevent any single shard from becoming a bottleneck. Shard keys that have a high correlation with insert time are poor choices for this reason; however, shard keys that have higher randomness satisfy this requirement better. See the Write Scaling (page 117) section for additional background. will make it possible for the mongos to return most query operations directly from a single specic mongod instance. Your shard key should be the primary eld used by your queries, and elds with a high degree of randomness are poor choices for this reason. See the Query Isolation (page 117) section for specic examples. The challenge when selecting a shard key is that there is not always an obvious choice. Often, an existing eld in your collection may not be the optimal key. In those situations, computing a special purpose shard key into an additional eld or using a compound shard key may help produce one that is more ideal.

8.1.4 Cong Servers


Cong servers maintain the shard metadata in a cong database. The cong database stores the relationship between chunks and where they reside within a shard cluster. Without a cong database, the mongos instances would be unable to route queries or write operations within the cluster. Cong servers do not run as replica sets. Instead, a shard cluster operates with a group of three cong servers that use a two-phase commit process that ensures immediate consistency and reliability. For testing purposes you may deploy a shard cluster with a single cong server, but this is not recommended for production. Warning: If you choose to run a single cong server and it becomes unavailable for any reason, the cluster will be unusable. The actual load on conguration servers is small because each mongos instances maintains a cached copy of the conguration database. MongoDB only writes data to the cong server to: create splits in existing chunks, which happens as data in existing chunks exceeds the maximum chunk size. migrate a chunk between shards. Additionally, all cong servers must be available on initial setup of a shard cluster, each mongos instance must be able to write to the config.version collection.

96

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

If one or two conguration instances become unavailable, the clusters metadata becomes read only. It is still possible to read and write data from the shards, but no chunk migrations or splits will occur until all three servers are accessible. At the same time, cong server data is only read in the following situations: A new mongos starts for the rst time, or an existing mongos restarts. After a chunk migration, the mongos instances update themselves with the new cluster metadata. If all three cong servers are inaccessible, you can continue to use the cluster as long as you dont restart the mongos instances until the after cong servers are accessible again. If you restart the mongos instances and there are no accessible cong servers, the mongos would be unable to direct queries or write operations to the cluster. Because the conguration data is small relative to the amount of data stored in a cluster, the amount of activity is relatively low, and 100% up time is not required for a functioning shard cluster. As a result, backing up the cong servers is not difcult. Backups of cong servers are critical as shard clusters become totally inoperable when you lose all conguration instances and data. Precautions to ensure that the cong servers remain available and intact are critical. Note: Conguration servers store metadata for a single shard cluster. You must have a separate conguration server or servers for each shard cluster you operate.

8.1.5 mongos and Querying


See Also: mongos Manual (page 491) and the mongos-only settings: test (page 503) and chunkSize (page 503). Operations The mongos provides a single unied interface to a sharded cluster for applications using MongoDB. Except for the selection of a shard key, application developers and administrators need not consider any of the internal details of sharding (page 116). mongos caches data from the cong server (page 96), and uses this to route operations from applications and clients to the mongod instances. mongos have no persistent state and consume minimal system resources. The most common practice is to run mongos instances on the same systems as your application servers, but you can maintain mongos instances on the shards or on other dedicated resources. Note: Changed in version 2.1. Some aggregation operations using the aggregate command (i.e. db.collection.aggregate() (page 454),) will cause mongos instances to require more CPU resources than in previous versions. This modied performance prole may dictate alternate architecture decisions if you use the aggregation framework extensively in a sharded environment.

Routing mongos uses information from cong servers (page 96) to route operations to the cluster as efciently as possible. In general, operations in a sharded environment are either: 1. Targeted at a single shard or a limited group of shards based on the shard key. 2. Broadcast to all shards in the cluster that hold documents in a collection.

8.1. Sharding Fundamentals

97

MongoDB Documentation, Release 2.0.6

When possible you should design your operations to be as targeted as possible. Operations have the following targeting characteristics: Query operations broadcast to all shards 2 unless the mongos can determine which shard or shard stores this data. For queries that include the shard key, mongos can target the query at a specic shard. For example: For queries that contain a component of the shard key, the mongos may be able to target the query at a limited subset of shards. All insert() (page 460) operations target to one shard. All single update() (page 464) operations target to one shard. This includes upsert operations. The mongos broadcasts multi-update operations to every shard. The mongos broadcasts remove() (page 462) operations to every shard unless the operation species the shard key in full. While some operations must broadcast to all shards, you can improve performance by using as many targeted operations as possible by ensuring that your operations include the shard key.

8.1.6 Balancing and Distribution


Balancing refers to the process that MongoDB uses to redistribute data within a shard cluster when some shards have a greater number of chunks than other shards. The balancing process attempts to minimize the impact that balancing can have on the cluster, by: Only moving one chunk at a time. Only initiating a balancing round when the difference in number of chunks between the shard with the greatest and the shard with the least number of chunks exceeds the migration threshold (page 120). Additionally, you may disable the balancer on a temporary basis for maintenance and limit the window during which it runs to prevent the balancing process from impacting production trafc. See Also: The Balancing Internals (page 119) and Balancer Operations (page 108) for more information on balancing. Note: The balancing procedure for shard clusters is entirely transparent to the user and application layer. This documentation is only included for your edication and possible troubleshooting purposes.

8.1.7 Security
Note: You should always run shard clusters in trusted networking environments that control access to the cluster using network rules and restrictions to ensure that only known trafc reaches your mongod and mongos instances. Warning: Limitations Changed in version 2.2: Read only authentication is fully supported in shard clusters. Previously, in version 2.0, shard clusters would not enforce read-only limitations.Changed in version 2.0: Shard clusters support authentication. Previously, in version 1.8, shard clusters will not support authentication and access control. You must run your sharded systems in trusted environments.
2

If a shard does not store chunks from a given collection, queries for documents in that collection are not broadcast to that shard.

98

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

To control access to a shard cluster, you must set the keyFile (page 496) option on all components of the shard cluster. Use the --keyFile (page 493) run-time option or :the setting:keyFile conguration option for all mongos, conguration instances, and shard mongod instances. There are two classes of security credentials in a shard cluster: credentials for admin users (i.e. for the admin database) and credentials for all other databases. These credentials reside in different locations within the shard cluster and have different roles: 1. Admin database credentials reside on the cong servers, to receive admin access to the cluster you must authenticate a session while connected to a mongos instance using the admin database. 2. Other database credentials reside on the primary shard for the database. This means that you can authenticate to these users and databases while connected directly to the primary shard for a database. However, for clarity and consistency all interactions between the client and the database should use a mongos instance. Note: Individual shards can store administrative credentials to their instance, which only permit access to a single shard. MongoDB stores these credentials in the shards admin databases and these credentials are completely distinct from the cluster-wide administrative credentials.

8.2 Shard Cluster Administration


This document provides a collection of basic operations and procedures for administering shard clusters. For a full introduction to sharding in MongoDB see Sharding Fundamentals (page 93), and for a complete overview of all sharding documentation in the MongoDB Manual, see Sharding (page 91). The Shard Cluster Architectures (page 114) document provides an overview of deployment possibilities to help deploy a shard cluster. Finally, the Sharding Internals (page 116) document provides a more detailed introduction to sharding when troubleshooting issues or understanding your clusters behavior.

8.2. Shard Cluster Administration

99

MongoDB Documentation, Release 2.0.6

Sharding Procedures: Setup (page 100) Cluster Management (page 102) Adding a Shard to a Cluster (page 102) Removing a Shard from a Cluster (page 103) Chunk Management (page 104) Splitting Chunks (page 105) Create Chunks (Pre-Splitting) (page 105) Modifying Chunk Size (page 106) Migrating Chunks (page 107) Balancer Operations (page 108) Check the Balancer Lock (page 108) Schedule the Balancing Window (page 109) Remove a Balancing Window Schedule (page 109) Disable the Balancer (page 109) Cong Server Maintenance (page 110) Upgrading from One Cong Server to Three Cong Servers (page 110) Migrating Cong Servers with the Same Hostname (page 110) Migrating Cong Servers with Different Hostnames (page 111) Replacing a Cong Server (page 111) Backup Cluster Metadata (page 112) Troubleshooting (page 112) All Data Remains on One Shard (page 112) One Shard Receives too much Trafc (page 113) The Cluster does not Balance (page 113) Migrations Render Cluster Unusable (page 113) Disable Balancing During Backups (page 114)

8.2.1 Setup
If you have an existing replica set, you can use the Convert a Replica Set to a Replicated Shard Cluster (page 167) tutorial as a guide. If youre deploying a shard cluster from scratch, see the Deploy a Shard Cluster (page 123) tutorial for more detail or use the following procedure as a quick starting point: 1. Provision the required hardware. The Requirements (page 94) section describes what youll need to get started. Warning: Sharding and localhost Addresses If you use either localhost or 127.0.0.1 as the host identier, then you must use localhost or 127.0.0.1 for all host settings for any MongoDB instances in the cluster. If you mix localhost addresses with remote host address, MongoDB will produce errors. 2. On all three (3) cong server instances, issue the following command to start the mongod process:
mongod --configsvr

This starts a mongod instance running on TCP port 27018, with the data stored in the /data/configdb path. All other command line (page 485) and conguration le (page 494) options are available for cong server instances. Note: All cong servers must be running and available when you rst initiate a shard cluster.

100

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

3. Start a mongos instance issuing the following command:

mongos --configdb config0.mongodb.example.net,config1.mongodb.example.net,config2.mongodb.exampl

4. Connect to the mongos instance using the mongo shell.


mongo mongos.mongodb.example.net

5. Add shards to the cluster. Run the following commands while connected to a mongos to initialize the cluster. First, you need to tell the cluster where to nd the individual shards. You can do this using the addShard command.
sh.addShard( "[hostname]:[port]" )

For example:
sh.addShard( "mongodb0.example.net:27027" )

MongoDB will discover all other members of the replica set, if mongodb0.example.net:27027 is a member of a replica set. Note: In production deployments, all shards should be replica sets. Repeat this step for each new shard in your cluster. Optional You may specify a name as an argument to the addShard, as follows:
db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } )

Or:
sh.addShard( mongodb0.example.net, name: "mongodb0" )

If you do not specify a shard name, then MongoDB will assign a name upon creation.

Note: Changed in version 2.0.3. Before version 2.0.3, you must specify the shard in the following form:
replicaSetName/<seed1>,<seed2>,<seed3>

For example, if the name of the replica set is repl0, then your sh.addShard (page 480) command would be:

sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:2

6. Enable sharding for any database that you want to shard. MongoDB enables sharding on a per-database basis. This is only a meta-data change and will not redistribute your data. To enable sharding for a given database, use the enableSharding command or the sh.enableSharding() (page 481) shell function.
db.runCommand( { enableSharding: [database] } )

Or:

8.2. Shard Cluster Administration

101

MongoDB Documentation, Release 2.0.6

sh.enableSharding([database])

Replace [database] with the name of the database you wish to enable sharding on. Note: MongoDB creates databases automatically upon their rst use. Once you enable sharding for a database, MongoDB assigns a primary shard for that database, where MongoDB stores all data before sharding begins. 7. Enable sharding on a per-collection basis. Finally, you must explicitly specify collections to shard. The collections must belong to a database for which you have enabled sharding. When you shard a collection, you also choose the shard key. To shard a collection, run the shardCollection command or the sh.shardCollection() (page 483) shell helper.
db.runCommand( { shardCollection: "[database].[collection]", key: "[shard-key]" } )

Or:
sh.shardCollection("[database].[collection]", "key")

For example:
db.runCommand( { shardCollection: "myapp.users", key: {username: 1} } )

Or:
sh.shardCollection("myapp.users", {username: 1})

The choice of shard key is incredibly important: it affects everything about the cluster from the efciency of your queries to the distribution of data. Furthermore, you cannot change a collections shard key once it has been set. See the Shard Key Overview (page 96) and the more in depth documentation of Shard Key Qualities (page 116) to help you select better shard keys. If you do not specify a shard key, MongoDB will shard the collection using the _id eld.

8.2.2 Cluster Management


Once you have a running shard cluster, you will need to maintain it. This section describes common maintenance procedure, including: how to add and remove nodes, how to manually split chunks, and how to disable the balancer for backups. Adding a Shard to a Cluster To add a shard to an existing shard cluster, use the following procedure: 1. Connect to a mongos in the cluster using the mongo shell. 2. First, you need to tell the cluster where to nd the individual shards. You can do this using the addShard command or the sh.addShard() (page 480) helper:
sh.addShard( "[hostname]:[port]" )

Replace [hostname] and [port] with the hostname and TCP port number of where the shard is accessible. For example:

102

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

sh.addShard( "mongodb0.example.net:27027" )

Note: In production deployments, all shards should be replica sets. Repeat for each shard in your cluster. Optional You may specify a name as an argument to the addShard, as follows:
db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } )

Or:
sh.addShard( mongodb0.example.net, name: "mongodb0" )

If you do not specify a shard name, then MongoDB will assign a name upon creation. Changed in version 2.0.3. Note: It may take some time for chunks to migrate to the new shard. See the Balancing and Distribution (page 98) section for an overview of the balancing operation and the Balancing Internals (page 119) section for additional information.

Removing a Shard from a Cluster To remove a shard from a shard cluster, you must: Migrate chunks to another shard or database. Ensure that this shard is not the primary shard for any databases in the cluster. If it is, move the primary status for these databases to other shards. Finally, remove the shard from the clusters conguration. Note: To successfully migrate data from a shard, the balancer process must be active. The procedure to remove a shard is as follows: 1. Connect to a mongos in the cluster using the mongo shell. 2. Determine the name of the shard you will be removing. You must specify the name of the shard. You may have specied this shard name when you rst ran the addShard command. If not, you can nd out the name of the shard by running the listshards or printShardingStatus commands or the sh.status() (page 484) shell helper. The following examples will remove a shard named mongodb0 from the cluster. 3. Begin removing chunks from the shard. Start by running the removeShard command. This will start draining or migrating chunks from the shard youre removing to another shard in the cluster.

8.2. Shard Cluster Administration

103

MongoDB Documentation, Release 2.0.6

db.runCommand( { removeshard: "mongodb0" } )

This operation will return the following response immediately:


{ msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 }

Depending on your network capacity and the amount of data in the shard, this operation can take anywhere from a few minutes to several days to complete. 4. View progress of the migration. You can run the removeShard again at any stage of the process to view the progress of the migration, as follows:
db.runCommand( { removeShard: "mongodb0" } )

The output should look something like this:


{ msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 }

In the remaining sub-document { chunks: xx, dbs: y }, a counter displays the remaining number of chunks that MongoDB must migrate to other shards and the number of MongoDB databases that have primary status on this shard. Continue checking the status of the removeShard command until the remaining number of chunks to transfer is 0. 5. Move any databases to other shards in the cluster as needed. This is only necessary when removing a shard that is also the primary shard for one or more databases. Issue the following command at the mongo shell:
db.runCommand( { movePrimary: "myapp", to: "mongodb1" })

This command will migrate all remaining non-sharded data in the database named myapp to the shard named mongodb1. Warning: Do not run the movePrimary until you have nished draining the shard. The command will not return until MongoDB completes moving all data. The response from this command will resemble the following:
{ "primary" : "mongodb1", "ok" : 1 }

6. Run removeShard again to clean up all metadata information and nalize the shard removal, as follows:
db.runCommand( { removeshard: "mongodb0" } )

When successful, this command will return a document like this:


{ msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 }

Once the value of the stage eld is completed, you may safely stop the processes comprising the mongodb0 shard.

8.2.3 Chunk Management


This section describes various operations on chunks in shard clusters. MongoDB automates these processes; however, in some cases, particularly when youre setting up a shard cluster, you may need to create and manipulate chunks 104 Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

directly. Splitting Chunks Normally, MongoDB splits a chunk following inserts when a chunk exceeds the chunk size (page 120). The balancer may migrate recently split chunks to a new shard immediately if mongos predicts future insertions will benet from the move. The MongoDB treats all chunks the same, whether split manually or automatically by the system. Warning: You cannot merge or combine chunks once you have split them. You may want to split chunks manually if: you have a large amount of data in your cluster and very few chunks, as is the case after creating a shard cluster from existing data. you expect to add a large amount of data that would initially reside in a single chunk or shard. Example You plan to insert a large amount of data with shard key values between 300 and 400, but all values of your shard keys are between 250 and 500 are in a single chunk. Use sh.status() (page 484) to determine the current chunks ranges across the cluster. To split chunks manually, use the split command with operators: middle and find. The equivalent shell helpers are sh.splitAt() (page 483) or sh.splitFind() (page 483). Example The following command will split the chunk that contains the value of 63109 for the zipcode eld:
sh.splitFind( { "zipcode": 63109 } )

sh.splitFind() (page 483) will split the chunk that contains the rst document returned that matches this query into two equally sized chunks. The query in sh.splitFind() (page 483) need not contain the shard key, though it almost always makes sense to query for the shard key in this case, and including the shard key will expedite the operation. Use sh.splitAt() (page 483) to split a chunk in two using the queried document as the partition point:
sh.splitAt( { "zipcode": 63109 } )

However, the location of the document that this query nds with respect to the other documents in the chunk does not affect how the chunk splits. Create Chunks (Pre-Splitting) In most situations a shard cluster will create and distribute chunks automatically without user intervention. However, in a limited number of use proles, MongoDB cannot create enough chunks or distribute data fast enough to support required throughput. Consider the following scenarios: you must partition an existing data collection that resides on a single shard.

8.2. Shard Cluster Administration

105

MongoDB Documentation, Release 2.0.6

you must ingest a large volume of data into a shard cluster that isnt balanced, or where the ingestion of data will lead to an imbalance of data. This can arise in an initial data loading, or in a case where you must insert a large volume of data into a single chunk, as is the case when you must insert at the beginning or end of the chunk range, as is the case for monotonically increasing or decreasing shard keys. Preemptively splitting chunks increases cluster throughput for these operations, by reducing the overhead of migrating chunks that hold data during the write operation. MongoDB only creates splits after an insert operation, and can only migrate a single chunk at a time. Chunk migrations are resource intensive and further complicated by large write volume to the migrating chunk. To create and migrate chunks manually, use the following procedure: 1. Split empty chunks in your collection by manually performing split command on chunks. Example To create chunks for documents in the myapp.users collection, using the email eld as the shard key, use the following operation in the mongo shell:
for ( var x=97; x<97+26; x++ ){ for( var y=97; y<97+26; y+=6 ) { var prefix = String.fromCharCode(x) + String.fromCharCode(y); db.runCommand( { split : "myapp.users" , middle : { email : prefix } } ); } }

This assumes a collection size of 100 million documents. 2. Migrate chunks manually using the moveChunk command: Example To migrate all of the manually created user proles evenly, putting each prex chunk on the next shard from the other, run the following commands in the mongo shell:

var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", for ( var x=97; x<97+26; x++ ){ for( var y=97; y<97+26; y+=6 ) { var prefix = String.fromCharCode(x) + String.fromCharCode(y); db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97 } }

You can also let the balancer automatically distribute the new chunks. Modifying Chunk Size When you initialize a shard cluster, the default chunk size is 64 megabytes. This default chunk size works well for most deployments. However, if you notice that automatic migrations are incurring a level of I/O that your hardware cannot handle, you may want to reduce the chunk size. For the automatic splits and migrations, a small chunk size leads to more rapid and frequent migrations. To modify the chunk size, use the following procedure: 1. Connect to any mongos in the cluster using the mongo shell.

106

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

2. Issue the following command to switch to the cong database:


use config

3. Issue the following save() (page 463) operation:


db.settings.save( { _id:"chunksize", value: <size> } )

Where the value of <size> reects the new chunk size in megabytes. Here, youre essentially writing a document whose values store the global chunk size conguration value. Note: The chunkSize (page 503) and --chunkSize (page 493) options, passed at runtime to the mongos do not affect the chunk size after you have initialized the cluster. To eliminate confusion you should always set chunk size using the above procedure and never use the runtime options. Modifying the chunk size has several limitations: Automatic splitting only occurs when inserting documents or updating existing documents. If you lower the chunk size it may take time for all chunks to split to the new size. Splits cannot be undone. If you increase the chunk size, existing chunks must grow through insertion or updates until they reach the new size. Migrating Chunks In most circumstances, you should let the automatic balancer migrate chunks between shards. However, you may want to migrate chunks manually in a few cases: If you create chunks by pre-splitting the data in your collection, you will have to migrate chunks manually to distribute chunks evenly across the shards. Use pre-splitting in limited situations, to support bulk data ingestion. If the balancer in an active cluster cannot distribute chunks within the balancing window, then you will have to migrate chunks manually. See the chunk migration (page 121) section to understand the internal process of how chunks move between shards. To migrate chunks, use the moveChunk command. Note: To return a list of shards, use the listshards command. Specify shard names using the addShard command using the name argument. If you do not specify a name in the addShard command, MongoDB will assign a name automatically. The following example assumes that the eld username is the shard key for a collection named users in the myapp database, and that the value smith exists within the chunk you want to migrate. To move this chunk, you would issue the following command from a mongo shell connected to any mongos instance.

db.adminCommand({moveChunk : "myapp.users", find : {username : "smith"}, to : "mongodb-shard3.example

This command moves the chunk that includes the shard key value smith to the shard named mongodb-shard3.example.net. The command will block until the migration is complete. See Create Chunks (Pre-Splitting) (page 105) for an introduction to pre-splitting. New in version 2.2: moveChunk command has the: _secondaryThrottle paramenter. When set to true, MongoDB ensures that secondary members have replicated operations before allowing new chunk migrations.

8.2. Shard Cluster Administration

107

MongoDB Documentation, Release 2.0.6

Warning: The moveChunk command may produce the following error message:
The collections metadata lock is already taken.

These errors occur when clients have too many open cursors that access the chunk you are migrating. You can either wait until the cursors complete their operation or close the cursors manually.

8.2.4 Balancer Operations


This section provides an overview of common administrative procedures related to balancing and the balancing process. See Also: Balancing and Distribution (page 98) and the moveChunk that provides manual chunk migrations. Check the Balancer Lock To see if the balancer process is active in your shard cluster, do the following: 1. Connect to any mongos in the cluster using the mongo shell. 2. Issue the following command to switch to the cong database:
use config

3. Use the following query to return the balancer lock:


db.locks.find( { _id : "balancer" } ).pretty()

You can also use the following shell helper to return the same information:
sh.getBalancerState().pretty()

When this command returns, you will see output like the following:
{ "_id" "process" "state" "ts" "when" "who" "why" : : : : : : : "balancer", "mongos0.example.net:1292810611:1804289383", 2, ObjectId("4d0f872630c42d1978be8a2e"), "Mon Dec 20 2010 11:41:10 GMT-0500 (EST)", "mongos0.example.net:1292810611:1804289383:Balancer:846930886", "doing balance round" }

Heres what this tells you: The balancer originates from mongos0.example.net. the mongos running on the system with the hostname

The value in the state eld indicates that a mongos has the lock. For version 2.0 and later, the value of an active lock is 2; for earlier versions the value is 1. Note: Use the sh.isBalancerRunning() (page 481) helper in the mongo shell to determine if the balancer is running, as follows:
sh.isBalancerRunning()

108

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

Schedule the Balancing Window In some situations, particularly when your data set grows slowly and a migration can impact performance, its useful to be able to ensure that the balancer is active only at certain times. Use the following procedure to specify a window during which the balancer will be able to migrate chunks: 1. Connect to any mongos in the cluster using the mongo shell. 2. Issue the following command to switch to the cong database:
use config

3. Use an operation modeled on the following example update() (page 464) operation to modify the balancers window:

db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "<start-time>", sto

Replace <start-time> and <end-time> with time values using two digit hour and minute values (e.g HH:MM) that describe the beginning and end boundaries of the balancing window. These times will be evaluated relative to the time zone of each individual mongos instance in the shard cluster. For instance, running the following will force the balancer to run between 11PM and 6AM local time only:

db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "23:00", stop : "6:

Note: The balancer window must be sufcient to complete the migration of all data inserted during the day. As data insert rates can change based on activity and usage patterns, it is important to ensure that the balancing window you select will be sufcient to support the needs of your deployment.

Remove a Balancing Window Schedule If you have set the balancing window (page 109) and wish to remove the schedule so that the balancer is always running, issue the following sequence of operations:
use config db.settings.update({ _id : "balancer" }, { $unset : { activeWindow : true })

Disable the Balancer By default the balancer may run at any time and only moves chunks as needed. To disable the balancer for a short period of time and prevent all migration, use the following procedure: 1. Connect to any mongos in the cluster using the mongo shell. 2. Issue the following command to disable the balancer:
sh.setBalancerState(false)

3. Later, issue the following command to enable the balancer:


sh.setBalancerState(true)

Note: If a balancing round is in progress, the system will complete the current round before the balancer is ofcially disabled. After disabling, you can use the sh.getBalancerState() (page 482) shell function to determine whether the balancer is in fact disabled.

8.2. Shard Cluster Administration

109

MongoDB Documentation, Release 2.0.6

The above process and the sh.setBalancerState() (page 482) helper provide a wrapper on the following process, which may be useful if you need to run this operation from a driver that does not have helper functions: 1. Connect to any mongos in the cluster using the mongo shell. 2. Issue the following command to switch to the cong database:
use config

3. Issue the following update to disable the balancer:


db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true );

4. To enable the balancer again, alter the value of stopped as follows:


db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true );

8.2.5 Cong Server Maintenance


Cong servers store all shard cluster metadata, perhaps most notably, the mapping from chunks to shards. This section provides an overview of the basic procedures to migrate, replace, and maintain these servers. See Also: Cong Servers (page 96) Upgrading from One Cong Server to Three Cong Servers For redundancy, all production shard clusters should deploy three cong servers processes on three different machines. Do not use only a single cong server for production deployments. Only use a single cong server deployments for testing. You should upgrade to three cong servers immediately if you are shifting to production. The following process shows how to upgrade from one to three cong servers. 1. Shut down all existing MongoDB processes. This includes: all mongod instances or replica sets that provide your shards. the mongod instance that provides your existing cong database. all mongos instances in your cluster. 2. Copy the entire dbpath (page 497) le system tree from the existing cong server to the two machines that will provide the additional cong servers. These commands, issued on the system with the existing cong database, mongo-config0.example.net may resemble the following:
rsync -az /data/configdb mongo-config1.example.net:/data/configdb rsync -az /data/configdb mongo-config2.example.net:/data/configdb

3. Start all three cong servers, using the same invocation that you used for the single cong server.
mongod --configsvr

4. Restart all shard mongod and mongos processes. Migrating Cong Servers with the Same Hostname Use this process when you need to migrate a cong server to a new system but the new system will be accessible using the same host name.

110

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

1. Shut down the cong server that youre moving. This will render all cong data for your cluster read only (page 96). 2. Change the DNS entry that points to the system that provided the old cong server, so that the same hostname points to the new system. How you do this depends on how you organize your DNS and hostname resolution services. 3. Move the entire dbpath (page 497) le system tree from the old cong server to the new cong server. This command, issued on the old cong server system, may resemble the following:
rsync -az /data/configdb mongo-config0.example.net:/data/configdb

4. Start the cong instance on the new system. The default invocation is:
mongod --configsrv

When you start the third cong server, your cluster will become writable and it will be able to create new splits and migrate chunks as needed. Migrating Cong Servers with Different Hostnames Use this process when you need to migrate a cong database to a new server and it will not be accessible via the same hostname. If possible, avoid changing the hostname so that you can use the previous procedure (page 110). 1. Shut down the cong server (page 96) youre moving. This will render all cong data for your cluster read only:
rsync -az /data/configdb mongodb.config2.example.net:/data/configdb

2. Start the cong instance on the new system. The default invocation is:
mongod --configsrv

3. Shut down all existing MongoDB processes. This includes: all mongod instances or replica sets that provide your shards. the mongod instances that provide your existing cong databases. all mongos instances in your cluster. 4. Restart all mongod processes that provide the shard servers. 5. Update the --configdb (page 493) parameter (or configdb (page 502)) for all mongos instances and restart all mongos instances. Replacing a Cong Server Use this procedure only if you need to replace one of your cong servers after it becomes inoperable (e.g. hardware failure.) This process assumes that the hostname of the instance will not change. If you must change the hostname of the instance, use the process for migrating a cong server to a different hostname (page 111). 1. Provision a new system, with the same hostname as the previous host. You will have to ensure that the new system has the same IP address and hostname as the system its replacing or you will need to modify the DNS records and wait for them to propagate.

8.2. Shard Cluster Administration

111

MongoDB Documentation, Release 2.0.6

2. Shut down one (and only one) of the existing cong servers. Copy all this hosts dbpath (page 497) le system tree from the current system to the system that will provide the new cong server. This command, issued on the system with the data les, may resemble the following:
rsync -az /data/configdb mongodb.config2.example.net:/data/configdb

3. Restart the cong server process that you used in the previous step to copy the data les to the new cong server instance. 4. Start the new cong server instance. The default invocation is:
mongod --configsrv

Backup Cluster Metadata The shard cluster will remain operational 3 without one of the cong databases mongod instances, creating a backup of the cluster metadata from the cong database is straight forward: 1. Shut down one of the cong databases. 2. Create a full copy of the data les (i.e. the path specied by the dbpath (page 497) option for the cong instance.) 3. Restart the original conguration server. See Also: Backup and Restoration Strategies (page 156).

8.2.6 Troubleshooting
The two most important factors in maintaining a successful shard cluster are: choosing an appropriate shard key (page 116) and sufcient capacity to support current and future operations (page 94). You can prevent most issues encountered with sharding by ensuring that you choose the best possible shard key for your deployment and ensure that you are always adding additional capacity to your cluster well before the current resources become saturated. Continue reading for specic issues you may encounter in a production environment. All Data Remains on One Shard Your cluster must have sufcient data for sharding to make sense. Sharding works by migrating chunks between the shards until each shard has roughly the same number of chunks. The default chunk size is 64 megabytes. MongoDB will not begin migrations until the imbalance of chunks in the cluster exceeds the migration threshold (page 120). While the default chunk size is congurable with the chunkSize (page 503) setting, these behaviors help prevent unnecessary chunk migrations, which can degrade the performance of your cluster as a whole. If you have just deployed a shard cluster, make sure that you have enough data to make sharding effective. If you do not have sufcient data to create more than eight 64 megabyte chunks, then all data will remain on one shard. Either lower the chunk size (page 120) setting, or add more data to the cluster.
3 While one of the three cong servers unavailable, no the cluster cannot split any chunks nor can it migrate chunks between shards. Your application will be able to write data to the cluster. The Cong Servers (page 96) section of the documentation provides more information on this topic.

112

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

As a related problem, the system will split chunks only on inserts or updates, which means that if you congure sharding and do not continue to issue insert and update operations, the database will not create any chunks. You can either wait until your application inserts data or split chunks manually (page 105). Finally, if your shard key has a low cardinality (page 116), MongoDB may not be able to create sufcient splits among the data. One Shard Receives too much Trafc In some situations, a single shard or a subset of the cluster will receive a disproportionate portion of the trafc and workload. In almost all cases this is the result of a shard key that does not effectively allow write scaling (page 117). Its also possible that you have hot chunks. In this case, you may be able to solve the problem by splitting and then migrating parts of these chunks. In the worst case, you may have to consider re-sharding your data and choosing a different shard key (page 118) to correct this pattern. The Cluster does not Balance If you have just deployed your shard cluster, you may want to consider the troubleshooting suggestions for a new cluster where data remains on a single shard (page 112). If the cluster was initially balanced, but later developed an uneven distribution of data, consider the following possible causes: You have deleted or removed a signicant amount of data from the cluster. If you have added additional data, it may have a different distribution with regards to its shard key. Your shard key has low cardinality (page 116) and MongoDB cannot split the chunks any further. Your data set is growing faster than the balancer can distribute data around the cluster. This is uncommon and typically is the result of: a balancing window (page 109) that is too short, given the rate of data growth. an uneven distribution of write operations (page 117) that requires more data migration. You may have to choose a different shard key to resolve this issue. poor network connectivity between shards, which may lead to chunk migrations that take too long to complete. Investigate your network conguration and interconnections between shards. Migrations Render Cluster Unusable If migrations impact your cluster or applications performance, consider the following options, depending on the nature of the impact: 1. If migrations only interrupt your clusters sporadically, you can limit the balancing window (page 109) to prevent balancing activity during peak hours. Ensure that there is enough time remaining to keep the data from becoming out of balance again. 2. If the balancer is always migrating chunks to the detriment of overall cluster performance: You may want to attempt decreasing the chunk size (page 106) to limit the size of the migration. Your cluster may be over capacity, and you may want to attempt to add one or two shards (page 102) to the cluster to distribute load.

8.2. Shard Cluster Administration

113

MongoDB Documentation, Release 2.0.6

Its also possible, that your shard key causes your application to direct all writes to a single shard. This kind of activity pattern can require the balancer to migrate most data soon after writing it. Consider redeploying your cluster with a shard key that provides better write scaling (page 117). Disable Balancing During Backups If MongoDB migrates a chunk during a backup (page 156), you can end with an inconsistent snapshot of your shard cluster. Never run a backup while the balancer is active. To ensure that the balancer is inactive during your backup operation: Set the balancing window (page 109) so that the balancer is inactive during the backup. Ensure that the backup can complete while you have the balancer disabled. manually disable the balancer (page 109) for the duration of the backup procedure. Conrm that the balancer is not active using the sh.getBalancerState() (page 482) method before starting a backup operation. When the backup procedure is complete you can reactivate the balancer process.

8.3 Shard Cluster Architectures


This document describes the organization and design of shard cluster deployments. See Also: The Shard Cluster Administration (page 99) document, the Sharding Requirements (page 94) section, and the Sharding Tutorials (page 123) for more information on deploying and maintaining a shard cluster.

8.3.1 Deploying a Test Cluster


Warning: Use this architecture for testing and development only. You can deploy a very minimal shard cluster for testing and development. Such a cluster will have the following components: 1 cong server (page 96). At least one mongod instance (either replica sets or as a standalone node.) 1 mongos instance.

8.3.2 Deploying a Production Cluster


When deploying a shard cluster to production, you must ensure that the data is redundant and that your systems are highly available. To that end, a production-level shard cluster must have the following components: 3 cong servers (page 96), each residing on a discrete system. Note: A single shard cluster must have exclusive use of its cong servers (page 96). If you have multiple shards, you will need to have a group of cong servers for each cluster.

114

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

2 or more replica sets, for the shards. See Also: Replication Architectures (page 47) and Replication (page 31) for more information on replica sets. mongos instances. Typically, you will deploy a single mongos instance on each application server. Alternatively, you may deploy several mongos nodes and let your application connect to these via a load balancer. See Also: The Adding a Shard to a Cluster (page 102) and Removing a Shard from a Cluster (page 103) procedures for more information.

8.3.3 Sharded and Non-Sharded Data


Sharding operates on the collection level. You can shard multiple collections within a database, or have multiple databases with sharding enabled. 4 However, in production deployments some databases and collections will use sharding, while other databases and collections will only reside on a single database instance or replica set (i.e. a shard.) Note: Regardless of the data architecture of your shard cluster, ensure that all queries and operations use the mongos router to access the data cluster. Use the mongos even for operations that do not impact the sharded data. Every database has a primary 5 shard that holds all un-sharded collections in that database. All collections that are not sharded reside on the primary for their database. Use the moveprimary command to change the primary shard for a database. Use the printShardingStatus command or the sh.status() (page 484) to see an overview of the cluster, which contains information about the chunk and database distribution within the cluster. Warning: The moveprimary command can be expensive because it copies all non-sharded data to the new shard, during which that data will be unavailable for other operations. When you deploy a new shard cluster, the rst shard becomes the primary for all databases before enabling sharding. Databases created subsequently, may reside on any shard in the cluster.

8.3.4 High Availability and MongoDB


A production (page 114) shard cluster has no single point of failure. This section introduces the availability concerns for MongoDB deployments, and highlights potential failure scenarios and available resolutions: Application servers or mongos instances become unavailable. If each application server has its own mongos instance, other application servers can continue access the database. Furthermore, mongos instances do not maintain persistent state, and they can restart and become unavailable without loosing any state or data. When a mongos instance starts, it retrieves a copy of the cong database and can begin routing queries. A single mongod becomes unavailable in a shard. Replica sets (page 31) provide high availability for shards. If the unavailable mongod is a primary, then the replica set will elect (page 34) a new primary. If the unavailable mongod is a secondary, and it connects within its recovery window (page 36). In a three member replica set, even if a single member of the set experiences catastrophic failure, two other members have full copies of the data.
As you congure sharding, you will use the enablesharding command to enable sharding for a database. This simply makes it possible to use the shardcollection on a collection within that database. 5 The term primary in the context of databases and sharding, has nothing to do with the term primary in the context of replica sets.
4

8.3. Shard Cluster Architectures

115

MongoDB Documentation, Release 2.0.6

Always investigate availability interruptions and failures. If a system is unrecoverable, replace it and create a new member of the replica set as soon as possible to replace the lost redundancy. All members of a replica set become unavailable. If all members of a replica set within a shard are unavailable, all data held in on that shard is unavailable. However, the data on all other shards will remain available, and its possible to read and write data to the other shards. However, your application must be able to deal with partial results, and you should investigate the cause of the interruption and attempt to recover the shard as soon as possible. One or two cong database become unavailable. Three distinct mongod instances provide the cong database using a special two-phase commits to maintain consistent state between these mongod instances. Shard cluster operation will continue as normal but chunk migration (page 98) and the cluster can create no new chunk splits (page 105). Replace the cong server as soon as possible. If all multiple cong databases become unavailable, the cluster can become inoperable. Note: All cong servers must be running and available when you rst initiate a shard cluster.

8.4 Sharding Internals


This document introduces lower level sharding concepts for users who are familiar with sharding generally and want to learn more about the internals of sharding in MongoDB. The Sharding Fundamentals (page 93) document provides an overview of higher level sharding concepts while the Shard Cluster Administration (page 99) provides an overview of common administrative tasks.

8.4.1 Shard Keys


Shard keys are the eld in a collection that MongoDB uses to distribute documents within a shard cluster. See the overview of shard keys (page 96) for an introduction to these topics. Cardinality Cardinality in the context of MongoDB, refers to the ability of the system to partition data into chunks. For example, consider a collection of data such as an address book that stores address records: Consider the use of a state eld as a shard key: The state keys value holds the US state for a given address document. This eld has a low cardinality as all documents that have the same value in the state eld must reside on the same shard, even if a particular states chunk exceeds the maximum chunk size. Since there are a limited number of possible values for the state eld, MongoDB may distribute data unevenly among a small number of xed chunks. This may have a number of effects: If MongoDB cannot split a chunk because all of its documents have the same shard key, migrations involving these un-splitable chunks will take longer than other migrations, and it will be more difcult for your data to stay balanced. If you have a xed maximum number of chunks, you will never be able to use more than that number of shards for this collection. Consider the use of a zipcode eld as a shard key:

116

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

While this eld has a large number of possible values, and thus has potentially higher cardinality, its possible that a large number of users could have the same value for the shard key, which would make this chunk of users un-splitable. In these cases, cardinality depends on the data. If your address book stores records for a geographically distributed contact list (e.g. Dry cleaning businesses in America,) then a value like zipcode would be sufcient. However, if your address book is more geographically concentrated (e.g ice cream stores in Boston Massachusetts,) then you may have a much lower cardinality. Consider the use of a phone-number eld as a shard key: Phone number has a high cardinality, because users will generally have a unique value for this eld, MongoDB will be able to split as many chunks as needed. While high cardinality, is necessary for ensuring an even distribution of data, having a high cardinality does not guarantee sufcient query isolation (page 117) or appropriate write scaling (page 117). Please continue reading for more information on these topics. Write Scaling Some possible shard keys will allow your application to take advantage of the increased write capacity that the shard cluster can provide, while others do not. Consider the following example where you shard by the values of the default _id eld, which is ObjectID. ObjectID is computed upon document creation, that is a unique identier for the object. However, the most significant bits of data in this value represent a time stamp, which means that they increment in a regular and predictable pattern. Even though this value has high cardinality (page 116), when using this, any date, or other monotonically increasing number as the shard key, all insert operations will be storing data into a single chunk, and therefore, a single shard. As a result, the write capacity of this shard will dene the effective write capacity of the cluster. A shard key that increases monotonically will not hinder performance if you have a very low insert rate, or if most of your write operations are update() (page 464) operations distributed through your entire data set. Generally, choose shard keys that have both high cardinality and will distribute write operations across the entire cluster. Typically, a computed shard key that has some amount of randomness, such as ones that include a cryptographic hash (i.e. MD5 or SHA1) of other content in the document, will allow the cluster to scale write operations. However, random shard keys do not typically provide query isolation (page 117), which is another important characteristic of shard keys. Querying The mongos provides an interface for applications to interact with shard clusters that hides the complexity of data partitioning. A mongos receives queries from applications, and uses metadata from the cong server (page 96), to route queries to the mongod instances with the appropriate data. While the mongos succeeds in making all querying operational in sharded environments, the shard key you select can have a profound affect on query performance. See Also: The mongos and Sharding (page 97) and cong server (page 96) sections for a more general overview of querying in sharded environments.
Query Isolation

The fastest queries in a sharded environment are those that mongos will route to a single shard, using the shard key and the cluster meta data from the cong server (page 96). For queries that dont include the shard key, mongos must

8.4. Sharding Internals

117

MongoDB Documentation, Release 2.0.6

query all shards, wait for their response and then return the result to the application. These scatter/gather queries can be long running operations. If your query includes the rst component of a compound shard key 6 , the mongos can route the query directly to a single shard, or a small number of shards, which provides better performance. Even if you query values of the shard key reside in different chunks, the mongos will route queries directly to specic shards. To select a shard key for a collection: determine the most commonly included elds in queries for a given application nd which of these operations are most performance dependent. If this eld has low cardinality (i.e not sufciently selective) you should add a second eld to the shard key making a compound shard key. The data may become more splitable with a compound shard key. See Also: mongos and Querying (page 97) for more information on query operations in the context of shard clusters.
Sorting

If you use the sort() (page 452) method on a query in a sharded MongoDB environment and the sort is on a eld that is not part of the shard key, mongos must send the query to all mongod instances in the cluster. mongos must wait for a response from every shard before it can merge the results and return data. If you require high performance sorted queries, ensure that the sort key is a component of the shard key. Operations and Reliability The most important consideration when choosing a shard key are: to ensure that MongoDB will be able to distribute data evenly among shards, and to scale writes across the cluster, and to ensure that mongos can isolate most queries to a specic mongod. Furthermore: Each shard should be a replica set, if a specic mongod instance fails, the replica set members will elect another to be primary and continue operation. However, if an entire shard is unreachable or fails for some reason, that data will be unavailable. If the shard key allows the mongos to isolate most operations to a single shard, then the failure of a single will only render some data unavailable. If your shard key distributes data required for every operation throughout the cluster, then the failure of the entire shard will render the entire cluster unavailable. In essence, this concern for reliably simply underscores the importance of choosing a shard key that isolates query operations to a single shard. Choosing a Shard Key It is unlikely that any single, naturally occurring key in your collection will satisfy all requirements of a good shard key. There are three options:
6 In many ways, you can think of the shard key a cluster-wide unique index. However, be aware that sharded systems cannot enforce cluster-wide unique indexes unless the unique eld is in the shard key. Consider the Indexes wiki page for more information on indexes and compound indexes.

118

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

1. Compute a more ideal shard key in your application layer, and store this in all of your documents, potentially in the _id eld. 2. Use a compound shard key, that uses two or three values from all documents that provide the right mix of cardinality with scalable write operations and query isolation. 3. Determine that the impact of using a less than ideal shard key, is insignicant in your use case given: limited write volume, expected data size, or query patterns and demands. From a decision making stand point, begin by nding the eld that will provide the required query isolation (page 117), ensure that writes will scale across the cluster (page 117), and then add an additional eld to provide additional cardinality (page 116) if your primary key does not have sufcient split-ability. Shard Key Indexes All sharded collections must have an index that starts with the shard key. If you shard a collection that does not yet contain documents and without such an index, the shardCollection will create an index on the shard key. If the collection already contains documents, you must create an appropriate index before using shardCollection. Changed in version 2.2: The index on the shard key no longer needs to be identical to the shard key. This index can be an index of the shard key itself as before, or a compound index where the shard key is the prex of the index. This index cannot be a multikey index. If you have a collection named people, sharded using the eld { zipcode: 1 }, and you want to replace this with an index on the eld { zipcode: 1, username: 1 }, then: 1. Create an index on { zipcode: 1, username: 1 }:

db.people.ensureIndex( { zipcode: 1, username: 1 } );

2. When MongoDB nishes building the index, you can safely drop existing index on { zipcode:
db.people.dropIndex( { zipcode: 1 } );

1 }:

Warning: The index on the shard key cannot be a multikey index. As above, an index on { zipcode: 1, username: 1 } can only replace an index on zipcode if there are no array values for the username eld. If you drop the last appropriate index for the shard key, recover by recreating a index on just the shard key.

8.4.2 Cluster Balancer


The balancer (page 98) sub-process is responsible for redistributing chunks evenly among the shards and ensuring that each member of the cluster is responsible for the same volume of data. This section contains complete documentation of the balancer process and operations. For a higher level introduction see the Balancing (page 98) section. Balancing Internals A balancing round originates from an arbitrary mongos instance, because your shard cluster can have a number of mongos instances. When a balancer process is active, the responsible mongos acquires a lock by modifying a document on the cong database.

8.4. Sharding Internals

119

MongoDB Documentation, Release 2.0.6

By default, the balancer process is always running. When the number of chunks in a collection is unevenly distributed among the shards, the balancer begins migrating chunks from shards with more chunks to shards with a fewer number of chunks. The balancer will continue migrating chunks, one at a time, until the data is evenly distributed among the shards. While these automatic chunk migrations are crucial for distributing data, they carry some overhead in terms of bandwidth and workload, both of which can impact database performance. As a result, MongoDB attempts to minimize the effect of balancing by only migrating chunks when the distribution of chunks passes the migration thresholds (page 120). The migration process ensures consistency and maximizes availability of chunks during balancing: when MongoDB begins migrating a chunk, the database begins copying the data to the new server and tracks incoming write operations. After migrating chunks, the from mongod sends all new writes, to the receiving server. Finally, mongos updates the chunk record in the cong database to reect the new location of the chunk. Migration Thresholds Changed in version 2.2: The following thresholds appear rst in 2.2; prior to this release, balancing would only commence if the shard with the most chunks had 8 more chunks than the shard with the least number of chunks. In order to minimize the impact of balancing on the cluster, the balancer will not begin balancing until the distribution of chunks has reached certain thresholds. These thresholds apply to the difference in number of chunks between the shard with the greatest number of chunks and the shard with the least number of chunks. The balancer has the following thresholds: Number of Chunks Less than 20 21-80 Greater than 80 Migration Threshold 2 4 8

Once a balancing round starts, the balancer will not stop until the difference between the number of chunks on any two shards is less than two. Note: You can restrict the balancer so that it only operates between specic start and end times. See Schedule the Balancing Window (page 109) for more information. The specication of the balancing window is relative to the local time zone of all individual mongos instances in the shard cluster.

Chunk Size The default chunk size in MongoDB is 64 megabytes. When chunks grow beyond the specied chunk size (page 120) a mongos instance will split the chunk in half. This will eventually lead to migrations, when chunks become unevenly distributed among the cluster. The mongos instances will initiate a round of migrations to redistribute data in the cluster. Chunk size is arbitrary and must account for the following: 1. Small chunks lead to a more even distribution of data at the expense of more frequent migrations, which creates expense at the query routing (mongos) layer. 2. Large chunks lead to fewer migrations, which is more efcient both from the networking perspective and in terms internal overhead at the query routing layer. Large chunks produce these efciencies at the expense of a potentially more uneven distribution of data.

120

Chapter 8. Documentation

MongoDB Documentation, Release 2.0.6

For many deployments it makes sense to avoid frequent and potentially spurious migrations at the expense of a slightly less evenly distributed data set, but this value is congurable (page 106). Be aware of the following limitations when modifying chunk size: Automatic splitting only occurs when inserting documents or updating existing documents; if you lower the chunk size it may take time for all chunks to split to the new size. Splits cannot be undone: if you increase the chunk size, existing chunks must grow through insertion or updates until they reach the new size. Shard Size By default, MongoDB will attempt to ll all available disk space with data on every shard as the data set grows. Monitor disk utilization in addition to other performance metrics, to ensure that the cluster always has capacity to accommodate additional data. You can also congure a maximum size for any shard when you add the shard using the maxSize parameter of the addShard command. This will prevent the balancer from migrating chunks to the shard when the value of mem.mapped (page 542) exceeds the maxSize setting. See Also: Monitoring Database Systems (page 146). Chunk Migration MongoDB migrates chunks in a shard cluster to distribute data evenly among shards. Migrations may be either: Manual. In these migrations you must specify the chunk that you want to migrate and the destination shard. Only migrate chunks manually after initiating sharding, to distribute data during bulk inserts, or if the cluster becomes uneven. See Migrating Chunks (page 107) for more details. Automatic. The balancer process handles most migrations when distribution of chunks between shards becomes uneven. See Migration Thresholds (page 120) for more details. All chunk migrations use the following procedure: 1. The balancer process sends the moveChunk command to the source shard for the chunk. In this operation the balancer passes the name of the destination shard to the source shard. 2. The source initaties the move with an internal moveChunk command with the destination shard. 3. The destination shard begins requesting documents in the chunk, and begins receiving these chunks. 4. After receving the nal document in the chunk, the destination shard initiates a synchronization process to ensure that all changes to the documents in the chunk on the source shard during the migration process exist on the destination shard. When fully synchronized, the destination shard connects to the cong database and updates the chunk location in the cluster metadata. After completing this operation, once there are no open cursors on the chunk, the source shard starts deleting its copy of documents from the migrated chunk. When the _secondaryThrottle is true for moveChunk or the balancer, MongoDB ensure that one secondary member has replicated changes before allowing new chunk migrations.

8.4. Sharding Internals

121

MongoDB Documentation, Release 2.0.6

122

Chapter 8. Documentation

CHAPTER

NINE

TUTORIALS
The following tutorials describe specic sharding procedures:

9.1 Deploy a Shard Cluster


9.1.1 Synopsis
This document describes how to deploy a shard cluster for a standalone mongod instance. To deploy a shard cluster for an existing replica set, see Convert a Replica Set to a Replicated Shard Cluster (page 167).

9.1.2 Requirements
Before deploying a shard cluster, see the requirements listed in Requirements for Shard Clusters (page 94). Warning: Sharding and localhost Addresses If you use either localhost or 127.0.0.1 as the hostname portion of any host identier, for example as the host argument to addShard or the value to the mongos --configdb (page 493) run time option, then you must use localhost or 127.0.0.1 for all host settings. If you mix localhost addresses and remote host address, MongoDB will error.

9.1.3 Procedure
Initiate Cong Database Instances Begin by conguring three cong databases. These are very small mongod instances that provide cluster metadata. You must have exactly three instances in production deployments. For redundancy these instances should run on different systems and servers. You must separate cong database mongod instances to provide redundancy and to ensure that cluster metadata is secure and durable. Since cong database mongod instances receive relatively little trafc and demand only a small portion of system resources, you can run the instances on systems that run other services, such as on shards or on servers that provide mongos. To start a cong database, type the following command at a system prompt:
mongod --configsvr

123

MongoDB Documentation, Release 2.0.6

The --configsrv stores the data les in the congdb/ sub-directory of the dbpath (page 497) directory. By default, the dbpath (page 497) directory is /data/db/. The cong mongod instance is accessible via port 27019. In addition to configsvr (page 502), use other mongod runtime options (page 494) as needed. Repeat this process for all three cong databases. Start mongos Instances All operations against a shard cluster go through the mongos instance. The mongos instance routes queries and operations to the appropriate shards and interacts with the congdb instances. mongos instances are lightweight, and a shard cluster can have multiple instances. Typically, you run one mongos instance on each of your application servers. You must specify three cong instances. Use resolvable host names for all hosts, using DNS or your systems hostle to provide operational exibility. The mongos instance runs on the TCP port 27017, which is the default MongoDB port. Use the following command at a system prompt to start a mongos instance:

mongos --configdb config0.mongodb.example.net,config1.mongodb.example.net,config2.mongodb.example.net

The above example assumes that you have cong instances running on the following hosts: config0.mongodb.example.net config1.mongodb.example.net config2.mongodb.example.net Add Shards to the Cluster In a production shard cluster, each shard is itself a replica set. You must deploy at least two replica sets for use as shards. For instructions on deploying replica sets, see Deploy a Replica Set (page 61). When you have two active and functioning replica sets, perform the following procedure: Using the mongo shell, log into a mongos. For example, mongos0.mongodb.example.net on port 27017 you would type:
mongo mongos0.mongodb.example.net

if the mongos is accessible at

To add each shard to the cluster, Use sh.addShard() (page 480). For example, to add two shards with the hostnames shard0.example.net and shard1.example.net on port 27017, call the following methods in the mongo shell session:
sh.addShard( "shard0.example.net" ) sh.addShard( "shard1.example.net" )

All shards should be replica sets. Changed in version 2.0.3. After version 2.0.3, you may use the above form to add replica sets to a cluster. The cluster will automatically discover the members of the replica set and adjust its conguration accordingly. Before version 2.0.3, you must specify the shard in the following form: the replica set name, followed by a forward slash, followed by a comma-separated list of seeds for the replica set. For example, if the name of the replica set is repl0, then your sh.addShard (page 480) command might resemble:

124

Chapter 9. Tutorials

MongoDB Documentation, Release 2.0.6

sh.addShard( "repl0/shard0.example.net,shard1.example.net" )

The sh.addShard() (page 480) helper in the mongo shell provides a wrapper around the addShard database command. Enable Sharding for Databases While sharding operates on a per-collection basis, you must enable sharding for each database that holds a database that you would like to shard. Use the following operation in a mongo shell session connected to a mongos instance in your cluster:
sh.enableSharding("records")

Where records is the name of the database that holds the collection you want to shard. sh.enableSharding() (page 481) is a wrapper around the enableSharding database command. You can enable sharding for multiple databases in your deployment. Enable Sharding for Collections You can enable sharding on a per-collection basis. Because MongoDB uses range based sharding, you must specify a shard key MongoDB can use to distribute your documents among the shards. For more information, see the sections of this manual that give an overview of shard keys (page 96) and that give an in-depth exploration of the features of good shard keys (page 96). To enable sharding for a collection, use the sh.shardCollection() (page 483) helper in the mongo shell. The helper provides a wrapper around the shardCollection database command and has the following prototype form:
sh.shardCollection("[database].[collection]", "key")

Replace the [database].[collection] string with the full namespace of your database, which consists of the name of your database, a dot (e.g. .), and the full name of the collection. The key represents your shard key, which you specify in the same form as you would an index. If you do not specify the key argument, MongoDB will use the _id eld as the shard key. Consider the following example invocations of sh.shardCollection() (page 483):
sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } ) sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } ) sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } ) sh.shardCollection("events.alerts", { "hashed_id": 1 } )

In order, these operations shard: 1. The people collection in the records database using the shard key { "zipcode": }. 1, "name": 1

This shard key distributes documents by the value of the zipcode eld. If a number of documents have the same value for this eld, then that chunk will be splitable (page 116) by the values of the name eld. 2. The addresses collection in the people database using the shard key { "state": }. 1, "_id": 1

This shard key distributes documents by the value of the state eld. If a number of documents have the same value for this eld, then that chunk will be splitable (page 116) by the values of the _id eld.

9.1. Deploy a Shard Cluster

125

MongoDB Documentation, Release 2.0.6

3. The chairs collection in the assets database using the shard key { "type":

1, "_id":

1 }.

This shard key distributes documents by the value of the type eld. If a number of documents have the same value for this eld, then that chunk will be splitable (page 116) by the values of the _id eld. 4. The alerts collection in the events database using the shard key { "hashed_id": 1 }.

This shard key distributes documents by the value of the hashed_id eld. Presumably this is a computed value that holds the hash of some value in your documents and is able to evenly distribute documents throughout your cluster.

9.2 Add Shards to an Existing Cluster


9.2.1 Synopsis
This document describes how to add a shard to an existing shard cluster. As your data set grows you must add additional shards to a cluster to provide additional capacity. For additional sharding procedures, see Shard Cluster Administration (page 99).

9.2.2 Concerns
Distributing chunks among your cluster requires some capacity to support the migration process. When adding a shard to your cluster, you should always ensure that your cluster has enough capacity to support the migration without affecting legitimate production trafc. In production environments, all shards should be replica sets. Furthermore, all interaction with your sharded cluster should pass through a mongos instance. This tutorial assumes that you already have a mongo shell connection to a mongos instance.

9.2.3 Process
Tell the cluster where to nd the individual shards. You can do this using the addShard command:
db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } )

Or you can use the sh.addShard() (page 480) helper in the mongo shell:
sh.addShard( "[hostname]:[port]" )

Replace [hostname] and [port] with the hostname and TCP port number of where the shard is accessible. For example:
sh.addShard( "mongodb0.example.net:27027" )

MongoDB will discover all other members of the replica set, if mongodb0.example.net:27027 is a member of a replica set. Note: In production deployments, all shards should be replica sets. Changed in version 2.0.3. Before version 2.0.3, you must specify the shard in the following form:
replicaSetName/<seed1>,<seed2>,<seed3>

For example, if the name of the replica set is repl0, then your sh.addShard (page 480) command would be:

126

Chapter 9. Tutorials

MongoDB Documentation, Release 2.0.6

sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017"

Repeat this step for each shard in your cluster. Optional You may specify a name as an argument to the addShard and sh.addShard() (page 480), as follows:
db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } ) sh.addShard( mongodb0.example.net, name: "mongodb0" )

If you do not specify a shard name, then MongoDB assigns a name upon creation.

Note: It may take some time for chunks to migrate to the new shard because the system must copy data from one mongod instance to another while maintaining data consistency. For an overview of the balancing operation, see the Balancing and Distribution (page 98) section. For additional information on balancing, see the Balancing Internals (page 119) section.

9.3 Remove Shards from an Existing Shard Cluster


9.3.1 Synopsis
This procedure describes the procedure for migrating data from a shard safely, when you need to decommission a shard. You may also need to remove shards as part of hardware reorganization and data migration. Do not use this procedure to migrate an entire shard cluster to new hardware. To migrate an entire shard to new hardware, migrate individual shards as if they were independent replica sets. To remove a shard, you will: Move chunks off of the shard. Ensure that this shard is not the primary shard for any databases in the cluster. If it is, move the primary status for these databases to other shards. Remove the shard from the clusters conguration.

9.3.2 Procedure
Complete this procedure by connecting to any mongos in the cluster using the mongo shell. You can only remove a shard by its shard name. To discover or conrm the name of a shard, use the listshards command, printShardingStatus command, or sh.status() (page 484) shell helper. The example commands in this document remove a shard named mongodb0. Note: To successfully migrate data from a shard, the balancer process must be active. Check the balancer state using the sh.getBalancerState() (page 482) helper in the mongo shell. For more information, see the section on balancer operations (page 109).

9.3. Remove Shards from an Existing Shard Cluster

127

MongoDB Documentation, Release 2.0.6

Remove Chunks from the Shard Start by running the removeShard command. This begins draining chunks from the shard you are removing.
db.runCommand( { removeshard: "mongodb0" } )

This operation returns immediately, with the following response:


{ msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 }

Depending on your network capacity and the amount of data in your cluster, this operation can take from a few minutes to several days to complete. Check the Status of the Migration To check the progress of the migration, run removeShard again at any stage of the process, as follows:
db.runCommand( { removeshard: "mongodb0" } )

The output resembles the following document:


{ msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 }

In the remaining sub document, a counter displays the remaining number of chunks that MongoDB must migrate to other shards and the number of MongoDB databases that have primary status on this shard. Continue checking the status of the removeshard command until the number of chunks remaining is 0. Then proceed to the next step. Move Unsharded Databases Databases with non-sharded collections store those collections on a single shard known as the primary shard for that database. The following step is necessary only when the shard to remove is also the primary shard for one or more databases. Issue the following command at the mongo shell:
db.runCommand( { movePrimary: "myapp", to: "mongodb1" })

This command migrates all remaining non-sharded data in the database named myapp to the shard named mongodb1. Warning: Do not run the movePrimary until you have nished draining the shard. This command will not return until MongoDB completes moving all data, which may take a long time. The response from this command will resemble the following:
{ "primary" : "mongodb1", "ok" : 1 }

Finalize the Migration Run removeShard again to clean up all metadata information and nalize the removal, as follows:
db.runCommand( { removeshard: "mongodb0" } )

A success message appears at completion:

128

Chapter 9. Tutorials

MongoDB Documentation, Release 2.0.6

{ msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 }

When the value of state is completed, you may safely stop the mongodb0 shard.

9.4 Enforce Unique Keys for Sharded Collections


9.4.1 Overview
The unique constraint on indexes ensures that only one document can have a value for a eld in a collection. For sharded collections these unique indexes cannot enforce uniqueness (page 574) because insert and indexing operations are local to each shard. 1 If your need to ensure that a eld is always unique in all collections in a sharded environment, there are two options: 1. Enforce uniqueness of the shard key (page 96). MongoDB can enforce uniqueness for the shard key. For compound shard keys, MongoDB will enforce uniqueness on the entire key combination, and not for a specic component of the shard key. 2. Use a secondary collection to enforce uniqueness. Create a minimal collection that only contains the unique eld and a reference to a document in the main collection. If you always insert into a secondary collection before inserting to the main collection, MongoDB will produce an error if you attempt to use a duplicate key. Note: If you have a small data set, you may not need to shard this collection and you can create multiple unique indexes. Otherwise you can shard on a single unique key. Regardless of method, be aware that writes to the MongoDB database are re and forget, or unsafe by default: they will not return errors to the client if MongoDB rejects a write operation because of a duplicate key or other error. As a result if you want to enforce unique keys you must use the safe write setting in your driver. See your drivers documentation (page 225) on getLastError for more information.

9.4.2 Unique Constraints on the Shard Key


Process To shard a collection using the unique constraint, specify the shardcollection command in the following form:
db.runCommand( { shardcollection : "test.users" , key : { email : 1 } , unique : true } );

Remember that the _id eld index is always unique. By default, MongoDB inserts an ObjectId into the _id eld. However, you can manually insert your own value into the _id eld and use this as the shard key. To use the _id eld as the shard key, use the following operation:
db.runCommand( { shardcollection : "test.users" } )

Warning: In any sharded collection where you are not sharding by the _id eld, you must ensure uniqueness of the _id eld. The best way to ensure _id is always unique is to use ObjectId, or another universally unique identier (UUID.)
1 If you specify a unique index on a sharded collection, MongoDB will be able to enforce uniqueness only among the documents located on a single shard at the time of creation.

9.4. Enforce Unique Keys for Sharded Collections

129

MongoDB Documentation, Release 2.0.6

Limitations You can only enforce uniqueness on one single eld in the collection using this method. If you use a compound shard key, you can only enforce uniqueness on the combination of component keys in the shard key. In most cases, the best shard keys are compound keys that include elements that permit write scaling (page 117) and query isolation (page 117), as well as high cardinality (page 116). These ideal shard keys are not often the same keys that require uniqueness and requires a different approach.

9.4.3 Unique Constraints on Arbitrary Fields


If you cannot use a unique eld as the shard key or if you need to enforce uniqueness over multiple elds, you must create another collection to act as a proxy collection. This collection must contain both a reference to the original document (i.e. its ObjectId) and the unique key. If you must shard this proxy collection, then shard on the unique key using the above procedure (page 129); otherwise, you can simply create multiple unique indexes on the collection. Process Consider the following for the proxy collection:
{ "_id" : ObjectId("...") "email" ": "..." }

The _id eld holds the ObjectId of the document it reects, and the email eld is the eld on which you want to ensure uniqueness. To shard this collection, use the following operation using the email eld as the shard key:
db.runCommand( { shardcollection : "records.proxy" , key : { email : 1 } , unique : true } );

If you do not need to shard the proxy collection, use the following command to create a unique index on the email eld:
db.proxy.ensureIndex( { "email" : 1 }, {unique : true} )

You may create multiple unique indexes on this collection if you do not plan to shard the proxy collection. To insert documents, use the following procedure in the JavaScript shell (page 503):
use records primary_id = ObjectId() db.information.proxy({ "_id" : primary_id "email" : "example@example.net" }) // if: the above operation returns successfully, // then continue: db.information.insert({

130

Chapter 9. Tutorials

MongoDB Documentation, Release 2.0.6

"_id" : primary_id "email": "example@example.net" // additional information... })

You must insert a document into the proxy collection rst. If this operation succeeds, the email eld is unique, and you may continue by inserting the actual document into the information collection. See Also: The full documentation of: shardcollection. Considerations Your application must catch errors when inserting documents into the proxy collection and must enforce consistency between the two collections. If the proxy collection requires sharding, you must shard on the single eld on which you want to enforce uniqueness. To enforce uniqueness on more than one eld using sharded proxy collections, you must have one proxy collection for every eld for which to enforce uniqueness. If you create multiple unique indexes on a single proxy collection, you will not be able to shard proxy collections. db.collection.ensureIndex() (page 455), ensureIndex, and

9.4. Enforce Unique Keys for Sharded Collections

131

MongoDB Documentation, Release 2.0.6

132

Chapter 9. Tutorials

CHAPTER

TEN

REFERENCE
The following reference section describes sharding commands: sharding-commands

133

MongoDB Documentation, Release 2.0.6

134

Chapter 10. Reference

Part V

Administration

135

MongoDB Documentation, Release 2.0.6

This page lists the core administrative documentation and the administration tutorials. This page also provides links to administrative documentation for replica sets, sharding, and indexes.

137

MongoDB Documentation, Release 2.0.6

138

CHAPTER

ELEVEN

CORE COMPETENCIES
The following documents outline basic MongoDB administrative topics:

11.1 Run-time Database Conguration


The command line (page 485) and conguration le (page 494) interfaces provide MongoDB administrators with a large number of options and settings for controlling the operation of the database system. This document provides an overview of common congurations and examples of best-practice congurations for common use cases. While both interfaces provide access the same collection of options and settings, this document primarily uses the conguration le interface. If you run MongoDB using a control script or packaged for your operating system, you likely already have a conguration le located at /etc/mongodb.conf. Conrm this by checking the content of the /etc/init.d/mongod or /etc/rc.d/mongod script to insure that the control scripts start the mongod with the appropriate conguration le (see below.) To start MongoDB instance using this conguration issue a command in the following form:
mongod --config /etc/mongodb.conf mongod -f /etc/mongodb.conf

Modify the values in the /etc/mongodb.conf le on your system to control the conguration of your database instance.

11.1.1 Starting, Stopping, and Running the Database


Consider the following basic conguration:
fork = true bind_ip = 127.0.0.1 port = 27017 quiet = true dbpath = /srv/mongodb logpath = /var/log/mongodb/mongod.log logappend = true journal = true

For most standalone servers, this is a sufcient base conguration. It makes several assumptions, but consider the following explanation: fork (page 497) is true, which enables a daemon mode for mongod, which detaches (i.e. forks) the MongoDB from the current session and allows you to run the database as a conventional server.

139

MongoDB Documentation, Release 2.0.6

bind_ip (page 495) is 127.0.0.1, which forces the server to only listen for requests on the localhost IP. Only bind to secure interfaces that the application-level systems can access with access control provided by system network ltering (i.e. rewall) systems. port (page 495) is 27017, which is the default MongoDB port for database instances. MongoDB can bind to any port. You can also lter access based on port using network ltering tools. Note: UNIX-like systems require superuser privileges to attach processes to ports lower than 1000. quiet (page 495) is true. This disables all but the most critical entries in output/log le. In normal operation this is the preferable operation to avoid log noise. In diagnostic or testing situations, set this value to false. Use setParameter to modify this setting during run time. dbpath (page 497) is /srv/mongodb, which species where MongoDB will store its data les. /srv/mongodb and /var/lib/mongodb are popular locations. The user account that mongod runs under will need read and write access to this directory. logpath (page 496) is /var/log/mongodb/mongod.log which is where mongod will write its output. If you do not set this value, mongod writes all output to standard output (e.g. stdout.) logappend (page 496) is true, which ensures that mongod does not overwrite an existing log le following the server start operation. journal (page 498) is true, which enables journaling. Journaling ensures single instance write-durability. 64-bit builds of mongod enable journaling by default. Thus, this setting may be redundant. Given the default conguration, some of these values may be redundant. However, in many situations explicitly stating the conguration increases overall system intelligibility.

11.1.2 Security Considerations


The following collection of conguration options are useful for limiting access to a mongod instance. Consider the following:
bind_ip = 127.0.0.1 bind_ip = 10.8.0.10 bind_ip = 192.168.4.24 nounixsocket = true auth = true

Consider the following explanation for these conguration decisions: bind_ip (page 495) has three values: 127.0.0.1, the localhost interface; 10.8.0.10, a private IP address typically used for local networks and VPN interfaces; and 192.168.4.24, a private network interface typically used for local networks. Because production MongoDB instances need to be accessible from multiple database servers, it is important to bind MongoDB to multiple interfaces that are accessible from your application servers. At the same time its important to limit these interfaces to interfaces controlled and protected at the network layer. nounixsocket (page 496) is true which disables the UNIX Socket, which is otherwise enabled by default. This limits access on the local system. This is desirable when running MongoDB on with shared access, but in most situations has minimal impact. auth (page 497) is true which enables the authentication system within MongoDB. If enabled you will need to log in, by connecting over the localhost interface for the rst time to create user credentials. See Also: The Security and Authentication wiki page. 140 Chapter 11. Core Competencies

MongoDB Documentation, Release 2.0.6

11.1.3 Replication and Sharding Conguration


Replication Conguration Replica set conguration is straightforward, and only requires that the replSet (page 501) have a value that is consistent among all members of the set. Consider the following:
replSet = set0

Use descriptive names for sets. Once congured use the mongo shell to add hosts to the replica set. See Also: Replica set reconguration (page 564). To enable authentication for the replica set, add the following option:
keyFile = /srv/mongodb/keyfile

New in version 1.8: for replica sets, and 1.9.1 for sharded replica sets. Setting keyFile (page 496) enables authentication and species a key le for the replica set member use to when authenticating to each other. The content of the key le is arbitrary, but must be the same on all members of the replica set and mongos instances that connect to the set. The keyle must be less one kilobyte in size and may only contain characters in the base64 set and le must not have group or world permissions on UNIX systems. See Also: The Replica set Reconguration (page 564) section for information regarding the process for changing replica set during operation. Additionally, consider the Replica Set Security (page 44) section for information on conguring authentication with replica sets. Finally, see the Replication (page 31) index and the Replication Fundamentals (page 33) document for more information on replication in MongoDB and replica set conguration in general. Sharding Conguration Sharding requires a number of mongod instances with different congurations. The cong servers store the clusters metadata, while the cluster distributes data among one or more shard servers. Note: Cong servers are not replica sets. To set up one or three cong server instances as normal (page 139) mongod instances, and then add the following conguration option:
configsvr = true bind_ip = 10.8.0.12 port = 27001

This creates a cong server running on the private IP address 10.8.0.12 on port 27001. Make sure that there are no port conicts, and that your cong server is accessible from all of your mongos and mongod instances. To set up shards, congure two or more mongod instance using your base conguration (page 139), adding the shardsvr (page 502) setting:
shardsvr = true

11.1. Run-time Database Conguration

141

MongoDB Documentation, Release 2.0.6

Finally, to establish the cluster, congure at least one mongos process with the following settings:
configdb = 10.8.0.12:27001 chunkSize = 64

You can specify multiple configdb (page 502) instances by specifying hostnames and ports in the form of a comma separated list. In general, avoid modifying the chunkSize (page 503) from the default value of 64, 1 and should ensure this setting is consistent among all mongos instances. See Also: The Sharding wiki page for more information on sharding and shard cluster conguration.

11.1.4 Running Multiple Database Instances on the Same System


In many cases running multiple instances of mongod on a single system is not recommended, on some types of deployments 2 and for testing purposes you may need to run more than one mongod on a single system. In these cases, use a base conguration (page 139) for each instance, but consider the following conguration values:
dbpath = /srv/mongodb/db0/ pidfilepath = /srv/mongodb/db0.pid

The dbpath (page 497) value controls the location of the mongod instances data directory. Ensure that each database has a distinct and well labeled data directory. The pidfilepath (page 496) controls where mongod process places its pid le. As this tracks the specic mongod le, it is crucial that le be unique and well labeled to make it easy to start and stop these processes. Create additional control scripts and/or adjust your existing MongoDB conguration and control script as needed to control these processes.

11.1.5 Diagnostic Congurations


The following conguration options control various mongod behaviors for diagnostic purposes. The following settings have default values that tuned for general production purposes:
slowms = 50 profile = 3 verbose = true diaglog = 3 objcheck = true cpu = true

Use the base conguration (page 139) and add these options if you are experiencing some unknown issue or performance problem as needed: slowms (page 500) congures the threshold for the database proler to consider a query slow. The default value is 100 milliseconds. Set a lower value if the database proler does not return useful results. See the Optimization wiki page for more information on optimizing operations in MongoDB. profile (page 499) sets the database proler level. The proler is not active by default because of the possible impact on the proler itself on performance. Unless this setting has a value, queries are not proled.
1 Chunk size is 64 megabytes by default, which provides the ideal balance between the most even distribution of data, for which smaller chunk sizes are best, and minimizing chunk migration, for which larger chunk sizes are optimal. 2 Single-tenant systems with SSD or other high performance disks may provide acceptable performance levels for multiple mongod instances. Additionally, you may nd that multiple databases with small working sets may function acceptably on a single system.

142

Chapter 11. Core Competencies

MongoDB Documentation, Release 2.0.6

verbose (page 495) enables a verbose logging mode that modies mongod output and increases logging to include a greater number of events. Only use this option if you are experiencing an issue that is not reected in the normal logging level. If you require additional verbosity, consider the following options:
v = true vv = true vvv = true vvvv = true vvvvv = true

Each additional level v adds additional verbosity to the logging. The verbose option is equal to v = true. diaglog (page 497) enables diagnostic logging. Level 3 logs all read and write options. objcheck (page 496) forces mongod to validate all requests from clients upon receipt. Use this option to ensure that invalid requests are not causing errors, particularly when running a database with untrusted clients. This option may affect database performance. cpu (page 497) forces mongod to report the percentage of the last interval spent in write-lock. The interval is typically 4 seconds, and each output line in the log includes both the actual interval since the last report and the percentage of time spent in write lock.

11.2 Using MongoDB with SSL Connections


This document outlines the use and operation of MongoDBs SSL support. SSL, allows MongoDB clients to support encrypted connections to mongod instances. Note: The default distribution of MongoDB does not contain support for SSL. As of the current release, to use SSL you must either: build MongoDB locally passing the --ssl option to scons, or use the MongoDB subscriber build. These instructions outline the process for getting started with SSL and assume that you have already installed a build of MongoDB that includes SSL support and that your client driver supports SSL.

11.2.1 mongod SSL Conguration


Add the following command line options to your mongod invocation:
mongod --sslOnNormalPorts --sslPEMKeyFile <pem> --sslPEMKeyPassword <pass>

Replace <pem> with the path to your SSL certicate .pem le, and <pass> with the password you used to encrypt the .pem le. You may also specify these options in your mongodb.conf le with following options:
sslOnNormalPorts = true sslPEMKeyFile = /etc/ssl/mongodb.pem sslPEMKeyPassword = pass

Modify these values to reect the location of your actual .pem le and its password. You can use any existing SSL certicate, or you can generate your own SSL certicate using a command that resembles the following:

11.2. Using MongoDB with SSL Connections

143

MongoDB Documentation, Release 2.0.6

cd /etc/ssl/ openssl req -new -x509 -days 365 -nodes -out mongodb-cert.pem -keyout mongodb-cert.key

To create the combined .pem le that contains the .key le and the .pem certicate, use the following command:
cat mongodb-cert.key mongodb-cert.pem > mongodb.pem

11.2.2 Clients
Clients must have support for SSL to work with a mongod instance that has SSL support enabled. The current versions of the Python, Java, Ruby, and Node.js drivers have support for SSL, with full support coming in future releases of other drivers. mongo The mongo shell built with ssl support distributed with the subscriber build also supports SSL. Use the --ssl ag as follows:
mongo --ssl --host <host>

MMS The MMS agent will also have to connect via SSL in order to gather its stats. Because the agent already utilizes SSL for its communications to the MMS servers, this is just a matter of enabling SSL support in MMS itself on a per host basis. Use the Edit host button (i.e. the pencil) on the Hosts page in the MMS console and is currently enabled on a group by group basis by 10gen. Please see the MMS Manual for more information about MMS conguration. PyMongo Add the ssl=True parameter to a PyMongo connection to create a MongoDB connection to an SSL MongoDB instance:
from pymongo import Connection c = Connection(host="mongodb.example.net", port=27017, ssl=True)

To connect to a replica set, use the following operation:


from pymongo import ReplicaSetConnection c = ReplicaSetConnection("mongodb.example.net:27017", replicaSet="mysetname", ssl=True)

PyMongo also supports an ssl=true option for the MongoDB URI:


mongodb://mongodb.example.net:27017/?ssl=true

Java Consider the following example sslApp.java class le:

144

Chapter 11. Core Competencies

MongoDB Documentation, Release 2.0.6

import com.mongodb.*; import javax.net.ssl.SSLContext; public class sslApp { public static void main(String args[]) throws Exception { MongoOptions o = new MongoOptions(); o.socketFactory = SSLSocketFactory.getDefault(); Mongo m = new Mongo( "localhost" , o ); DB db = m.getDB( "test" ); DBCollection c = db.getCollection( "foo" ); System.out.println( c.findOne() ); } }

Ruby The recent versions version of the Ruby driver have support for connections to SSL servers. Install the latest version of the driver with the following command:
gem install mongo

Then connect to a standalone instance, using the following form:


require rubygems require mongo connection = Mongo::Connection.new(localhost, 27017, :ssl => true)

Replace connection with the following if youre connecting to a replica set:


connection = Mongo::ReplSetConnection.new([localhost:27017], [localhost:27018], :ssl => true)

Here, mongod instance run on localhost:27017 and localhost:27018. Node.JS (node-mongodb-native) In the node-mongodb-native driver, use the following invocation to connect to a mongod or mongos instance via SSL:
var db1 = new Db(MONGODB, new Server("127.0.0.1", 27017, { auto_reconnect: false, poolSize:4, ssl:ssl } );

To connect to a replica set via SSL, use the following form:


var replSet = new ReplSetServers( [ new Server( RS.host, RS.ports[1], { auto_reconnect: true } ), new Server( RS.host, RS.ports[0], { auto_reconnect: true } ), ],

11.2. Using MongoDB with SSL Connections

145

MongoDB Documentation, Release 2.0.6

{rs_name:RS.name, ssl:ssl} );

.NET As of release 1.6 of the .NET driver supports SSL connections with mongod an mongos instances. To connect using SSL, you must add an option to the connection string, specifying ssl=true as follows:
var connectionString = "mongodb://localhost/?ssl=true"; var server = MongoServer.Create(connectionString);

The .NET driver will validate the certicate against the local trusted certicate store, in addition to providing encryption of the server. This behavior may produce issues during testing, if the server uses a self-signed certicate. If you encounter this issue, add the sslverifycertificate=false option to the connection string to prevent the .NET driver from validating the certicate, as follows:
var connectionString = "mongodb://localhost/?ssl=true&sslverifycertificate=false"; var server = MongoServer.Create(connectionString);

11.3 Monitoring Database Systems


Monitoring is a critical component of all database administration. A rm grasp of MongoDBs reporting will allow you to assess the state of your database and maintain your deployment without crisis. Additionally, a sense of MongoDBs normal operational parameters will allow you to diagnose issues as you encounter them, rather than waiting for a crisis or failure. This document provides an overview of the available tools and data provided by MongoDB as well as introduction to diagnostic strategies, and suggestions for monitoring instances in MongoDBs replica sets and shard clusters. Note: 10gen provides a hosted monitoring service which collects and aggregates these data to provide insight into the performance and operation of MongoDB deployments. See the MongoDB Monitoring Service (MMS) and the MMS documentation for more information.

11.3.1 Monitoring Tools


There are two primary methods for collecting data regarding the state of a running MongoDB instance. First, there are a set of tools distributed with MongoDB that provide real-time reporting of activity on the database. Second, several database commands return statistics regarding the current database state with greater delity. Both methods allow you to collect data that answers a different set of questions, and are useful in different contexts. This section provides an overview of these utilities and statistics, along with an example of the kinds of questions that each method is most suited to help you address. Utilities The MongoDB distribution includes a number of utilities that return statistics about instances performance and activity quickly. These are typically most useful for diagnosing issues and assessing normal operation.

146

Chapter 11. Core Competencies

MongoDB Documentation, Release 2.0.6

mongotop

mongotop tracks and reports the current read and write activity of a MongoDB instance. mongotop provides percollection visibility into use. Use mongotop to verify that activity and use match expectations. See the mongotop manual (page 520) for details.
mongostat

mongostat captures and returns counters of database operations. mongostat reports operations on a per-type (e.g. insert, query, update, delete, etc.) basis. This format makes it easy to understand the distribution of load on the server. Use mongostat to understand the distribution of operation types and to inform capacity planning. See the mongostat manual (page 516) for details.
REST Interface

MongoDB provides a REST interface that exposes a diagnostic and monitoring information in a simple web page. Enable this by setting rest (page 499) to true, and access this page via the local host interface using the port numbered 1000 more than that the database port. In default congurations the REST interface is accessible on 28017. For example, to access the REST interface on a locally running mongod instance: http://localhost:28017 Statistics The mongo shell provides a number of commands that return statistics about the state of the MongoDB instance. These data may provide ner granularity regarding the state of the MongoDB instance than the tools above. Consider using their output in scripts and programs to develop custom alerts, or modifying the behavior of your application in response to the activity of your instance.
serverStatus

Access serverStatus data (page 537) by way of the serverStatus command. This document contains a general overview of the state of the database, including disk usage, memory use, connection, journaling, index accesses. The command returns quickly and does not impact MongoDB performance. While this output contains a (nearly) complete account of the state of a MongoDB instance, in most cases you will not run this command directly. Nevertheless, all administrators should be familiar with the data provided by serverStatus. See Also: db.stats() (page 472) and serverStatus data (page 537).
replSetGetStatus

View the replSetGetStatus data (page 558) with the replSetGetStatus command (rs.status() (page 479) from the shell). The document returned by this command reects the state and conguration of the replica set. Use this data to ensure that replication is properly congured, and to check the connections between the current host and the members of the replica set.

11.3. Monitoring Database Systems

147

MongoDB Documentation, Release 2.0.6

dbStats

The dbStats data (page 551) is accessible by way of the dbStats command (db.stats() (page 472) from the shell). This command returns a document that contains data that reects the amount of storage used and data contained in the database, as well as object, collection, and index counters. Use this data to check and track the state and storage of a specic database. This output also allows you to compare utilization between databases and to determine average document size in a database.
collStats

The collStats data (page 552) is accessible using the collStats command (db.printCollectionStats() (page 470) from the shell). It provides statistics that resemble dbStats on the collection level: this includes a count of the objects in the collection, the size of the collection, the amount of disk space used by the collection, and information about the indexes. Third Party Tools A number of third party monitoring tools have support for MongoDB, either directly, or through their own plugins.
Self Hosted Monitoring Tools

These are monitoring tools that you must install, congure and maintain on your own servers, usually open source. Tool Ganglia Ganglia Motop mtop Munin Munin Plugin mongodb-ganglia Description Shell script to report operations per second, memory usage, btree statistics, master/slave status and current connections. gmond_python_modules Parses output from the serverStatus and replSetGetStatus commands. None None mongo-munin mongomon Realtime monitoring tool for several MongoDB servers. Shows current operations ordered by durations every second. A top like tool. Retrieves server statistics. Retrieves collection statistics (sizes, index sizes, and each (congured) collection count for one DB). Some additional munin plugins not in the main distribution. A simple Nagios check script. Monitors availability, resource utilization, health, performance and other important metrics.

Munin munin-plugins Ubuntu PPA Nanagios-plugingios mongodb Zabmikoomibix mongodb

Also consider dex, and index and query analyzing tool for MongoDB that compares MongoDB log les and indexes to make indexing recommendations.
Hosted (SaaS) Monitoring Tools

These are monitoring tools provided as a hosted service, usually on a subscription billing basis.

148

Chapter 11. Core Competencies

MongoDB Documentation, Release 2.0.6

Name Scout Server Density

Notes Several plugins including: MongoDB Monitoring, MongoDB Slow Queries and MongoDB Replica Set Monitoring. Dashboard for MongoDB, MongoDB specic alerts, replication failover timeline and iPhone, iPad and Android mobile apps.

11.3.2 Diagnosing Performance Issues


Degraded performance in MongoDB can be the result of an array of causes, and is typically a function of the relationship between the quantity of data stored in the database, the amount of system RAM, the number of connections to the database, and the amount of time the database spends in a lock state. In some cases performance issues may be transient and related to trafc load, data access patterns, or the availability of hardware on the host system for virtualized environments. Some users also experience performance limitations as a result of inadequate or inappropriate indexing strategies, or as a consequence of poor schema design patterns. In other situations, performance issues may indicate that the database may be operating at capacity and that its time to add additional capacity to the database. Locks MongoDB uses a locking system to ensure consistency; however, if certain operations are long-running, or a queue forms, performance slows as requests and operations wait for the lock. Because lock related slow downs can be intermittent, look to the data in the globalLock (page 540) section of the serverStatus response to asses if the lock has been a challenge to your performance. If globalLock.currentQueue.total (page 541) is consistently high, then there is a chance that a large number of requests are waiting for a lock. This indicates a possible concurrency issue that might effect performance. If globalLock.toalTime is high in context of uptime (page 538) then the database has existed in a lock state for a signicant amount of time. If globalLock.ratio (page 541) is also high, MongoDB has likely been processing a large number of long running queries. Long queries are often the result of a number of factors: ineffective use of indexes, non-optimal schema design, poor query structure, system architecture issues, or insufcient RAM resulting in page faults (page 149) and disk reads. Memory Usage Because MongoDB uses memory mapped les to store data, given a data set of sufcient size, the MongoDB process will allocate all memory available on the system for its use. Because of the way operating systems function, the amount of allocated RAM is not a useful reection of MongoDBs state. While this is part of the design, and affords MongoDB superior performance, the memory mapped les make it difcult to determine if the amount of RAM is sufcient for the data set. Consider memory usage statuses (page 542) to better understand MongoDBs memory utilization. Check the resident memory use (i.e. mem.resident (page 542):) if this exceeds the amount of system memory and theres a signicant amount of data on disk that isnt in RAM, you may have exceeded the capacity of your system. Also check the amount of mapped memory (i.e. mem.mapped (page 542).) If this value is greater than the amount of system memory, some operations will require disk access page faults to read data from virtual memory with deleterious effects on performance. Page Faults Page faults represent the number of times that MongoDB requires data not located in physical memory, and must read from virtual memory. To check for page faults, see the extra_info.page_faults (page 543) value in the 11.3. Monitoring Database Systems 149

MongoDB Documentation, Release 2.0.6

serverStatus command. This data is only available on Linux systems. Alone, page faults are minor and complete quickly; however, in aggregate, large numbers of page fault typically indicate that MongoDB is reading too much data from disk and can indicate a number of underlying causes and recommendations. In many situations, MongoDBs read locks will yield after a page fault to allow other processes to read and avoid blocking while waiting for the next page to read into memory. This approach improves concurrency, and in high volume systems this also improves overall throughput. If possible, increasing the amount of RAM accessible to MongoDB may help reduce the number of page faults. If this is not possible, you may want to consider deploying a shard cluster and/or adding one or more shards to your deployment to distribute load among mongod instances. Number of Connections In some cases, the number of connections between the application layer (i.e. clients) and the database can overwhelm the ability of the server to handle requests which can produce performance irregularities. Check the following elds in the serverStatus (page 537) document: globalLock.activeClients (page 541) contains a counter of the total number of clients with active operations in progress or queued. connections is a container for the following two elds: connections.current (page 542) the total number of current clients that connect to the database instance. connections.available (page 543) the total number of unused collections available for new clients. Note: Unless limited by system-wide limits MongoDB has a hard connection limit of 20 thousand connections. You can modify system limits using the ulimit command, or by editing your systems /etc/sysctl le. If requests are high because there are many concurrent application requests, the database may have trouble keeping up with demand. If this is the case, then you will need to increase the capacity of your deployment. For read-heavy applications increase the size of your replica set and distribute read operations to secondary members. For write heavy applications, deploy sharding and add one or more shards to a shard cluster to distribute load among mongod instances. Spikes in the number of connections can also be the result of application or driver errors. All of the MongoDB drivers supported by 10gen implement connection pooling, which allows clients to use and reuse connections more efciently. Extremely high numbers of connections, particularly without corresponding workload is often indicative of a driver or other conguration error. Database Proling MongoDB contains a database proling system that can help identify inefcient queries and operations. Enable the proler by setting the profile (page 499) value using the following command in the mongo shell:
db.setProfilingLevel(1)

See Also: The documentation of db.setProfilingLevel() (page 471) for more information about this command. Note: Because the database proler can have an impact on the performance, only enable proling for strategic intervals and as minimally as possible on production systems. You may enable proling on a per-mongod basis. This setting will not propagate across a replica set or shard cluster. 150 Chapter 11. Core Competencies

MongoDB Documentation, Release 2.0.6

The following proling levels are available: Level 0 1 2 Setting Off. No proling. On. Only includes slow operations. On. Includes all operations.

See the output of the proler in the system.profile collection of your database. You can specify the slowms (page 500) to set a threshold above which the proler considers operations slow and thus included in the level 1 proling data. You may congure slowms (page 500) at runtime, as an argument to the db.setProfilingLevel() (page 471) operation. Additionally, mongod records all slow queries to its log (page 496), as dened by slowms (page 500). The data in system.profile does not persist between mongod restarts. You can view the prolers output by issuing the show profile command in the mongo shell, with the following operation.
db.system.profile.find( { millis : { $gt : 100 } } )

This returns all operations that lasted longer than 100 milliseconds. Ensure that the value specied here (i.e. 100) is above the slowms (page 500) threshold. See Also: The Optimization wiki page addresses strategies that may improve the performance of your database queries and operations.

11.3.3 Replication and Monitoring


The primary administrative concern that requires monitoring with replica sets, beyond the requirements for any MongoDB instance is replication lag. This refers to the amount of time that it takes a write operation on the primary to replicate to a secondary. Some very small delay period may be acceptable; however, as replication lag grows, two signicant problems emerge: First, operations that have occurred in the period of lag are not replicated to one or more secondaries. If youre using replication to ensure data persistence, exceptionally long delays may impact the integrity of your data set. Second, if the replication lag exceeds the length of the operation log (oplog) then the secondary will have to resync all data from the primary and rebuild all indexes. In normal circumstances this is uncommon given the typical size of the oplog, but its an issue to be aware of. For causes of replication lag, see Replication Lag (page 45). Replication issues are most often the result of network connectivity issues between members or the result of a primary that does not have the resources to support application and replication trafc. To check the status of a replica, use the replSetGetStatus or the following helper in the shell:
rs.status()

See the Replica Status Reference (page 558) document for a more in depth overview view of this output. In general watch the value of optimeDate. Pay particular attention to the difference in time between the primary and the secondary members. The size of the operation log is only congurable during the rst run using the --oplogSize (page 490) argument to the mongod command, or preferably the oplogSize (page 501) in the MongoDB conguration le. If you do not specify this on the command line before running with the --replSet (page 490) option, mongod will create an default sized oplog.

11.3. Monitoring Database Systems

151

MongoDB Documentation, Release 2.0.6

By default the oplog is 5% of total available disk space on 64-bit systems. See Also: Change the Size of the Oplog (page 71)

11.3.4 Sharding and Monitoring


In most cases the components of shard clusters benet from the same monitoring and analysis as all other MongoDB instances. Additionally, shard clusters require monitoring to ensure that data is effectively distributed among nodes and that sharding operations are functioning appropriately. See Also: See the Sharding wiki page for more information. Cong Servers The cong database provides a map of documents to shards. The cluster updates this map as chunks move between shards. When a conguration server becomes inaccessible, some sharding operations like moving chunks and starting mongos instances become unavailable. However, shard clusters remain accessible from already-running mongo instances. Because inaccessible conguration servers can have a serious impact on the availability of a shard cluster, you should monitor the conguration servers to ensure that your shard cluster remains well balanced and that mongos instances can restart. Balancing and Chunk Distribution The most effective shard clusters require that chunks are evenly balanced between the shards. MongoDB has a background balancer process that distributes data such that chunks are always optimally distributed among the shards. Issue the db.printShardingStatus() (page 470) or sh.status() (page 484) command to the mongos by way of the mongo shell. This returns an overview of the shard cluster including the database name, and a list of the chunks. Stale Locks In nearly every case, all locks used by the balancer are automatically released when they become stale. However, because any long lasting lock can block future balancing, its important to insure that all locks are legitimate. To check the lock status of the database, connect to a mongos instance using the mongo shell. Issue the following command sequence to switch to the config database and display all outstanding locks on the shard database:
use config db.locks.find()

For active deployments, the above query might return a useful result set. The balancing process, which originates on a randomly selected mongos, takes a special balancer lock that prevents other balancing activity from transpiring. Use the following command, also to the config database, to check the status of the balancer lock.
db.locks.find( { _id : "balancer" } )

If this lock exists, make sure that the balancer process is actively using this lock.

152

Chapter 11. Core Competencies

MongoDB Documentation, Release 2.0.6

11.4 Importing and Exporting MongoDB Data


Full database instance backups (page 156) are useful for disaster recovery protection and routine database backup operation; however, some cases require additional import and export functionality. This document provides an overview of the import and export tools provided in distributions for MongoDB administrators. These utilities are useful when you want to backup or export a portion of your database without capturing the state of the entire database. For more complex data migration tasks, you may want to write your own import and export scripts using a client driver to interact with the database itself. Warning: Because these tools primarily operate by interacting with a running mongod instance, they can impact the performance of your running database. Not only do these backup processes create trafc for a running database instance, they also force the database to read all data through memory. When MongoDB reads infrequently used data, it can supplant more frequently accessed data, causing a deterioration in performance for the databases regular workload. mongoimport and mongoexport do not reliably preserve all rich BSON data types, because BSON is a superset of JSON . Thus, mongoimport and mongoexport cannot represent BSON data accurately in JSON . As a result data exported or imported with these tools may lose some measure of delity. See the MongoDB Extended JSON wiki page for more information about Use with care.

11.4.1 Data Type Fidelity


JSON does not have the following data types that exist in BSON documents: data_binary, data_date, data_timestamp, data_regex, data_oid and data_ref. As a result using any tool that decodes BSON documents into JSON will suffer some loss of delity. If maintaining type delity is important, consider writing a data import and export system that does not force BSON documents into JSON form as part of the process. The following list of types contain examples for how MongoDB will represent how BSON documents render in JSON. data_binary
{ "$binary" : "<bindata>", "$type" : "<t>" }

<bindata> is the base64 representation of a binary string. <t> is the hexadecimal representation of a single byte indicating the data type. data_date
Date( <date> )

<date> is the JSON representation of a 64-bit signed integer for milliseconds since epoch. data_timestamp
Timestamp( <t>, <i> )

<t> is the JSON representation of a 32-bit unsigned integer for milliseconds since epoch. <i> is a 32-bit unsigned integer for the increment. data_regex
/<jRegex>/<jOptions>

<jRegex> is a string that may contain valid JSON characters and unescaped double quote (i.e. ") characters, but may not contain unescaped forward slash (i.e. /) characters. <jOptions> is a string that may contain only the characters g, i, m, and s.

11.4. Importing and Exporting MongoDB Data

153

MongoDB Documentation, Release 2.0.6

data_oid
ObjectId( "<id>" )

<id> is a 24 character hexadecimal string. These representations require that data_oid values have an associated eld named _id. data_ref
Dbref( "<name>", "<id>" )

<name> is a string of valid JSON characters. <id> is a 24 character hexadecimal string. See Also: MongoDB Extended JSON wiki page.

11.4.2 Using Database Imports and Exports for Backups


For resilient and non-disruptive backups in most cases youll want to use a le system or block-level disk snapshot function, such as the method described in the Backup and Restoration Strategies (page 156) document. The tools and operations discussed provide functionality thats useful in the context of providing some kinds of backups. By contrast, use import and export tools to backup a small subset of your data or to move data to or from a 3rd party system. These backups may capture a small crucial set of data or a frequently modied section of data, for extra insurance, or for ease of access. No matter how you decide to import or export your data, consider the following guidelines: Label les so that you can identify what point in time the export or backup reects. Labeling should describe the contents of the backup, and reect the subset of the data corpus, captured in the backup or export. Do not create or apply exports if the backup process itself will have an adverse effect on a production system. Make sure that they reect a consistent data state. Export or backup processes can impact data integrity (i.e. type delity) and consistency if updates continue during the backup process. Test backups and exports by restoring and importing to ensure that the backups are useful.

11.4.3 Human Intelligible Import/Export Formats


This section describes a process to import/export your database, or a portion thereof, to a le in a JSON or CSV format. See Also: The mongoimport Manual (page 511) and mongoexport Manual (page 514) documents contain complete documentation of these tools. If you have questions about the function and parameters of these tools not covered here, please refer to these documents. If you want to simply copy a database or collection from one instance to another, consider using the copydb, clone, or cloneCollection commands, which may be more suited to this task. The mongo shell provides the db.copyDatabase() (page 465) method. These tools may also be useful for importing data into a MongoDB database from third party applications.

154

Chapter 11. Core Competencies

MongoDB Documentation, Release 2.0.6

Database Export with mongoexport With the mongoexport utility you can create a backup le. In the most simple invocation, the command takes the following form:
mongoexport --collection collection --out collection.json

This will export all documents in the collection named collection into the le collection.json. Without the output specication (i.e. --out collection.json (page 515),) mongoexport writes output to standard output (i.e. stdout.) You can further narrow the results by supplying a query lter using the --query (page 515) and limit results to a single database using the --db (page 515) option. For instance:
mongoexport --db sales --collection contacts --query {"field": 1}

This command returns all documents in the sales databases contacts collection, with a eld named field with a value of 1. Enclose the query in single quotes (e.g. ) to ensure that it does not interact with your shell environment. The resulting documents will return on standard output. By default, mongoexport returns one JSON document per MongoDB document. Specify the --jsonArray argument to return the export as a single JSON array. Use the --csv (page 515) le to return the result in CSV (comma separated values) format. If your mongod instance is not running, you can use the --dbpath (page 514) option to specify the location to your MongoDB instances database les. See the following example:
mongoexport --db sales --collection contacts --dbpath /srv/MongoDB/

This reads the data les directly. This locks the data directory to prevent conicting writes. The mongod process must not be running or attached to these data les when you run mongoexport in this conguration. The --host (page 514) and --port (page 514) options allow you to specify a non-local host to connect to capture the export. Consider the following example:

mongoexport --host mongodb1.example.net --port 37017 --username user --password pass --collection con

On any mongoexport command you may, as above specify username and password credentials as above. Database Import with mongoimport To restore a backup taken with mongoexport. mongoimport. Consider the following command: Most of the arguments to mongoexport also exist for

mongoimport --collection collection --file collection.json

This imports the contents of the le collection.json into the collection named collection. If you do not specify a le with the --file (page 512) option, mongoimport accepts input over standard input (e.g. stdin.) If you specify the --upsert (page 513) option, all of mongoimport operations will attempt to update existing documents in the database and insert other documents. This option will cause some performance impact depending on your conguration. You can specify the database option --db to import these documents to a particular database. If your MongoDB instance is not running, use the --dbpath (page 512) option to specify the location of your MongoDB instances database les. Consider using the --journal (page 512) option to ensure that mongoimport records its operations in the journal. The mongod process must not be running or attached to these data les when you run mongoimport in this conguration. Use the --ignoreBlanks (page 512) option to ignore blank elds. For CSV and TSV imports, this option provides the desired functionality in most cases: it avoids inserting blank elds in MongoDB documents.

11.4. Importing and Exporting MongoDB Data

155

MongoDB Documentation, Release 2.0.6

See Also: See the Backup and Restoration Strategies (page 156) document for more in depth information about backing up MongoDB instances. Additionally, consider the following references for commands addressed in this document: mongoexport Manual (page 514) mongorestore Manual (page 508)

11.5 Backup and Restoration Strategies


This document provides an inventory of database backup strategies for use with MongoDB. Use the backup overview (page 156) and backup considerations (page 156) sections as you develop the most appropriate strategy for backing up your MongoDB environment. Then, use the examples from the block level backup methods (page 157) or the backups using mongodump (page 160) sections to implement the backup solution that is best suited to your deployments needs. A robust backup strategy and a well-tested corresponding restoration process is crucial for every production-grade deployment. Take the specic features of your deployment, your use patterns, and architecture into consideration as you develop your own backup system. Replica sets and shard clusters require special considerations. Dont miss the backup considerations for shard clusters and replica sets (page 162).

11.5.1 Overview
If you are familiar with backups systems in the context of database systems please skip ahead to backup considerations (page 156). With MongoDB, there are two major approaches to backups: using system-level tools, like disk image snapshots, and using various capacities present in the mongodump tool (page 160). The underlying goal of these strategies is to produce a full and consistent copy of the data that you can use to bring up a new or replacement database instance. The methods described in this document operate by copying the data le on the disk level. If your system does not provide functionality for this kind of backup, see the section on using database dumps for backups (page 160) for more information. Ensuring that the state captured by the backup is consistent and usable is the primary challenge of producing backups of database systems. Backups that you cannot produce reliably, or restore from feasibly are worthless. Because every environment is unique its important to regularly test the backups that you capture to ensure that your backup system is practically, and not just theoretically, functional.

11.5.2 Production Considerations


When evaluating a backup strategy for your node consider the following factors: Geography. Ensure that you move some backups away from the your primary database infrastructure. Its important to be able to restore your database if you lose access to a system or site. System errors. Ensure that your backups can survive situations where hardware failures or disk errors impact the integrity or availability of your backups. Production constraints. Backup operations themselves sometimes require substantial system resources. Its important to consider the time of the backup schedule relative to peak usage and maintenance windows.

156

Chapter 11. Core Competencies

MongoDB Documentation, Release 2.0.6

System capabilities. In order to use some of the block-level snapshot tools requires special support on the operating-system or infrastructure level. Database conguration. Replication and sharding can affect the process, and impact of the backup implementation. Actual requirements. You may be able to save time, effort, and space by including only crucial data in the most frequent backups and backing up less crucial data less frequently. With this information in hand you can begin to develop a backup plan for your database. Remember that all backup plans must be: Tested. If you cannot effectively restore your database from the backup, then your backups are useless. Test backup restoration regularly in practical situations to ensure that your backup system provides value. Automated. Database backups need to run regularly and automatically. Also automate tests of backup restoration.

11.5.3 Block Level Methods


This section will provides an overview of using disk/block level snapshots (i.e. LVM or storage appliance) to backup a MongoDB instance. These tools make a quick block-level backup of the device that holds MongoDBs data les. These methods complete quickly, work reliably, and typically provide the easiest backup systems method to implement. Snapshots work by creating pointers between the live data and a special snapshot volume: these pointers are theoretically equivalent to hard links. Then, as the working data diverges from the snapshot, the snapshot process uses a copy-on-write strategy. As a result the snapshot only stores modied data. After making the snapshot, you will mount the snapshot image on your le system and copy data from the snapshot. The resulting backup contains a full copy of all data. Snapshots have the following limitations: The database must be in a consistent or recoverable state when the snapshot takes place. This means that all writes accepted by the database need to be fully written to disk: either to the journal or to data les. If all writes are not on disk when the backup occurs, the backup will not reect these changes. If writes are in progress when the backup occurs, the data les will reect an inconsistent state. With journaling all data-le states resulting from in-progress writes are recoverable; without journaling you must ush all pending writes and to disk before running the backup operation and ensure that no writes occur during the entire backup procedure. If you do use journaling, the journal must reside on the same volume as the data. Snapshots create an image of an entire disk image. Unless you need to back up your entire system, consider isolating your MongoDB data les, journal (if applicable), and conguration on one logical disk that doesnt contain any other data. Alternately, store all MongoDB data les on a dedicated device to so that you can make backups without duplicating extraneous data. Ensure that you copy data from snapshots and onto other systems to ensure that data is safe from site-failures. With Journaling If your system has snapshot capability and your mongod instance has journaling enabled then you can use any kind of le system or volume/block level snapshot tool to create backups.

11.5. Backup and Restoration Strategies

157

MongoDB Documentation, Release 2.0.6

Warning: Changed in version 1.9.2. Journaling is only enabled by default on 64-bit builds of MongoDB. To enable journaling on all other builds, specify journal (page 498) = true in the conguration or use the --journal (page 487) run-time option for mongod. Many service providers provide a block-level backup service based on disk image snapshots. If you manage your own infrastructure on a Linux-based system, congure your system with LVM to provide your disk packages and provide snapshot capability. You can also use LVM-based setups within a cloud/virtualized environment. Note: Running LVM provides additional exibility and enables the possibility of using snapshots to backup MongoDB. If you use Amazons EBS service in a software RAID 10 (e.g. 1+0) conguration, use LVM to capture a consistent disk image. Also consider, Amazon EBS in Software RAID 10 Conguration (page 160) The following sections provide an overview of a simple backup process using LVM on a Linux system. While the tools, commands, and paths may be (slightly) different on your system the following steps provide a high level overview of the backup operation.
Create Snapshot

To create a snapshot with LVM issue a command, as root, in the following format:
lvcreate --size 100M --snapshot --name mdb-snap01 /dev/vg0/mongodb

This command creates an LVM snapshot (with the --snapshot option) named mdb-snap01 of the mongodb volume in the vg0 volume group. This example creates a snapshot named mdb-snap01 located at /dev/vg0/mdb-snap01. The location and paths to your systems volume groups and devices may vary slightly depending on your operating systems LVM conguration. The snapshot has a cap of at 100 megabytes, because of the parameter --size 100M. This size does not reect the total amount of the data on the disk, but rather the quantity of differences between the current state of /dev/vg0/mongodb and the creation of the snapshot (i.e. /dev/vg0/mdb-snap01.) Warning: Ensure that you create snapshots with enough space to account for data growth, particularly for the period of time that it takes to copy data out of the system or to a temporary image. If you your snapshot runs out of space, the snapshot image becomes unusable. Discard this logical volume and create another. The snapshot will exist when the command returns. You can restore directly from the snapshot at any time or by creating a new logical volume and restoring from this snapshot to the alternate image. While snapshots are great for creating high quality backups very quickly, they are not ideal as a format for storing backup data. Snapshots typically depend and reside on the same storage infrastructure as the original disk images. Therefore, its crucial that you archive these snapshots and store them elsewhere.
Archive Snapshots

After creating a snapshot, mount the snapshot and move the data to separate storage. Your system may wish to compress the backup images as you move the ofine. Consider the following procedure to fully archive the data from the snapshot:

158

Chapter 11. Core Competencies

MongoDB Documentation, Release 2.0.6

umount /dev/vg0/mdb-snap01 dd if=/dev/vg0/mdb-snap01 | tar -czf mdb-snap01.tar.gz

This command sequence: 1. Ensures that the /dev/vg0/mdb-snap01 device is not mounted. 2. Does a block level copy of the entire snapshot image using the dd command, and compresses the result in a gziped tar archive in the current working directory. Warning: This command will create a large tar.gz le in your current working directory. Make sure that you run this command in a le system that has enough free space.

Restore Snapshot

To restore a backup created with the above method, use the following procedure:
lvcreate --size 1G --name mdb-new vg0 tar -xzf mdb-snap01.tar.gz | dd of=/dev/vg0/mdb-new mount /dev/vg0/mdb-new /srv/mongodb

This sequence: 1. Creates a new logical volume named mdb-new, in the /dev/vg0 volume group. The path to the new device will be /dev/vg0/mdb-new. Warning: This volume will have a maximum size of 1 gigabyte. The original le system must have had a total size of 1 gigabyte or smaller, or else the restoration will fail. Change 1G to your desired volume size. 2. Uncompresses and unarchives the mdb-snap01.tar.gz into the mdb-new disk image. 3. Mounts the mdb-new disk image to the /srv/mongodb directory. Modify the mount point to correspond to your MongoDB data le location, or other location as needed.
Restore Directly from a Snapshot

To combine the above processes without writing to a compressed tar archive, use the following sequence:
umount /dev/vg0/mdb-snap01 lvcreate --size 1G --name mdb-new vg0 dd if=/dev/vg0/mdb-snap01 of=/dev/vg0/mdb-new mount /dev/vg0/mdb-new /srv/mongodb

Remote Backup Storage

You can implement off-system backups using the combined process (page 159) and SSH. Consider the following procedure:
umount /dev/vg0/mdb-snap01 dd if=/dev/vg0/mdb-snap01 | ssh username@example.com tar -czf /opt/backup/mdb-snap01.tar.gz lvcreate --size 1G --name mdb-new vg0 ssh username@example.com tar -xzf /opt/backup/mdb-snap01.tar.gz | dd of=/dev/vg0/mdb-new mount /dev/vg0/mdb-new /srv/mongodb

11.5. Backup and Restoration Strategies

159

MongoDB Documentation, Release 2.0.6

This sequence is identical to procedures explained above, except that it archives and compresses the backup on a remote system using SSH. Without Journaling If your mongod instance does not run with journaling enabled, or if your journal is on a separate volume, obtaining a functional backup of a consistent state is more complicated. Flush all writes to disk and lock the database to prevent writes during the backup process. If you have a replica set conguration, use a secondary that is not receiving reads (i.e. hidden member) for backup purposes. You can ush writes to disk, and lock the database to prevent further writes with the db.fsyncLock() (page 467) command in the mongo shell, as follows:
db.fsyncLock();

Perform the backup operation described above (page 158) at this point. To unlock the database after the snapshot has completed, use the following command in the mongo shell:
db.fsyncUnlock();

Note: Version 1.9.0 added db.fsyncLock() (page 467) and db.fsyncUnlock() (page 467) helpers to the mongo shell. Prior to this version, use the following commands:
db.runCommand( { fsync: 1, lock: true } ); db.runCommand( { fsync: 1, lock: false } );

Note: The database cannot be locked with db.fsyncLock() (page 467) while proling is enabled. You must disable proling before locking the database with db.fsyncLock() (page 467). Disable proling using db.setProfilingLevel() (page 471) as follows in the mongo shell:
db.setProfilingLevel(0)

Amazon EBS in Software RAID 10 Conguration If your deployment depends on Amazons Elastic Block Storage (EBS) with RAID congured within your instance, it is impossible to get a consistent state across all disks using the platforms snapshot tool. As a result you may: Flush all writes to disk and create a write lock to ensure consistent state during the backup process. If you choose this option see the section on Backup without Journaling (page 160) Congure LVM to run and hold your MongoDB data les on top of the RAID within your system. If you choose this option see the section that outlines the LVM backup operation (page 158)

11.5.4 Binary Dump/Restore Formats


This section describes the process for writing the entire contents of your MongoDB instance, to a le in a binary format. This command provides the best option for full system database backups if disk-level snapshots are not available. See Also:

160

Chapter 11. Core Competencies

MongoDB Documentation, Release 2.0.6

The mongodump Manual (page 505) and mongorestore Manual (page 508) documents contain complete documentation of these tools. If you have questions about the function and parameters of these tools not covered here, please refer to these documents. If your system has disk level snapshot capabilities, consider the backup methods described above (page 157). Database Dump with mongodump The mongodump utility performs a live backup the data, or can work against an inactive set of database les. The mongodump utility can create a dump for an entire server/database/collection (or part of a collection using of query,) even when the database is running and active. If you run mongodump without any arguments the command will connect to the local database instance (e.g. 127.0.0.1 or localhost) and create a database backup in a in the current directory named dump/. Note: If you use the mongodump tool from the 2.2 distribution to create a dump of a database, you can restore that dump only to a 2.2 database. You can specify database and collection as options to the mongodump command to limit the amount of data included in the database dump. For example:
mongodump --collection collection --database test

This command creates a dump in of the database in the dump/ directory of only the collection named collection in the database named test. mongodump provides the --oplog (page 507) option that forces the dump operation to use the operation log to take a point-in-time snapshot of the database. With --oplog (page 507) , mongodump copies all the data from the source database, as well as all of the oplog entries from the beginning of the backup procedure to until the backup procedure completes. This backup procedure, in conjunction with mongorestore --oplogReplay (page 510), allows you to restore a backup that reects a consistent and specic moment in time. If your MongoDB instance is not running, you can use the --dbpath (page 506) option to specify the location to your MongoDB instances database les. mongodump reads from the data les directly with this operation. This locks the data directory to prevent conicting writes. The mongod process must not be running or attached to these data les when you run mongodump in this conguration. Consider the following example:
mongodump --dbpath /srv/mongodb

Additionally, the --host (page 506) and --port (page 506) options allow you to specify a non-local host to connect to capture the dump. Consider the following example:

mongodump --host mongodb1.example.net --port 3017 --username user --password pass /opt/backup/mongodu

On any mongodump command you may, as above specify username and password credentials to specify database authentication. Database Import with mongorestore The mongorestore utility restores a binary backup created by mongodump. Consider the following example command:
mongorestore dump-2011-10-25/

Here, mongorestore imports the database backup located in the dump-2011-10-25 directory to the mongod instance running on the localhost interface. By default, mongorestore will look for a database dump in the dump/

11.5. Backup and Restoration Strategies

161

MongoDB Documentation, Release 2.0.6

directory and restore that. If you wish to restore to a non-default host, the --host and --port (page 485) options allow you to specify a non-local host to connect to capture the dump. Consider the following example:

mongorestore --host mongodb1.example.net --port 3017 --username user --password pass /opt/backup/mong

On any mongorestore command you may, as above specify username and password credentials as above. If you created your database dump using the --oplog (page 507) option to ensure a point-in-time snapshot, call mongorestore with the --oplogReplay option as in the following example:
mongorestore --oplogReplay

You may also consider using the mongorestore --objcheck (page 509) option to check the integrity of objects while inserting them into the database, or the mongorestore --drop (page 510) option to drop each collection from the database before restoring from backups. mongorestore also includes the ability to a lter to all input before inserting it into the new database. Consider the following example:
mongorestore --filter {"field": 1}

Here, mongorestore only adds documents to the database from the dump located in the dump/ folder if the documents have a eld name field that holds a value of 1. Enclose the lter in single quotes (e.g. ) to prevent the lter from interacting with your shell environment.
mongorestore --dbpath /srv/mongodb --journal

Here, mongorestore restores the database dump located in dump/ folder into the data les located at /srv/mongodb, with the --dbpath (page 509) option. Additionally, the --journal (page 509) option ensures that mongorestore records all operation in the durability journal. The journal prevents data le corruption if anything (e.g. power failure, disk failure, etc.) interrupts the restore operation. See Also: mongodump Manual (page 505) and mongorestore Manual (page 508).

11.5.5 Shard Clusters and Replica Sets Considerations


The underlying architecture of shard clusters and replica sets present several challenges for creating backups of data stored in MongoDB. This section provides a high-level overview of these concerns, and strategies for creating quality backups in environments with these congurations. Creating useful backups for shard clusters is more complicated, because its crucial that the backup captures a consistent state across all shards. Shard Clusters
Using Database Dumps From a Cluster

If you have a small collection of data, the easiest way to connecting to the mongos and taking a dump of the database from the running copy. This will create a consistent copy of the data in your database. If your data corpus is small enough that: its possible to store the entire backup on one system, or a single storage device. Consider both backups of entire instances, and incremental dumps of data. the state of the database at the beginning of the operation is not signicantly different than the state of the database at the end of the backup. If the backup operation cannot capture a backup this is not a viable option. the backup can run and complete without impacting the performance of the shard cluster. 162 Chapter 11. Core Competencies

MongoDB Documentation, Release 2.0.6

Using Conventional Backups from All Database Instances

If you there is no way to conduct a backup reasonably with a dump, then youll need to either snapshot the database using the snapshot backup procedure (page 157) or create a binary dump of each database instance using binary dump methods (page 160). These backups must not only capture the database in a consistent state, as described in the aforementioned procedures, but the shard cluster needs to be consistent in itself. Also, disable the balancer process that equalizes the distribution of data among the shards before taking the backup. You should also lock all cluster members at once so that your backups reect your entire database system at a single point in time. Warning: It is essential that you stop the balancer before creating backups. If the balancer remains active, your resulting backups could have duplicate data or miss some data, as chunks migrate while recording backups. Similarly, if you do not lock all shards at the same time, the backup can reect an inconsistent state that is impossible to restore from. To stop the balancer, connect to the mongos with the :optionmongo shell and issue the following 2 commands:
use config db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true );

After disabling the balancer, proceed with your backup in the following sequence: 1. Lock all shards, using a process to lock all shard instances in as short of an interval as possible. 2. Use mongodump to backup the cong database. Issue this command against the cong database itself or the mongos, and would resemble the following:
mongodump --database config

Note: In this situation mongodump will read from secondary nodes. See: mongodump feature (page 507) for more information. 2. Record a backup of all shards 3. Unlock all shards. 4. Restore the balancer. Use the following command sequence when connected to the mongos with the mongo shell:
use config db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true );

If you have an automated backup schedule, you can disable all balancing operations for a period of time. For instance, consider the following command:
use config db.settings.update( { _id : "balancer" }, { $set : { activeWindow : { start : "6:00", stop : "23:00"

This operation congures the balancer to run between 6:00 am and 11:00pm, server time. Schedule your backup operation to run and complete in this time. Ensure that the backup can complete during the window when the balancer is running and that the balancer can effectively balance the collection among the shards in the window allotted to each.

11.5. Backup and Restoration Strategies

163

MongoDB Documentation, Release 2.0.6

Replica Sets In most cases, backing up data stored in replica is similar to backing up data stored in a single instance. Its possible to lock a single slave or secondary database and then create a backup from that instance. When you unlock the database, the slave will catch master or primary node. You may also chose to deploy a dedicated hidden member for backup purposes. If you have a shard cluster where each shard is itself a replica set, you can use this method to create a backup of the entire cluster without disrupting the operation of the node. In these situations you should still turn off the balancer when you create backups. For any cluster, using a non-master/primary node to create backups is particularly advantageous, in that the backup operation does not affect the performance of the master or primary node. Replication itself provides some measure of redundancy. Nevertheless, keeping point-in time backups of your cluster to provide for disaster recovery and as an additional layer of protection is crucial. See Also: Replica Set Administration (page 38) Replication Architectures (page 47) Shard Cluster Administration (page 99) Shard Cluster Architectures (page 114) indexes (page 179) Indexing Operations (page 186)

164

Chapter 11. Core Competencies

CHAPTER

TWELVE

TUTORIALS
The following tutorials describe basic administrative procedures for MongoDB deployments:

12.1 Recover MongoDB Data following Unexpected Shutdown


If MongoDB does not shutdown cleanly 1 the on-disk representation of the data les will likely reect an inconsistent state which could lead to data corruption. To prevent data inconsistency and corruption, always shut down the database cleanly, and use the durability journaling (page 498). The journal writes data to disk every 100 milliseconds by default, and ensures that MongoDB will be able to recover to a consistent state even in the case of an unclean shutdown due to power loss or other system failure. If you are not running as part of a replica set and do not have journaling enabled use the following procedure to recover data that may be in an inconsistent state. If you are running as part of a replica set, you should always restore from a backup or restart the mongod instance with an empty dbpath (page 497) and allow MongoDB to resync the data. See Also: The Administration (page 137) documents and the documentation of the repair (page 499), repairpath (page 499), and journal (page 498) settings.

12.1.1 Process
Indications When you are aware of a mongod instance running without journaling that stops unexpectedly and youre not running with replication, you should always run the repair operation before starting MongoDB again. If youre using replication, then restore from a backup and allow replication to synchronize your data. If the mongod.lock le in the data directory specied by dbpath (page 497), /data/db by default, is not a zero-byte le, then mongod will refuse to start, and you will nd a message that contains the following line in your MongoDB log our output:
Unclean shutdown detected.

This indicates that you need to remove the lockle and run repair. If you run repair when the mongodb.lock le exists without the mongod --repairpath (page 489) option, you will see a message that contains the following line:
1 To ensure a clean shut down, use the mongod --shutdown (page 489) option, your control script, Control-C (when running mongod in interactive mode,) or kill $(pidof mongod) or kill -2 $(pidof mongod).

165

MongoDB Documentation, Release 2.0.6

old lock file: /data/db/mongod.lock. probably means unclean shutdown

You must remove the lockle and run the repair operation before starting the database normally using the following procedure: Overview Warning: Recovering a member of a replica set. Do not use this procedure to recover a member of a replica set. Instead you should either restore from a backup (page 156) or re-sync from an intact member of the set. There are two processes to repair data les that result from an unexpected shutdown: 1. Use the --repair (page 488) option in conjunction with the --repairpath (page 489) option. mongod will read the existing data les, and write the existing data to new data les. This does not modify or alter the existing data les. You do not need to remove the mongod.lock le before using this procedure. 2. Use the --repair (page 488) option. mongod will read the existing data les, write the existing data to new les and replace the existing, possibly corrupt, les with new les. You must remove the mongod.lock le before using this procedure. Procedures To repair your data les using the --repairpath (page 489) option to preserve the original data les unmodied: 1. Start mongod using --repair (page 488) to read the existing data les.
mongod --dbpath /data/db --repair --repairpath /data/db0

When this completes, the new repaired data les will be in the /data/db0 directory. 2. Start mongod using the following invocation to point the dbpath (page 497) at /data/db2:
mongod --dbpath /data/db0

Once you conrm that the data les are operational you may delete or archive the data les in the /data/db directory. To repair your data les without preserving the original les, do not use the --repairpath (page 489) option, as in the following procedure: 1. Remove the stale lock le:
rm /data/db/mongod.lock

Replace /data/db with your dbpath (page 497) where your MongoDB instances data les reside. Warning: After you remove the mongod.lock le you must run the --repair (page 488) process before using your database. 2. Start mongod using --repair (page 488) to read the existing data les.
mongod --dbpath /data/db --repair

When this completes, the repaired data les will replace the original data les in the /data/db directory. 166 Chapter 12. Tutorials

MongoDB Documentation, Release 2.0.6

3. Start mongod using the following invocation to point the dbpath (page 497) at /data/db:
mongod --dbpath /data/db

12.1.2 mongod.lock
In normal operation, you should never remove the mongod.lock le and start mongod. Instead use one of the above methods to recover the database and remove the lock les. In dire situations you can remove the lockle, and start the database using the possibly corrupt les, and attempt to recover data from the database; however, its impossible to predict the state of the database in these situations. If you are not running with journaling, and your database shuts down unexpectedly for any reason, you should always proceed as if your database is in an inconsistent and likely corrupt state. If at all possible restore from backup (page 156) or if running as a replica set re-sync from an intact member of the set.

12.2 Convert a Replica Set to a Replicated Shard Cluster


12.2.1 Overview
Following this tutorial, you will convert a single 3-member replica set to a shard cluster that consists of 2 shards. Each shard will consist of an independent 3-member replica set. The tutorial uses a test environment running on a local system UNIX-like system. You should feel encouraged to follow along at home. If you need to perform this process in a production environment, notes throughout the document indicate procedural differences. The procedure, from a high level, is as follows: 1. Create or select a 3-member replica set and insert some data into a collection. 2. Start the cong databases and create a shard cluster with a single shard. 3. Create a second replica set with three new mongod instances. 4. Add the second replica set to the shard cluster. 5. Enable sharding on the desired collection or collections.

12.2.2 Process
Install MongoDB according to the instructions in the MongoDB Installation Tutorial (page 9). Deploy a Replica Set with Test Data If have an existing MongoDB replica set deployment, you can omit the this step and continue from Deploy Sharding Infrastructure (page 169). Use the following sequence of steps to congure and deploy a replica set and to insert test data. 1. Create the following directories for the rst replica set instance, named rstset: /data/example/firstset1 /data/example/firstset2 /data/example/firstset3

12.2. Convert a Replica Set to a Replicated Shard Cluster

167

MongoDB Documentation, Release 2.0.6

To create directories, issue the following command:


mkdir -p /data/example/firstset1 /data/example/firstset2 /data/example/firstset3

2. In a separate terminal window or GNU Screen window, start three mongod instances by running each of the following commands:
mongod --dbpath /data/example/firstset1 --port 10001 --replSet firstset --oplogSize 700 --rest mongod --dbpath /data/example/firstset2 --port 10002 --replSet firstset --oplogSize 700 --rest mongod --dbpath /data/example/firstset3 --port 10003 --replSet firstset --oplogSize 700 --rest

Note: The --oplogSize 700 (page 490) option restricts the size of the operation log (i.e. oplog) for each mongod instance to 700MB. Without the --oplogSize (page 490) option, each mongod reserves approximately 5% of the free disk space on the volume. By limiting the size of the oplog, each instance starts more quickly. Omit this setting in production environments. 3. In a mongo shell session in a new terminal, connect to the mongodb instance on port 10001 by running the following command. If you are in a production environment, rst read the note below.
mongo localhost:10001/admin

Note: Above and hereafter, if you are running in a production environment or are testing this process with mongod instances on multiple systems, replace localhost with a resolvable domain, hostname, or the IP address of your system. 4. In the mongo shell, initialize the rst replica set by issuing the following command:
db.runCommand({"replSetInitiate" : {"_id" : "firstset", "members" : [{"_id" : 1, "host" {"_id" : 2, "host" {"_id" : 3, "host" ]}}) { "info" : "Config now saved locally. Should come online in about "ok" : 1 } : "localhost:10001"}, : "localhost:10002"}, : "localhost:10003"}

a minute.",

5. In the mongo shell, create and populate a new collection by issuing the following sequence of JavaScript operations:

use test switched to db test people = ["Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy", "Greg", "Steve", "Kristina for(var i=0; i<1000000; i++){ name = people[Math.floor(Math.random()*people.length)]; user_id = i; boolean = [true, false][Math.floor(Math.random()*2)]; added_at = new Date(); number = Math.floor(Math.random()*10001); db.test_collection.save({"name":name, "user_id":user_id, "boolean": }

The above operations add one million documents to the collection test_collection. This can take several minutes, depending on your system. The script adds the documents in the following form:

168

Chapter 12. Tutorials

MongoDB Documentation, Release 2.0.6

{ "_id" : ObjectId("4ed5420b8fc1dd1df5886f70"), "name" : "Greg", "user_id" : 4, "boolean" : true, "ad

Deploy Sharding Infrastructure This procedure creates the three cong databases that store the clusters metadata. Note: For development and testing environments, a single cong database is sufcient. In production environments, use three cong databases. Because cong instances store only the metadata for the shard cluster, they have minimal resource requirements. 1. Create the following data directories for three cong database instances: /data/example/config1 /data/example/config2 /data/example/config3 Issue the following command at the system prompt:
mkdir -p /data/example/config1 /data/example/config2 /data/example/config3

2. In a separate terminal window or GNU Screen window, start the cong databases by running the following commands:
mongod --configsvr --dbpath /data/example/config1 --port 20001 mongod --configsvr --dbpath /data/example/config2 --port 20002 mongod --configsvr --dbpath /data/example/config3 --port 20003

3. In a separate terminal window or GNU Screen window, start mongos instance by running the following command:
mongos --configdb localhost:20001,localhost:20002,localhost:20003 --port 27017 --chunkSize 1

Note: If you are using the collection created earlier or are just experimenting with sharding, you can use a small --chunkSize (page 493) (1MB works well.) The default chunkSize (page 503) of 64MB means that your cluster must have 64MB of data before the MongoDBs automatic sharding begins working. In production environments, do not use a small shard size. The configdb (page 502) options specify the conguration databases (e.g. localhost:20001, localhost:20002, and localhost:2003). The mongos instance runs on the default MongoDB port (i.e. 27017), while the databases themselves are running on ports in the 30001 series. In the this example, you may omit the --port 27017 (page 492) option, as 27017 is the default port. 4. Add the rst shard in mongos. In a new terminal window or GNU Screen session, add the rst shard, according to the following procedure: (a) Connect to the mongos with the following command:
mongo localhost:27017/admin

(b) Add the rst shard to the cluster by issuing the addShard command:
db.runCommand( { addShard : "firstset/localhost:10001,localhost:10002,localhost:10003" } )

(c) Observe the following message, which denotes success:

12.2. Convert a Replica Set to a Replicated Shard Cluster

169

MongoDB Documentation, Release 2.0.6

{ "shardAdded" : "firstset", "ok" : 1 }

Deploy a Second Replica Set This procedure deploys a second replica set. This closely mirrors the process used to establish the rst replica set above, omitting the test data. 1. Create the following data directories for the members of the second replica set, named secondset: /data/example/secondset1 /data/example/secondset2 /data/example/secondset3 2. In three new terminal windows, start three instances of mongod with the following commands:

mongod --dbpath /data/example/secondset1 --port 10004 --replSet secondset --oplogSize 700 --rest mongod --dbpath /data/example/secondset2 --port 10005 --replSet secondset --oplogSize 700 --rest mongod --dbpath /data/example/secondset3 --port 10006 --replSet secondset --oplogSize 700 --rest

Note: As above, the second replica set uses the smaller oplogSize (page 501) conguration. Omit this setting in production environments. 3. In the mongo shell, connect to one mongodb instance by issuing the following command:
mongo localhost:10004/admin

4. In the mongo shell, initialize the second replica set by issuing the following command:
db.runCommand({"replSetInitiate" : {"_id" : "secondset", "members" : [{"_id" : 1, "host" : "localhost:10004"}, {"_id" : 2, "host" : "localhost:10005"}, {"_id" : 3, "host" : "localhost:10006"} ]}}) { "info" : "Config now saved locally. "ok" : 1 } Should come online in about a minute.",

5. Add the second replica set to the shard cluster. Connect to the mongos instance created in the previous procedure and issue the following sequence of commands:
use admin db.runCommand( { addShard : "secondset/localhost:10004,localhost:10005,localhost:10006" } )

This command returns the following success message:


{ "shardAdded" : "secondset", "ok" : 1 }

6. Verify that both shards are properly congured by running the listShards command. View this and example output below:
db.runCommand({listshards:1}) { "shards" : [ {

170

Chapter 12. Tutorials

MongoDB Documentation, Release 2.0.6

"_id" : "firstset", "host" : "firstset/localhost:10001,localhost:10003,localhost:10002" }, { "_id" : "secondset", "host" : "secondset/localhost:10004,localhost:10006,localhost:10005" } ], "ok" : 1 }

Enable Sharding MongoDB must have sharding enabled on both the database and collection levels.
Enabling Sharding on the Database Level

Issue the enableSharding command. The following example emables sharding on the test database:
db.runCommand( { enablesharding : "test" } ) { "ok" : 1 }

Create an Index on the Shard Key

MongoDB uses the shard key to distribute documents between shards. Once selected, you cannot change the shard key. Good shard keys: have values that are evenly distributed among all documents, group documents that are often accessed at the same time into contiguous chunks, and allow for effective distribution of activity among shards. Typically shard keys are compound, comprising of some sort of hash and some sort of other primary key. Selecting a shard key depends on your data set, application architecture, and usage pattern, and is beyond the scope of this document. For the purposes of this example, we will shard the number key. This typically would not be a good shard key for production deployments. Create the index with the following procedure:
use test db.test_collection.ensureIndex({number:1})

See Also: The Shard Key Overview (page 96) and Shard Key (page 116) sections.
Shard the Collection

Issue the following command:


use admin db.runCommand( { shardcollection : "test.test_collection", key : {"number":1} }) { "collectionsharded" : "test.test_collection", "ok" : 1 }

12.2. Convert a Replica Set to a Replicated Shard Cluster

171

MongoDB Documentation, Release 2.0.6

The collection test_collection is now sharded! Over the next few minutes the Balancer begins to redistribute chunks of documents. You can conrm this activity by switching to the test database and running db.stats() (page 472) or db.printShardingStatus() (page 470). As clients insert additional documents into this collection, mongos distributes the documents evenly between the shards. In the mongo shell, issue the following commands to return statics against each cluster:
use test db.stats() db.printShardingStatus()

Example output of the db.stats() (page 472) command:


{ "raw" : { "firstset/localhost:10001,localhost:10003,localhost:10002" : { "db" : "test", "collections" : 3, "objects" : 973887, "avgObjSize" : 100.33173458522396, "dataSize" : 97711772, "storageSize" : 141258752, "numExtents" : 15, "indexes" : 2, "indexSize" : 56978544, "fileSize" : 1006632960, "nsSizeMB" : 16, "ok" : 1 }, "secondset/localhost:10004,localhost:10006,localhost:10005" : { "db" : "test", "collections" : 3, "objects" : 26125, "avgObjSize" : 100.33286124401914, "dataSize" : 2621196, "storageSize" : 11194368, "numExtents" : 8, "indexes" : 2, "indexSize" : 2093056, "fileSize" : 201326592, "nsSizeMB" : 16, "ok" : 1 } }, "objects" : 1000012, "avgObjSize" : 100.33176401883178, "dataSize" : 100332968, "storageSize" : 152453120, "numExtents" : 23, "indexes" : 4, "indexSize" : 59071600, "fileSize" : 1207959552, "ok" : 1 }

Example output of the db.printShardingStatus() (page 470) command:

172

Chapter 12. Tutorials

MongoDB Documentation, Release 2.0.6

--- Sharding Status --sharding version: { "_id" : 1, "version" : 3 } shards: { "_id" : "firstset", "host" : "firstset/localhost:10001,localhost:10003,localhost:10002" } { "_id" : "secondset", "host" : "secondset/localhost:10004,localhost:10006,localhost:10005" databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : true, "primary" : "firstset" } test.test_collection chunks: secondset 5 firstset 186 [...]

In a few moments you can run these commands for a second time to demonstrate that chunks are migrating from firstset to secondset. When this procedure is complete, you will have converted a replica set into a shard cluster where each shard is itself a replica set. Consider the following tutorials located in other sections of this Manual. Install MongoDB on Linux (page 17) Install MongoDB on RedHat Enterprise, CentOS, or Fedora Linux (page 9) Install MongoDB on OS X (page 19) tutorial/install-mongodb-on-debian-or-ubuntu-linux Install MongoDB on Windows (page 22) Change the Size of the Oplog (page 71) Deploy a Replica Set (page 61) Deploy a Geographically Distributed Replica Set (page 66) Add Members to a Replica Set (page 64) Change Hostnames in a Replica Set (page 79) Convert a Secondary to an Arbiter (page 83) Recover MongoDB Data following Unexpected Shutdown (page 165) Deploy a Shard Cluster (page 123) Convert a Replica Set to a Replicated Shard Cluster (page 167) Add Shards to an Existing Cluster (page 126) Remove Shards from an Existing Shard Cluster (page 127) Expire Data from Collections by Setting TTL (page 234)

12.2. Convert a Replica Set to a Replicated Shard Cluster

173

MongoDB Documentation, Release 2.0.6

174

Chapter 12. Tutorials

Part VI

Indexes

175

MongoDB Documentation, Release 2.0.6

Indexes provide high performance read operations for frequently used queries. Indexes are particularly useful where the total size of the documents exceeds the amount of available RAM. For basic concepts and options, see Indexing Overview (page 179). For procedures and operational concerns, see Indexing Operations (page 186). For information on how applications might use indexes, see Indexing Strategies (page 190).

177

MongoDB Documentation, Release 2.0.6

178

CHAPTER

THIRTEEN

DOCUMENTATION
The following is the outline of the main documentation:

13.1 Indexing Overview


13.1.1 Synopsis
An index is a data structure that allows you to quickly locate documents based on the values stored in certain specied elds. Fundamentally, indexes in MongoDB are similar to indexes in other database systems. MongoDB supports indexes on any eld or sub-eld contained in documents within a MongoDB collection. Consider the following core features of indexes: MongoDB denes indexes on a per-collection level. Indexes often dramatically increase the performance of queries; however, each index creates a slight overhead for every write operation. Every query, including update operations, use one and only one index. The query optimizer selects the index empirically, by occasionally running alternate query plans, and selecting the plan with the best response time for each query type. You can override the query optimizer using the cursor.hint() (page 450) method. You can create indexes on a single eld or on multiple elds using a compound index (page 181). When the index covers queries, the database returns results more quickly than queries that have to scan many individual documents. An index covers a query if the keys of the index stores all the data that the query must return. See Use Covered Queries (page 190) for more information. Using queries with good index coverage will reduce the number of full documents that MongoDB needs to store in memory, thus maximizing database performance and throughput. Continue reading for a complete overview of indexes in MongoDB, including the types of indexes (page 179), basic operations with indexes (page 183), and other MongoDB features (page 184) implemented using indexes.

13.1.2 Index Types


All indexes in MongoDB are B-Tree indexes. In the mongo shell, the helper ensureIndex() (page 455) provides a method for creating indexes. This section provides an overview of the types of indexes available in MongoDB as well as an introduction to their use.

179

MongoDB Documentation, Release 2.0.6

_id The _id index is a unique index (page 182) 1 on the _id eld, and MongoDB creates this index by default on all collections. 2 You cannot delete the index on _id. The _id eld is the primary key for the collection, and every document must have a unique _id eld. You may store any unique value in the _id eld. The default value of _id is ObjectID on every insert operation. An ObjectId is a 12-byte unique identiers suitable for use as the value of an _id eld. Note: In shard clusters, if you do not use the _id eld as the shard key, then your application must ensure the uniqueness of the values in the _id eld to prevent errors. This is most-often done by using a standard auto-generated ObjectId.

Secondary Indexes All indexes in MongoDB are secondary indexes. You can create indexes on any eld within any document or subdocument. Additionally, you can create compound indexes with multiple elds, so that a single query can match multiple components using the index without needing to scan (as many) actual documents. In general, you should have secondary indexes that support all of your primary, common, and user-facing queries and require MongoDB to scan the fewest number of documents possible. To create a secondary index, use the ensureIndex() method. The argument to ensureIndex() (page 455) will resemble the following in the MongoDB shell:
{ "field": 1 } { "field0.field1": 1 } { "field0": 1, "field1": 1 }

For each eld in the index you will specify either 1 for an ascending order or -1 for a descending order, which represents the order of the keys in the index. For indexes with more than one key (i.e. compound indexes,) the sequence of elds is important.
Embedded Fields

You can create indexes on elds that exist in sub-documents within your collection. Consider the collection people that holds documents that resemble the following example document:
{"_id": ObjectId(...) "name": "John Doe" "address": { "street": "Main" "zipcode": 53511 "state": "WI" } }

You can create an index on the address.zipcode eld, using the following specication:
db.people.ensureIndex( { "address.zipcode": 1 } )

Introspecting sub-documents in this way is commonly called dot notation.


Although the index on _id is unique, the getIndexes() (page 458) method will not print unique: true in the mongo shell. Before version 2.2 capped collections did not have an _id eld. In 2.2, all capped collections have an _id eld, except those in the local database. See the release notes (page 589) for more information.
2 1

180

Chapter 13. Documentation

MongoDB Documentation, Release 2.0.6

Compound Indexes

MongoDB supports compound indexes, where a single index structure holds references to multiple elds within a collections documents. Consider the collection products that holds documents that resemble the following example document:
{ "_id": ObjectId(...) "item": "Banana" "category": ["food", "produce", "grocery"] "stock": 4 "type": cases "arrival": Date(...) }

Most queries probably select on the item eld, but a signicant number of queries will also check the stock eld. You can specify a single compound index to support both of these queries:
db.products.ensureIndex( { "item": 1, "stock": 1 } )

MongoDB will be able to use this index to support queries that select the item eld as well as those queries that select the item eld and the stock eld. However, this index will not be useful for queries that select only the stock eld. Note: The order of elds in a compound index is very important. In the previous example, the index will contain references to documents sorted by the values of the item eld, and within each item, sorted by values of the stock eld.

Ascending and Descending

Indexes store references to elds in either ascending or descending order. For single-eld indexes, the order of keys doesnt matter, because MongoDB can traverse the index in either direction. However, for compound indexes, if you need to order results against two elds, sometimes you need the index elds running in opposite order relative to each other. To specify an index with a descending order, use the following form:
db.products.ensureIndex( { "field": -1 } )

More typically in the context of a compound index (page 181), the specication would resemble the following prototype:
db.products.ensureIndex( { "field0": 1, "field1": -1 } )

Consider a collection of event data that includes both usernames and a timestamp. If you want to return a list of events sorted by username and then with the most recent events rst. To create this index, use the following command:
db.events.ensureIndex( { "username" : 1, "timestamp" : -1 } )

Multikey

If you index a eld that contains an array, you will create a multikey index, which adds entries to the index for every item in the array. Consider a feedback collection with documents in the following form:

13.1. Indexing Overview

181

MongoDB Documentation, Release 2.0.6

{ "_id": ObjectId(...) "title": "Grocery Quality" "comments": [ { author_id: ObjectId(..) date: Date(...) text: "Please expand the cheddar selection." }, { author_id: ObjectId(..) date: Date(...) text: "Please expand the mustard selection." }, { author_id: ObjectId(..) date: Date(...) text: "Please expand the olive selection." } ] }

An index on the comments.text eld would be a multikey index, and will add items to the index for all of the sub-documents in the array. As a result you will be able to run the following query, using only the index to locate the document:
db.feedback.find( { "comments.text": "Please expand the olive selection." } )

Note: To build or rebuild indexes for a replica set see Building Indexes on Replica Sets (page 189). Warning: MongoDB will refuse to insert documents into a compound index where more than one eld is an array (i.e. {a: [1, 2], b: [1, 2]}); however, MongoDB permits documents in collections with compound indexes where only one eld per compound index is an array (i.e. {a: [1, 2], b: 1} and {a: 1, b: [1, 2]}.)

Unique Index A unique index causes MongoDB to reject all documents that contain a duplicate value for the indexed eld. To create a unique index on the user_id eld of the members collection, use the following operation in the mongo shell:
db.addresses.ensureIndex( { "user_id": 1 }, { unique: true } )

If you use the unique constraint on a compound index (page 181) then MongoDB will enforce uniqueness on the combination of values, rather than the individual value for any or all values of the key. If a document does not have a value for the indexed eld in a unique index, the index will store a null value for this document. MongoDB will only permit one document without a unique value in the collection because of this unique constraint. You can combine with the sparse index (page 182) to lter these null values from the unique index. Sparse Index Sparse indexes only contain entries for documents that have the indexed eld. By contrast, non-sparse indexes contain all documents in a collection, and store null values for documents that do not contain the indexed eld. Create a sparse index on the xmpp_id eld, of the members collection, using the following operation in the mongo shell:
db.addresses.ensureIndex( { "xmpp_id": 1 }, { sparse: true } )

Warning: Using these indexes will sometimes result in incomplete results when ltering or sorting results, because sparse indexes are not complete for all documents in a collection.

182

Chapter 13. Documentation

MongoDB Documentation, Release 2.0.6

Note: Sparse indexes in MongoDB are not to be confused with block-level indexes in other databases. Think of them as dense indexes with a specic lter. You can combine the sparse index option with the unique indexes (page 182) option so that mongod will reject documents that have duplicate values for a eld, but that ignore documents that do not have the key.

13.1.3 Index Creation Options


Most parameters 3 to the ensureIndex() (page 455) operation affect the kind of index that MongoDB creates. Two options, background construction (page 183) and duplicate dropping (page 184), affect how MongoDB builds the indexes. Background Construction By default, creating an index is a blocking operation. Building an index on a large collection of data, the operation can take a long time to complete. To resolve this issue, the background option can allow you to continue to use your mongod instance during the index build. Create an index in the background of the zipcode eld of the people collection using a command that resembles the following:
db.people.ensureIndex( { zipcode: 1}, {background: true} )

You can combine the background option with other options, as in the following:
db.people.ensureIndex( { zipcode: 1}, {background: true, sparse: true } )

Be aware of the following behaviors with background index construction: A single mongod can only build a single index at a time. The indexing operation runs in the background so that other database operations can run while creating the index. However, the mongo shell session or connection where you are creating the index will block until the index build is complete. Open another connection or mongo instance to continue using commands to the database. The background index operation use an incremental approach that is slower than the normal foreground index builds. If the index is larger than the available RAM, then the incremental process can take much longer than the foreground build. Building Indexes on Secondaries Background index operations on a replica set primary, become foreground indexing operations on secondary members of the set. All indexing operations on secondaries block replication. To build large indexes on secondaries the best approach is to restart one secondary at a time in standalone mode and build the index. After building the index, restart as a member of the replica set, allow it to catch up with the other members of the set, and then build the index on the next secondary. When all the secondaries have the new index, step down the primary, restart it as a standalone, and build the index on the former primary. Remember, the amount of time required to build the index on a secondary node must be within the window of the oplog, so that the secondary can catch up with the primary. See Building Indexes on Replica Sets (page 189) for more information on this process.
3

Other functionality accessible by way of parameters include sparse (page 182), unique (page 182), and TTL (page 184).

13.1. Indexing Overview

183

MongoDB Documentation, Release 2.0.6

Indexes on secondary members in recovering mode are always built in the foreground to allow them to catch up as soon as possible.

Note: Administrative operations such as repairDatabase and compact will not run concurrently with a background index build. Queries will not use these indexes until the index build is complete. Duplicate Dropping MongoDB cannot create a unique index (page 182) on a eld that has duplicate values. To force the creation of a unique index, you can specify the dropDups option, which will only index the rst occurrence of a value for the key, and delete all subsequent values. Warning: As in all unique indexes, if a document does not have the indexed eld, MongoDB will include it in the index with a null value. If subsequent elds do not have the indexed eld, and you have set {dropDups: true}, MongoDB will remove these documents from the collection when creating the index. If you combine dropDups with the sparse (page 182) option, this index will only include documents in the index that have the value, and the documents without the eld will remain in the database. To create a unique index that drops duplicates on the username eld of the accounts collection, use a command in the following form:
db.accounts.ensureIndex( { username: 1 }, { unique: true, dropDups: true } )

Warning: Specifying { dropDups: tion.

true } will delete data from your database. Use with extreme cau-

13.1.4 Index Features


TTL Indexes TTL indexes are special indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time. This is ideal for some types of information like machine generated event data, logs, and session information that only need to persist in a database for a limited amount of time. These indexes have the following limitations: Compound indexes are not supported. The indexed eld must be a date type. If the eld holds an array, and there are multiple date-typed data in the index, the document will expire when the lowest (i.e. earliest) matches the expiration threshold. Note: TTL indexes expire data by removing documents in a background task that runs once a minute. As a result, the TTL index provides no guarantees that expired documents will not exist in the collection. Consider that: Documents may remain in a collection after they expire and before the background process runs. The duration of the removal operations depend on the workload of your mongod instance.

184

Chapter 13. Documentation

MongoDB Documentation, Release 2.0.6

In all other respects, TTL indexes are normal secondary indexes (page 180), and if appropriate, MongoDB can use these indexes to fulll arbitrary queries. See Also: Expire Data from Collections by Setting TTL (page 234) Geospatial Indexes MongoDB provides geospatial indexes to support location-based and other similar queries in a two dimensional coordinate systems. For example, use geospatial indexes when you need to take a collection of documents that have coordinates, and return a number of options that are near a given coordinate pair. To create a geospatial index, your documents must have a coordinate pair. For maximum compatibility, these coordinate pairs should be in the form of a two element array, such as [ x , y ]. Given the eld of loc, that held a coordinate pair, in the collection places, you would create a geospatial index as follows:
db.places.ensureIndex( { loc : "2d" } )

MongoDB will reject documents that have values in the loc eld beyond the minimum and maximum values. Note: MongoDB permits only one geospatial index per collection. Although, MongoDB will allow clients to create multiple geospatial indexes, a single query can use only one index. See the $near, and the database command geoNear for more information on accessing geospatial data. Geohaystack Indexes In addition to conventional geospatial indexes (page 185), MongoDB also provides a bucket-based geospatial index, called geospatial haystack indexes. These indexes support high performance queries for locations within a small area, when the query must lter along another dimension. Example If you need to return all documents that have coordinates within 25 miles of a given point and have a type eld value of museum, a haystack index would be provide the best support for these queries. Haystack indexes allow you to tune your bucket size to the distribution of your data, so that in general you search only very small regions of 2d space for a particular kind of document. These indexes are not suited for nding the closest documents to a particular location, when the closest documents are far away compared to bucket size.

13.1.5 Index Limitations


Be aware of the following current limitations of MongoDBs indexes: A collection may have no more than 64 indexes (page 573). Index keys can be no larger than 1024 bytes (page 573). This includes the eld value or values, the eld name or names, and the namespace. The name of an index, including the namespace must be shorter than 128 characters (page 573). Indexes have storage requirements, and impacts insert/update speed to some degree.

13.1. Indexing Overview

185

MongoDB Documentation, Release 2.0.6

Create indexes to support queries and other operations, but do not maintain indexes that your MongoDB instance cannot or will not use.

13.2 Indexing Operations


13.2.1 Synopsis
Indexes allow MongoDB to process and fulll queries quickly, by creating an small and efcient representation of the documents in the collection. Fundamentally, indexes in MongoDB are operationally similar to indexes in other database systems. Read the Indexing Overview (page 179) documentation for more information on the fundamentals of indexing in MongoDB, and the Indexing Strategies (page 190) documentation for practical strategies and examples for using indexes in your application. This document provides operational guidelines and procedures related to indexing data in MongoDB collections.

13.2.2 Operations
Creation Use the db.collection.ensureIndex() (page 455), or similar method for your driver, to create an index. Consider the following prototype operation:
db.collection.ensureIndex( { a: 1 } )

The following example creates 4 an index on the phone-number eld of the people collection:
db.people.ensureIndex( { "phone-number": 1 } )

To create a compound index (page 181), use an operation that resembles the following prototype:
db.collection.ensureIndex( { a: 1, b: 1, c: 1 } )

For example, the following operation will create an index on the item, category, and price elds of the products collection:
db.products.ensureIndex( { item: 1, category: 1, price: 1 } )

Some drivers may specify indexes, using NumberLong(1) rather than 1 as the specication. This does not have any affect on the resulting index. Note: To build or rebuild indexes for a replica set see Building Indexes on Replica Sets (page 189).

Special Creation Options

Note: TTL collections use a special expire index option. See Expire Data from Collections by Setting TTL (page 234) for more information.
4

As the name suggests, ensureIndex() (page 455) only creates an index if an index of the same specication does not already exist.

186

Chapter 13. Documentation

MongoDB Documentation, Release 2.0.6

Sparse Indexes

To create a sparse index (page 182) on a eld, use an operation that resembles the following prototype:
db.collection.ensureIndex( { a: 1 }, { sparse: true } )

The following example creates a sparse index on the users table that only indexes the twitter_name if a document has this eld. This index will not include documents in this collection without the twitter_name eld.
db.users.ensureIndex( { twitter_name: 1 }, { sparse: true } )

Note: Sparse indexes can affect the results returned by the query, particularly with respect to sorts on elds not included in the index. See the sparse index (page 182) section for more information.

Unique Indexes

To create a unique indexes (page 182), consider the following prototype:


db.collection.ensureIndex( { a: 1 }, { unique: true } )

For example, you may want to create a unique index on the "tax-id": of the accounts collection to prevent storing multiple account records for the same legal entity:
db.accounts.ensureIndex( { "tax-id": 1 }, { unique: true } )

The _id index (page 180) is a unique index. In some situations you may consider using _id eld itself for this kind of data rather than using a unique index on another eld. In many situations you will want to combine the unique constraint with the sparse option. When MongoDB indexes a eld, if a document does not have a value for a eld, the index entry for that item will be null. Since unique indexes cannot have duplicate values for a eld, without the sparse option, MongoDB will reject the second document and all subsequent documents without the indexed eld. Consider the following prototype.
db.collection.ensureIndex( { a: 1 }, { unique: true, sparse: true } )

You can also enforce a unique constraint on compound indexes (page 181), as in the following prototype:
db.collection.ensureIndex( { a: 1, b: 1 }, { unique: true } )

These indexes enforce uniqueness for the combination of index keys and not for either key individually.
Background

To create an index in the background you can specify background construction (page 183). Consider the following prototype invocation of db.collection.ensureIndex() (page 455):
db.collection.ensureIndex( { a: 1 }, { background: true } )

Consider the section on background index construction (page 183) for more information about these indexes and their implications.

13.2. Indexing Operations

187

MongoDB Documentation, Release 2.0.6

Drop Duplicates

To force the creation of a unique index (page 182) index on a collection with duplicate values in the eld you are indexing you can use the dropDups option. This will force MongoDB to create a unique index by deleting documents with duplicate values when building the index. Consider the following prototype invocation of db.collection.ensureIndex() (page 455):
db.collection.ensureIndex( { a: 1 }, { dropDups: true } )

See the full documentation of duplicate dropping (page 184) for more information. Warning: Specifying { dropDups: tion. true } may delete data from your database. Use with extreme cau-

Refer to the ensureIndex() (page 455) documentation for additional index creation options. Removal To remove an index, use the db.collection.dropIndex() (page 455) method, as in the following example:
db.accounts.dropIndex( { "tax-id": 1 } )

This will remove the index on the "tax-id" eld in the accounts collection. The shell provides the following document after completing the operation:
{ "nIndexesWas" : 3, "ok" : 1 }

Where the value of nIndexesWas reects the number of indexes before removing this index. You can also use the db.collection.dropIndexes() (page 455) to remove all indexes, except for the _id index (page 180) from a collection. These shell helpers provide wrappers around the dropIndexes database command. Your client library (page 225) may have a different or additional interface for these operations. Rebuilding If you need to rebuild indexes for a collection you can use the db.collection.reIndex() (page 462) method. This will drop all indexes, including the _id index (page 180), and then rebuild all indexes. The operation takes the following form:
db.accounts.reIndex()

MongoDB will return the following document when the operation completes:
{ "nIndexesWas" : 2, "msg" : "indexes dropped for collection", "nIndexes" : 2, "indexes" : [ { "key" : { "_id" : 1, "tax-id" : 1 }, "ns" : "records.accounts", "name" : "_id_"

188

Chapter 13. Documentation

MongoDB Documentation, Release 2.0.6

} ], "ok" : 1 }

This shell helper provides a wrapper around the reIndex database command. Your client library (page 225) may have a different or additional interface for this operation. Note: To build or rebuild indexes for a replica set see Building Indexes on Replica Sets (page 189).

13.2.3 Building Indexes on Replica Sets


Consideration Background index creation operations (page 183) become foreground indexing operations on secondary members of replica sets. The foreground index building process blocks all replication and read operations on the secondaries while they build the index. Secondaries will begin building indexes after the primary nishes building the index. In shard clusters, the mongos will send ensureIndex() (page 455) to the primary members of the replica set for each shard, which then replicate to the secondaries after the primary nishes building the index. To minimize the impact of building an index on your replica set, use the following procedure to build indexes on secondaries: Procedure

Note: If you need to build an index in a shard cluster, repeat the following procedure for each replica set that provides each shard. 1. Stop the mongod process on one secondary. Restart the mongod process without the --replSet (page 490) option and running on a different port. 5 This instance is now in standalone mode. 2. Create the new index or rebuild the index on this mongod instance. 3. Restart the mongod instance with the --replSet (page 490) option. Allow replication to catch up on this member. 4. Repeat this operation on all of the remaining secondaries. 5. Run rs.stepDown() (page 479) on the primary member of the set, and then repeat this procedure on the former primary. Warning: Ensure that your oplog is large enough to permit the indexing or re-indexing operation to complete without falling too far behind to catch up. See the oplog sizing (page 36) documentation for additional information.

Note: This procedure does take one member out of the replica set at a time. However, this procedure will only affect one member of the set at a time rather than all secondaries at the same time.
By running the mongod on a different port, you ensure that the other members of the replica set and all clients will not contact the member while you are building the index.
5

13.2. Indexing Operations

189

MongoDB Documentation, Release 2.0.6

13.2.4 Measuring Index Use


Query performance is a good general indicator of index use; however, for more precise insight into index use, MongoDB provides the following tools: explain() (page 449) Append the explain() (page 449) method to any cursor (e.g. query) to return a document with statistics about the query process, including the index used, and the number of documents scanned. cursor.hint() (page 450) Append the hint() (page 450) to any cursor (e.g. query) with the index as the argument to force MongoDB to use a specic index to fulll the query. Consider the following example:
db.people.find( { name: "John Doe", zipcode: { $gt: 63000 } } } ).hint( { zipcode: 1 } )

You can use hint() (page 450) and explain() (page 449) in conjunction with each other to compare the effectiveness of a specic index. Specify the $natural operator to the hint() (page 450) method to prevent MongoDB from using any index:
db.people.find( { name: "John Doe", zipcode: { $gt: 63000 } } } ).hint( { $natural: 1 } )

indexCounters (page 543) Use the indexCounters (page 543) data in the output of serverStatus for insight into database-wise index utilization.

13.2.5 Monitoring and Controlling Index Building


To see the status of the indexing processes, you can use the db.currentOp() (page 466) method in the mongo shell. The value of the query eld and the msg eld will indicate if the operation is an index build. The msg eld also indicates the percent of the build that is complete. If you need to terminate an ongoing index build, You can use the db.killOP() (page 469) method in the mongo shell.

13.3 Indexing Strategies


13.3.1 Synopsis
Indexes allow MongoDB to process and fulll queries quickly, by creating an small and efcient representation of the documents in the collection. Read the Indexing Overview (page 179) documentation for more information on the fundamentals of indexing in MongoDB, and the Indexing Operations (page 186) documentation for operational guidelines and examples for building and managing indexes. This document provides an overview of approaches to indexing with MongoDB and a selection of strategies that you can use as you develop applications with MongoDB.

13.3.2 Strategies
Use Covered Queries In some cases, MongoDB will be able to fulll a query using only the index, without needing to scan actual documents from the database. To use a covered index you must: 190 Chapter 13. Documentation

MongoDB Documentation, Release 2.0.6

ensure that the index includes all of the elds in the result. This means that the projection, must explicitly exclude the _id eld from the result set, unless the index includes _id. if any of the indexed elds in any of the documents in the collection includes an array, then the index becomes a multi-key index (page 181) index, and cannot support a covered query. Use the explain() (page 449) to test the query. If MongoDB was able to use a covered index, then the value of the indexOnly eld will be true. Covered queries are much faster than other queries, for two reasons: indexes are typically stored in RAM or located sequentially on disk, and indexes are smaller than the documents they catalog. Sort Using Indexes While the sort() (page 452) method supports in-memory sort operations without the use of an index, these operations are: 1. Signicantly slower than sort operations that use an index. 2. Abort when the sort operation consume 32 megabytes of memory. For the best result, index the eld you want sorted query results. For example: if you have an { username: username eld. 1 } index, you can use this index to return documents sorted by the

MongoDB can return sorted results in either ascending or descending order using an index in ascending or descending order, because MongoDB can transverse items in the index in both directions. For more information about index order see the section on Ascending and Descending Index Order (page 181). In general, MongoDB can use a compound index to return sorted results if : the rst sorted eld is rst eld in the index. the last eld in the index before the rst sorted eld is an equality match in the query. Consider the example presented below for an illustration of this concept. Example Given the following index:
{ a: 1, b: 1, c: 1, d: 1 }

The following query and sort operations will be able to use the index:
db.collection.find().sort( { a:1 } ) db.collection.find().sort( { a:1, b:1 } ) db.collection.find( { a:4 } ).sort( { a:1, b:1 } ) db.collection.find( { b:5 } ).sort( { a:1, b:1 } ) db.collection.find( { a:{ $gt:4 } } ).sort( { a:1, b:1 } ) db.collection.find( { a:5 } ).sort( { a:1, b:1 } ) db.collection.find( { a:5 } ).sort( { b:1, c:1 } ) db.collection.find( { a:5, c:4, b:3 } ).sort( { d:1 } )

13.3. Indexing Strategies

191

MongoDB Documentation, Release 2.0.6

db.collection.find( { a:5, b:3, d:{ $gt:4 } } ).sort( { c:1 } ) db.collection.find( { a:5, b:3, c:{ $lt:2 }, d:{ $gt:4 } } ).sort( { c:1 } )

However, the following query operations would not be able to sort the results using the index:
db.collection.find().sort( { b:1 } ) db.collection.find( { b:5 } ).sort( { b:1 } ) db.collection.find( { b:{ $gt:5 } } ).sort( { a:1, b:1 } )

Store Indexes in Memory For best results, always ensure that your indexes t entirely in RAM, so the system doesnt need to read the index from disk to fulll a query. If your indexes approach or exceed the total size of available RAM, they may not t in memory. You can check the size of your indexes in the mongo shell, using the db.collection.totalIndexSize() (page 464) helper. You may also use collStats or db.collection.stats() (page 463) to return this and related information (page 552). db.collection.totalIndexSize() (page 464) returns data in bytes. Consider the following invocation:
> db.collection.totalIndexSize() 4294976499

This reports a total index size of roughly 4 gigabytes. Consider this value in contrast to the total amount of available system RAM and the rest of the working set. Also remember: if you have and use multiple collections to consider the size of all indexes on all collections. there are some limited cases where indexes do not need to t in RAM (page 194).

13.3.3 Considerations
Above all, when developing your indexing strategy you should have a deep understanding of: the applications queries. the relative frequency of each query in the application. the current indexes created for your collections. which indexes the most common queries use. MongoDB can only use one index to support any given operation. However, each clause of an $or query can use its own index. Selectivity Selectivity describes the ability of a query to narrow the result set using the index. Effective indexes are more selective and allow MongoDB to use the index for a larger portion of the work associated with fullling the query. There are two aspects of selectivity: 1. Data need to have a high distribution of the values for the indexed key. 2. Queries need to limit the number of possible documents using the indexed eld.

192

Chapter 13. Documentation

MongoDB Documentation, Release 2.0.6

Example First, consider an index, { a : tion:


{ { { { { { { { { _id: _id: _id: _id: _id: _id: _id: _id: _id: ObjectId(), ObjectId(), ObjectId(), ObjectId(), ObjectId(), ObjectId(), ObjectId(), ObjectId(), ObjectId(), a: a: a: a: a: a: a: a: a: 1, 1, 1, 2, 2, 2, 3, 3, 3,

1 }, on a collection where a has three values evenly distributed across the collecb: b: b: b: b: b: b: b: b: "ab" "cd" "ef" "jk" "lm" "no" "pq" "rs" "tv" } } } } } } } } }

If you do a query for { a: 2, b: "no" } MongoDB will still need to scan 3 documents of the documents in the collection to fulll the query. Similarly, a query for { a: { $gt: 1}, b: "tv" }, would need to scan through 6 documents, although both queries would return the same result. Then, consider an index on a eld that has many values evenly distributed across the collection:
{ { { { { { { { { _id: _id: _id: _id: _id: _id: _id: _id: _id: ObjectId(), ObjectId(), ObjectId(), ObjectId(), ObjectId(), ObjectId(), ObjectId(), ObjectId(), ObjectId(), a: a: a: a: a: a: a: a: a: 1, 2, 3, 4, 5, 6, 7, 8, 9, b: b: b: b: b: b: b: b: b: "ab" "cd" "ef" "jk" "lm" "no" "pq" "rs" "tv" } } } } } } } } }

Although the index on a is more selective, in the sense that queries can use the index more effectively, a query such as { a: { $gt: 5 }, b: "tv" } would still need to scan 4 documents. By contrast, given a query like { a: 2, b: "cd" }, MongoDB would only need to scan one document to fulll the rest of the query. The index and query are more selective because the values of a are evenly distributed and the query can selects a specic document using the index. To ensure optimal performance, use indexes that are maximally selective relative to your queries. At the same time queries need to be appropriately selective relative to your indexed data. If overall selectivity is low enough, and MongoDB must read a number of documents to return results, then some queries may perform faster without indexes. See the Measuring Index Use (page 190) section for more information on testing information. Insert Throughput MongoDB must update all indexes associated with a collection after every insert, update, or delete operation. Every index on a collection adds some amount of overhead to these operations. In almost every case, the performance gains that indexes realize for read operations are worth the insertion penalty; however: in some cases, an index to support an infrequent query may incur more insert-related costs than saved read-time. in some situations, if you have many indexes on a collection with a high insert throughput and a number of very similar indexes, you may nd better overall results by using a slightly less effective index on some queries if it means consolidating the total number of indexes. If your indexes and queries are not very selective, the speed improvements for query operations may not offset the costs of maintaining an index. See the section on index selectivity (page 192) for more information.

13.3. Indexing Strategies

193

MongoDB Documentation, Release 2.0.6

In some cases a single compound on two or more elds index may support all of the queries that index on a single eld index, or a smaller compound index. In general, MongoDB can use compound index to support the same queries as any of its prexes. Consider the following example: Example Given the following index on a collection:
{ x: 1, y: 1, z: 1 }

Can support a number of queries as well as most of the queries that the following indexes support:
{ x: 1 } { x: 1, y: 1 }

There are some situations where the prex indexes may offer better query performance as is the case if z is a large array. Also, consider the following index on the same collection:
{ x: 1, z: 1 }

The { x: 1, y: 1, z: 1 } index can support many of the same queries as the above index; however, { x: 1, z: 1 } has additional use: Given the following query:
db.collection.find( { x: 5 } ).sort( { z: 1} )

The { x: 1, z: 1 } will support both the query and the sort operation, while the { x: z: 1 } index can only support the query. See the Sort Using Indexes (page 191) section for more information.

1, y:

1,

Index Size Indexes require space, both on disk and in RAM. Indexes require less space in RAM than the full documents in the collection. In theory, if your queries only match a subset of the documents and can use the index to locate those documents, MongoDB can maintain a much smaller working set. Ensure that: the indexes and the working set can t RAM at the same time. all of your indexes use less space than all of the documents in the collection. This may not be an issue all of your queries use covered queries (page 190) or indexes do not need to t into ram, as in the following situation: Indexes do not have to t entirely into RAM in all cases. If the value of the indexed eld grows with every insert, and most queries select recently added documents; then MongoDB only needs to keep the parts of the index that hold the most recent or right-most values in RAM. This allows for efcient index use for read and write operations and minimize the amount of RAM required to support the index.

194

Chapter 13. Documentation

Part VII

Aggregation

195

MongoDB Documentation, Release 2.0.6

Aggregation provides a natural method for aggregating data inside of MongoDB. For a description of MongoDB aggregation, see Aggregation Framework (page 199). For examples of aggregation, see Aggregation Framework Examples (page 205). For descriptions of aggregation operators, see Aggregation Framework Reference (page 209). The following is the outline of the aggregation documentation:

197

MongoDB Documentation, Release 2.0.6

198

CHAPTER

FOURTEEN

AGGREGATION FRAMEWORK
New in version 2.1.

14.1 Overview
The MongoDB aggregation framework provides a means to calculate aggregated values without having to use mapreduce. While map-reduce is powerful, it is often more difcult than necessary for many simple aggregation tasks, such as totaling or averaging eld values. If youre familiar with SQL, the aggregation framework provides similar functionality to GROUP BY and related SQL operators as well as simple forms of self joins. Additionally, the aggregation framework provides projection capabilities to reshape the returned data. Using the projections in the aggregation framework, you can add computed elds, create new virtual sub-objects, and extract sub-elds into the top-level of results. See Also: A presentation from MongoSV 2011: MongoDBs New Aggregation Framework. Additionally, consider Aggregation Framework Examples (page 205) and Aggregation Framework Reference (page 209) for additional documentation.

14.2 Framework Components


This section provides an introduction to the two concepts that underpin the aggregation framework: pipelines and expressions.

14.2.1 Pipelines
Conceptually, documents from a collection pass through an aggregation pipeline, which transforms these objects as they pass through. For those familiar with UNIX-like shells (e.g. bash,) the concept is analogous to the pipe (i.e. |) used to string text lters together. In a shell environment the pipe redirects a stream of characters from the output of one process to the input of the next. The MongoDB aggregation pipeline streams MongoDB documents from one pipeline operator (page 210) to the next to process the documents. All pipeline operators process a stream of documents and the pipeline behaves as if the operation scans a collection and passes all matching documents into the top of the pipeline. Each operator in the pipeline transforms each document as it passes through the pipeline.

199

MongoDB Documentation, Release 2.0.6

Note: Pipeline operators need not produce one output document for every input document: operators may also generate new documents or lter out documents. Warning: The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope. See Also: The Aggregation Framework Reference (page 209) includes documentation of the following pipeline operators: $project (page 404) $match (page 402) $limit (page 402) $skip (page 406) $unwind (page 408) $group (page 399) $sort (page 406)

14.2.2 Expressions
Expressions (page 216) produce output documents based on calculations performed on input documents. The aggregation framework denes expressions using a document format using prexes. Expressions are stateless and are only evaluated when seen by the aggregation process. All aggregation expressions can only operate on the current document in the pipeline, and cannot integrate data from other documents. The accumulator expressions used in the $group (page 399) operator maintain that state (e.g. totals, maximums, minimums, and related data) as documents progress through the pipeline. See Also: Aggregation expressions (page 216) for additional examples of the expressions provided by the aggregation framework.

14.3 Use
14.3.1 Invocation
Invoke an aggregation operation with the aggregate() (page 454) wrapper in the mongo shell or the aggregate database command. Always call aggregate() (page 454) on a collection object that determines the input documents of the aggregation pipeline. The arguments to the aggregate() (page 454) method specify a sequence of pipeline operators (page 210), where each operator may have a number of operands. First, consider a collection of documents named articles using the following format:
{ title : "this is my title" , author : "bob" , posted : new Date () , pageViews : 5 , tags : [ "fun" , "good" , "fun" ] , comments : [ { author :"joe" , text : "this is cool" } ,

200

Chapter 14. Aggregation Framework

MongoDB Documentation, Release 2.0.6

{ author :"sam" , text : "this is bad" } ], other : { foo : 5 } }

The following example aggregation operation pivots data to create a set of author names grouped by tags applied to an article. Call the aggregation framework by issuing the following command:
db.articles.aggregate( { $project : { author : 1, tags : 1, } }, { $unwind : "$tags" }, { $group : { _id : { tags : "$tags" }, authors : { $addToSet : "$author" } } } );

The aggregation pipeline begins with the collection articles and selects the author and tags elds using the $project (page 404) aggregation operator. The $unwind (page 408) operator produces one output document per tag. Finally, the $group (page 399) operator pivots these elds.

14.3.2 Result
The aggregation operation in the previous section returns a document with two elds: result which holds an array of documents returned by the pipeline ok which holds the value 1, indicating success, or another value if there was an error As a document, the result is subject to the BSON Document size (page 573) limit, which is currently 16 megabytes.

14.4 Optimizing Performance


Because you will always call aggregate on a collection object, which logically inserts the entire collection into the aggregation pipeline, you may want to optimize the operation by avoiding scanning the entire collection whenever possible.

14.4.1 Pipeline Operators and Indexes


Depending on the order in which they appear in the pipeline, aggregation operators can take advantage of indexes. The following pipeline operators take advantage of an index when they occur at the beginning of the pipeline: $match (page 402) $sort (page 406) $limit (page 402) $skip (page 406) The above operators can also use an index when placed before the following aggregation operators: $project (page 404)

14.4. Optimizing Performance

201

MongoDB Documentation, Release 2.0.6

$unwind (page 408) $group (page 399).

14.4.2 Early Filtering


If your aggregation operation requires only a subset of the data in a collection, use the $match (page 402) operator to restrict which items go in to the top of the pipeline, as in a query. When placed early in a pipeline, these $match (page 402) operations use suitable indexes to scan only the matching documents in a collection. Placing a $match (page 402) pipeline stage followed by a $sort (page 406) stage at the start of the pipeline is logically equivalent to a single query with a sort, and can use an index. In future versions there may be an optimization phase in the pipeline that reorders the operations to increase performance without affecting the result. However, at this time place $match (page 402) operators at the beginning of the pipeline when possible.

14.4.3 Memory for Cumulative Operators


Certain pipeline operators require access to the entire input set before they can produce any output. For example, $sort (page 406) must receive all of the input from the preceding pipeline operator before it can produce its rst output document. The current implementation of $sort (page 406) does not go to disk in these cases: in order to sort the contents of the pipeline, the entire input must t in memory. $group (page 399) has similar characteristics: Before any $group (page 399) passes its output along the pipeline, it must receive the entirety of its input. For the $group (page 399) operator, this frequently does not require as much memory as $sort (page 406), because it only needs to retain one record for each unique key in the grouping specication. The current implementation of the aggregation framework logs a warning if a cumulative operator consumes 5% or more of the physical memory on the host. Cumulative operators produce an error if they consume 10% or more of the physical memory on the host.

14.5 Sharded Operation


Note: Changed in version 2.1. Some aggregation operations using aggregate will cause mongos instances to require more CPU resources than in previous versions. This modied performance prole may dictate alternate architectural decisions if you use the aggregation framework extensively in a sharded environment. The aggregation framework is compatible with sharded collections. When operating on a sharded collection, the aggregation pipeline is split into two parts. The aggregation framework pushes all of the operators up to and including the rst $group (page 399) or $sort (page 406) to each shard. 1 Then, a second pipeline on the mongos runs. This pipeline consists of the rst $group (page 399) or $sort (page 406) and any remaining pipeline operators, and runs on the results received from the shards. The mongos pipeline merges $sort (page 406) operations from the shards. The $group (page 399) operator brings in any sub-totals from the shards and combines them: in some cases these may be structures. For example, the $avg (page 400) expression maintains a total and count for each shard; mongos combines these values and then divides.
1 If an early $match (page 402) can exclude shards through the use of the shard key in the predicate, then these operators are only pushed to the relevant shards.

202

Chapter 14. Aggregation Framework

MongoDB Documentation, Release 2.0.6

14.6 Limitations
Aggregation operations with the aggregate command have the following limitations: The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey, DBRef, Code, CodeWScope. Output from the pipeline can only contain 16 megabytes. If your result set exceeds this limit, the aggregate command produces an error. If any single aggregation operation consumes more than 10 percent of system RAM the operation will produce an error.

14.6. Limitations

203

MongoDB Documentation, Release 2.0.6

204

Chapter 14. Aggregation Framework

CHAPTER

FIFTEEN

AGGREGATION FRAMEWORK EXAMPLES


MongoDB provides exible data aggregation functionality with the aggregate command. These aggregation operations are exible and provide an idiomatic way to combine and perform basic transformations on data inside of MongoDB. See the Aggregation Framework (page 199) document for a full overview of aggregation capabilities and Aggregation Framework Reference (page 209) for a complete documentation of all aggregation operators and expressions. This document provides a number of practical examples that display the capabilities of the aggregation framework. All examples use a publicly available data set of all zipcodes and populations in the United States.

15.1 Requirements
1. mongod and mongo, version 2.1 or later. 2. The zipcode data set. These data are available at: media.mongodb.org/zips.json. Use mongoimport to load this data set into your mongod instance.

15.2 Data Model


Each document in this collection has the following form:
{ "_id": "10280", "city": "NEW YORK", "state": "NY", "pop": 5574, "loc": [ -74.016323, 40.710537 ] }

In these documents: The _id eld holds the zipcode as a string. The city eld holds the city. The state eld holds the two letter state abbreviation.

205

MongoDB Documentation, Release 2.0.6

The pop eld holds the population. The loc eld holds the location as a latitude longitude pair.

15.3 Examples
All of the following examples use the aggregate() (page 454) helper in the mongo shell. aggregate() (page 454) provides a wrapper around the aggregate database command. See the documentation for your driver (page 225) for a more idiomatic interface for data aggregation operations.

15.3.1 States with Populations Over 10 Million


To return all states with a population greater than 10 million, use the following aggregation operation:
db.zipcodes.aggregate( { $group : { _id : "$state", totalPop : { $sum : "$pop" } } }, { $match : {totalPop : { $gte : 10*1000*1000 } } } )

Aggregations operations using the aggregate() (page 454) helper, process all documents on the zipcodes collection. aggregate() (page 454) a number of pipeline (page 199) operators that dene the aggregation process. In the above example, the pipeline passes all documents in the zipcodes collection through the following steps: the $group (page 399) operator collects all documents and creates documents for each state. These new per-state documents have one eld in addition the _id eld: totalpop which is a generated eld using the $sum (page 408) operation to calculate the total value of all pop elds in the source documents. After the $group (page 399) operation the document in the pipeline resemble the following:
{ "_id" : "AK", "totalPop" : 550043 }

the $match (page 402) operation lters these documents so that the only documents that remain are those where the value of totalpop is greater than or equal to 10 million. The $match (page 402) operation does not alter the documents, which have the same format as the documents output by $group (page 399). The equivalent SQL for this operation is:
SELECT state, SUM(pop) AS pop FROM zips GROUP BY state HAVING pop > (10*1000*1000)

15.3.2 Average City Population by State


To return the average populations for cities in each state, use the following aggregation operation:
db.zipcodes.aggregate( { $group : { _id : { state : "$state", city : "$city" }, pop : { $sum : "$pop" } } }, { $group :

206

Chapter 15. Aggregation Framework Examples

MongoDB Documentation, Release 2.0.6

{ _id : "$_id.state", avgCityPop : { $avg : "$pop" } } } )

Aggregations operations using the aggregate() (page 454) helper, process all documents on the zipcodes collection. aggregate() (page 454) a number of pipeline (page 199) operators that dene the aggregation process. In the above example, the pipeline passes all documents in the zipcodes collection through the following steps: the $group (page 399) operator collects all documents and creates new documents for every combination of the city and state elds in the source document. After this stage in the pipeline, the documents resemble the following:
{ "_id" : { "state" : "CO", "city" : "EDGEWATER" }, "pop" : 13154 }

the second $group (page 399) operator collects documents by the state eld and use the $avg (page 400) expression to compute a value for the avgCityPop eld. The nal output of this aggregation operation is:
{ "_id" : "MN", "avgCityPop" : 5335 },

15.3.3 Largest and Smallest Cities by State


To return the smallest and largest cities by population for each state, use the following aggregation operation:
db.zipcodes.aggregate( { $group: { _id: { state: "$state", city: "$city" }, pop: { $sum: "$pop" } } }, { $sort: { pop: 1 } }, { $group: { _id : "$_id.state", biggestCity: { $last: "$_id.city" }, biggestPop: { $last: "$pop" }, smallestCity: { $first: "$_id.city" }, smallestPop: { $first: "$pop" } } }, // the following $project is optional, and // modifies the output format. { $project: { _id: 0, state: "$_id", biggestCity: { name: "$biggestCity", pop: "$biggestPop" }, smallestCity: { name: "$smallestCity", pop: "$smallestPop" } } } )

Aggregations operations using the aggregate() (page 454) helper, process all documents on the zipcodes collection. aggregate() (page 454) a number of pipeline (page 199) operators that dene the aggregation process. In the above example, the pipeline passes all documents in the zipcodes collection through the following steps:

15.3. Examples

207

MongoDB Documentation, Release 2.0.6

the $group (page 399) operator collects all documents and creates new documents for every combination of the city and state elds in the source documents. By specifying the value of _id as a sub-document that contains both elds, the operation preserves the state eld for use later in the pipeline. The documents produced by this stage of the pipeline have a second eld, pop, which uses the $sum (page 408) operator to provide the total of the pop elds in the source document. At this stage in the pipeline, the documents resemble the following:
{ "_id" : { "state" : "CO", "city" : "EDGEWATER" }, "pop" : 13154 }

$sort (page 406) operator orders the documents in the pipeline based on the vale of the pop eld from largest to smallest. This operation does not alter the documents. the second $group (page 399) operator collects the documents in the pipeline by the state eld, which is a eld inside the nested _id document. Within each per-state document this $group (page 399) operator species four elds: Using the $last (page 401) expression, the $group (page 399) operator creates the biggestcity and biggestpop elds that store the city with the largest population and that population. Using the $first (page 400) expression, the $group (page 399) operator creates the smallestcity and smallestpop elds that store the city with the smallest population and that population. The documents, at this stage in the pipeline resemble the following:
{ "_id" : "WA", "biggestCity" : "SEATTLE", "biggestPop" : 520096, "smallestCity" : "BENGE", "smallestPop" : 2 }

The nal operation is $project (page 404), which renames the _id eld to state and moves the biggestCity, biggestPop, smallestCity, and smallestPop into biggestCity and smallestCity sub-documents. The nal output of this aggregation operation is:
{ "state" : "RI", "biggestCity" : { "name" : "CRANSTON", "pop" : 176404 }, "smallestCity" : { "name" : "CLAYVILLE", "pop" : 45 } }

208

Chapter 15. Aggregation Framework Examples

CHAPTER

SIXTEEN

AGGREGATION FRAMEWORK REFERENCE


New in version 2.1.0. The aggregation framework provides the ability to project, process, and/or control the output of the query, without using map-reduce. Aggregation uses a syntax that resembles the same syntax and form as regular MongoDB database queries. These aggregation operations are all accessible by way of the aggregate() method. While all examples in this document use this method, aggregate() is merely a wrapper around the database command aggregate. The following prototype aggregation operations are equivalent:
db.people.aggregate( <pipeline> ) db.people.aggregate( [<pipeline>] ) db.runCommand( { aggregate: "people", pipeline: [<pipeline>] } )

These operations perform aggregation routines on the collection named people. <pipeline> is a placeholder for the aggregation pipeline denition. aggregate() accepts the stages of the pipeline (i.e. <pipeline>) as an array, or as arguments to the method. This documentation provides an overview of all aggregation operators available for use in the aggregation pipeline as well as details regarding their use and behavior. See Also: Aggregation Framework (page 199) overview, the Aggregation Framework Documentation Index (page 197), and the Aggregation Framework Examples (page 205) for more information on the aggregation functionality. Aggregation Operators: Pipeline (page 210) Expressions (page 216) Boolean Operators (page 216) Comparison Operators (page 216) Arithmetic Operators (page 217) String Operators (page 218) Date Operators (page 218) Conditional Expressions (page 219)

209

MongoDB Documentation, Release 2.0.6

16.1 Pipeline
Warning: The pipeline cannot operate on values of the following types: Binary, Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope. Pipeline operators appear in an array. Conceptually, documents pass through these operators in a sequence. All examples in this section assume that the aggregation pipeline begins with a collection named article that contains documents that resemble the following:
{ title : "this is my title" , author : "bob" , posted : new Date() , pageViews : 5 , tags : [ "fun" , "good" , "fun" ] , comments : [ { author :"joe" , text : "this is cool" } , { author :"sam" , text : "this is bad" } ], other : { foo : 5 } }

The current pipeline operators are: $project Reshapes a document stream by renaming, adding, or removing elds. Also use $project (page 404) to create computed values or sub-objects. Use $project (page 404) to: Include elds from the original document. Insert computed elds. Rename elds. Create and populate elds that hold sub-documents. Use $project (page 404) to quickly select the elds that you want to include or exclude from the response. Consider the following aggregation framework operation.
db.article.aggregate( { $project : { title : 1 , author : 1 , }} );

This operation includes the title eld and the author eld in the document that returns from the aggregation pipeline. Note: The _id eld is always included by default. You may explicitly exclude _id as follows:
db.article.aggregate( { $project : { _id : 0 , title : 1 , author : 1 }} );

210

Chapter 16. Aggregation Framework Reference

MongoDB Documentation, Release 2.0.6

Here, the projection excludes the _id eld but includes the title and author elds. Projections can also add computed elds to the document stream passing through the pipeline. A computed eld can use any of the expression operators (page 216). Consider the following example:
db.article.aggregate( { $project : { title : 1, doctoredPageViews : { $add:["$pageViews", 10] } }} );

Here, the eld doctoredPageViews represents the value of the pageViews eld after adding 10 to the original eld using the $add (page 398). Note: You must enclose the expression that denes the computed eld in braces, so that the expression is a valid object. You may also use $project (page 404) to rename elds. Consider the following example:
db.article.aggregate( { $project : { title : 1 , page_views : "$pageViews" , bar : "$other.foo" }} );

This operation renames the pageViews eld to page_views, and renames the foo eld in the other subdocument as the top-level eld bar. The eld references used for renaming elds are direct expressions and do not use an operator or surrounding braces. All aggregation eld references can use dotted paths to refer to elds in nested documents. Finally, you can use the $project (page 404) to create and populate new sub-documents. Consider the following example that creates a new object-valued eld named stats that holds a number of values:
db.article.aggregate( { $project : { title : 1 , stats : { pv : "$pageViews", foo : "$other.foo", dpv : { $add:["$pageViews", 10] } } }} );

This projection includes the title eld and places $project (page 404) into inclusive mode. Then, it creates the stats documents with the following elds: pv which includes and renames the pageViews from the top level of the original documents. foo which includes the value of other.foo from the original documents. dpv which is a computed eld that adds 10 to the value of the pageViews eld in the original document using the $add (page 398) aggregation expression. $match Provides a query-like interface to lter documents out of the aggregation pipeline. The $match (page 402)

16.1. Pipeline

211

MongoDB Documentation, Release 2.0.6

drops documents that do not match the condition from the aggregation pipeline, and it passes documents that match along the pipeline unaltered. The syntax passed to the $match (page 402) is identical to the query syntax. Consider the following prototype form:
db.article.aggregate( { $match : <match-predicate> } );

The following example performs a simple eld equality test:


db.article.aggregate( { $match : { author : "dave" } } );

This operation only returns documents where the author eld holds the value dave. Consider the following example, which performs a range test:
db.article.aggregate( { $match : { score ); : { $gt : 50, $lte : 90 } } }

Here, all documents return when the score eld holds a value that is greater than 50 and less than or equal to 90. Note: Place the $match (page 402) as early in the aggregation pipeline as possible. Because $match (page 402) limits the total number of documents in the aggregation pipeline, earlier $match (page 402) operations minimize the amount of later processing. If you place a $match (page 402) at the very beginning of a pipeline, the query can take advantage of indexes like any other db.collection.find() (page 457) or db.collection.findOne() (page 458). Warning: You cannot use $where or geospatial operations in $match (page 402) queries as part of the aggregation pipeline. $limit Restricts the number of documents that pass through the $limit (page 402) in the pipeline. $limit (page 402) takes a single numeric (positive whole number) value as a parameter. Once the specied number of documents pass through the pipeline operator, no more will. Consider the following example:
db.article.aggregate( { $limit : 5 } );

This operation returns only the rst 5 documents passed to it from by the pipeline. $limit (page 402) has no effect on the content of the documents it passes. $skip Skips over the specied number of documents that pass through the $skip (page 406) in the pipeline before passing all of the remaining input. $skip (page 406) takes a single numeric (positive whole number) value as a parameter. Once the operation has skipped the specied number of documents, it passes all the remaining documents along the pipeline without alteration. Consider the following example:
db.article.aggregate( { $skip : 5 } );

212

Chapter 16. Aggregation Framework Reference

MongoDB Documentation, Release 2.0.6

This operation skips the rst 5 documents passed to it by the pipeline. $skip (page 406) has no effect on the content of the documents it passes along the pipeline. $unwind Peels off the elements of an array individually, and returns a stream of documents. $unwind (page 408) returns one document for every member of the unwound array within every source document. Take the following aggregation command:
db.article.aggregate( { $project : { author : 1 , title : 1 , tags : 1 }}, { $unwind : "$tags" } );

Note: The dollar sign (i.e. $) must proceed the eld specication handed to the $unwind (page 408) operator. In the above aggregation $project (page 404) selects (inclusively) the author, title, and tags elds, as well as the _id eld implicitly. Then the pipeline passes the results of the projection to the $unwind (page 408) operator, which will unwind the tags eld. This operation may return a sequence of documents that resemble the following for a collection that contains one document holding a tags eld with an array of 3 items.
{ "result" : [ { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "good" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" } ], "OK" : 1 }

A single document becomes 3 documents: each document is identical except for the value of the tags eld. Each value of tags is one of the values in the original tags array. Note: $unwind (page 408) has the following behaviors: $unwind (page 408) is most useful in combination with $group (page 399).

16.1. Pipeline

213

MongoDB Documentation, Release 2.0.6

You may undo the effects of unwind operation with the $group (page 399) pipeline operator. If you specify a target eld for $unwind (page 408) that does not exist in an input document, the pipeline ignores the input document, and will generate no result documents. If you specify a target eld for $unwind (page 408) that is not an array, aggregate() generates an error. If you specify a target eld for $unwind (page 408) that holds an empty array ([]) in an input document, the pipeline ignores the input document, and will generates no result documents. $group Groups documents together for the purpose of calculating aggregate values based on a collection of documents. Practically, group often supports tasks such as average page views for each page in a website on a daily basis. The output of $group (page 399) depends on how you dene groups. Begin by specifying an identier (i.e. a _id eld) for the group youre creating with this pipeline. You can specify a single eld from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming elds. With the exception of the _id eld, $group (page 399) cannot output nested documents. Every group expression must specify an _id eld. You may specify the _id eld as a dotted eld path reference, a document with multiple elds enclosed in braces (i.e. { and }), or a constant value. Note: Use $project (page 404) as needed to rename the grouped eld after an $group (page 399) operation, if necessary. Consider the following example:
db.article.aggregate( { $group : { _id : "$author", docsPerAuthor : { $sum : 1 }, viewsPerAuthor : { $sum : "$pageViews" } }} );

This groups by the author eld and computes two elds, the rst docsPerAuthor is a counter eld that adds one for each document with a given author eld using the $sum (page 408) function. The viewsPerAuthor eld is the sum of all of the pageViews elds in the documents for each group. Each eld dened for the $group (page 399) must use one of the group aggregation function listed below to generate its composite value: $addToSet Returns an array of all the values found in the selected eld among the documents in that group. Every unique value only appears once in the result set. There is no ordering guarantee for the output documents. $first Returns the rst value it encounters for its group . Note: Only use $first (page 400) when the $group (page 399) follows an $sort (page 406) operation. Otherwise, the result of this operation is unpredictable. $last Returns the last value it encounters for its group.

214

Chapter 16. Aggregation Framework Reference

MongoDB Documentation, Release 2.0.6

Note: Only use $last (page 401) when the $group (page 399) follows an $sort (page 406) operation. Otherwise, the result of this operation is unpredictable. $max Returns the highest value among all values of the eld in all documents selected by this group. $min Returns the lowest value among all values of the eld in all documents selected by this group. $avg Returns the average of all the values of the eld in all documents selected by this group. $push Returns an array of all the values found in the selected eld among the documents in that group. A value may appear more than once in the result set if more than one eld in the grouped documents has that value. $sum Returns the sum of all the values for a specied eld in the grouped documents, as in the second use above. Alternately, if you specify a value as an argument, $sum (page 408) will increment this eld by the specied value for every document in the grouping. Typically, as in the rst use above, specify a value of 1 in order to count members of the group. Warning: The aggregation system currently stores $group (page 399) operations in memory, which may cause problems when processing a larger number of groups. $sort The $sort (page 406) pipeline operator sorts all input documents and returns them to the pipeline in sorted order. Consider the following prototype form:
db.<collection-name>.aggregate( { $sort : { <sort-key> } } );

This sorts the documents in the collection named <collection-name>, according to the key and specication in the { <sort-key> } document. Specify the sort in a document with a eld or elds that you want to sort by and a value of 1 or -1 to specify an ascending or descending sort respectively, as in the following example:
db.users.aggregate( { $sort : { age : -1, posts: 1 } } );

This operation sorts the documents in the users collection, in descending order according by the age eld and then in ascending order according to the value in the posts eld. Note: The $sort (page 406) cannot begin sorting documents until previous operators in the pipeline have returned all output. $skip (page 406) $sort (page 406) operator can take advantage of an index when placed at the beginning of the pipleline or placed before the following aggregation operators: $project (page 404) $unwind (page 408)

16.1. Pipeline

215

MongoDB Documentation, Release 2.0.6

$group (page 399). Warning: Unless the $sort (page 406) operator can use an index, in the current release, the sort must t within memory. This may cause problems when sorting large numbers of documents.

16.2 Expressions
These operators calculate values within the aggregation framework.

16.2.1 Boolean Operators


The three boolean operators accept Booleans as arguments and return Booleans as results. Note: These operators convert non-booleans to Boolean values according to the BSON standards. Here, null, undefined, and 0 values become false, while non-zero numeric values, and all other types, such as strings, dates, objects become true. $and Takes an array one or more values and returns true if all of the values in the array are true. Otherwise $and returns false. Note: $and uses short-circuit logic: the operation stops evaluation after encountering the rst false expression. $or Takes an array of one or more values and returns true if any of the values in the array are true. Otherwise $or returns false. Note: $or uses short-circuit logic: the operation stops evaluation after encountering the rst true expression. $not Returns the boolean opposite value passed to it. When passed a true value, $not returns false; when passed a false value, $not returns true.

16.2.2 Comparison Operators


These operators perform comparisons between two values and return a Boolean, in most cases, reecting the result of that comparison. All comparison operators take an array with a pair of values. You may compare numbers, strings, and dates. Except for $cmp (page 398), all comparison operators return a Boolean value. $cmp (page 398) returns an integer. $cmp Takes two values in an array and returns an integer. The returned value is: A negative number if the rst value is less than the second. A positive number if the rst value is greater than the second. 0 if the two values are equal.

216

Chapter 16. Aggregation Framework Reference

MongoDB Documentation, Release 2.0.6

$eq Takes two values in an array and returns a boolean. The returned value is: true when the values are equivalent. false when the values are not equivalent. $gt Takes two values in an array and returns an integer. The returned value is: true when the rst value is greater than the second value. false when the rst value is less than or equal to the second value. $gte Takes two values in an array and returns an integer. The returned value is: true when the rst value is greater than or equal to the second value. false when the rst value is less than the second value. $lt Takes two values in an array and returns an integer. The returned value is: true when the rst value is less than the second value. false when the rst value is greater than or equal to the second value. $lte Takes two values in an array and returns an integer. The returned value is: true when the rst value is less than or equal to the second value. false when the rst value is greater than the second value. $ne Takes two values in an array returns an integer. The returned value is: true when the values are not equivalent. false when the values are equivalent.

16.2.3 Arithmetic Operators


These operators only support numbers. $add Takes an array of one or more numbers and adds them together, returning the sum. $divide Takes an array that contains a pair of numbers and returns the value of the rst number divided by the second number. $mod Takes an array that contains a pair of numbers and returns the remainder of the rst number divided by the second number. See Also: $mod $multiply Takes an array of one or more numbers and multiples them, returning the resulting product.

16.2. Expressions

217

MongoDB Documentation, Release 2.0.6

$subtract Takes an array that contains a pair of numbers and subtracts the second from the rst, returning their difference.

16.2.4 String Operators


These operators manipulate strings within projection expressions. $strcasecmp Takes in two strings. Returns a number. $strcasecmp (page 407) is positive if the rst string is greater than the second and negative if the rst string is less than the second. $strcasecmp (page 407) returns 0 if the strings are identical. Note: $strcasecmp (page 407) may not make sense when applied to glyphs outside the Roman alphabet. $strcasecmp (page 407) internally capitalizes strings before comparing them to provide a case-insensitive comparison. Use $cmp (page 398) for a case sensitive comparison. $substr $substr (page 407) takes a string and two numbers. The rst number represents the number of bytes in the string to skip, and the second number species the number of bytes to return from the string. Note: $substr (page 407) is not encoding aware and if used improperly may produce a result string containing an invalid utf-8 character sequence. $toLower Takes a single string and converts that string to lowercase, returning the result. All uppercase letters become lowercase. Note: $toLower (page 408) may not make sense when applied to glyphs outside the Roman alphabet. $toUpper Takes a single string and converts that string to uppercase, returning the result. All lowercase letters become uppercase. Note: $toUpper (page 408) may not make sense when applied to glyphs outside the Roman alphabet.

16.2.5 Date Operators


All date operators take a Date typed value as a single argument and return a number. $dayOfYear Takes a date and returns the day of the year as a number between 1 and 366. $dayOfMonth Takes a date and returns the day of the month as a number between 1 and 31. $dayOfWeek Takes a date and returns the day of the week as a number between 1 (Sunday) and 7 (Saturday.) $year Takes a date and returns the full year.

218

Chapter 16. Aggregation Framework Reference

MongoDB Documentation, Release 2.0.6

$month Takes a date and returns the month as a number between 1 and 12. $week Takes a date and returns the week of the year as a number between 0 and 53. Weeks begin on Sundays, and week 1 begins with the rst Sunday of the year. Days preceding the rst Sunday of the year are in week 0. This behavior is the same as the %U operator to the strftime standard library function. $hour Takes a date and returns the hour between 0 and 23. $minute Takes a date and returns the minute between 0 and 59. $second Takes a date and returns the second between 0 and 59, but can be 60 to account for leap seconds.

16.2.6 Conditional Expressions


$cond

Example
{ $cond: [ <boolean-expression>, <true-case>, <false-case> ] }

Takes an array with three expressions, where the rst expression evaluates to a Boolean value. If the rst expression evaluates to true, $cond (page 398) returns the value of the second expression. If the rst expression evaluates to false, $cond (page 398) evaluates and returns the third expression. $ifNull

Example
{ $ifNull: [ <expression>, <replacement-if-null> ] }

Takes an array with two expressions. $ifNull (page 401) returns the rst expression if it evaluates to a non-null value. Otherwise, $ifNull (page 401) returns the second expressions value.

16.2. Expressions

219

MongoDB Documentation, Release 2.0.6

220

Chapter 16. Aggregation Framework Reference

Part VIII

Application Development

221

MongoDB Documentation, Release 2.0.6

MongoDB provides language-specic client libraries called drives that let you develop applications to interact with your databases. This page lists the documents, tutorials, and reference pages that describe application development. qor API-level documentation, see Drivers (page 225). For an overview of topics with which every MongoDB application developer will want familiarity, see the aggregation (page 197) and indexes (page 177) documents. For an introduction to basic MongoDB use, see the administration tutorials (page 165). See Also: Developer Zone wiki pages and the FAQ: MongoDB for Application Developers (page 353) document. Developers also should be familiar with the Using the MongoDB Shell (page 239) shell and the MongoDB query and update operators.

223

MongoDB Documentation, Release 2.0.6

224

CHAPTER

SEVENTEEN

APPLICATION DEVELOPMENT
The following documents outline basic application development topics:

17.1 Drivers
Applications communicate with MongoDB by way of a client library or driver that handles all interaction with the database in language appropriate and sensible manner. See the following pages for more information about the MongoDB wiki drivers page: JavaScript (wiki, docs) Python (wiki, docs) Ruby (wiki, docs) PHP (wiki, docs) Perl (wiki, docs) Java (wiki, docs) Scala (wiki, docs) C# (wiki, docs) C (wiki, docs) C++ (wiki, docs) Haskell (wiki, docs) Erlang (wiki, docs)

17.2 Database References


MongoDB does not support joins. In MongoDB some data is denormalized, or stored with related data in documents to remove the need for joins. However, in some cases it makes sense to store related information in separate documents, typically in different collections or databases. MongoDB applications use one of two methods for relating documents: 1. Manual references (page 226) where you save the _id eld of one document in another document as a reference. Then your application can run a second query to return the embedded data. These references are simple and sufcient for most use cases.

225

MongoDB Documentation, Release 2.0.6

2. DBRefs (page 227) are references from one document to another using the value of the rst documents _id eld collection, and optional database name. To resolve DBRefs, your application must perform additional queries to return the referenced documents. Many drivers (page 225) have helper methods that form the query for the DBRef automatically. The drivers 1 do not automatically resolve DBRefs into documents. Use a DBRef when you need to embed documents from multiple collections in documents from one collection. DBRefs also provide a common format and type to represent these relationships among documents. The DBRef format provides common semantics for representing links between documents if your database must interact with multiple frameworks and tools. Unless you have a compelling reason for using a DBref use manual references.

17.2.1 Manual References


Background Manual references refers to the practice of including one documents _id eld in another document. The application can then issue a second query to resolve the referenced elds as needed. Process Consider the following operation to insert two documents, using the _id eld of the rst document as a reference in the second document:
original_id = ObjectId() db.places.insert({ "_id": original_id "name": "Broadway Center" "url": "bc.example.net" }) db.people.insert({ "name": "Erin" "places_id": original_id "url": "bc.exmaple.net/Erin" })

Then, when a query returns the document from the people collection you can, if needed, make a second query for the document referenced by the places_id eld in the places collection. Use For nearly every case where you want to store a relationship between two documents, use manual references (page 226). The references are simple to create and your application can resolve references as needed. The only limitation of manual linking is that these references do not convey the database and collection name. If you have documents in a single collection that relate to documents in more than one collection, you may need to consider using DBRefs (page 227).
1

Some community supported drivers may have alternate behavior and may resolve a DBRef into a document automatically.

226

Chapter 17. Application Development

MongoDB Documentation, Release 2.0.6

17.2.2 DBRefs
Background DBRefs are a convention for representing a document, rather than a specic reference type. They include the name of the collection, and in some cases the database, in addition to the value from the _id eld. Format DBRefs have the following elds: $ref The $ref eld holds the name of the collection where the referenced document resides. $id The $id eld contains the value of the _id eld in the referenced document. $db Optional. Contains the name of the database where the referenced document resides. Only some drivers support $db references. Thus a DBRef document would resemble the following:
{ $ref : <value>, $id : <value>, $db : <value> }

Note: The order of elds in the DBRef matter, and you must use the above sequence when using a DBRef.

Support C++ The C++ driver contains no support for DBRefs. You can transverse references manually. C# The C# driver provides access to DBRef objects with the MongoDBRef Class and supplies the FetchDBRef Method for accessing these objects. Java The DBRef class provides supports for DBRefs from Java. JavaScrpt The mongo shells JavaScript interface provides a DBRef. Perl The Perl driver contains no support for DBRefs. You can transverse references manually or use the MongoDBx::AutoDeref CPAN module. PHP The PHP driver does support DBRefs, including the optional $db reference, through The MongoDBRef class. Python The Python driver provides the DBref class, and the dereference method for interacting with DBRefs. Ruby The Ruby Driver supports DBRefs using the DBRef class and the deference method. Use In most cases you should use the manual reference (page 226) method for connecting two or more related documents. However, if you need to reference documents from multiple collections, consider a DBRef. See Also: Application Development with Replica Sets (page 50)

17.2. Database References

227

MongoDB Documentation, Release 2.0.6

Indexing Strategies (page 190) Aggregation Framework (page 199)

228

Chapter 17. Application Development

CHAPTER

EIGHTEEN

PATTERNS
The following documents provide patterns for developing application features:

18.1 Perform Two Phase Commits


18.1.1 Synopsis
This document provides a pattern for doing multi-document updates or transactions using a two-phase commit approach for writing data to multiple documents. Additionally, you can extend this process to provide a rollback (page 232) like functionality.

18.1.2 Background
Operations on a single document are always atomic with MongoDB databases; however, operations that involve multiple documents, which are often referred to as transactions, are not atomic. Since documents can be fairly complex and contain multiple nested documents, single-document atomicity provides necessary support for many practical use cases. Thus, without precautions, success or failure of the database operation cannot be all or nothing, and without support for multi-document transactions its possible for an operation to succeed for some operations and fail with others. When executing a transaction composed of several sequential operations the following issues arise: Atomicity: if one operation fails, the previous operation within the transaction must rollback to the previous state (i.e. the nothing, in all or nothing.) Isolation: operations that run concurrently with the transaction operation set must see a consistent view of the data throughout the transaction process. Consistency: if a major failure (i.e. network, hardware) interrupts the transaction, the database must be able to recover a consistent state. Despite the power of single-document atomic operations, there are cases that require multi-document transactions. For these situations, you can use a two-phase commit, to provide support for these kinds of multi-document updates. Because documents can represent both pending data and states, you can use a two-phase commit to ensure that data is consistent, and that in the case of an error, the state that preceded the transaction is recoverable (page 232).

229

MongoDB Documentation, Release 2.0.6

18.1.3 Pattern
Overview The most common example of transaction is to transfer funds from account A to B in a reliable way, and this pattern uses this operation as an example. In a relational database system, this operation would encapsulate subtracting funds from the source (A) account and adding them to the destination (B) within a single atomic transaction. For MongoDB, you can use a two-phase commit in these situations to achieve a compatible response. All of the examples in this document use the mongo shell to interact with the database, and assume that you have two collections: First, a collection named accounts that will store data about accounts with one account per document, and a collection named transactions which will store the transactions themselves. Begin by creating two accounts named A and B, with the following command:
db.accounts.save({name: "A", balance: 1000, pendingTransactions: []}) db.accounts.save({name: "B", balance: 1000, pendingTransactions: []})

To verify that these operations succeeded, use find() (page 457):


db.accounts.find()

mongo will return two documents that resemble the following:

{ "_id" : ObjectId("4d7bc66cb8a04f512696151f"), "name" : "A", "balance" : 1000, "pendingTransactions" { "_id" : ObjectId("4d7bc67bb8a04f5126961520"), "name" : "B", "balance" : 1000, "pendingTransactions"

Transaction Description
Set Transaction State to Initial

Create the transaction collection by inserting the following document. The transaction document holds the source and destination, which refer to the name elds of the accounts collection, as well as the value eld that represents the amount of data change to the balance eld. Finally, the state eld reects the current state of the transaction.
db.transactions.save({source: "A", destination: "B", value: 100, state: "initial"})

To verify that these operations succeeded, use find() (page 457):


db.transactions.find()

This will return a document similar to the following:

{ "_id" : ObjectId("4d7bc7a8b8a04f5126961522"), "source" : "A", "destination" : "B", "value" : 100, "

Switch Transaction State to Pending

Before modifying either records in the accounts collection, set the transaction state to pending from initial. Set the local variable t in your shell session, to the transaction document using findOne() (page 458):
t = db.transactions.findOne({state: "initial"})

After assigning this variable t, the shell will return the value of t, you will see the following output:

230

Chapter 18. Patterns

MongoDB Documentation, Release 2.0.6

{ "_id" : ObjectId("4d7bc7a8b8a04f5126961522"), "source" : "A", "destination" : "B", "value" : 100, "state" : "initial" }

Use update() (page 464) to change the value of state to pending:


db.transactions.update({_id: t._id}, {$set: {state: "pending"}}) db.transactions.find()

The find() (page 457) operation will return the contents of the transactions collection, which should resemble the following:

{ "_id" : ObjectId("4d7bc7a8b8a04f5126961522"), "source" : "A", "destination" : "B", "value" : 100, "

Apply Transaction to Both Accounts

Continue by applying the transaction to both accounts. The update() (page 464) query will prevent you from applying the transaction if the transaction is not already pending. Use the following update() (page 464) operation:

db.accounts.update({name: t.source, pendingTransactions: {$ne: t._id}}, {$inc: {balance: -t.value}, $ db.accounts.update({name: t.destination, pendingTransactions: {$ne: t._id}}, {$inc: {balance: t.value db.accounts.find()

The find() (page 457) operation will return the contents of the accounts collection, which should now resemble the following:

{ "_id" : ObjectId("4d7bc97fb8a04f5126961523"), "balance" : 900, "name" : "A", "pendingTransactions" { "_id" : ObjectId("4d7bc984b8a04f5126961524"), "balance" : 1100, "name" : "B", "pendingTransactions"

Set Transaction State to Committed

Use the following update() (page 464) operation to set the the transactions state to committed:
db.transactions.update({_id: t._id}, {$set: {state: "committed"}}) db.transactions.find()

The find() (page 457) operation will return the contents of the transactions collection, which should now resemble the following:

{ "_id" : ObjectId("4d7bc7a8b8a04f5126961522"), "destination" : "B", "source" : "A", "state" : "commi

Remove Pending Transaction

Use the following update() (page 464) operation to set remove the pending transaction from the documents in the accounts collection:

db.accounts.update({name: t.source}, {$pull: {pendingTransactions: ObjectId("4d7bc7a8b8a04f5126961522 db.accounts.update({name: t.destination}, {$pull: {pendingTransactions: ObjectId("4d7bc7a8b8a04f51269 db.accounts.find()

18.1. Perform Two Phase Commits

231

MongoDB Documentation, Release 2.0.6

The find() (page 457) operation will return the contents of the accounts collection, which should now resemble the following:

{ "_id" : ObjectId("4d7bc97fb8a04f5126961523"), "balance" : 900, "name" : "A", "pendingTransactions" { "_id" : ObjectId("4d7bc984b8a04f5126961524"), "balance" : 1100, "name" : "B", "pendingTransactions"

Set Transaction State to Done

Complete the transaction by setting the state of the transaction document to done:
db.transactions.update({_id: t._id}, {$set: {state: "done"}}) db.transactions.find()

The find() (page 457) operation will return the contents of the transactions collection, which should now resemble the following:

{ "_id" : ObjectId("4d7bc7a8b8a04f5126961522"), "destination" : "B", "source" : "A", "state" : "done"

Recovering from Failure Scenarios The most important part of the transaction procedure is not, the prototypical example above, but rather the possibility for recovering the from various failure scenarios when transactions do not complete as intended. This section will provide an overview of possible failures and provide methods to recover from these kinds of events. There are two classes of failures: all failures that occur after the rst step (i.e. setting the transaction set to initial (page 230)) but before the third step (i.e. applying the transaction to both accounts (page 231).) To recover, applications should get a list of transactions in the pending state and resume from the second step (i.e. switching the transaction state to pending (page 230).) all failures that occur after the third step (i.e. applying the transaction to both accounts (page 231)) but before the fth step (i.e. setting the transaction state to done (page 231).) To recover, application should get a list of transactions in the committed state and resume from the fourth step (i.e. remove the pending transaction (page 231).) Thus, the application will always be able to resume the transaction and eventually arrive at a consistent state. Run the following recovery operations every time the application starts to catch any unnished transactions. You may also wish run the recovery operation at regular intervals to ensure that your data remains consistent. The time required to reach a consistent state depends, on how long the application needs to recover each transaction.
Rollback

In some cases you may need to rollback or undo a transaction when the application needs to cancel the transaction, or because it can never recover as in cases where one of the accounts doesnt exist, or stops existing during the transaction. There are two possible rollback operations: 1. After you apply the transaction (page 231) (i.e. the third step,) you have fully committed the transaction and you should not roll back the transaction. Instead, create a new transaction and switch the values in the source and destination elds. 2. After you create the transaction (page 230) (i.e. the rst step,) but before you apply the transaction (page 231) (i.e the third step,) use the following process: 232 Chapter 18. Patterns

MongoDB Documentation, Release 2.0.6

Set Transaction State to Canceling update() (page 464) operation:

Begin by setting the transactions state to canceling using the following

db.transactions.update({_id: t._id}, {$set: {state: "canceling"}})

Undo the Transaction counts:

Use the following sequence of operations to undo the transaction operation from both ac-

db.accounts.update({name: t.source, pendingTransactions: t._id}, {$inc: {balance: t.value}, $pull: {p db.accounts.update({name: t.destination, pendingTransactions: t._id}, {$inc: {balance: -t.value}, $pu db.accounts.find()

The find() (page 457) operation will return the contents of the accounts collection, which should resemble the following:

{ "_id" : ObjectId("4d7bc97fb8a04f5126961523"), "balance" : 1000, "name" : "A", "pendingTransactions" { "_id" : ObjectId("4d7bc984b8a04f5126961524"), "balance" : 1000, "name" : "B", "pendingTransactions"

Set Transaction State to Canceled transactions state to canceled:

Finally, use the following following update() (page 464) operation to set the

Step 3: set the transactions state to canceled:


db.transactions.update({_id: t._id}, {$set: {state: "canceled"}})

Multiple Applications

Transactions exist, in part, so that several applications can create and run operations concurrently without causing data inconsistency or conicts. As a result, it is crucial that only one 1 application can handle a given transaction at any point in time. Consider the following example, with a single transaction (i.e. T1) and two applications (i.e. A1 and A1). If both applications begin processing the transaction which is still in the initial state (i.e. step 1 (page 230)), then: A1 can apply the entire whole transaction before A2 starts. A2 will then apply T1 for the second time, because the transaction does not appear as pending in the accounts documents. To handle multiple applications, create a marker in the transaction document itself to identify the application that is handling the transaction. Use findAndModify() (page 457) method to modify the transaction:
t = db.transactions.findAndModify({query: {state: "initial", application: {$exists: 0}}, update: {$set: {state: "pending", application: "A1"}}, new: true})

When you modify and reassign the local shell variable t, the mongo shell will return the t object, which should resemble the following:
{ "_id" : ObjectId("4d7be8af2c10315c0847fc85"), "application" : "A1", "destination" : "B", "source" : "A", "state" : "pending", "value" : 150 }

18.1. Perform Two Phase Commits

233

MongoDB Documentation, Release 2.0.6

Amend the transaction operations to ensure that only applications that match the identier in the value of the application eld before applying the transaction. If the application A1 fails during transaction execution, you can use the recovery procedures (page 232), but applications should ensure that they owns the transaction before applying the transaction. For example to resume pending jobs, use a query that resembles the following:
db.transactions.find({application: "A1", state: "pending"})

This will (or may) return a document from the transactions document that resembles the following:

{ "_id" : ObjectId("4d7be8af2c10315c0847fc85"), "application" : "A1", "destination" : "B", "source" :

18.1.4 Using Two-Phase Commits in Production Applications


The example transaction above is intentionally simple. For example, it assumes that: it is always possible roll back operations an account. account balances can hold negative values. Production implementations would likely be more complex. Typically accounts need to information about current balance, pending credits, pending debits. Then: when your application switches the transaction state to pending (page 230) (i.e. step 2) it would also make sure that the accounts has sufcient funds for the transaction. During this update operation, the application would also modify the values of the credits and debits as well as adding the transaction as pending. when your application removes the pending transaction (page 231) (i.e. step 4) the application would apply the transaction on balance, modify the credits and debits as well as removing the transaction from the pending eld., all in one update. Because all of the changes in the above two operations occur within a single update() (page 464) operation, these changes are all atomic. Additionally, for most important transactions, ensure that: the database interface (i.e. client library or driver) has a reasonable write concern congured to ensure that operations return a response on the success or failure of a write operation. your mongod instance has journaling enabled to ensure that your data is always in a recoverable state, in the event of an unclean mongod shutdown.

18.2 Expire Data from Collections by Setting TTL


New in version 2.2. This document provides an introductions to MongoDBs time to live or TTL collection feature. Implemented as a special index type, TTL collections make it possible to store data in MongoDB and have the mongod automatically remove data after a specied period of time. This is ideal for some types of information like machine generated event data, logs, and session information that only need to persist in a database for a limited period of time.

18.2.1 Background
Collections expire by way of a special index that keeps track of insertion time in conjunction with a background mongod process that regularly removes expired documents from the collection. You can use this feature to expire data from replica sets and shard clusters.

234

Chapter 18. Patterns

MongoDB Documentation, Release 2.0.6

Use the expireAfterSeconds option to the ensureIndex (page 455) method in conjunction with a TTL value in seconds to create an expiring collection. Additionally, TTL collections have the collection ag usePowerOf2Sizes set to support more efcient storage reuse.

18.2.2 Constraints
Consider the following limitations: the indexed eld must be a date BSON type. If the eld does not have a date type, the data will not expire. you cannot create this index on the _id eld, or a eld that already has an index. the TTL index may not have multiple elds. if the eld holds an array, and there are multiple date-typed data in the index, the document will expire when the lowest (i.e. earliest) matches the expiration threshold. you cannot use a TTL index on a capped collection, because MongoDB cannot remove documents from a capped collection. Note: TTL indexes expire data by removing documents in a background task that runs once a minute. As a result, the TTL index provides no guarantees that expired documents will not exist in the collection. Consider that: Documents may remain in a collection after they expire and before the background process runs. The duration of the removal operations depend on the workload of your mongod instance.

18.2.3 Create Collections with a TTL


To set a TTL on the collection log.events for one hour use the following command at the mongo shell:
db.log.events.ensureIndex( { "status": 1 }, { expireAfterSeconds: 3600 } )

MongoDB will automatically delete documents from this collection after storing it for an hour, if status holds date-type information.

18.2. Expire Data from Collections by Setting TTL

235

MongoDB Documentation, Release 2.0.6

236

Chapter 18. Patterns

Part IX

Using the MongoDB Shell

237

MongoDB Documentation, Release 2.0.6

See Also: The introductory Tutorial in the MongoDB wiki and the Mongo Shell wiki pages for more information on the mongo shell.

239

MongoDB Documentation, Release 2.0.6

240

CHAPTER

NINETEEN

MONGO SHELL
mongo Manual (page 503)

241

MongoDB Documentation, Release 2.0.6

242

Chapter 19. mongo Shell

CHAPTER

TWENTY

MONGODB SHELL INTERFACE


reference/javascript reference/operators reference/commands Aggregation Framework Reference (page 209) reference/meta-query-operators

243

MongoDB Documentation, Release 2.0.6

244

Chapter 20. MongoDB Shell Interface

Part X

Use Cases

245

MongoDB Documentation, Release 2.0.6

The use case documents provide introductions to the patterns, design, and operation used in application development with MongoDB. Each document provides more concrete examples and implementation details to support core MongoDB use cases (page 247). These documents highlight application design, and data modeling strategies (i.e. schema design) for MongoDB with special attention to pragmatic considerations including indexing, performance, sharding, and scaling. Each document is distinct and can stand alone; however, each section builds on a set of common topics. The operational intelligence case studies describe applications that collect machine generated data from logging systems, application output, and other systems. The product data management case studies address aspects of applications required for building product catalogs, and managing inventory in e-commerce systems. The content management case studies introduce basic patterns and techniques for building content management systems using MongoDB. Finally, the introductory application development tutorials with Python and MongoDB (page 317), provides a complete and fully developed application that you can build using MongoDB and popular Python web development tool kits.

247

MongoDB Documentation, Release 2.0.6

248

CHAPTER

TWENTYONE

OPERATIONAL INTELLIGENCE
As an introduction to the use of MongoDB for operational intelligence and real time analytics use, Storing Log Data (page 249) document describes several ways and approaches to modeling and storing machine generated data with MongoDB. Then, Pre-Aggregated Reports (page 259) describes methods and strategies for processing data to generate aggregated reports from raw event-data. Finally Hierarchical Aggregation (page 268) presents a method for using MongoDB to process and store hierarchical reports (i.e. per-minute, per-hour, and per-day) from raw event data.

21.1 Storing Log Data


21.1.1 Overview
This document outlines the basic patterns and principles for using MongoDB as a persistent storage engine for log data from servers and other machine data. Problem Servers generate a large number of events (i.e. logging,) that contain useful information about their operation including errors, warnings, and users behavior. By default, most servers, store these data in plain text log les on their local le systems. While plain-text logs are accessible and human-readable, they are difcult to use, reference, and analyze without holistic systems for aggregating and storing these data. Solution The solution described below assumes that each server generates events also consumes event data and that each server can access the MongoDB instance. Furthermore, this design assumes that the query rate for this logging data is substantially lower than common for logging applications with a high-bandwidth event stream. Note: This case assumes that youre using an standard uncapped collection for this event data, unless otherwise noted. See the section on capped collections (page 259)

249

MongoDB Documentation, Release 2.0.6

Schema Design The schema for storing log data in MongoDB depends on the format of the event data that youre storing. For a simple example, consider standard request logs in the combined format from the Apache HTTP Server. A line from these logs may resemble the following:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.ex

The simplest approach to storing the log data would be putting the exact text of the log record into a document:
{

_id: ObjectId(4f442120eb03305789000000), line: 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http: }

While this solution is does capture all data in a format that MongoDB can use, the data is not particularly useful, or its not terribly efcient: if you need to nd events that the same page, you would need to use a regular expression query, which would require a full scan of the collection. The preferred approach is to extract the relevant information from the log data into individual elds in a MongoDB document. When you extract data from the log into elds, pay attention to the data types you use to render the log data into MongoDB. As you design this schema, be mindful that the data types you use to encode the data can have a signicant impact on the performance and capability of the logging system. Consider the date eld: In the above example, [10/Oct/2000:13:55:36 -0700] is 28 bytes long. If you store this with the UTC timestamp type, you can convey the same information in only 8 bytes. Additionally, using proper types for your data also increases query exibility: if you store date as a timestamp you can make date range queries, whereas its very difcult to compare two strings that represent dates. The same issue holds for numeric elds; storing numbers as strings requires more space and is difcult to query. Consider the following document that captures all data from the above log entry:
{ _id: ObjectId(4f442120eb03305789000000), host: "127.0.0.1", logname: null, user: frank, time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", request: "GET /apache_pb.gif HTTP/1.0", status: 200, response_size: 2326, referrer: "[http://www.example.com/start.html](http://www.example.com/start.html)", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)" }

When extracting data from logs and designing a schema, also consider what information you can omit from your log tracking system. In most cases theres no need to track all data from an event log, and you can omit other elds. To continue the above example, here the most crucial information may be the host, time, path, user agent, and referrer, as in the following example document:
{ _id: ObjectId(4f442120eb03305789000000), host: "127.0.0.1", time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", referer: "[http://www.example.com/start.html](http://www.example.com/start.html)",

250

Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)" }

You may also consider omitting explicit time elds, because the ObjectId embeds creation time:
{ _id: ObjectId(4f442120eb03305789000000), host: "127.0.0.1", path: "/apache_pb.gif", referer: "[http://www.example.com/start.html](http://www.example.com/start.html)", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)" }

System Architecture The primary performance concern for event logging systems are: 1. how many inserts per second can it support, which limits the event throughput, and 2. how will the system manage the growth of event data, particularly concerning a growth in insert activity. In most cases the best way to increase the capacity of the system is to use an architecture with some sort of partitioning or sharding that distributes writes among a cluster of systems.

21.1.2 Operations
Insertion speed is the primary performance concern for an event logging system. At the same time, the system must be able to support exible queries so that you can return data from the system efciently. This section describes procedures for both document insertion and basic analytics queries. The examples that follow use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose. Inserting a Log Record
Write Concern

MongoDB has a congurable write concern. This capability allows you to balance the importance of guaranteeing that all writes are fully recorded in the database with the speed of the insert. For example, if you issue writes to MongoDB and do not require that the database issue any response, the write operations will return very fast (i.e. asynchronously,) but you cannot be certain that all writes succeeded. Conversely, if you require that MongoDB acknowledge every write operation, the database will not return as quickly but you can be certain that every item will be present in the database. The proper write concern is often an application specic decision, and depends on the reporting requirements and uses of your analytics application.
Insert Performance

The following example contains the setup for a Python console session using PyMongo, with an event from the Apache Log:

21.1. Storing Log Data

251

MongoDB Documentation, Release 2.0.6

>>> import bson >>> import pymongo >>> from datetime import datetime >>> conn = pymongo.Connection() >>> db = conn.event_db >>> event = { ... _id: bson.ObjectId(), ... host: "127.0.0.1", ... time: datetime(2000,10,10,20,55,36), ... path: "/apache_pb.gif", ... referer: "[http://www.example.com/start.html](http://www.example.com/start.html)", ... user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)" ...}

The following command will insert the event object into the events collection.
>>> db.events.insert(event, safe=False)

By setting safe=False, you do not require that MongoDB acknowledges receipt of the insert. Although very fast, this is risky because the application cannot detect network and server failures. If you want to ensure that MongoDB acknowledges inserts, you can pass safe=True argument as follows:
>>> db.events.insert(event, safe=True)

MongoDB also supports a more stringent level of write concern, if you have a lower tolerance for data loss: You can ensure that MongoDB not only acknowledge receipt of the message but also commit the write operation to the on-disk journal before returning successfully to the application, use can use the following insert() operation:
>>> db.events.insert(event, j=True)

Note: j=True implies safe=True. Finally, if you have extremely low tolerance for event data loss, you can require that MongoDB replicate the data to multiple secondary replica set members before returning:
>>> db.events.insert(event, w=2)

This will force your application to acknowledge that the data has replicated to 2 members of the replica set. You can combine options as well:
>>> db.events.insert(event, j=True, w=2)

In this case, your application will wait for a successful journal commit and a replication acknowledgment. This is the safest option presented in this section, but it is the slowest. There is always a trade-off between safety and speed. Note: If possible, consider using bulk inserts to insert event data. All write concern options apply to bulk inserts, but you can pass multiple events to the insert() method at once. Batch inserts allow MongoDB to distribute the performance penalty incurred by more stringent write concern across a group of inserts. See Also: Write Concern for Replica Sets (page 50) and getLastError.

252

Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

Finding All Events for a Particular Page The value in maintaining a collection of event data derives from being able to query that data to answer specic questions. You may have a number of simple queries that you may use to analyze these data. As an example, you may want to return all of the events associated with specic value of a eld. Extending the Apache access log example from above, a common case would be to query for all events with a specic value in the path eld: This section contains a pattern for returning data and optimizing this operation.
Query

Use a query that resembles the following to return all documents with the /apache_pb.gif value in the path eld:
>>> q_events = db.events.find({path: /apache_pb.gif})

Note: If you choose to shard the collection that stores this data, the shard key you choose can impact the performance of this query. See the sharding (page 257) section of the sharding document.

Index Support

Adding an index on the path eld would signicantly enhance the performance of this operation.
>>> db.events.ensure_index(path)

Because the values of the path likely have a random distribution, in order to operate efciently, the entire index should be resident in RAM. In this case, the number of distinct paths is typically small in relation to the number of documents, which will limit the space that the index requires. If your system has a limited amount of RAM, or your data set has a wider distribution in values, you may need to re investigate your indexing support. In most cases, however, this index is entirely sufcient. See Also: The db.collection.ensureIndex() (page db.events.ensure_index() method in PyMongo. Finding All the Events for a Particular Date The next example describes the process for returning all the events for a particular date.
Query

455)

JavaScript

method

and

the

To retrieve this data, use the following query:


>>> q_events = db.events.find(time: ... { $gte:datetime(2000,10,10),$lt:datetime(2000,10,11)})

21.1. Storing Log Data

253

MongoDB Documentation, Release 2.0.6

Index Support

In this case, an index on the time eld would optimize performance:


>>> db.events.ensure_index(time)

Because your application is inserting events in order, the parts of the index that capture recent events will always be in active RAM. As a result, if you query primarily on recent data, MongoDB will be able to maintain a large index, quickly fulll queries, and avoid using much system memory. See Also: The db.events.ensureIndex() (page 455) JavaScript method and the db.events.ensure_index() method in PyMongo. Finding All Events for a Particular Host/Date The following example describes a more complex query for returning all events in the collection for a particular host on a particular date. This kinds analysis may be useful for investigating suspicious behavior by a specic user.
Query

Use a query that resembles the following:


>>> q_events = db.events.find({ ... host: 127.0.0.1, ... time: {$gte:datetime(2000,10,10),$lt:datetime(2000,10,11)} ... })

This query selects documents from the events collection where the host eld is 127.0.0.1 (i.e. local host), and the value of the time eld represents a date that is on or after (i.e. $gte) 2000-10-10 but before (i.e. $lt) 2000-10-11.
Index Support

The indexes you use may have signicant implications for the performance of these kinds of queries. For instance, you can create a compound index on the time and host eld, using the following command:
>>> db.events.ensure_index([(time, 1), (host, 1)])

To analyze the performance for the above query using this index, issue the q_events.explain() method in a Python console. This will return something that resembles:
{ ... ucursor: uBtreeCursor time_1_host_1, uindexBounds: {uhost: [[u127.0.0.1, u127.0.0.1]], utime: [ [ datetime.datetime(2000, 10, 10, 0, 0), datetime.datetime(2000, 10, 11, 0, 0)]] }, ... umillis: 4, un: 11, unscanned: 1296, unscannedObjects: 11, ... }

254

Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

This query had to scan 1296 items from the index to return 11 objects in 4 milliseconds. Conversely, you can test a different compound index with the host eld rst, followed by the time eld. Create this index using the following operation:
>>> db.events.ensure_index([(host, 1), (time, 1)])

Use the q_events.explain() operation to test the performance:


{ ... ucursor: uBtreeCursor host_1_time_1, uindexBounds: {uhost: [[u127.0.0.1, u127.0.0.1]], utime: [[datetime.datetime(2000, 10, 10, 0, 0), datetime.datetime(2000, 10, 11, 0, 0)]]}, ... umillis: 0, un: 11, ... unscanned: 11, unscannedObjects: 11, ... }

Here, the query had to scan 11 items from the index before returning 11 objects in less than a millisecond. By placing the more selective element of your query rst in a compound index you may be able to build more useful queries. Note: Although the index order has an impact query performance, remember that index scans are much faster than collection scans, and depending on your other queries, it may make more sense to use the { time: 1, host: 1 } index depending on usage prole. See Also: The db.events.ensureIndex() (page 455) JavaScript method and the db.events.ensure_index() method in PyMongo. Counting Requests by Day and Page The following example describes the process for using the collection of Apache access events to determine the number of request per resource (i.e. page) per day in the last month.
Aggregation

New in version 2.1. The aggregation framework provides the capacity for queries that select, process, and aggregate results from large numbers of documents. The aggregate() (and aggregate command) offers greater exibility, capacity with less complexity than the existing mapreduce and group aggregation. Consider the following aggregation pipeline:
1

>>> result = db.command(aggregate, events, pipeline=[ ... { $match: { ... time: { ... $gte: datetime(2000,10,1), ... $lt: datetime(2000,11,1) } } }, ... { $project: {
1 To translate statements from the aggregation framework (page 199) to SQL, you can consider the $match (page 402) equivalent to WHERE, $project (page 404) to SELECT, and $group (page 399) to GROUP BY.

21.1. Storing Log Data

255

MongoDB Documentation, Release 2.0.6

... ... ... ... ... ... ... ... ... ... ... ... ...

path: 1, date: { y: { $year: $time }, m: { $month: $time }, d: { $dayOfMonth: $time } } } }, { $group: { _id: { p:$path, y: $date.y, m: $date.m, d: $date.d }, hits: { $sum: 1 } } }, ])

This command aggregates documents from the events collection with a pipeline that: 1. Uses the $match (page 402) to limit the documents that the aggregation framework must process. $match (page 402) is similar to a find() (page 457) query. This operation selects all documents where the value of the time eld represents a date that is on or after (i.e. $gte) 2000-10-10 but before (i.e. $lt) 2000-10-11. 2. Uses the $project (page 404) to limit the data that continues through the pipeline. This operator: Selects the path eld. Creates a y eld to hold the year, computed from the time eld in the original documents. Creates a m eld to hold the month, computed from the time eld in the original documents Creates a d eld to hold the day, computed from the time eld in the original documents. 3. Uses the $group (page 399) to create new computed documents. This step will create a single new document for each unique path/date combination. The documents take the following form: the _id eld holds a sub-document with the contents path eld from the original documents in the p eld, with the date elds from the $project (page 404) as the remaining elds. the hits eld use the $sum (page 408) statement to increment a counter for every document in the group. In the aggregation output, this eld holds the total number of documents at the beginning of the aggregation pipeline with this unique date and path. Note: In sharded environments, the performance of aggregation operations depends on the shard key. Ideally, all the items in a particular $group (page 399) operation will reside on the same server. While this distribution of documents would occur if you chose the time eld as the shard key, a eld like path also has this property and is a typical choice for sharding. Also see the sharding considerations (page 257). of this document for additional recommendations for using sharding. See Also: Aggregation Framework (page 199)
Index Support

To optimize the aggregation operation, ensure that the initial $match (page 402) query has an index. Use the following command to create an index on the time eld in the events collection:

256

Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

>>> db.events.ensure_index(time)

Note: If you have already created a compound index on the time and host (i.e. { time: 1, host, 1 },) MongoDB will use this index for range queries on just the time eld. Do not create an additional index, in these situations.

21.1.3 Sharding
Eventually your systems events will exceed the capacity of a single event logging database instance. In these situations you will want to use a shard cluster, which takes advantage of MongoDBs sharding functionality. This section introduces the unique sharding concerns for this event logging case. See Also: FAQ: Sharding with MongoDB (page 359) and the Sharding wiki page. Limitations In a sharded environment the limitations on the maximum insertion rate are: the number of shards in the cluster. the shard key you chose. Because MongoDB distributed data in using ranges (i.e. chunks) of keys, the choice of shard key can control how MongoDB distributes data and the resulting systems capacity for writes and queries. Ideally, your shard key should allow insertions balance evenly among the shards 2 and for most queries to only need to access a single shard. 3 Continue reading for an analysis of a collection of shard key choices. Shard by Time While using the timestamp, or the ObjectId in the _id eld, these keys lead to two problems:
4

would distribute your data evenly among shards,

1. All inserts always ow to the same shard, which means that your shard cluster will have the same write throughput as a standalone instance. 2. Most reads will tend to cluster on the same shard, as analytics queries. Shard by a Semi-Random Key To distribute data more evenly among the shards, you may consider using a more random piece of data, such as a hash of the _id eld (i.e. the ObjectId as a shard key. While this introduces some additional complexity into your application, to generate the key, it will distribute writes among the shards. In these deployments having 5 shards will provide 5 times the write capacity as a single instance. Using this shard key, or any hashed value as a key presents the following downsides: the shard key, and the index on the key will consume additional space in the database.
2 For this reason, avoid shard keys based on the timestamp or the insertion time (i.e. the ObjectId) because all writes will end up on a single node. 3 For this reason, avoid randomized shard keys (e.g. hash based shard keys) because any query will have to access all shards in the cluster. 4 The ObjectId derives from the creation time, and is effectively a timestamp in this case.

21.1. Storing Log Data

257

MongoDB Documentation, Release 2.0.6

queries, unless they include the shard key itself, 5 must run in parallel on all shards, which may lead to degraded performance. This might be an acceptable trade-off in some situations. The workload of event logging systems tends to be heavily skewed toward writing, read performance may not be as critical as more robust write performance. Shard by an Evenly-Distributed Key in the Data Set If a eld in your documents has values that are evenly distributed among the documents, you may consider using this key as a shard key. Continuing the example from above, you may consider using the path eld. Which may have a couple of advantages: 1. writes will tend to balance evenly among shards. 2. reads will tend to be selective and local to a single shard if the query selects on the path eld. There are a few potential problems with these kinds of shard keys: 1. If a large number of documents will have the same shard key, you run the risk of having a portion of your data collection MongoDB cannot distribute throughout the cluster. 2. If there are a small number of possible values, there may be a limit to how much MongoDB will be able to distribute the data among the shard. Note: Test using your existing data to ensure that the distribution is truly even, and that there is a sufcient quantity of distinct values for the shard key.

Shard by Combine a Natural and Synthetic Key MongoDB supports compound shard keys that combine the best aspects of sharding by a evenly distributed key in the set (page 258) and sharding by a random key (page 257). In these situations, the shard key would resemble { path: 1 , ssk: 1 } where, path is an often used natural key, or value from your data and ssk is a hash of the _id eld. 6 Using this type of shard key, data is largely distributed by the natural key, or path, which makes most queries that access the path eld local to a single shard or group of shards. At the same time, if there is not sufcient distribution for specic values of path, the ssk makes it possible for MongoDB to create chunks and data across the cluster. In most situations, these kinds of keys provide the ideal balance between distributing writes across the cluster and ensuring that most queries will only need to access a select number of shards. Test with Your Own Data Selecting shard keys is difcult because: there are no denitive best-practices, the decision has a large impact on performance, and it is difcult or impossible to change the shard key after making the selection. The sharding options (page 257) provides a good starting point for thinking about shard key selection. Nevertheless, the best way to select a shard key is to analyze the actual insertions and queries from your own application.
5 6

Typically, it is difcult to use these kinds of shard keys in queries. You must still calculate the value of this synthetic key in your application when you insert documents into your collection.

258

Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

21.1.4 Managing Event Data Growth


Without some strategy for managing the size of your database, most event logging systems can grow innitely. This is particularly important in the context of MongoDB may not relinquish data to the le system in the way you might expect. Consider the following strategies for managing data growth: Capped Collections Depending on your data retention requirements as well as your reporting and analytics needs, you may consider using a capped collection to store your events. Capped collections have a xed size, and drop old data when inserting new data after reaching cap. Note: In the current version, it is not possible to shard capped collections.

Multiple Collections, Single Database Strategy: Periodically rename your event collection so that your data collection rotates in much the same way that you might rotate log les. When needed, you can drop the oldest collection from the database. This approach has several advantages over the single collection approach: 1. Collection renames are fast and atomic. 2. MongoDB does not bring any document into memory to drop a collection. 3. MongoDB can effectively reuse space freed by removing entire collections without leading to data fragmentation. Nevertheless, this operation may increase some complexity for queries, if any of your analyses depend on events that may reside in the current and previous collection. For most real time data collection systems, this approach is the most ideal. Multiple Databases Strategy: Rotate databases rather than collections, as in the Multiple Collections, Single Database (page 259) example. While this signicantly increases application complexity for insertions and queries, when you drop old databases, MongoDB will return disk space to the le system. This approach makes the most sense in scenarios where your event insertion rates and/or your data retention rates were extremely variable. For example, if you are performing a large backll of event data and want to make sure that the entire set of event data for 90 days is available during the backll, during normal operations you only need 30 days of event data, you might consider using multiple databases.

21.2 Pre-Aggregated Reports


21.2.1 Overview
This document outlines the basic patterns and principles for using MongoDB as an engine for collecting and processing events in real time for use in generating up to the minute or second reports.

21.2. Pre-Aggregated Reports

259

MongoDB Documentation, Release 2.0.6

Problem Servers and other systems can generate a large number of documents, and it can be difcult to access and analyze such large collections of data originating from multiple servers. This document makes the following assumptions about real-time analytics: There is no need to retain transactional event data in MongoDB, and how your application handles transactions is outside of the scope of this document. You require up-to-the minute data, or up-to-the-second if possible. The queries for ranges of data (by time) must be as fast as possible. See Also: Storing Log Data (page 249). Solution The solution described below assumes a simple scenario using data from web server access logs. With this data, you will want to return the number of hits to a collection of web sites at various levels of granularity based on time (i.e. by minute, hour, day, week, and month) as well as by the path of a resource. To achieve the required performance to support these tasks, upserts and increment operations will allow you to calculate statistics, produce simple range-based queries, and generate lters to support time-series charts of aggregated data.

21.2.2 Schema
Schemas for real-time analytics systems must support simple and fast query and update operations. In particular, attempt to avoid the following situations which can degrade performance: documents growing signicantly after creation. Document growth forces MongoDB to move the document on disk, which can be time and resource consuming relative to other operations; queries requiring MongoDB to scan documents in the collection without using indexes; and deeply nested documents that make accessing particular elds slow. Intuitively, you may consider keeping hit counts in individual documents with one document for every unit of time (i.e. minute, hour, day, etc.) However, queries must return multiple documents for all non-trivial time-rage queries, which can slow overall query performance. Preferably, to maximize query performance, use more complex documents, and keep several aggregate values in each document. The remainder of this section outlines several schema designs that you may consider for this real-time analytics system. While there is no single pattern for every problem, each pattern is more well suited to specic classes of problems. One Document Per Page Per Day Consider the following example schema for a solution that stores all statistics for a single day and page in a single document:

260

Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

{ _id: "20101010/site-1/apache_pb.gif", metadata: { date: ISODate("2000-10-10T00:00:00Z"), site: "site-1", page: "/apache_pb.gif" }, daily: 5468426, hourly: { "0": 227850, "1": 210231, ... "23": 20457 }, minute: { "0": 3612, "1": 3241, ... "1439": 2819 } }

This approach has a couple of advantages: For every request on the website, you only need to update one document. Reports for time periods within the day, for a single page require fetching a single document. There are, however, signicant issues with this approach. The most signicant issue is that, as you upsert data into the hourly and monthly elds, the document grows. Although MongoDB will pad the space allocated to documents, it must still will need to reallocate these documents multiple times throughout the day, which impacts performance. Pre-allocate Documents
Simple Pre-Allocation

To mitigate the impact of repeated document migrations throughout the day, you can tweak the one document per page per day (page 260) approach by adding a process that pre-allocates documents with elds that hold 0 values throughout the previous day. Thus, at midnight, new documents will exist. Note: To avoid situations where your application must pre-allocate large numbers of documents at midnight, its best to create documents throughout the previous day by upserting randomly when you update a value in the current days data. This requires some tuning, to balance two requirements: 1. your application should have pre-allocated all or nearly all of documents by the end of the day. 2. your application should infrequently pre-allocate a document that already exists to save time and resources on extraneous upserts. As a starting point, consider the average number of hits a day (h), and then upsert a blank document upon update with a probability of 1/h. Pre-allocating increases performance by initializing all documents with 0 values in all elds. After create, documents will never grow. This means that: 1. there will be no need to migrate documents within the data store, which is a problem in the one document per page per day (page 260) approach.

21.2. Pre-Aggregated Reports

261

MongoDB Documentation, Release 2.0.6

2. MongoDB will not add padding to the records, which leads to a more compact data representation and better memory use of your memory.
Add Intra-Document Hierarchy

Note: MongoDB stores BSON documents as a sequence of elds and values, not as a hash table. As a result, writing to the eld stats.mn.0 is considerably faster than writing to stats.mn.1439.

Figure 21.1: In order to update the value in minute #1349, MongoDB must skip over all 1349 entries before it. To optimize update and insert operations you can introduce intra-document hierarchy. In particular, you can split the minute eld up into 24 hourly elds:
{ _id: "20101010/site-1/apache_pb.gif", metadata: { date: ISODate("2000-10-10T00:00:00Z"), site: "site-1", page: "/apache_pb.gif" }, daily: 5468426, hourly: { "0": 227850, "1": 210231, ... "23": 20457 }, minute: { "0": { "0": 3612, "1": 3241, ... "59": 2130 }, "1": { "60": ... , }, ... "23": { ... "1439": 2819 } } }

This allows MongoDB to skip forward throughout the day when updating the minute data, which makes the update performance more uniform and faster later in the day. Separate Documents by Granularity Level Pre-allocating documents (page 261) is a reasonable design for storing intra-day data, but the model breaks down when displaying data over longer multi-day periods like months or quarters. In these cases, consider storing daily statistics in a single document as above, and then aggregate monthly data into a separate document.

262

Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

Figure 21.2: To update the value in minute #1349, MongoDB rst skips the rst 23 hours and then skips 59 minutes for only 82 skips as opposed to 1439 skips in the previous schema. This introduce a second set of upsert operations to the data collection and aggregation portion of your application but the gains reduction in disk seeks on the queries, should be worth the costs. Consider the following example schema: 1. Daily Statistics
{ _id: "20101010/site-1/apache_pb.gif", metadata: { date: ISODate("2000-10-10T00:00:00Z"), site: "site-1", page: "/apache_pb.gif" }, hourly: { "0": 227850, "1": 210231, ... "23": 20457 }, minute: { "0": { "0": 3612, "1": 3241, ... "59": 2130 }, "1": { "0": ..., }, ... "23": { "59": 2819 } } }

2. Monthly Statistics
{ _id: "201010/site-1/apache_pb.gif", metadata: { date: ISODate("2000-10-00T00:00:00Z"), site: "site-1", page: "/apache_pb.gif" }, daily: { "1": 5445326, "2": 5214121, ... } }

21.2. Pre-Aggregated Reports

263

MongoDB Documentation, Release 2.0.6

21.2.3 Operations
This section outlines a number of common operations for building and interacting with real-time-analytics reporting system. The major challenge is in balancing performance and write (i.e. upsert) performance. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose. Log an Event Logging an event such as a page request (i.e. hit) is the main write activity for your system. To maximize performance, youll be doing in-place updates with the upsert operation. Consider the following example:
from datetime import datetime, time def log_hit(db, dt_utc, site, page): # Update daily stats doc id_daily = dt_utc.strftime(%Y%m%d /) + site + page hour = dt_utc.hour minute = dt_utc.minute # Get a datetime that only includes date info d = datetime.combine(dt_utc.date(), time.min) query = { _id: id_daily, metadata: { date: d, site: site, page: page } } update = { $inc: { hourly.%d % (hour,): 1, minute.%d .%d % (hour,minute): 1 } } db.stats.daily.update(query, update, upsert=True) # Update monthly stats document id_monthly = dt_utc.strftime(%Y%m/) + site + page day_of_month = dt_utc.day query = { _id: id_monthly, metadata: { date: d.replace(day=1), site: site, page: page } } update = { $inc: { daily.%d % day_of_month: 1} } db.stats.monthly.update(query, update, upsert=True)

The upsert operation (i.e. upsert=True) performs an update if the document exists, and an insert if the document does not exist. Note: This application requires upserts, because the pre-allocation (page 265) method only pre-allocates new documents with a high probability, not with complete certainty. Without preallocation, you end up with a dynamically growing document, slowing upserts as MongoDB moves documents to accommodate growth.

264

Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

Pre-allocate To prevent document growth, you can preallocate new documents before the system needs them. As you create new documents, set all values to 0 for so that documents will not grow to accommodate updates. Consider the following preallocate() function:
def preallocate(db, dt_utc, site, page): # Get id values id_daily = dt_utc.strftime(%Y%m%d /) + site + page id_monthly = dt_utc.strftime(%Y%m/) + site + page # Get daily metadata daily_metadata = { date: datetime.combine(dt_utc.date(), time.min), site: site, page: page } # Get monthly metadata monthly_metadata = { date: daily_m[d].replace(day=1), site: site, page: page } # Initial zeros for statistics hourly = dict((str(i), 0) for i in range(24)) minute = dict( (str(i), dict((str(j), 0) for j in range(60))) for i in range(24)) daily = dict((str(i), 0) for i in range(1, 32)) # Perform upserts, setting metadata db.stats.daily.update( { _id: id_daily, hourly: hourly, minute: minute}, { $set: { metadata: daily_metadata }}, upsert=True) db.stats.monthly.update( { _id: id_monthly, daily: daily }, { $set: { m: monthly_metadata }}, upsert=True)

The function pre-allocated both the monthly and daily documents at the same time. The performance benets from separating these operations are negligible, so its reasonable to keep both operations in the same function. Ideally, your application should pre-allocate documents before needing to write data to maintain consistent update performance. Additionally, its important to avoid causing a spike in activity and latency by creating documents all at once. In the following example, document updates (i.e. log_hit()) will also pre-allocate a document probabilistically. However, by tuning probability, you can limit redundant preallocate() calls.
from random import random from datetime import datetime, timedelta, time # Example probability based on 500k hits per day per page

21.2. Pre-Aggregated Reports

265

MongoDB Documentation, Release 2.0.6

prob_preallocate = 1.0 / 500000 def log_hit(db, dt_utc, site, page): if random.random() < prob_preallocate: preallocate(db, dt_utc + timedelta(days=1), site_page) # Update daily stats doc ...

Using this method, there will be a high probability that each document will already exist before your application needs to issue update operations. Youll also be able to prevent a regular spike in activity for pre-allocation, and be able to eliminate document growth. Retrieving Data for a Real-Time Chart This example describes fetching the data from the above MongoDB system, for use in generating a chart that displays the number of hits to a particular resource over the last hour.
Querying

Use the following query in a find_one operation at the Python/PyMongo console to retrieve the number of hits to a specic resource (i.e. /index.html) with minute-level granularity:
>>>db.stats.daily.find_one( ... {metadata: {date:dt, site:site-1, page:/index.html}}, ... { minute: 1 })

Use the following query to retrieve the number of hits to a resource over the last day, with hour-level granularity:
>>> db.stats.daily.find_one( ... {metadata: {date:dt, site:site-1, page:/foo.gif}}, ... { hy: 1 })

If you want a few days of hourly data, you can use a query in the following form:
>>> db.stats.daily.find( ... { ... metadata.date: { $gte: dt1, $lte: dt2 }, ... metadata.site: site-1, ... metadata.page: /index.html}, ... { metadata.date: 1, hourly: 1 } }, ... sort=[(metadata.date, 1)])

Indexing

To support these query operation, create a compound index on the following daily statistics elds: metadata.site, metadata.page, and metadata.date (in that order.) Use the following operation at the Python/PyMongo console.
>>> db.stats.daily.ensure_index([ ... (metadata.site, 1), ... (metadata.page, 1), ... (metadata.date, 1)])

This index makes it possible to efciently run the query for multiple days of hourly data. At the same time, any compound index on page and date, will allow you to query efciently for a single days statistics. 266 Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

Get Data for a Historical Chart


Querying

To retrieve daily data for a single month, use the following query:
>>> db.stats.monthly.find_one( ... {metadata: ... {date:dt, ... site: site-1, ... page:/index.html}}, ... { daily: 1 })

To retrieve several months of daily data, use a variation on the above query:
>>> db.stats.monthly.find( ... { ... metadata.date: { $gte: dt1, $lte: dt2 }, ... metadata.site: site-1, ... metadata.page: /index.html}, ... { metadata.date: 1, daily: 1 } }, ... sort=[(metadata.date, 1)])

Indexing

Create the following index to support these queries for monthly data on the metadata.site, metadata.page, and metadata.date elds:
>>> db.stats.monthly.ensure_index([ ... (metadata.site, 1), ... (metadata.page, 1), ... (metadata.date, 1)])

This eld order will efciently support range queries for a single page over several months.

21.2.4 Sharding
The only potential limits on the performance of this system are the number of shards in your system, and the shard key that you use. An ideal shard key will distribute upserts between the shards while routing all queries to a single shard, or a small number of shards. While your choice of shard key may depend on the precise workload of your deployment, consider using { metadata.site: 1, metadata.page: 1 } as a shard key. The combination of site and page (or event) will lead to a well balanced cluster for most deployments. Enable sharding for the daily statistics collection with the following shardcollection command in the Python/PyMongo console:
>>> db.command(shardcollection, stats.daily, { ... key : { metadata.site: 1, metadata.page : 1 } })

Upon success, you will see the following response:


{ "collectionsharded" : "stats.daily", "ok" : 1 }

21.2. Pre-Aggregated Reports

267

MongoDB Documentation, Release 2.0.6

Enable sharding for the monthly statistics collection with the following shardcollection command in the Python/PyMongo console:
>>> db.command(shardcollection, stats.monthly, { ... key : { metadata.site: 1, metadata.page : 1 } })

Upon success, you will see the following response:


{ "collectionsharded" : "stats.monthly", "ok" : 1 }

One downside of the { metadata.site: 1, metadata.page: 1 } shard key is: if one page dominates all your trafc, all updates to that page will go to a single shard. This is basically unavoidable, since all update for a single page are going to a single document. You may wish to include the date in addition to the site, and page elds so that MongoDB can split histories so that you can serve different historical ranges with different shards. Use the following shardcollection command to shard the daily statistics collection in the Python/PyMongo console:
>>> db.command(shardcollection, stats.daily, { ... key:{metadata.site:1,metadata.page:1,metadata.date:1}}) { "collectionsharded" : "stats.daily", "ok" : 1 }

Enable sharding for the monthly statistics collection with the following shardcollection command in the Python/PyMongo console:
>>> db.command(shardcollection, stats.monthly, { ... key:{metadata.site:1,metadata.page:1,metadata.date:1}}) { "collectionsharded" : "stats.monthly", "ok" : 1 }

Note: Determine your actual requirements and load before deciding to shard. In many situations a single MongoDB instance may be able to keep track of all events and pages.

21.3 Hierarchical Aggregation


21.3.1 Overview
Background If you collect a large amount of data, but do not pre-aggregate (page 259), and you want to have access to aggregated information and reports, then you need a method to aggregate these data into a usable form. This document provides an overview of these aggregation patterns and processes. For clarity, this case study assumes that the incoming event data resides in a collection named events. For details on how you might get the event data into the events collection, please see Storing Log Data (page 249) document. This document continues using this example. Solution The rst step in the aggregation process is to aggregate event data into the nest required granularity. Then use this aggregation to generate the next least specic level granularity and this repeat process until you have generated all required views.

268

Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

The solution uses several collections: the raw data (i.e. events) collection as well as collections for aggregated hourly, daily, weekly, monthly, and yearly statistics. All aggregations use the mapreduce command, in a hierarchical process. The following gure illustrates the input and output of each job:

Figure 21.3: Hierarchy of data aggregation.

Note: Aggregating raw events into an hourly collection is qualitatively different from the operation that aggregates hourly statistics into the daily collection. See Also: Map-reduce and the MapReduce wiki page for more information on the Map-reduce data aggregation paradigm.

21.3.2 Schema
When designing the schema for event storage, its important to track the events included in the aggregation and events that are not yet included. Relational Approach A simple tactic from relational database, uses an auto-incremented integer as the primary key. However, this introduces a signicant performance penalty for event logging process because the aggregation process must fetch new keys one at a time. If you can batch your inserts into the events collection, you can use an auto-increment primary key by using the find_and_modify command to generate the _id values, as in the following example:
>>> obj = db.my_sequence.find_and_modify( ... query={_id:0}, ... update={$inc: {inc: 50}} ... upsert=True, ... new=True) >>> batch_of_ids = range(obj[inc]-50, obj[inc])

21.3. Hierarchical Aggregation

269

MongoDB Documentation, Release 2.0.6

However, in most cases you can simply include a timestamp with each event that you can use to distinguish processed events from unprocessed events. This example assumes that you are calculating average session length for logged-in users on a website. The events will have the following form:
{ "userid": "rick", "ts": ISODate(2010-10-10T14:17:22Z), "length":95 }

The operations described in the next session will calculate total and average session times for each user at the hour, day, week, month and year. For each aggregation you will want to store the number of sessions so that MongoDB can incrementally recompute the average session times. The aggregate document will resemble the following:
{ _id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z") }, value: { ts: ISODate(2010-10-10T15:01:00Z), total: 254, count: 10, mean: 25.4 } }

Note: The timestamp value in the _id sub-document, which will allow you to incrementally update documents at various levels of the hierarchy.

21.3.3 Operations
This section assumes that all events exist in the events collection and have a timestamp. The operations, thus are to aggregate from the events collection into the smallest aggregatehourly totals and then aggregate from the hourly totals into coarser granularity levels. In all cases, these operations will store aggregation time as a last_run variable. Creating Hourly Views from Event Collections
Aggregation

Note: Although this solution uses Python and PyMongo to connect with MongoDB, you must pass JavaScript functions (i.e. mapf, reducef, and finalizef) to the mapreduce command. Begin by creating a map function, as below:
mapf_hour = bson.Code(function() { var key = { u: this.userid, d: new Date( this.ts.getFullYear(), this.ts.getMonth(), this.ts.getDate(), this.ts.getHours(), 0, 0, 0); emit(

270

Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

key, { total: this.length, count: 1, mean: 0, ts: new Date(); }); })

In this case, it emits key-value pairs that contain the data you want to aggregate as youd expect. The function also emits a ts value that makes it possible to cascade aggregations to coarser grained aggregations (i.e. hour to day, etc.) Consider the following reduce function:
reducef = bson.Code(function(key, values) { var r = { total: 0, count: 0, mean: 0, ts: null }; values.forEach(function(v) { r.total += v.total; r.count += v.count; }); return r; })

The reduce function returns a document in the same format as the output of the map function. This pattern for map and reduce functions makes map-reduce processes easier to test and debug. While the reduce function ignores the mean and ts (timestamp) values, the nalize step, as follows, computes these data:
finalizef = bson.Code(function(key, value) { if(value.count > 0) { value.mean = value.total / value.count; } value.ts = new Date(); return value; })

With the above function the map_reduce operation itself will resemble the following:
cutoff = datetime.utcnow() - timedelta(seconds=60) query = { ts: { $gt: last_run, $lt: cutoff } } db.events.map_reduce( map=mapf_hour, reduce=reducef, finalize=finalizef, query=query, out={ reduce: stats.hourly }) last_run = cutoff

The cuttoff variable allows you to process all events that have occurred since the last run but before 1 minute ago. This allows for some delay in logging events. You can safely run this aggregation as often as you like, provided that you update the last_run variable each time.
Indexing

Create an index on the timestamp (i.e. the ts eld) to support the query selection of the map_reduce operation. Use the following operation at the Python/PyMongo console:

21.3. Hierarchical Aggregation

271

MongoDB Documentation, Release 2.0.6

>>> db.events.ensure_index(ts)

Deriving Day-Level Data


Aggregation

To calculate daily statistics, use the hourly statistics as input. Begin with the following map function:
mapf_day = bson.Code(function() { var key = { u: this._id.u, d: new Date( this._id.d.getFullYear(), this._id.d.getMonth(), this._id.d.getDate(), 0, 0, 0, 0) }; emit( key, { total: this.value.total, count: this.value.count, mean: 0, ts: null }); })

The map function for deriving day-level data differs from the initial aggregation above in the following ways: the aggregation key is the (userid, date) rather than (userid, hour) to support daily aggregation. the keys and values emitted (i.e. emit()) are actually the total and count values from the hourly aggregates rather than properties from event documents. This is the case for all the higher-level aggregation operations. Because the output of this map function is the same as the previous map function, you can use the same reduce and nalize functions. The actual code driving this level of aggregation is as follows:
cutoff = datetime.utcnow() - timedelta(seconds=60) query = { value.ts: { $gt: last_run, $lt: cutoff } } db.stats.hourly.map_reduce( map=mapf_day, reduce=reducef, finalize=finalizef, query=query, out={ reduce: stats.daily }) last_run = cutoff

There are a couple of things to note here. First of all, the query is not on ts now, but value.ts, the timestamp written during the nalization of the hourly aggregates. Also note that you are, in fact, aggregating from the stats.hourly collection into the stats.daily collection.

272

Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

Indexing

Because you will run the query option regularly which nds on the value.ts eld, you may wish to create an index to support this. Use the following operation in the Python/PyMongo shell to create this index:
>>> db.stats.hourly.ensure_index(value.ts)

Weekly and Monthly Aggregation


Aggregation

You can use the aggregated day-level data to generate weekly and monthly statistics. A map function for generating weekly data follows:
mapf_week = bson.Code(function() { var key = { u: this._id.u, d: new Date( this._id.d.valueOf() - dt.getDay()*24*60*60*1000) }; emit( key, { total: this.value.total, count: this.value.count, mean: 0, ts: null }); })

Here, to get the group key, the function takes the current and subtracts days until you get the beginning of the week. In the weekly map function, youll use the rst day of the month as the group key, as follows:
mapf_month = bson.Code(function() { d: new Date( this._id.d.getFullYear(), this._id.d.getMonth(), 1, 0, 0, 0, 0) }; emit( key, { total: this.value.total, count: this.value.count, mean: 0, ts: null }); })

These map functions are identical to each other except for the date calculation.
Indexing

Create additional indexes to support the weekly and monthly aggregation options on the value.ts eld. Use the following operation in the Python/PyMongo shell.
>>> db.stats.daily.ensure_index(value.ts) >>> db.stats.monthly.ensure_index(value.ts)

21.3. Hierarchical Aggregation

273

MongoDB Documentation, Release 2.0.6

Refactor Map Functions

Use Pythons string interpolation to refactor the map function denitions as follows:
mapf_hierarchical = function() { var key = { u: this._id.u, d: %s }; emit( key, { total: this.value.total, count: this.value.count, mean: 0, ts: null }); } mapf_day = bson.Code( mapf_hierarchical % new Date( this._id.d.getFullYear(), this._id.d.getMonth(), this._id.d.getDate(), 0, 0, 0, 0)) mapf_week = bson.Code( mapf_hierarchical % new Date( this._id.d.valueOf() - dt.getDay()*24*60*60*1000)) mapf_month = bson.Code( mapf_hierarchical % new Date( this._id.d.getFullYear(), this._id.d.getMonth(), 1, 0, 0, 0, 0)) mapf_year = bson.Code( mapf_hierarchical % new Date( this._id.d.getFullYear(), 1, 1, 0, 0, 0, 0))

You can create a h_aggregate function to wrap the map_reduce operation, as below, to reduce code duplication:
def h_aggregate(icollection, ocollection, mapf, cutoff, last_run): query = { value.ts: { $gt: last_run, $lt: cutoff } } icollection.map_reduce( map=mapf, reduce=reducef, finalize=finalizef, query=query, out={ reduce: ocollection.name })

With h_aggregate dened, you can perform all aggregation operations as follows:
cutoff = datetime.utcnow() - timedelta(seconds=60) h_aggregate(db.events, db.stats.hourly, mapf_hour, cutoff, last_run) h_aggregate(db.stats.hourly, db.stats.daily, mapf_day, cutoff, last_run) h_aggregate(db.stats.daily, db.stats.weekly, mapf_week, cutoff, last_run) h_aggregate(db.stats.daily, db.stats.monthly, mapf_month, cutoff, last_run)

274

Chapter 21. Operational Intelligence

MongoDB Documentation, Release 2.0.6

h_aggregate(db.stats.monthly, db.stats.yearly, mapf_year, cutoff, last_run) last_run = cutoff

As long as you save and restore the last_run variable between aggregations, you can run these aggregations as often as you like since each aggregation operation is incremental.

21.3.4 Sharding
Ensure that you choose a shard key that is not the incoming timestamp, but rather something that varies signicantly in the most recent documents. In the example above, consider using the userid as the most signicant part of the shard key. To prevent a single, active user from creating a large, chunk that MongoDB cannot split, use a compound shard key with (username, timestamp) on the events collection. Consider the following:
>>> db.command(shardcollection,events, { ... key : { userid: 1, ts : 1} } ) { "collectionsharded": "events", "ok" : 1 }

To shard the aggregated collections you must use the _id eld, so you can issue the following group of shard operations in the Python/PyMongo shell:
db.command(shardcollection, key: { _id: 1 } }) db.command(shardcollection, key: { _id: 1 } }) db.command(shardcollection, key: { _id: 1 } }) db.command(shardcollection, key: { _id: 1 } }) stats.daily, { stats.weekly, { stats.monthly, { stats.yearly, {

You should also update the h_aggregate map-reduce wrapper to support sharded output Add sharded:True to the out argument. See the full sharded h_aggregate function:
def h_aggregate(icollection, ocollection, mapf, cutoff, last_run): query = { value.ts: { $gt: last_run, $lt: cutoff } } icollection.map_reduce( map=mapf, reduce=reducef, finalize=finalizef, query=query, out={ reduce: ocollection.name, sharded: True })

21.3. Hierarchical Aggregation

275

MongoDB Documentation, Release 2.0.6

276

Chapter 21. Operational Intelligence

CHAPTER

TWENTYTWO

PRODUCT DATA MANAGEMENT


MongoDBs exible schema makes it particularly well suited to storing information for product data management and e-commerce websites and solutions. The Product Catalog (page 277) document describes methods and practices for modeling and managing a product catalog using MongoDB, while the Inventory Management (page 284) document introduces a pattern for handling interactions between inventory and users shopping carts. Finally the Category Hierarchy (page 291) document describes methods for interacting with category hierarchies in MongoDB.

22.1 Product Catalog


22.1.1 Overview
This document describes the basic patterns and principles for designing an E-Commerce product catalog system using MongoDB as a storage engine. Problem Product catalogs must have the capacity to store many differed types of objects with different sets of attributes. These kinds of data collections are quite compatible with MongoDBs data model, but many important considerations and design decisions remain. Solution For relational databases, there are several solutions that address this problem, each with a different performance prole. This section examines several of these options and then describes the preferred MongoDB solution. SQL and Relational Data Models
Concrete Table Inheritance

One approach, in a relational model, is to create a table for each product category. Consider the following example SQL statement for creating database tables:
CREATE TABLE product_audio_album ( sku char(8) NOT NULL, ... artist varchar(255) DEFAULT NULL, genre_0 varchar(255) DEFAULT NULL,

277

MongoDB Documentation, Release 2.0.6

genre_1 varchar(255) DEFAULT NULL, ..., PRIMARY KEY(sku)) ... CREATE TABLE product_film ( sku char(8) NOT NULL, ... title varchar(255) DEFAULT NULL, rating char(8) DEFAULT NULL, ..., PRIMARY KEY(sku)) ...

This approach has limited exibility for two key reasons: You must create a new table for every new category of products. You must explicitly tailor all queries for the exact type of product.
Single Table Inheritance

Another relational data model uses a single table for all product categories and adds new columns anytime you need to store data regarding a new type of product. Consider the following SQL statement:
CREATE TABLE product ( sku char(8) NOT NULL, ... artist varchar(255) DEFAULT NULL, genre_0 varchar(255) DEFAULT NULL, genre_1 varchar(255) DEFAULT NULL, ... title varchar(255) DEFAULT NULL, rating char(8) DEFAULT NULL, ..., PRIMARY KEY(sku))

This approach is more exible than concrete table inheritance: it allows single queries to span different product types, but at the expense of space.
Multiple Table Inheritance

Also in the relational model, you may use a multiple table inheritance pattern to represent common attributes in a generic product table, with some variations in individual category product tables. Consider the following SQL statement:
CREATE TABLE product ( sku char(8) NOT NULL, title varchar(255) DEFAULT NULL, description varchar(255) DEFAULT NULL, price, ... PRIMARY KEY(sku)) CREATE TABLE product_audio_album ( sku char(8) NOT NULL, ... artist varchar(255) DEFAULT NULL, genre_0 varchar(255) DEFAULT NULL,

278

Chapter 22. Product Data Management

MongoDB Documentation, Release 2.0.6

genre_1 varchar(255) DEFAULT NULL, ..., PRIMARY KEY(sku), FOREIGN KEY(sku) REFERENCES product(sku)) ... CREATE TABLE product_film ( sku char(8) NOT NULL, ... title varchar(255) DEFAULT NULL, rating char(8) DEFAULT NULL, ..., PRIMARY KEY(sku), FOREIGN KEY(sku) REFERENCES product(sku)) ...

Multiple table inheritance is more space-efcient than single table inheritance (page 278) and somewhat more exible than concrete table inheritance (page 278). However, this model does require an expensive JOIN operation to obtain all relevant attributes relevant to a product.
Entity Attribute Values

The nal substantive pattern from relational modeling is the entity-attribute-value schema where you would create a meta-model for product data. In this approach, you maintain a table with three columns, e.g. entity_id, attribute_id, value, and these triples describe each product. Consider the description of an audio recording. You may have a series of rows representing the following relationships: Entity sku_00e8da9b sku_00e8da9b sku_00e8da9b sku_00e8da9b sku_00e8da9b sku_00e8da9b ... Attribute type title ... artist genre genre ... Value Audio Album A Love Supreme ... John Coltrane Jazz General ...

This schema is totally exible: any entity can have any set of any attributes. New product categories do not require any changes to the data model in the database. The downside for these models, is that all nontrivial queries require large numbers of JOIN operations that results in large performance penalties.
Avoid Modeling Product Data

Additionally some e-commerce solutions with relational database systems avoid choosing one of the the data models above, and serialize all of this data into a BLOB column. While simple, the details become difcult to access for search and sort.
Non-Relational Data Model

Because MongoDB is a non-relational database, the data model for your product catalog can benet from this additional exibility. The best models use a single MongoDB collection to store all the product data, which is similar to the

22.1. Product Catalog

279

MongoDB Documentation, Release 2.0.6

single table inheritance (page 278) relational model. MongoDBs dynamic schema means that each document need not conform to the same schema. As a result, the document for each product only needs to contain attributes relevant to that product. Schema At the beginning of the document, the schema must contain general product information, to facilitate searches of the entire catalog. Then, a details sub-document that contains elds that vary between product types. Consider the following example document for an album product.
{ sku: "00e8da9b", type: "Audio Album", title: "A Love Supreme", description: "by John Coltrane", asin: "B0000A118M", shipping: { weight: 6, dimensions: { width: 10, height: 10, depth: 1 }, }, pricing: { list: 1200, retail: 1100, savings: 100, pct_savings: 8 }, details: { title: "A Love Supreme [Original Recording Reissued]", artist: "John Coltrane", genre: [ "Jazz", "General" ], ... tracks: [ "A Love Supreme Part I: Acknowledgement", "A Love Supreme Part II - Resolution", "A Love Supreme, Part III: Pursuance", "A Love Supreme, Part IV-Psalm" ], }, }

A movie item would have the same elds for general product information, shipping, and pricing, but have different details sub-document. Consider the following:
{ sku: "00e8da9d", type: "Film", ..., asin: "B000P0J0AQ", shipping: { ... },

280

Chapter 22. Product Data Management

MongoDB Documentation, Release 2.0.6

pricing: { ... }, details: { title: "The Matrix", director: [ "Andy Wachowski", "Larry Wachowski" ], writer: [ "Andy Wachowski", "Larry Wachowski" ], ..., aspect_ratio: "1.66:1" }, }

Note: In MongoDB, you can have elds that hold multiple values (i.e. arrays) without any restrictions on the number of elds or values (as with genre_0 and genre_1) and also without the need for a JOIN operation.

22.1.2 Operations
For most deployments the primary use of the product catalog is to perform search operations. This section provides an overview of various types of queries that may be useful for supporting an e-commerce site. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose. Find Albums by Genre and Sort by Year Produced
Querying

This query returns the documents for the products of a specic genre, sorted in reverse chronological order:
query = db.products.find({type:Audio Album, details.genre: jazz}) query = query.sort([(details.issue_date, -1)])

Indexing

To support this query, create a compound index on all the properties used in the lter and in the sort:
db.products.ensure_index([ (type, 1), (details.genre, 1), (details.issue_date, -1)])

Note: The nal component of the index is the sort eld. This allows MongoDB to traverse the index in the sorted order to preclude a slow in-memory sort.

Find Products Sorted by Percentage Discount Descending While most searches will be for a particular type of product (e.g album, movie, etc.,) in some situations you may want to return all products in a certain price range, or discount percentage.

22.1. Product Catalog

281

MongoDB Documentation, Release 2.0.6

Querying

To return this data use the pricing information that exists in all products to nd the products with the highest percentage discount:
query = db.products.find( { pricing.pct_savings: {$gt: 25 }) query = query.sort([(pricing.pct_savings, -1)])

Indexing

To support this type of query, you will want to create an index on the pricing.pct_savings eld:
db.products.ensure_index(pricing.pct_savings)

Since MongoDB can read indexes in ascending or descending order, the order of the index does not matter. Note: If you want to preform range queries (e.g. return all products over $25) and then sort by another property like pricing.retail, MongoDB cannot use the index as effectively in this situation. The eld that you want to select a range, or perform sort operations, must be the last eld in a compound index in order to avoid scanning an entire collection. Using different properties within a single combined range query and sort operation requires some scanning which will limit the speed of your query.

Find Movies Based on Staring Actor


Querying

Use the following query to select documents within the details of a specied product type (i.e. Film) of product (a movie) to nd products that contain a certain value (i.e. a specic actor in the details.actor eld,) with the results sorted by date descending:
query = db.products.find({type: Film, details.actor: Keanu Reeves}) query = query.sort([(details.issue_date, -1)])

Indexing

To support this query, you may want to create the following index.
db.products.ensure_index([ (type, 1), (details.actor, 1), (details.issue_date, -1)])

This index begins with the type eld and then narrows by the other search eld, where the nal component of the index is the sort eld to maximize index efciency. Find Movies with a Particular Word in the Title Regardless of database engine, in order to retrieve this information the system will need to scan some number of documents or records to satisfy this query.

282

Chapter 22. Product Data Management

MongoDB Documentation, Release 2.0.6

Querying

MongoDB supports regular expressions within queries. In Python, you can use the python:re module to construct the query:
import re re_hacker = re.compile(r.*hacker.*, re.IGNORECASE) query = db.products.find({type: Film, title: re_hacker}) query = query.sort([(details.issue_date, -1)])

MongoDB provides a special syntax for regular expression queries without the need for the re module. Consider the following alternative which is equivalent to the above example:
query = db.products.find({ type: Film, title: {$regex: .*hacker.*, $options:i}}) query = query.sort([(details.issue_date, -1)])

The $options operator species a case insensitive match.


Indexing

The indexing strategy for these kinds of queries is different from previous attempts. Here, create an index on { type: 1, details.issue_date: -1, title: 1 } using the following command at the Python/PyMongo console:
db.products.ensure_index([ (type, 1), (details.issue_date, -1), (title, 1)])

This index makes it possible to avoid scanning whole documents by using the index for scanning the title rather than forcing MongoDB to scan whole documents for the title eld. Additionally, to support the sort on the details.issue_date eld, by placing this eld before the title eld, ensures that the result set is already ordered before MongoDB lters title eld.

22.1.3 Scaling
Sharding Database performance for these kinds of deployments are dependent on indexes. You may use sharding to enhance performance by allowing MongoDB to keep larger portions of those indexes in RAM. In sharded congurations, select a shard key that allows mongos to route queries directly to a single shard or small group of shards. Since most of the queries in this system include the type eld, include this in the shard key. Beyond this, the remainder of the shard key is difcult to predict without information about your databases actual activity and distribution. Consider that: details.issue_date would be a poor addition to the shard key because, although it appears in a number of queries, no query was were selective by this eld. you should include one or more elds in the detail document that you query frequently, and a eld that has quasi-random features, to prevent large unsplitable chunks.

22.1. Product Catalog

283

MongoDB Documentation, Release 2.0.6

In the following example, assume that the details.genre eld is the second-most queried eld after type. Enable sharding using the following shardcollection operation at the Python/PyMongo console:
>>> db.command(shardcollection, product, { ... key : { type: 1, details.genre : 1, sku:1 } }) { "collectionsharded" : "details.genre", "ok" : 1 }

Note: Even if you choose a poor shard key that requires mongos to broadcast all to all shards, you will still see some benets from sharding, because: 1. Sharding makes a larger amount of memory available to store indexes, and 2. MongoDB will parallelize queries across shards, reducing latency.

Read Preference While sharding is the best way to scale operations, some data sets make it impossible to partition data so that mongos can route queries to specic shards. In these situations mongos sends the query to all shards and then combines the results before returning to the client. In these situations, you can add additional read performance by allowing mongos to read from the secondary instances in a replica set by conguring read preference in your client. Read preference is congurable on a per-connection or per-operation basis. In PyMongo, set the read_preference argument. The SECONDARY property in the following example, permits reads from a secondary (as well as a primary) for the entire connection .
conn = pymongo.Connection(read_preference=pymongo.SECONDARY)

Conversely, the SECONDARY_ONLY read preference means that the client will only send read operation only to the secondary member
conn = pymongo.Connection(read_preference=pymongo.SECONDARY_ONLY)

You can also specify read_preference for specic queries, as follows:


results = db.product.find(..., read_preference=pymongo.SECONDARY)

or
results = db.product.find(..., read_preference=pymongo.SECONDARY_ONLY)

See Also: Replica Set Read Preference (page 51).

22.2 Inventory Management


22.2.1 Overview
This case study provides an overview of practices and patterns for designing and developing the inventory management portions of an E-commerce application. See Also: Product Catalog (page 277).

284

Chapter 22. Product Data Management

MongoDB Documentation, Release 2.0.6

Problem Customers in e-commerce stores regularly add and remove items from their shopping cart, change quantities multiple times, abandon the cart at any point, and sometimes have problems during and after checkout that require a hold or canceled order. These activities make it difcult to maintain inventory systems and counts and ensure that customers cannot buy items that are unavailable while they shop in your store. Solution This solution keeps the traditional metaphor of the shopping cart, but the shopping cart will age. After a shopping cart has been inactive for a certain period of time, all items in the cart re-enter the available inventory and the cart is empty. The state transition diagram for a shopping cart is below:

Schema Inventory collections must maintain counts of the current available inventory of each stock-keeping unit (SKU; or item) as well as a list of items in carts that may return to the available inventory if they are in a shopping cart that times out. In the following example, the _id eld stores the SKU:
{ _id: 00e8da9b, qty: 16, carted: [ { qty: 1, cart_id: 42, timestamp: ISODate("2012-03-09T20:55:36Z"), }, { qty: 2, cart_id: 43, timestamp: ISODate("2012-03-09T21:55:36Z"), }, ] }

Note: These examples use a simplied schema. In a production implementation, you may choose to merge this schema with the product catalog schema described in the Product Catalog (page 277) document. The SKU above has 16 items in stock, 1 item a cart, and 2 items in a second cart. This leaves a total of 19 unsold items of merchandise. To model the shopping cart objects, you need to maintain sku, quantity, elds embedded in a shopping cart document: 22.2. Inventory Management 285

MongoDB Documentation, Release 2.0.6

{ _id: 42, last_modified: ISODate("2012-03-09T20:55:36Z"), status: active, items: [ { sku: 00e8da9b, qty: 1, item_details: {...} }, { sku: 0ab42f88, qty: 4, item_details: {...} } ] }

Note: The item_details eld in each line item allows your application to display the cart contents to the user without requiring a second query to fetch details from the catalog collection.

22.2.2 Operations
This section introduces operations that you may use to support an e-commerce site. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose. Add an Item to a Shopping Cart Moving an item from the available inventory to a cart is a fundamental requirement for a shopping cart system. The most important requirement is to ensure that your application will never move an unavailable item from the inventory to the cart. Ensure that inventory is only updated if there is sufcient inventory to satisfy the request with the following add_item_to_cart function operation.
def add_item_to_cart(cart_id, sku, qty, details): now = datetime.utcnow() # Make sure the cart is still active and add the line item result = db.cart.update( {_id: cart_id, status: active }, { $set: { last_modified: now }, $push: { items: {sku: sku, qty:qty, details: details } } }, safe=True) if not result[updatedExisting]: raise CartInactive() # Update the inventory result = db.inventory.update( {_id:sku, qty: {$gte: qty}}, {$inc: {qty: -qty}, $push: { carted: { qty: qty, cart_id:cart_id, timestamp: now } } }, safe=True) if not result[updatedExisting]: # Roll back our cart update db.cart.update( {_id: cart_id },

286

Chapter 22. Product Data Management

MongoDB Documentation, Release 2.0.6

{ $pull: { items: {sku: sku } } }) raise InadequateInventory()

The system does not trust that the available inventory can satisfy a request First this operation checks to make sure that that the cart is active before adding a item. Then, it veries that the available inventory to satisfy the request before decrementing inventory. If there is not adequate inventory, the system removes the cart update: by specifying safe=True and checking the result allows the application to report an error if the cart is inactive or available quantity is insufcient to satisfy the request.

Note: This operation requires no indexes beyond the default index on the _id eld.

Modifying the Quantity in the Cart The following process underlies adjusting the quantity of items in a users cart. The application must ensure that when a user increases the quantity of an item, in addition to updating the carted entry for the users cart, that the inventory exists to cover the modication.
def update_quantity(cart_id, sku, old_qty, new_qty): now = datetime.utcnow() delta_qty = new_qty - old_qty # Make sure the cart is still active and add the line item result = db.cart.update( {_id: cart_id, status: active, items.sku: sku }, {$set: { last_modified: now, items.$.qty: new_qty }, }, safe=True) if not result[updatedExisting]: raise CartInactive() # Update the inventory result = db.inventory.update( {_id:sku, carted.cart_id: cart_id, qty: {$gte: delta_qty} }, {$inc: {qty: -delta_qty }, $set: { carted.$.qty: new_qty, timestamp: now } }, safe=True) if not result[updatedExisting]: # Roll back our cart update db.cart.update( {_id: cart_id, items.sku: sku }, {$set: { items.$.qty: old_qty } }) raise InadequateInventory()

Note: That the positional operator $ updates the particular carted entry and item that matched the query. This allows the application to update the inventory and keep track of the data needed to rollback the cart in a single atomic operation. The code also ensures that the cart is active.

22.2. Inventory Management

287

MongoDB Documentation, Release 2.0.6

Note: This operation requires no indexes beyond the default index on the _id eld.

Checking Out The checkout operation must: validate the method of payment and remove the carted items after the transaction succeeds. Consider the following procedure:
def checkout(cart_id): now = datetime.utcnow() # Make sure the cart is still active and set to pending. Also # fetch the cart details so we can calculate the checkout price cart = db.cart.find_and_modify( {_id: cart_id, status: active }, update={$set: { status: pending,last_modified: now } } ) if cart is None: raise CartInactive() # Validate payment details; collect payment try: collect_payment(cart) db.cart.update( {_id: cart_id }, {$set: { status: complete } } ) db.inventory.update( {carted.cart_id: cart_id}, {$pull: {cart_id: cart_id} }, multi=True) except: db.cart.update( {_id: cart_id }, {$set: { status: active } } ) raise

Begin by locking the cart by setting its status to pending Then the system will verify that the cart is still active and collect payment data. Then, the findAndModify command makes it possible to update the cart atomically and return its details to capture payment information. Then: If the payment is successful, then the application will remove the carted items from the inventory documents and set the cart to complete. If payment is unsuccessful, the application will unlock the cart by setting its status to active and report a payment error. Note: This operation requires no indexes beyond the default index on the _id eld.

Returning Inventory from Timed-Out Carts


Process

Periodically, your application must expire inactive carts and return their items to available inventory. In the example that follows the variable timeout controls the length of time before a cart expires:

288

Chapter 22. Product Data Management

MongoDB Documentation, Release 2.0.6

def expire_carts(timeout): now = datetime.utcnow() threshold = now - timedelta(seconds=timeout) # Lock and find all the expiring carts db.cart.update( {status: active, last_modified: { $lt: threshold } }, {$set: { status: expiring } }, multi=True ) # Actually expire each cart for cart in db.cart.find({status: expiring}): # Return all line items to inventory for item in cart[items]: db.inventory.update( { _id: item[sku], carted.cart_id: cart[id], carted.qty: item[qty] }, {$inc: { qty: item[qty] }, $pull: { carted: { cart_id: cart[id] } } }) db.cart.update( {_id: cart[id] }, {$set: { status: expired })

This procedure: 1. nds all carts that are older than the threshold and are due for expiration. 2. for each expiring cart, return all items to the available inventory. 3. once the items return to the available inventory, set the status eld to expired.
Indexing

To support returning inventory from timed-out cart, create an index to support queries on their status and last_modified elds. Use the following operations in the Python/PyMongo shell:
db.cart.ensure_index([(status, 1), (last_modified, 1)])

Error Handling The above operations do not account for one possible failure situation: if an exception occurs after updating the shopping cart but before updating the inventory collection. This would result in a shopping cart that may be absent or expired but items have not returned to available inventory. To account for this case, your application will need a periodic cleanup operation that nds inventory items that have carted items and check that to ensure that they exist in a users cart, and return them to available inventory if they do not.
def cleanup_inventory(timeout): now = datetime.utcnow() threshold = now - timedelta(seconds=timeout) # Find all the expiring carted items

22.2. Inventory Management

289

MongoDB Documentation, Release 2.0.6

for item in db.inventory.find( {carted.timestamp: {$lt: threshold }}): # Find all the carted items that matched carted = dict( (carted_item[cart_id], carted_item) for carted_item in item[carted] if carted_item[timestamp] < threshold) # First Pass: Find any carts that are active and refresh the carted items for cart in db.cart.find( { _id: {$in: carted.keys() }, status:active}): cart = carted[cart[_id]] db.inventory.update( { _id: item[_id], carted.cart_id: cart[_id] }, { $set: {carted.$.timestamp: now } }) del carted[cart[_id]] # Second Pass: All the carted items left in the dict need to now be # returned to inventory for cart_id, carted_item in carted.items(): db.inventory.update( { _id: item[_id], carted.cart_id: cart_id, carted.qty: carted_item[qty] }, { $inc: { qty: carted_item[qty] }, $pull: { carted: { cart_id: cart_id } } })

To summarize: This operation nds all carted items that have time stamps older than the threshold. Then, the process makes two passes over these items: 1. Of the items with time stamps older than the threshold, if the cart is still active, it resets the time stamp to maintain the carts. 2. Of the stale items that remain in inactive carts, the operation returns these items to the inventory. Note: The function above is safe for use because it checks to ensure that the cart has expired before returning items from the cart to inventory. However, it could be long-running and slow other updates and queries. Use judiciously.

22.2.3 Sharding
If you need to shard the data for this system, the _id eld is an ideal shard key for both carts and products because most update operations use the _id eld. This allows mongos to route all updates that select on _id to a single mongod process. There are two drawbacks for using _id as a shard key: If the cart collections _id is an incrementing value, all new carts end up on a single shard. You can mitigate this effect by choosing a random value upon the creation of a cart, such as a hash (i.e. MD5 or SHA-1) of an ObjectID, as the _id. The process for this operation would resemble the following:

290

Chapter 22. Product Data Management

MongoDB Documentation, Release 2.0.6

import hashlib import bson cart_id = bson.ObjectId() cart_id_hash = hashlib.md5(str(cart_id)).hexdigest() cart = { "_id": cart_id, "cart_hash": cart_id_hash } db.cart.insert(cart)

Cart expiration and inventory adjustment requires update operations and queries to broadcast to all shards when using _id as a shard key. This may be less relevant as the expiration functions run relatively infrequently and you can queue them or articially slow them down (as with judicious use of sleep()) to minimize server load. Use the following commands in the Python/PyMongo console to shard the cart and inventory collections:
>>> db.command(shardcollection, inventory ... key: { _id: 1 } ) { "collectionsharded" : "inventory", "ok" : 1 } >>> db.command(shardcollection, cart) ... key: { _id: 1 } ) { "collectionsharded" : "cart", "ok" : 1 }

22.3 Category Hierarchy


22.3.1 Overview
This document provides the basic design for modeling a product hierarchy stored in MongoDB as well as a collection of common operations for interacting with this data that will help you begin to write an E-commerce product category hierarchy. See Also: Product Catalog (page 277) Solution To model a product category hierarchy, this solution keeps each category in its own document that also has a list of its ancestors or parents. This document uses music genres as the basis of its examples: Because these kinds of categories change infrequently, this model focuses on the operations needed to keep the hierarchy up-to-date rather than the performance prole of update operations. Schema This schema has the following properties: A single document represents each category in the hierarchy. An ObjectId identies each category document for internal cross-referencing. Each category document has a human-readable name and a URL compatible slug eld. The schema stores a list of ancestors for each category to facilitate displaying a query and its ancestors using only a single query.

22.3. Category Hierarchy

291

MongoDB Documentation, Release 2.0.6

Figure 22.1: Initial category hierarchy Consider the following prototype:


{ "_id" : ObjectId("4f5ec858eb03303a11000002"), "name" : "Modal Jazz", "parent" : ObjectId("4f5ec858eb03303a11000001"), "slug" : "modal-jazz", "ancestors" : [ { "_id" : ObjectId("4f5ec858eb03303a11000001"), "slug" : "bop", "name" : "Bop" }, { "_id" : ObjectId("4f5ec858eb03303a11000000"), "slug" : "ragtime", "name" : "Ragtime" } ] }

22.3.2 Operations
This section outlines the category hierarchy manipulations that you may need in an E-Commerce site. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose. Read and Display a Category
Querying

Use the following option to read and display a category hierarchy. This query will use the slug eld to return the category information and a bread crumb trail from the current category to the top level category.

292

Chapter 22. Product Data Management

MongoDB Documentation, Release 2.0.6

category = db.categories.find( {slug:slug}, {_id:0, name:1, ancestors.slug:1, ancestors.name:1 })

Indexing

Create a unique index on the slug eld with the following operation on the Python/PyMongo console:
>>> db.categories.ensure_index(slug, unique=True)

Add a Category to the Hierarchy To add a category you must rst determine its ancestors. Take adding a new category Swing as a child of Ragtime, as below:

Figure 22.2: Adding a category The insert operation would be trivial except for the ancestors. To dene this array, consider the following helper function:
def build_ancestors(_id, parent_id): parent = db.categories.find_one( {_id: parent_id}, {name: 1, slug: 1, ancestors:1}) parent_ancestors = parent.pop(ancestors) ancestors = [ parent ] + parent_ancestors db.categories.update( {_id: _id}, {$set: { ancestors: ancestors } })

22.3. Category Hierarchy

293

MongoDB Documentation, Release 2.0.6

You only need to travel up one level in the hierarchy to get the ancestor list for Ragtime that you can use to build the ancestor list for Swing. Then create ate document with the following set of operations:
doc = dict(name=Swing, slug=swing, parent=ragtime_id) swing_id = db.categories.insert(doc) build_ancestors(swing_id, ragtime_id)

Note: Since these queries and updates all selected based on _id, you only need the default MongoDB-supplied index on _id to support this operation efciently.

Change the Ancestry of a Category This section address the process for reorganizing the hierarchy by moving bop under swing as follows:

Figure 22.3: Change the parent of a category

Procedure

Update the bop document to reect the change in ancestry with the following operation:

294

Chapter 22. Product Data Management

MongoDB Documentation, Release 2.0.6

db.categories.update( {_id:bop_id}, {$set: { parent: swing_id } } )

The following helper function, rebuilds the ancestor elds to ensure correctness.

def build_ancestors_full(_id, parent_id): ancestors = [] while parent_id is not None: parent = db.categories.find_one( {_id: parent_id}, {parent: 1, name: 1, slug: 1, ancestors:1}) parent_id = parent.pop(parent) ancestors.append(parent) db.categories.update( {_id: _id}, {$set: { ancestors: ancestors } })

You can use the following loop to reconstruct all the descendants of the bop category:
for cat in db.categories.find( {ancestors._id: bop_id}, {parent_id: 1}): build_ancestors_full(cat[_id], cat[parent_id])

Indexing

Create an index on the ancestors._id eld to support the update operation.


db.categories.ensure_index(ancestors._id)

Rename a Category To a rename a category you need to both update the category itself and also update all the descendants. Consider renaming Bop to BeBop as in the following gure: First, you need to update the category name with the following operation:
db.categories.update( {_id:bop_id}, {$set: { name: BeBop } } )

Next, you need to update each descendants ancestors list:


db.categories.update( {ancestors._id: bop_id}, {$set: { ancestors.$.name: BeBop } }, multi=True)

This operation uses: the positional operation $ to match the exact ancestor entry that matches the query, and the multi option to update all documents that match this query. Note: In this case, the index you have already dened on ancestors._id is sufcient to ensure good performance.
Your application cannot guarantee that the ancestor list of a parent category is correct, because MongoDB may process the categories out-oforder.
1

22.3. Category Hierarchy

295

MongoDB Documentation, Release 2.0.6

Figure 22.4: Rename a category

296

Chapter 22. Product Data Management

MongoDB Documentation, Release 2.0.6

22.3.3 Sharding
For most deployments, sharding this collection has limited value because the collection will be very small. If you do need to shard, because most updates query the _id eld, this eld is a suitable shard key. Shard the collection with the following operation in the Python/PyMongo console.
>>> db.command(shardcollection, categories, { ... key: {_id: 1} }) { "collectionsharded" : "categories", "ok" : 1 }

22.3. Category Hierarchy

297

MongoDB Documentation, Release 2.0.6

298

Chapter 22. Product Data Management

CHAPTER

TWENTYTHREE

CONTENT MANAGEMENT SYSTEMS


The content management use cases introduce fundamental MongoDB practices and approaches, using familiar problems and simple examples. The Metadata and Asset Management (page 299) document introduces a model that you may use when designing a web site content management system, while Storing Comments (page 306) introduces the method for modeling user comments on content, like blog posts, and media, in MongoDB.

23.1 Metadata and Asset Management


23.1.1 Overview
This document describes the design and pattern of a content management system using MongoDB modeled on the popular Drupal CMS. Problem You are designing a content management system (CMS) and you want to use MongoDB to store the content of your sites. Solution To build this system you will use MongoDBs exible schema to store all content nodes in a single collection regardless of type. This guide will provide prototype schema and describe common operations for the following primary node types: Basic Page Basic pages are useful for displaying infrequently-changing text such as an about page. With a basic page, the salient information is the title and the content. Blog entry Blog entries record a stream of posts from users on the CMS and store title, author, content, and date as relevant information. Photo Photos participate in photo galleries, and store title, description, author, and date along with the actual photo binary data. This solution does not describe schema or process for storing or using navigational and organizational information. Schema Although documents in the nodes collection contain content of different times, all documents have a similar structure and a set of common elds. Consider the following prototype document for a basic page node type:

299

MongoDB Documentation, Release 2.0.6

{ _id: ObjectId(...), nonce: ObjectId(...), metadata: { type: basic-page section: my-photos, slug: about, title: About Us, created: ISODate(...), author: { _id: ObjectId(...), name: Rick }, tags: [ ... ], detail: { text: # About Us\n... } } }

Most elds are descriptively titled. The section eld identies groupings of items, as in a photo gallery, or a particular blog . The slug eld holds a URL-friendly unique representation of the node, usually that is unique within its section for generating URLs. All documents also have a detail eld that varies with the document type. For the basic page above, the detail eld might hold the text of the page. For a blog entry, the detail eld might hold a sub-document. Consider the following prototype:
{ ... metadata: { ... type: blog-entry, section: my-blog, slug: 2012-03-noticed-the-news, ... detail: { publish_on: ISODate(...), text: I noticed the news from Washington today... } } }

Photos require a different approach. Because photos can be potentially larger than these documents, its important to separate the binary photo storage from the nodes metadata. GridFS provides the ability to store larger les in MongoDB. GridFS stores data in two collections, in this case, cms.assets.files, which stores metadata, and cms.assets.chunks which stores the data itself. Consider the following prototype document from the cms.assets.files collection:
{ _id: ObjectId(...), length: 123..., chunkSize: 262144, uploadDate: ISODate(...), contentType: image/jpeg, md5: ba49a..., metadata: { nonce: ObjectId(...), slug: 2012-03-invisible-bicycle, type: photo, section: my-album, title: Kitteh, created: ISODate(...),

300

Chapter 23. Content Management Systems

MongoDB Documentation, Release 2.0.6

author: { _id: ObjectId(...), name: Jared }, tags: [ ... ], detail: { filename: kitteh_invisible_bike.jpg, resolution: [ 1600, 1600 ], ... } } }

Note: This document embeds the basic node document elds, which allows you to use the same code to manipulate nodes, regardless of type.

23.1.2 Operations
This section outlines a number of common operations for building and interacting with the metadata and asset layer of the cms for all node types. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose. Create and Edit Content Nodes
Procedure

The most common operations inside of a CMS center on creating and editing content. Consider the following <pymongo:pymongo.collection.Collection.insert>() operation:
db.cms.nodes.insert({ nonce: ObjectId(), metadata: { section: myblog, slug: 2012-03-noticed-the-news, type: blog-entry, title: Noticed in the News, created: datetime.utcnow(), author: { id: user_id, name: Rick }, tags: [ news, musings ], detail: { publish_on: datetime.utcnow(), text: I noticed the news from Washington today... } } })

Once inserted, your application must have some way of preventing multiple concurrent updates. The schema uses the special nonce eld to help detect concurrent edits. By using the nonce eld in the query portion of the update operation, the application will generate an error if there is an editing collision. Consider the following update
def update_text(section, slug, nonce, text): result = db.cms.nodes.update( { metadata.section: section, metadata.slug: slug, nonce: nonce }, { $set:{metadata.detail.text: text, nonce: ObjectId() } }, safe=True) if not result[updatedExisting]: raise ConflictError()

23.1. Metadata and Asset Management

301

MongoDB Documentation, Release 2.0.6

You may also want to perform metadata edits to the item such as adding tags:
db.cms.nodes.update( { metadata.section: section, metadata.slug: slug }, { $addToSet: { tags: { $each: [ interesting, funny ] } } })

In this example the $addToSet operator will only add values to the tags eld if they do not already exist in the tags array, theres no need to supply or update the nonce.
Index Support

To support updates and queries on the metadata.section, and metadata.slug, elds and to ensure that two editors dont create two documents with the same section name or slug. Use the following operation at the Python/PyMongo console:
>>> db.cms.nodes.ensure_index([ ... (metadata.section, 1), (metadata.slug, 1)], unique=True)

The unique=True option prevents to documents from colliding. If you want an index to support queries on the above elds and the nonce eld create the following index:
>>> db.cms.nodes.ensure_index([ ... (metadata.section, 1), (metadata.slug, 1), (nonce, 1) ])

However, in most cases, the rst index will be sufcient to support these operations. Upload a Photo
Procedure

To update a photo object, use the following operation, which builds upon the basic update procedure:
def upload_new_photo( input_file, section, slug, title, author, tags, details): fs = GridFS(db, cms.assets) with fs.new_file( content_type=image/jpeg, metadata=dict( type=photo, locked=datetime.utcnow(), section=section, slug=slug, title=title, created=datetime.utcnow(), author=author, tags=tags, detail=detail)) as upload_file: while True: chunk = input_file.read(upload_file.chunk_size) if not chunk: break upload_file.write(chunk) # unlock the file db.assets.files.update( {_id: upload_file._id}, {$set: { locked: None } } )

302

Chapter 23. Content Management Systems

MongoDB Documentation, Release 2.0.6

Because uploading the photo spans multiple documents and is a non-atomic operation, you must lock the le during upload by writing datetime.utcnow() in the record. This helps when there are multiple concurrent editors and lets the application detect stalled le uploads. This operation assumes that, for photo upload, the last update will succeed:
def update_photo_content(input_file, section, slug): fs = GridFS(db, cms.assets) # Delete the old version if its unlocked or was locked more than 5 # minutes ago file_obj = db.cms.assets.find_one( { metadata.section: section, metadata.slug: slug, metadata.locked: None }) if file_obj is None: threshold = datetime.utcnow() - timedelta(seconds=300) file_obj = db.cms.assets.find_one( { metadata.section: section, metadata.slug: slug, metadata.locked: { $lt: threshold } }) if file_obj is None: raise FileDoesNotExist() fs.delete(file_obj[_id]) # update content, keep metadata unchanged file_obj[locked] = datetime.utcnow() with fs.new_file(**file_obj): while True: chunk = input_file.read(upload_file.chunk_size) if not chunk: break upload_file.write(chunk) # unlock the file db.assets.files.update( {_id: upload_file._id}, {$set: { locked: None } } )

As with the basic operations, you can use a much more simple operation to edit the tags:
db.cms.assets.files.update( { metadata.section: section, metadata.slug: slug }, { $addToSet: { metadata.tags: { $each: [ interesting, funny ] } } })

Index Support

Create a unique index on { metadata.section: 1, metadata.slug: 1 } to support the above operations and prevent users from creating or updating the same le concurrently. Use the following operation in the Python/PyMongo console:
>>> db.cms.assets.files.ensure_index([ ... (metadata.section, 1), (metadata.slug, 1)], unique=True)

Locate and Render a Node To locate a node based on the value of metadata.section and metadata.slug, use the following find_one operation.

23.1. Metadata and Asset Management

303

MongoDB Documentation, Release 2.0.6

node = db.nodes.find_one({metadata.section: section, metadata.slug: slug })

Note: The index dened (section, slug) created to support the update operation, is sufcient to support this operation as well.

Locate and Render a Photo To locate an image based on the value of metadata.section and metadata.slug, use the following find_one operation.
fs = GridFS(db, cms.assets) with fs.get_version({metadata.section: section, metadata.slug: slug }) as img_fpo: # do something with the image file

Note: The index dened (section, slug) created to support the update operation, is sufcient to support this operation as well.

Search for Nodes by Tag


Querying

To retrieve a list of nodes based on their tags, use the following query:
nodes = db.nodes.find({metadata.tags: tag })

Indexing

Create an index on the tags eld in the cms.nodes collection, to support this query:
>>> db.cms.nodes.ensure_index(tags)

Search for Images by Tag


Procedure

To retrieve a list of images based on their tags, use the following operation:
image_file_objects = db.cms.assets.files.find({metadata.tags: tag }) fs = GridFS(db, cms.assets) for image_file_object in db.cms.assets.files.find( {metadata.tags: tag }): image_file = fs.get(image_file_object[_id]) # do something with the image file

Indexing

Create an index on the tags eld in the cms.assets.files collection, to support this query:

304

Chapter 23. Content Management Systems

MongoDB Documentation, Release 2.0.6

>>> db.cms.assets.files.ensure_index(tags)

Generate a Feed of Recently Published Blog Articles


Querying

Use the following operation to generate a list of recent blog posts sorted in descending order by date, for use on the index page of your site, or in an .rss or .atom feed.
articles = db.nodes.find({ metadata.section: my-blog metadata.published: { $lt: datetime.utcnow() } }) articles = articles.sort({metadata.published: -1})

Note: In many cases you will want to limit the number of nodes returned by this query.

Indexing

Create an compound index the the { metadata.section: support this query and sort operation.

1, metadata.published:

1 } elds to

>>> db.cms.nodes.ensure_index( ... [ (metadata.section, 1), (metadata.published, -1) ])

Note: For all sort or range queries, ensure that eld with the sort or range operation is the nal eld in the index.

23.1.3 Sharding
In a CMS, read performance is more critical than write performance. To achieve the best read performance in a shard cluster, ensure that the mongos can route queries to specic shards. Also remember that MongoDB can not enforce unique indexes across shards. Using a compound shard key that consists of metadata.section and metadata.slug, will provide the same semantics as describe above. Warning: Consider the actual use and workload of your cluster before conguring sharding for your cluster. Use the following operation at the Python/PyMongo shell:
>>> db.command(shardcollection, cms.nodes, { ... key : { metadata.section: 1, metadata.slug : 1 } }) { "collectionsharded": "cms.nodes", "ok": 1} >>> db.command(shardcollection, cms.assets.files, { ... key : { metadata.section: 1, metadata.slug : 1 } }) { "collectionsharded": "cms.assets.files", "ok": 1}

To shard the cms.assets.chunks collection, you must use the _id eld as the shard key. The following operation will shard the collection

23.1. Metadata and Asset Management

305

MongoDB Documentation, Release 2.0.6

>>> db.command(shardcollection, cms.assets.chunks, { ... key : { files_id: 1 } }) { "collectionsharded": "cms.assets.chunks", "ok": 1}

Sharding on the files_id eld ensures routable queries because all reads from GridFS must rst look up the document in cms.assets.files and then look up the chunks separately.

23.2 Storing Comments


This document outlines the basic patterns for storing user-submitted comments in a content management system (CMS.)

23.2.1 Overview
MongoDB provides a number of different approaches for storing data like users-comments on content from a CMS. There is no correct implementation, but there are a number of common approaches and known considerations for each approach. This case study explores the implementation details and trade offs of each option. The three basic patterns are: 1. Store each comment in its own document. This approach provides the greatest exibility at the expense of some additional application level complexity. These implementations make it possible to display comments in chronological or threaded order, and place no restrictions on the number of comments attached to a specic object. 2. Embed all comments in the parent document. This approach provides the greatest possible performance for displaying comments at the expense of exibility: the structure of the comments in the document controls the display format. Note: Because of the limit on document size (page 573), documents, including the original content and all comments, cannot grow beyond 16 megabytes. 3. A hybrid design, stores comments separately from the parent, but aggregates comments into a small number of documents, where each contains many comments. Also consider that comments can be threaded, where comments are always replies to parent item or to another comment, which carries certain architectural requirements discussed below.

23.2.2 One Document per Comment


Schema If you store each comment in its own document, the documents in your comments collection, would have the following structure:
{ _id: ObjectId(...), discussion_id: ObjectId(...), slug: 34db, posted: ISODateTime(...), author: {

306

Chapter 23. Content Management Systems

MongoDB Documentation, Release 2.0.6

id: ObjectId(...), name: Rick }, text: This is so bogus ... }

This form is only suitable for displaying comments in chronological order. Comments store: the discussion_id eld that references the discussion parent, a URL-compatible slug identier, a posted timestamp, an author sub-document that contains a reference to a users prole in the id eld and their name in the name eld, and the full text of the comment. To support threaded comments, you might use a slightly different structure like the following:
{ _id: ObjectId(...), discussion_id: ObjectId(...), parent_id: ObjectId(...), slug: 34db/8bda full_slug: 2012.02.08.12.21.08:34db/2012.02.09.22.19.16:8bda, posted: ISODateTime(...), author: { id: ObjectId(...), name: Rick }, text: This is so bogus ... }

This structure: adds a parent_id eld that stores the contents of the _id eld of the parent comment, modies the slug eld to hold a path composed of the parent or parents slug and this comments unique slug, and adds a full_slug eld that that combines the slugs and time information to make it easier to sort documents in a threaded discussion by date. Warning: MongoDB can only index 1024 bytes (page 573). This includes all eld data, the eld name, and the namespace (i.e. database name and collection name.) This may become an issue when you create an index of the full_slug eld to support sorting.

Operations This section contains an overview of common operations for interacting with comments represented using a schema where each comment is its own document. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose. Issue the following commands at the interactive Python shell to load the required libraries:

23.2. Storing Comments

307

MongoDB Documentation, Release 2.0.6

>>> import bson >>> import pymongo

Post a New Comment

To post a new comment in a chronologically ordered (i.e. without threading) system, use the following insert() operation:
slug = generate_pseudorandom_slug() db.comments.insert({ discussion_id: discussion_id, slug: slug, posted: datetime.utcnow(), author: author_info, text: comment_text })

To insert a comment for a system with threaded comments, you must generate the slug path and full_slug at insert. See the following operation:
posted = datetime.utcnow() # generate the unique portions of the slug and full_slug slug_part = generate_pseudorandom_slug() full_slug_part = posted.strftime(%Y.%m.%d .%H.%M.%S) + : + slug_part # load the parent comment (if any) if parent_slug: parent = db.comments.find_one( {discussion_id: discussion_id, slug: parent_slug }) slug = parent[slug] + / + slug_part full_slug = parent[full_slug] + / + full_slug_part else: slug = slug_part full_slug = full_slug_part # actually insert the comment db.comments.insert({ discussion_id: discussion_id, slug: slug, full_slug: full_slug, posted: posted, author: author_info, text: comment_text })

View Paginated Comments

To view comments that are not threaded, select all comments participating in a discussion and sort by the posted eld. For example:
cursor cursor cursor cursor = = = = db.comments.find({discussion_id: discussion_id}) cursor.sort(posted) cursor.skip(page_num * page_size) cursor.limit(page_size)

Because the full_slug eld contains both hierarchical information (via the path) and chronological information, you can use a simple sort on the full_slug eld to retrieve a threaded view:

308

Chapter 23. Content Management Systems

MongoDB Documentation, Release 2.0.6

cursor cursor cursor cursor

= = = =

db.comments.find({discussion_id: discussion_id}) cursor.sort(full_slug) cursor.skip(page_num * page_size) cursor.limit(page_size)

See Also: cursor.limit (page 450), cursor.skip (page 452), and cursor.sort (page 452)
Indexing

To support the above queries efciently, maintain two compound indexes, on: 1. (discussion_id, posted) and 2. (discussion_id, full_slug) Issue the following operation at the interactive Python shell.
>>> db.comments.ensure_index([ ... (discussion_id, 1), (posted, 1)]) >>> db.comments.ensure_index([ ... (discussion_id, 1), (full_slug, 1)])

Note: Ensure that you always sort by the nal element in a compound index to maximize the performance of these queries.

Retrieve Comments via Direct Links


Queries

To directly retrieve a comment, without needing to page through all comments, you can select by the slug eld:
comment = db.comments.find_one({ discussion_id: discussion_id, slug: comment_slug})

You can retrieve a sub-discussion, or a comment and all of its descendants recursively, by performing a regular expression prex query on the full_slug eld:
import re subdiscussion = db.comments.find_one({ discussion_id: discussion_id, full_slug: re.compile(^ + re.escape(parent_slug)) }) subdiscussion = subdiscussion.sort(full_slug)

Indexing

Since you have already created indexes on { discussion_id: 1, full_slug: } to support retrieving sub-discussions, you can add support for the above queries by adding an index on { discussion_id: 1 , slug: 1 }. Use the following operation in the Python shell:

23.2. Storing Comments

309

MongoDB Documentation, Release 2.0.6

>>> db.comments.ensure_index([ ... (discussion_id, 1), (slug, 1)])

23.2.3 Embedding All Comments


This design embeds the entire discussion of a comment thread inside of the topic document. In this example, the topic, document holds the total content for whatever content youre managing. Schema Consider the following prototype topic document:
{ _id: ObjectId(...), ... lots of topic data ... comments: [ { posted: ISODateTime(...), author: { id: ObjectId(...), name: Rick }, text: This is so bogus ... }, ... ] }

This structure is only suitable for a chronological display of all comments because it embeds comments in chronological order. Each document in the array in the comments contains the comments date, author, and text. Note: Since youre storing the comments in sorted order, there is no need to maintain per-comment slugs. To support threading using this design, you would need to embed comments within comments, using a structure that resembles the following:
{ _id: ObjectId(...), ... lots of topic data ... replies: [ { posted: ISODateTime(...), author: { id: ObjectId(...), name: Rick }, text: This is so bogus ... , replies: [ { author: { ... }, ... }, ... ] }

Here, the replies eld in each comment holds the sub-comments, which can intern hold sub-comments. Note: In the embedded document design, you give up some exibility regarding display format, because it is difcult to display comments except as you store them in MongoDB. If, in the future, you want to switch from chronological to threaded or from chronological to threaded, this design would make that migration quite expensive.

310

Chapter 23. Content Management Systems

MongoDB Documentation, Release 2.0.6

Warning: Remember that BSON documents have a 16 megabyte size limit (page 573). If popular discussions grow larger than 16 megabytes, additional document growth will fail. Additionally, when MongoDB documents grow signicantly after creation you will experience greater storage fragmentation and degraded update performance while MongoDB migrates documents internally.

Operations This section contains an overview of common operations for interacting with comments represented using a schema that embeds all comments the document of the parent or topic content. Note: For all operations below, there is no need for any new indexes since all the operations are function within documents. Because you would retrieve these documents by the _id eld, you can rely on the index that MongoDB creates automatically.

Post a new comment

To post a new comment in a chronologically ordered (i.e unthreaded) system, you need the following update():
db.discussion.update( { discussion_id: discussion_id }, { $push: { comments: { posted: datetime.utcnow(), author: author_info, text: comment_text } } } )

The $push operator inserts comments into the comments array in correct chronological order. For threaded discussions, the update() operation is more complex. To reply to a comment, the following code assumes that it can retrieve the path as a list of positions, for the parent comment:
if path != []: str_path = ..join(replies.%d % part for part in path) str_path += .replies else: str_path = replies db.discussion.update( { discussion_id: discussion_id }, { $push: { str_path: { posted: datetime.utcnow(), author: author_info, text: comment_text } } } )

This constructs a eld name of the form replies.0.replies.2... as str_path and then uses this value with the $push operator to insert the new comment into the parent comments replies array.
View Paginated Comments

To view the comments in a non-threaded design, you must use the $slice (page 397) operator:
discussion = db.discussion.find_one( {discussion_id: discussion_id}, { ... some fields relevant to your page from the root discussion ...,

23.2. Storing Comments

311

MongoDB Documentation, Release 2.0.6

comments: { $slice: [ page_num * page_size, page_size ] } })

To return paginated comments for the threaded design, you must retrieve the whole document and paginate the comments within the application:
discussion = db.discussion.find_one({discussion_id: discussion_id}) def iter_comments(obj): for reply in obj[replies]: yield reply for subreply in iter_comments(reply): yield subreply paginated_comments = itertools.slice( iter_comments(discussion), page_size * page_num, page_size * (page_num + 1))

Retrieve a Comment via Direct Links

Instead of retrieving comments via slugs as above, the following example retrieves comments using their position in the comment list or tree. For chronological (i.e. non-threaded) comments, just use the $slice (page 397) operator to extract a comment, as follows:
discussion = db.discussion.find_one( {discussion_id: discussion_id}, {comments: { $slice: [ position, position ] } }) comment = discussion[comments][0]

For threaded comments, you must nd the correct path through the tree in your application, as follows:
discussion = db.discussion.find_one({discussion_id: discussion_id}) current = discussion for part in path: current = current.replies[part] comment = current

Note: Since parent comments embed child replies, this operation actually retrieves the entire sub-discussion for the comment you queried for. See Also: find_one().

23.2.4 Hybrid Schema Design


Schema In the hybrid approach you will store comments in buckets that hold about 100 comments. Consider the following example document:

312

Chapter 23. Content Management Systems

MongoDB Documentation, Release 2.0.6

{ _id: ObjectId(...), discussion_id: ObjectId(...), page: 1, count: 42, comments: [ { slug: 34db, posted: ISODateTime(...), author: { id: ObjectId(...), name: Rick }, text: This is so bogus ... }, ... ] }

Each document maintains page and count data that contains meta data regarding the page, the page number and the comment count, in addition to the comments array that holds the comments themselves. Note: Using a hybrid format makes storing threaded comments complex, and this specic conguration is not covered in this document. Also, 100 comments is a soft limit for the number of comments per page. This value is arbitrary: choose a value that will prevent the maximum document size from growing beyond the 16MB BSON document size limit (page 573), but large enough to ensure that most comment threads will t in a single document. In some situations the number of comments per document can exceed 100, but this does not affect the correctness of the pattern.

Operations This section contains a number of common operations that you may use when building a CMS using this hybrid storage model with documents that hold 100 comment pages. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose.
Post a New Comment

Updating In order to post a new comment, you need to $push the comment onto the last page and $inc that pages comment count. Consider the following example that queries on the basis of a discussion_id eld:
page = db.comment_pages.find_and_modify( { discussion_id: discussion[_id], page: discussion[num_pages] }, { $inc: { count: 1 }, $push: { comments: { slug: slug, ... } } }, fields={count:1}, upsert=True, new=True )

The find_and_modify() operation is an upsert,: if MongoDB cannot nd a document with the correct page number, the find_and_modify() will create it and initialize the new document with appropriate values for count and comments. To limit the number of comments per page to roughly 100, you will need to create new pages as they become necessary. Add the following logic to support this:

23.2. Storing Comments

313

MongoDB Documentation, Release 2.0.6

if page[count] > 100: db.discussion.update( { discussion_id: discussion[_id], num_pages: discussion[num_pages] }, { $inc: { num_pages: 1 } } )

This update() operation includes the last known number of pages in the query to prevent a race condition where the number of pages incriments twice, that would result result in a nearly or totally empty document. If another process increments the number of pages, then update above does nothing. Indexing To support the find_and_modify() and update() operations, maintain a compound index on (discussion_id, page) in the comment_pages collection, by issuing the following operation at the Python/PyMongo console:
>>> db.comment_pages.ensure_index([ ... (discussion_id, 1), (page, 1)])

View Paginated Comments

The following function denes how to paginate comments with a xed page size (i.e. not with the roughly 100 comment documents in the above example,) as en example:
def find_comments(discussion_id, skip, limit): result = [] page_query = db.comment_pages.find( { discussion_id: discussion_id }, { count: 1, comments: { $slice: [ skip, limit ] } }) page_query = page_query.sort(page) for page in page_query: result += page[comments] skip = max(0, skip - page[count]) limit -= len(page[comments]) if limit == 0: break return result

Here, the $slice (page 397) operator pulls out comments from each page, but only when this satises the skip requirement. For example: if you have 3 pages with 100, 102, 101, and 22 comments on each page, and you wish to retrieve comments where skip=300 and limit=50. Use the following algorithm: Skip Limit Discussion 300 50 {$slice: [ 300, 50 ] } matches nothing in page #1; subtract page #1s count from skip and continue. 200 50 {$slice: [ 200, 50 ] } matches nothing in page #2; subtract page #2s count from skip and continue. 98 50 {$slice: [ 98, 50 ] } matches 2 comments in page #3; subtract page #3s count from skip (saturating at 0), subtract 2 from limit, and continue. 0 48 {$slice: [ 0, 48 ] } matches all 22 comments in page #4; subtract 22 from limit and continue. 0 26 There are no more pages; terminate loop. Note: Since you already have an index on (discussion_id, page) in your comment_pages collection, MongoDB can satisfy these queries efciently.

314

Chapter 23. Content Management Systems

MongoDB Documentation, Release 2.0.6

Retrieve a Comment via Direct Links

Query To retrieve a comment directly without paging through all preceding pages of commentary, use the slug to nd the correct page, and then use application logic to nd the correct comment:
page = db.comment_pages.find_one( { discussion_id: discussion_id, comments.slug: comment_slug}, { comments: 1 }) for comment in page[comments]: if comment[slug] = comment_slug: break

Indexing To perform this query efciently youll need a new index on the discussion_id and comments.slug elds (i.e. { discussion_id: 1 comments.slug: 1 }.) Create this index using the following operation in the Python/PyMongo console:
>>> db.comment_pages.ensure_index([ ... (discussion_id, 1), (comments.slug, 1)])

23.2.5 Sharding
For all of the architectures discussed above, you will want to the discussion_id eld to participate in the shard key, if you need to shard your application. For applications that use the one document per comment approach, consider using slug (or full_slug, in the case of threaded comments) elds in the shard key to allow the mongos instances to route requests by slug. Issue the following operation at the Python/PyMongo console:
>>> db.command(shardcollection, comments, { ... key : { discussion_id : 1, full_slug: 1 } })

This will return the following response:


{ "collectionsharded" : "comments", "ok" : 1 }

In the case of comments that fully-embedded in parent content documents the determination of the shard key is outside of the scope of this document. For hybrid documents, use the page number of the comment page in the shard key along with the discussion_id to allow MongoDB to split popular discussions between, while grouping discussions on the same shard. Issue the following operation at the Python/PyMongo console:
>>> db.command(shardcollection, comment_pages, { ... key : { discussion_id : 1, page: 1 } }) { "collectionsharded" : "comment_pages", "ok" : 1 }

23.2. Storing Comments

315

MongoDB Documentation, Release 2.0.6

316

Chapter 23. Content Management Systems

CHAPTER

TWENTYFOUR

PYTHON APPLICATION DEVELOPMENT


24.1 Write a Tumblelog Application with Django MongoDB Engine
24.1.1 Introduction
In this tutorial, you will learn how to create a basic tumblelog application using the popular Django Python webframework and the MongoDB database. The tumblelog will consist of two parts: 1. A public site that lets people view posts and comment on them. 2. An admin site that lets you add, change and delete posts and publish comments. This tutorial assumes that you are already familiar with Django and have a basic familiarity with MongoDB operation and have installed MongoDB (page 9). Where to get help If youre having trouble going through this tutorial, please post a message to mongodb-user or join the IRC chat in #mongodb on irc.freenode.net to chat with other MongoDB users who might be able to help.

Note: Django MongoDB Engine uses the a forked version of Django 1.3 that adds non-relational support.

24.1.2 Installation
Begin by installing packages required by later steps in this tutorial. Prerequisite This tutorial uses pip to install packages and virtualenv to isolate Python environments. While these tools and this conguration are not required as such, they ensure a standard environment and are strongly recommended. Issue the following command at the system prompt:
pip install virtualenv virtualenv myproject

317

MongoDB Documentation, Release 2.0.6

Respectively, these commands: install the virtualenv program (using pip) and create a isolated python environment for this project (named myproject.) To activate myproject environment at the system prompt, use the following command:
source myproject/bin/activate

Installing Packages Django MongoDB Engine directly depends on: Django-nonrel, a fork of Django 1.3 that adds support for non-relational databases djangotoolbox, a bunch of utilities for non-relational Django applications and backends Install by issuing the following commands:
pip install https://bitbucket.org/wkornewald/django-nonrel/get/tip.tar.gz pip install https://bitbucket.org/wkornewald/djangotoolbox/get/tip.tar.gz pip install https://github.com/django-nonrel/mongodb-engine/tarball/master

Continue with the tutorial to begin building the tumblelog application.

24.1.3 Build a Blog to Get Started


In this tutorial you will build a basic blog as the foundation of this application and use this as the basis of your tumblelog application. You will add the rst post using the shell and then later use the Django administrative interface. Call the startproject command, as with other Django projects, to get started and create the basic project skeleton:
django-admin.py startproject tumblelog

Conguring Django Congure the database in the tumblelog/settings.py le:


DATABASES = { default: { ENGINE: django_mongodb_engine, NAME: my_tumble_log } }

See Also: The Django MongoDB Engine Settings documentation for more conguration options. Dene the Schema The rst step in writing a tumblelog in Django is to dene the models or in MongoDBs terminology documents. In this application, you will dene posts and comments, so that each Post can contain a list of Comments. Edit the tumblelog/models.py le so it resembles the following:

318

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

from django.db import models from django.core.urlresolvers import reverse from djangotoolbox.fields import ListField, EmbeddedModelField

class Post(models.Model): created_at = models.DateTimeField(auto_now_add=True, db_index=True) title = models.CharField(max_length=255) slug = models.SlugField() body = models.TextField() comments = ListField(EmbeddedModelField(Comment), editable=False) def get_absolute_url(self): return reverse(post, kwargs={"slug": self.slug}) def __unicode__(self): return self.title class Meta: ordering = ["-created_at"]

class Comment(models.Model): created_at = models.DateTimeField(auto_now_add=True) body = models.TextField(verbose_name="Comment") author = models.CharField(verbose_name="Name", max_length=255)

The Django nonrel code looks the same as vanilla Django however, there is no built in support for some of MongoDBs native data types like Lists and Embedded data. djangotoolbox handles these denitions. See Also: The Django MongoDB Engine elds documentation for more. The models declare an index to Post. One for the created_at date as our frontpage will order by date: there is no need to add db_index on SlugField because there is a default index on SlugField. Add Data with the Shell The manage.py provides a shell interface for the application that you can use to insert data into the tumblelog. Begin by issuing the following command to load the Python shell:
python manage.py shell

Create the rst post using the following sequence of operations:


>>> >>> ... ... ... ... >>> from tumblelog.models import * post = Post( title="Hello World!", slug="hello-world", body = "Welcome to my new shiny Tumble log powered by MongoDB and Django-MongoDB!" ) post.save()

Add comments using the following sequence of operations:


>>> post.comments []

24.1. Write a Tumblelog Application with Django MongoDB Engine

319

MongoDB Documentation, Release 2.0.6

>>> ... ... >>> >>>

comment = Comment( author="Joe Bloggs", body="Great post! Im looking forward to reading your blog") post.comments.append(comment) post.save()

Finally, inspect the post:


>>> post = Post.objects.get() >>> post <Post: Hello World!> >>> post.comments [<Comment: Comment object>]

Add the Views Because django-mongodb provides tight integration with Django you can use generic views to display the frontpage and post pages for the tumblelog. Insert the following content into the urls.py le to add the views:
from django.conf.urls.defaults import patterns, include, url from django.views.generic import ListView, DetailView from tumblelog.models import Post urlpatterns = patterns(, url(r^$, ListView.as_view( queryset=Post.objects.all(), context_object_name="posts_list"), name="home" ), url(r^post/(?P<slug>[a-zA-Z0-9-]+)/$, PostDetailView.as_view( queryset=Post.objects.all(), context_object_name="post"), name="post" ), )

Add Templates In the tumblelog directory add the following directories templates and templates/tumblelog for storing the tumblelog templates:
mkdir -p templates/tumblelog

Congure Django so it can nd the templates by updating TEMPLATE_DIRS in the settings.py le to the following:
import os.path TEMPLATE_DIRS = ( os.path.join(os.path.realpath(__file__), ../templates), )

Then add a base template that all others can inherit from. Add the following to templates/base.html:
<!DOCTYPE html> <html lang="en"> <head>

320

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

<meta charset="utf-8"> <title>My Tumblelog</title> <link href="http://twitter.github.com/bootstrap/1.4.0/bootstrap.css" rel="stylesheet"> <style>.content {padding-top: 80px;}</style> </head> <body>

<div class="topbar"> <div class="fill"> <div class="container"> <h1><a href="/" class="brand">My Tumblelog</a>! <small>Starring MongoDB and Django-MongoDB. </div> </div> </div> <div class="container"> <div class="content"> {% block page_header %}{% endblock %} {% block content %}{% endblock %} </div> </div> </body> </html>

Create the frontpage for the blog, which should list all the posts. templates/tumblelog/post_list.html:
{% extends "base.html" %}

Add the following template to the

{% block content %} {% for post in posts_list %} <h2><a href="{% url post slug=post.slug %}">{{ post.title }}</a></h2> <p>{{ post.body|truncatewords:20 }}</p> <p> {{ post.created_at }} | {% with total=post.comments|length %} {{ total }} comment{{ total|pluralize }} {% endwith %} </p> {% endfor %} {% endblock %}

Finally, add templates/tumblelog/post_detail.html for the individual posts:


{% extends "base.html" %} {% block page_header %} <div class="page-header"> <h1>{{ post.title }}</h1> </div> {% endblock %} {% block content %} <p>{{ post.body }}<p> <p>{{ post.created_at }}</p> <hr> <h2>Comments</h2>

24.1. Write a Tumblelog Application with Django MongoDB Engine

321

MongoDB Documentation, Release 2.0.6

{% if post.comments %} {% for comment in post.comments %} <p>{{ comment.body }}</p> <p><strong>{{ comment.author }}</strong> <small>on {{ comment.created_at }}</small></p> {{ comment.text }} {% endfor %} {% endif %} {% endblock %}

Run python manage.py runserver to see your new tumblelog! Go to http://localhost:8000/ and you should see:

24.1.4 Add Comments to the Blog


In the next step you will provide the facility for readers of the tumblelog to comment on posts. This a requires custom form and view to handle the form, and data. You will also update the template to include the form. Creat the Comments Form You must customize form handling to deal with embedded comments. By extending ModelForm, it is possible to append the comment to the post on save. Create and add the following to forms.py:
from django.forms import ModelForm from tumblelog.models import Comment

class CommentForm(ModelForm): def __init__(self, object, *args, **kwargs): """Override the default to store the original document that comments are embedded in. """ self.object = object return super(CommentForm, self).__init__(*args, **kwargs) def save(self, *args): """Append to the comments list and save the post""" self.object.comments.append(self.instance) self.object.save() return self.object

322

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

class Meta: model = Comment

Handle Comments in the View You must extend the generic views need to handle the form logic. Add the following to the views.py le:
from django.http import HttpResponseRedirect from django.views.generic import DetailView from tumblelog.forms import CommentForm

class PostDetailView(DetailView): methods = [get, post] def get(self, request, *args, **kwargs): self.object = self.get_object() form = CommentForm(object=self.object) context = self.get_context_data(object=self.object, form=form) return self.render_to_response(context) def post(self, request, *args, **kwargs): self.object = self.get_object() form = CommentForm(object=self.object, data=request.POST) if form.is_valid(): form.save() return HttpResponseRedirect(self.object.get_absolute_url()) context = self.get_context_data(object=self.object, form=form) return self.render_to_response(context)

Note: PostDetailView extends the DetailView so that it can handle GET and POST requests. On POST, post() validates the comment: if valid, post() appends the comment to the post. Dont forget to update the urls.py le and import the PostDetailView class to replace the DetailView class. Add Comments to the Templates Finally, you can add the form to the templates, so that readers can create comments. Splitting the template for the forms out into templates/_forms.html will allow maximum reuse of forms code:
<fieldset> {% for field in form.visible_fields %} <div class="clearfix {% if field.errors %}error{% endif %}"> {{ field.label_tag }} <div class="input"> {{ field }} {% if field.errors or field.help_text %} <span class="help-inline"> {% if field.errors %} {{ field.errors|join: }} {% else %} {{ field.help_text }} {% endif %}

24.1. Write a Tumblelog Application with Django MongoDB Engine

323

MongoDB Documentation, Release 2.0.6

</span> {% endif %} </div> </div> {% endfor %} {% csrf_token %} <div style="display:none">{% for h in form.hidden_fields %} {{ h }}{% endfor %}</div> </fieldset>

After the comments section in post_detail.html add the following code to generate the comments form:
<h2>Add a comment</h2> <form action="." method="post"> {% include "_forms.html" %} <div class="actions"> <input type="submit" class="btn primary" value="comment"> </div> </form>

Your tumblelogs readers can now comment on your posts! Run python manage.py runserver to see the changes. Run python manage.py runserver and go to http://localhost:8000/hello-world/ to see the following:

324

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

24.1.5 Add Site Administration Interface


While you may always add posts using the shell interface as above, you can easily create an administrative interface for posts with Django. Enable the admin by adding the following apps to INSTALLED_APPS in settings.py. django.contrib.admin django_mongodb_engine djangotoolbox 24.1. Write a Tumblelog Application with Django MongoDB Engine 325

MongoDB Documentation, Release 2.0.6

tumblelog Warning: This application does not require the Sites framework. As a result, remove django.contrib.sites from INSTALLED_APPS. If you need it later please read SITE_ID issues document. Create a admin.py le and register the Post model with the admin app:
from django.contrib import admin from tumblelog.models import Post admin.site.register(Post)

Note: The above modications deviate from the default django-nonrel and djangotoolbox mode of operation. Djangos administration module will not work unless you exclude the comments eld. By making the comments eld non-editable in the admin model denition, you will allow the administrative interface to function. If you need an administrative interface for a ListField you must write your own Form / Widget. See Also: The Django Admin documentation docs for additional information. Update the urls.py to enable the administrative interface. Add the import and discovery mechanism to the top of the le and then add the admin import rule to the urlpatterns:
# Enable admin from django.contrib import admin admin.autodiscover() urlpatterns = patterns(, # ... url(r^admin/, include(admin.site.urls)), )

Finally, add a superuser and setup the indexes by issuing the following command at the system prompt:
python manage.py syncdb

Once done run the server and you can login to admin by going to http://localhost:8000/admin/.

326

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

24.1.6 Convert the Blog to a Tumblelog


Currently, the application only supports posts. In this section you will add special post types including: Video, Image and Quote to provide a more traditional tumblelog application. Adding this data requires no migration. In models.py update the Post class to add new elds for the new post types. Mark these elds with blank=True so that the elds can be empty. Update Post in the models.py les to resemble the following:
POST_CHOICES = ( (p, post), (v, video), (i, image), (q, quote), )

class Post(models.Model): created_at = models.DateTimeField(auto_now_add=True) title = models.CharField(max_length=255) slug = models.SlugField() comments = ListField(EmbeddedModelField(Comment), editable=False) post_type = models.CharField(max_length=1, choices=POST_CHOICES, default=p) body = models.TextField(blank=True, help_text="The body of the Post / Quote") embed_code = models.TextField(blank=True, help_text="The embed code for video") image_url = models.URLField(blank=True, help_text="Image src") author = models.CharField(blank=True, max_length=255, help_text="Author name")

24.1. Write a Tumblelog Application with Django MongoDB Engine

327

MongoDB Documentation, Release 2.0.6

def get_absolute_url(self): return reverse(post, kwargs={"slug": self.slug}) def __unicode__(self): return self.title

Note: Django-Nonrel doesnt support multi-table inheritance. This means that you will have to manually create an administrative form to handle data validation for the different post types. The Abstract Inheritance facility means that the view logic would need to merge data from multiple collections. The administrative interface should now handle adding multiple types of post. To conclude this process, you must update the frontend display to handle and output the different post types. In the post_list.html le, change the post output display to resemble the following:
{% if post.post_type == p %} <p>{{ post.body|truncatewords:20 }}</p> {% endif %} {% if post.post_type == v %} {{ post.embed_code|safe }} {% endif %} {% if post.post_type == i %} <p><img src="{{ post.image_url }}" /><p> {% endif %} {% if post.post_type == q %} <blockquote>{{ post.body|truncatewords:20 }}</blockquote> <p>{{ post.author }}</p> {% endif %}

In the post_detail.html le, change the output for full posts:


{% if post.post_type == p %} <p>{{ post.body }}<p> {% endif %} {% if post.post_type == v %} {{ post.embed_code|safe }} {% endif %} {% if post.post_type == i %} <p><img src="{{ post.image_url }}" /><p> {% endif %} {% if post.post_type == q %} <blockquote>{{ post.body }}</blockquote> <p>{{ post.author }}</p> {% endif %}

Now you have a fully edged tumbleblog using Django and MongoDB!

328

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

24.2 Write a Tumblelog Application with Flask and MongoEngine


24.2.1 Introduction
This tutorial describes the process for creating a basic tumblelog application using the popular Flask Python webframework in conjunction with the MongoDB database. The tumblelog will consist of two parts: 24.2. Write a Tumblelog Application with Flask and MongoEngine 329

MongoDB Documentation, Release 2.0.6

1. A public site that lets people view posts and comment on them. 2. An admin site that lets you add and change posts. This tutorial assumes that you are already familiar with Flask and have a basic familiarity with MongoDB and have installed MongoDB (page 9). This tutorial uses MongoEngine as the Object Document Mapper (ODM,) this component may simplify the interaction between Flask and MongoDB. Where to get help If youre having trouble going through this tutorial, please post a message to mongodb-user or join the IRC chat in #mongodb on irc.freenode.net to chat with other MongoDB users who might be able to help.

24.2.2 Installation
Begin by installing packages required by later steps in this tutorial. Prerequisite This tutorial uses pip to install packages and virtualenv to isolate Python environments. While these tools and this conguration are not required as such, they ensure a standard environment and are strongly recommended. Issue the following command at the system prompt:
pip install virtualenv virtualenv myproject

Respectively, these commands: install the virtualenv program (using pip) and create a isolated python environment for this project (named myproject.) To activate myproject environment at the system prompt, use the following command:
source myproject/bin/activate

Install Packages Flask is a microframework, because it provides a small core of functionality and is highly extensible. For the tumblelog project, this tutorial includes task and the following extension: WTForms provides easy form handling. Flask-MongoEngine provides integration between MongoEngine, Flask, and WTForms. Flask-Script for an easy to use development server Install with the following commands:
pip pip pip pip pip install install install install install flask flask-script WTForms mongoengine flask_mongoengine

Continue with the tutorial to begin building the tumblelog application.

330

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

24.2.3 Build a Blog to Get Started


First, create a simple bare bones application. Make a directory named tumblelog for the project and then, add the following content into a le named __init__.py:
from flask import Flask app = Flask(__name__)

if __name__ == __main__: app.run()

Next, create the manage.py le. a development server and shell:

Use this le to load additional Flask-scripts in the future. Flask-scripts provides

# Set the path import os, sys sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), ..))) from flask.ext.script import Manager, Server from tumblelog import app manager = Manager(app) # Turn on debugger by default and reloader manager.add_command("runserver", Server( use_debugger = True, use_reloader = True, host = 0.0.0.0) ) if __name__ == "__main__": manager.run()

You can run this application with a test server, by issuing the following command at the system prompt:
python manage.py runserver

There should be no errors, and you can visit http://localhost:5000/ in a web browser to view a page with a 404 message. Congure MongoEngine and Flask Install the Flask extension and add the conguration. Update tumblelog/__init__.py so that it resembles the following:
from flask import Flask from flask.ext.mongoengine import MongoEngine app = Flask(__name__) app.config["MONGODB_DB"] = "my_tumble_log" app.config["SECRET_KEY"] = "KeepThisS3cr3t" db = MongoEngine(app) if __name__ == __main__: app.run()
1

This concept will be familiar to users of Django.

24.2. Write a Tumblelog Application with Flask and MongoEngine

331

MongoDB Documentation, Release 2.0.6

See Also: The MongoEngine Settings documentation for additional conguration options. Dene the Schema The rst step in writing a tumblelog in Flask is to dene the models or in MongoDBs terminology documents. In this application, you will dene posts and comments, so that each Post can contain a list of Comments. Edit the models.py le so that it resembles the following:
import datetime from flask import url_for from tumblelog import db

class Post(db.Document): created_at = db.DateTimeField(default=datetime.datetime.now, required=True) title = db.StringField(max_length=255, required=True) slug = db.StringField(max_length=255, required=True) body = db.StringField(required=True) comments = db.ListField(db.EmbeddedDocumentField(Comment)) def get_absolute_url(self): return url_for(post, kwargs={"slug": self.slug}) def __unicode__(self): return self.title meta = { allow_inheritance: True, indexes: [-created_at, slug], ordering: [-created_at] }

class Comment(db.EmbeddedDocument): created_at = db.DateTimeField(default=datetime.datetime.now, required=True) body = db.StringField(verbose_name="Comment", required=True) author = db.StringField(verbose_name="Name", max_length=255, required=True)

As above, MongoEngine syntax is simple and declarative. If you have a Django background, the syntax may look familiar. This example denes indexes for Post: one for the created_at date as our frontpage will order by date and another for the individual post slug. Add with Data the Shell The manage.py provides a shell interface for the application that you can use to insert data into the tumblelog. Before conguring the urls and views for this application, you can use this interface to interact with your the tumblelog. Begin by issuing the following command to load the Python shell:
python manage.py shell

Create the rst post using the following sequence of operations:


>>> from tumblelog.models import * >>> post = Post(

332

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

... ... ... ... >>>

title="Hello World!", slug="hello-world", body="Welcome to my new shiny Tumble log powered by MongoDB, MongoEngine, and Flask" ) post.save()

Add comments using the following sequence of operations:


>>> [] >>> ... ... ... >>> >>> post.comments comment = Comment( author="Joe Bloggs", body="Great post! Im looking forward to reading your blog!" ) post.comments.append(comment) post.save()

Finally, inspect the post:


>>> post = Post.objects.get() >>> post <Post: Hello World!> >>> post.comments [<Comment: Comment object>]

Add the Views Using Flasks class-based views system allows you to produce List and Detail views for tumblelog posts. Add views.py and create a posts blueprint:
from flask import Blueprint, request, redirect, render_template, url_for from flask.views import MethodView from tumblelog.models import Post, Comment posts = Blueprint(posts, __name__, template_folder=templates)

class ListView(MethodView): def get(self): posts = Post.objects.all() return render_template(posts/list.html, posts=posts)

class DetailView(MethodView): def get(self, slug): post = Post.objects.get_or_404(slug=slug) return render_template(posts/detail.html, post=post)

# Register the urls posts.add_url_rule(/, view_func=ListView.as_view(list)) posts.add_url_rule(/<slug>/, view_func=DetailView.as_view(detail))

Now in __init__.py register the blueprint, avoiding a circular dependency by registering the blueprints in a method. Add the following code:

24.2. Write a Tumblelog Application with Flask and MongoEngine

333

MongoDB Documentation, Release 2.0.6

def register_blueprints(app): # Prevents circular imports from tumblelog.views import posts app.register_blueprint(posts) register_blueprints(app)

Add Templates In the tumblelog directory add the templates and templates/posts directories to store the tumblelog templates:
mkdir -p templates/posts

Create a base template. All other templates will inherit from this template, which should exist in the templates/base.html le:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <title>My Tumblelog</title> <link href="http://twitter.github.com/bootstrap/1.4.0/bootstrap.css" rel="stylesheet"> <style>.content {padding-top: 80px;}</style> </head> <body>

{%- block topbar -%} <div class="topbar"> <div class="fill"> <div class="container"> <h2> <a href="/" class="brand">My Tumblelog</a> <small>Starring Flask, MongoDB and MongoEngi </h2> </div> </div> </div> {%- endblock -%} <div class="container"> <div class="content"> {% block page_header %}{% endblock %} {% block content %}{% endblock %} </div> </div> {% block js_footer %}{% endblock %} </body> </html>

Continue by creating a landing page for the blog that will list all posts. templates/posts/list.html le:
{% extends "base.html" %} {% block content %} {% for post in posts %}

Add the following to the

334

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

<h2><a href="{{ url_for(posts.detail, slug=post.slug) }}">{{ post.title }}</a></h2> <p>{{ post.body|truncate(100) }}</p> <p> {{ post.created_at.strftime(%H:%M %Y-%m-%d) }} | {% with total=post.comments|length %} {{ total }} comment {%- if total > 1 %}s{%- endif -%} {% endwith %} </p> {% endfor %} {% endblock %}

Finally, add templates/posts/detail.html template for the individual posts:


{% extends "base.html" %} {% block page_header %} <div class="page-header"> <h1>{{ post.title }}</h1> </div> {% endblock %}

{% block content %} <p>{{ post.body }}<p> <p>{{ post.created_at.strftime(%H:%M %Y-%m-%d) }}</p> <hr> <h2>Comments</h2> {% if post.comments %} {% for comment in post.comments %} <p>{{ comment.body }}</p> <p><strong>{{ comment.author }}</strong> <small>on {{ comment.created_at.strftime(%H:%M %Y-%m {{ comment.text }} {% endfor %} {% endif %} {% endblock %}

At this point, you can run the python manage.py runserver command again to see your new tumblelog! Go to http://localhost:5000 to see something that resembles the following:

24.2.4 Add Comments to the Blog


In the next step you will provide the facility for readers of the tumblelog to comment on posts. To provide commenting, you will create a form using WTForms that will update the view to handle the form data and update the template to include the form.

24.2. Write a Tumblelog Application with Flask and MongoEngine

335

MongoDB Documentation, Release 2.0.6

Handle Comments in the View Begin by updating and refactoring the views.py le so that it can handle the form. Begin by adding the import statement and the DetailView class to this le:
from flask.ext.mongoengine.wtf import model_form ... class DetailView(MethodView): form = model_form(Comment, exclude=[created_at]) def get_context(self, slug): post = Post.objects.get_or_404(slug=slug) form = self.form(request.form) context = { "post": post, "form": form } return context def get(self, slug): context = self.get_context(slug) return render_template(posts/detail.html, **context) def post(self, slug): context = self.get_context(slug) form = context.get(form) if form.validate(): comment = Comment() form.populate_obj(comment) post = context.get(post) post.comments.append(comment) post.save() return redirect(url_for(posts.detail, slug=slug)) return render_template(posts/detail.html, **context)

Note: DetailView extends the default Flask MethodView. This code remains DRY by dening a get_context method to get the default context for both GET and POST requests. On POST, post() validates the comment: if valid, post() appends the comment to the post.

Add Comments to the Templates Finally, you can add the form to the templates, so that readers can create comments. Create a macro for the forms in templates/_forms.html will allow you to reuse the form code:
{% macro render(form) -%} <fieldset> {% for field in form %} {% if field.type in [CSRFTokenField, HiddenField] %}

336

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

{{ field() }} {% else %} <div class="clearfix {% if field.errors %}error{% endif %}"> {{ field.label }} <div class="input"> {% if field.name == "body" %} {{ field(rows=10, cols=40) }} {% else %} {{ field() }} {% endif %} {% if field.errors or field.help_text %} <span class="help-inline"> {% if field.errors %} {{ field.errors|join( ) }} {% else %} {{ field.help_text }} {% endif %} </span> {% endif %} </div> </div> {% endif %} {% endfor %} </fieldset> {% endmacro %}

Add the comments form to templates/posts/detail.html. Insert an import statement at the top of the page and then output the form after displaying comments:
{% import "_forms.html" as forms %} ... <hr> <h2>Add a comment</h2> <form action="." method="post"> {{ forms.render(form) }} <div class="actions"> <input type="submit" class="btn primary" value="comment"> </div> </form>

Your tumblelogs readers can now comment on your posts! Run python manage.py runserver to see the changes.

24.2. Write a Tumblelog Application with Flask and MongoEngine

337

MongoDB Documentation, Release 2.0.6

24.2.5 Add a Site Administration Interface


While you may always add posts using the shell interface as above, in this step you will add an administrative interface for the tumblelog site. To add the administrative interface you will add authentication and an additional view. This tutorial only addresses adding and editing posts: a delete view and detection of slug collisions are beyond the scope of this tutorial. 338 Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

Add Basic Authentication For the purposes of this tutorial all we need is a very basic form of authentication. The following example borrows from the an example Flask Auth snippet. Create the le auth.py with the following content:
from functools import wraps from flask import request, Response

def check_auth(username, password): """This function is called to check if a username / password combination is valid. """ return username == admin and password == secret

def authenticate(): """Sends a 401 response that enables basic auth""" return Response( Could not verify your access level for that URL.\n You have to login with proper credentials, 401, {WWW-Authenticate: Basic realm="Login Required"})

def requires_auth(f): @wraps(f) def decorated(*args, **kwargs): auth = request.authorization if not auth or not check_auth(auth.username, auth.password): return authenticate() return f(*args, **kwargs) return decorated

Note: This creates a requires_auth decorator: provides basic authentication. Decorate any view that needs authentication with this decorator. The username is admin and password is secret.

Write an Administrative View Create the views and admin blueprint in admin.py. The following view is deliberately generic, to facilitate customization.
from flask import Blueprint, request, redirect, render_template, url_for from flask.views import MethodView from flask.ext.mongoengine.wtf import model_form from tumblelog.auth import requires_auth from tumblelog.models import Post, Comment admin = Blueprint(admin, __name__, template_folder=templates)

class List(MethodView): decorators = [requires_auth] cls = Post

24.2. Write a Tumblelog Application with Flask and MongoEngine

339

MongoDB Documentation, Release 2.0.6

def get(self): posts = self.cls.objects.all() return render_template(admin/list.html, posts=posts)

class Detail(MethodView): decorators = [requires_auth] def get_context(self, slug=None): form_cls = model_form(Post, exclude=(created_at, comments)) if slug: post = Post.objects.get_or_404(slug=slug) if request.method == POST: form = form_cls(request.form, inital=post._data) else: form = form_cls(obj=post) else: post = Post() form = form_cls(request.form) context = { "post": post, "form": form, "create": slug is None } return context def get(self, slug): context = self.get_context(slug) return render_template(admin/detail.html, **context) def post(self, slug): context = self.get_context(slug) form = context.get(form) if form.validate(): post = context.get(post) form.populate_obj(post) post.save() return redirect(url_for(admin.index)) return render_template(admin/detail.html, **context)

# Register the urls admin.add_url_rule(/admin/, view_func=List.as_view(index)) admin.add_url_rule(/admin/create/, defaults={slug: None}, view_func=Detail.as_view(create)) admin.add_url_rule(/admin/<slug>/, view_func=Detail.as_view(edit))

Note: Here, the List and Detail views are similar to the frontend of the site; however, requires_auth decorates both views. The Detail view is slightly more complex: to set the context, this view checks for a slug and if there is no slug, Detail uses the view for creating a new post. If a slug exists, Detail uses the view for editing an existing post.

340

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

In the __init__.py le update the register_blueprints() method to import the new admin blueprint.
def register_blueprints(app): # Prevents circular imports from tumblelog.views import posts from tumblelog.admin import admin app.register_blueprint(posts) app.register_blueprint(admin)

Create Administrative Templates Similar to the user-facing portion of the site, the administrative section of the application requires three templates: a base template a list view, and a detail view. Create an admin directory for the templates. templates/admin/base.html le:
{% extends "base.html" %} {%- block topbar -%} <div class="topbar" data-dropdown="dropdown"> <div class="fill"> <div class="container"> <h2> <a href="{{ url_for(admin.index) }}" class="brand">My Tumblelog Admin</a> </h2> <ul class="nav secondary-nav"> <li class="menu"> <a href="{{ url_for("admin.create") }}" class="btn primary">Create new post</a> </li> </ul> </div> </div> </div> {%- endblock -%}

Add a simple main index page for the admin in the

List all the posts in the templates/admin/list.html le:


{% extends "admin/base.html" %} {% block content %} <table class="condensed-table zebra-striped"> <thead> <th>Title</th> <th>Created</th> <th>Actions</th> </thead> <tbody> {% for post in posts %} <tr> <th><a href="{{ url_for(admin.edit, slug=post.slug) }}">{{ post.title }}</a></th> <td>{{ post.created_at.strftime(%Y-%m-%d) }}</td> <td><a href="{{ url_for("admin.edit", slug=post.slug) }}" class="btn primary">Edit</a></td> </tr> {% endfor %} </tbody> </table> {% endblock %}

24.2. Write a Tumblelog Application with Flask and MongoEngine

341

MongoDB Documentation, Release 2.0.6

Add a temple to create and edit posts in the templates/admin/detail.html le:


{% extends "admin/base.html" %} {% import "_forms.html" as forms %} {% block content %} <h2> {% if create %} Add new Post {% else %} Edit Post {% endif %} </h2> <form action="?{{ request.query_string }}" method="post"> {{ forms.render(form) }} <div class="actions"> <input type="submit" class="btn primary" value="save"> <a href="{{ url_for("admin.index") }}" class="btn secondary">Cancel</a> </div> </form> {% endblock %}

The administrative interface is ready for use. Restart the test server (i.e. runserver) so that you can log in to the administrative interface located at http://localhost:5000/admin/. (The username is admin and the password is secret.)

24.2.6 Converting the Blog to a Tumblelog


Currently, the application only supports posts. In this section you will add special post types including: Video, Image and Quote to provide a more traditional tumblelog application. Adding this data requires no migration because MongoEngine supports document inheritance. Begin by refactoring the Post class to operate as a base class and create new classes for the new post types. Update the models.py le to include the code to replace the old Post class:
class Post(db.Dynamic Document): created_at = db.DateTimeField(default=datetime.datetime.now, required=True) title = db.StringField(max_length=255, required=True) slug = db.StringField(max_length=255, required=True) comments = db.ListField(db.EmbeddedDocumentField(Comment))

342

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

def get_absolute_url(self): return url_for(post, kwargs={"slug": self.slug}) def __unicode__(self): return self.title @property def post_type(self): return self.__class__.__name__ meta = { allow_inheritance: True, indexes: [-created_at, slug], ordering: [-created_at] }

class BlogPost(Post): body = db.StringField(required=True)

class Video(Post): embed_code = db.StringField(required=True)

class Image(Post): image_url = db.StringField(required=True, max_length=255)

class Quote(Post): body = db.StringField(required=True) author = db.StringField(verbose_name="Author Name", required=True, max_length=255)

Note: In the Post class the post_type helper returns the class name, which will make it possible to render the various different post types in the templates. As MongoEngine handles returning the correct classes when fetching Post objects you do not need to modify the interface view logic: only modify the templates. Update the templates/posts/list.html le and change the post output format as follows:
{% if post.body %} {% if post.post_type == Quote %} <blockquote>{{ post.body|truncate(100) }}</blockquote> <p>{{ post.author }}</p> {% else %} <p>{{ post.body|truncate(100) }}</p> {% endif %} {% endif %} {% if post.embed_code %} {{ post.embed_code|safe() }} {% endif %} {% if post.image_url %} <p><img src="{{ post.image_url }}" /><p> {% endif %}

In the templates/posts/detail.html change the output for full posts as follows:

24.2. Write a Tumblelog Application with Flask and MongoEngine

343

MongoDB Documentation, Release 2.0.6

{% if post.body %} {% if post.post_type == Quote %} <blockquote>{{ post.body }}</blockquote> <p>{{ post.author }}</p> {% else %} <p>{{ post.body }}</p> {% endif %} {% endif %} {% if post.embed_code %} {{ post.embed_code|safe() }} {% endif %} {% if post.image_url %} <p><img src="{{ post.image_url }}" /><p> {% endif %}

Updating the Administration In this section you will update the administrative interface to support the new post types. Begin by, updating the admin.py le to import the new document models and then update get_context() in the Detail class to dynamically create the correct model form to use:
from tumblelog.models import Post, BlogPost, Video, Image, Quote, Comment # ... class Detail(MethodView): decorators = [requires_auth] # Map post types to models class_map = { post: BlogPost, video: Video, image: Image, quote: Quote, } def get_context(self, slug=None): if slug: post = Post.objects.get_or_404(slug=slug) # Handle old posts types as well cls = post.__class__ if post.__class__ != Post else BlogPost form_cls = model_form(cls, exclude=(created_at, comments)) if request.method == POST: form = form_cls(request.form, inital=post._data) else: form = form_cls(obj=post) else: # Determine which post type we need cls = self.class_map.get(request.args.get(type, post)) post = cls() form_cls = model_form(cls, exclude=(created_at, comments)) form = form_cls(request.form) context = { "post": post, "form": form,

344

Chapter 24. Python Application Development

MongoDB Documentation, Release 2.0.6

"create": slug is None } return context # ...

Update the template/admin/base.html le to create a new post drop down menu in the toolbar:
{% extends "base.html" %} {%- block topbar -%} <div class="topbar" data-dropdown="dropdown"> <div class="fill"> <div class="container"> <h2> <a href="{{ url_for(admin.index) }}" class="brand">My Tumblelog Admin</a> </h2> <ul class="nav secondary-nav"> <li class="menu"> <a href="#" class="menu">Create new</a> <ul class="menu-dropdown"> {% for type in (post, video, image, quote) %} <li><a href="{{ url_for("admin.create", type=type) }}">{{ type|title }}</a></li> {% endfor %} </ul> </li> </ul> </div> </div> </div> {%- endblock -%} {% block js_footer %} <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script> <script src="http://twitter.github.com/bootstrap/1.4.0/bootstrap-dropdown.js"></script> {% endblock %}

Now you have a fully edged tumbleblog using Flask and MongoEngine!

24.2. Write a Tumblelog Application with Flask and MongoEngine

345

MongoDB Documentation, Release 2.0.6

346

Chapter 24. Python Application Development

Part XI

Frequently Asked Questions

347

CHAPTER

TWENTYFIVE

FAQ: MONGODB FUNDAMENTALS


This document answers basic questions about MongoDB. If you dont nd the answer youre looking for, check the complete list of FAQs (page 349) or post your question to the MongoDB User Mailing List. Frequently Asked Questions: What kind of Database is MongoDB? (page 349) What languages can I use to work with the MongoDB? (page 349) Does MongoDB support SQL? (page 350) What are typical uses for MongoDB? (page 350) Does MongoDB support transactions? (page 350) Does MongoDB require a lot of RAM? (page 350) How do I congure the cache size? (page 351) Are writes written to disk immediately, or lazily? (page 351) Does MongoDB handle caching? (page 351) What language is MongoDB written in? (page 351) What are the 32-bit limitations? (page 351)

25.1 What kind of Database is MongoDB?


MongoDB is document-oriented DBMS. Think of MySQL but with JSON -like objects comprising the data model, rather than RDBMS tables. Signicantly, MongoDB supports neither joins nor transactions. However, it features secondary indexes, an expressive query language, atomic writes on a per-document level, and fully-consistent reads. Operationally, MongoDB features master-slave replication with automated failover and built-in horizontal scaling via automated range-based partitioning. Note: MongoDB uses BSON , a binary object format similar to, but more expressive than, JSON .

25.2 What languages can I use to work with the MongoDB?


MongoDB client drivers exist for all of the most popular programming languages, and many of the less popular ones. See the latest list of drivers for details. See Also:

349

MongoDB Documentation, Release 2.0.6

Drivers (page 225).

25.3 Does MongoDB support SQL?


No. However, MongoDB does support a rich, ad-hoc query language of its own. See Also: The query /reference/operators document and the Query Overview and the Tour pages from the wiki.

25.4 What are typical uses for MongoDB?


MongoDB has a general-purpose design, making it appropriate for a large number of use cases. Examples include content management systems, mobile app, gaming, e-commerce, analytics, archiving, and logging. Do not use MongoDB for systems that require SQL, joins, and multi-object transactions.

25.5 Does MongoDB support transactions?


MongoDB does not provide ACID transactions. However, MongoDB does provide some basic transactional capabilities. Atomic operations are possible within the scope of a single document: that is, we can debit a and credit b as a transaction if they are elds within the same document. Because documents can be rich, some documents contain thousands of elds, with support for testing elds in sub-documents. Additionally, you can make writes in MongoDB durable (the D in ACID). To get durable writes, you must enable journaling, which is on by default in 64-bit builds. You must also issue writes with a write concern of {j: true} to ensure that the writes block until the journal has synced to disk. Users have built successful e-commerce systems using MongoDB, but application requiring multi-object commit with rollback generally arent feasible.

25.6 Does MongoDB require a lot of RAM?


Not necessarily. Its certainly possible to run MongoDB on a machine with a small amount of free RAM. MongoDB automatically uses all free memory on the machine as its cache. System resource monitors show that MongoDB uses a lot of memory, but its usage is dynamic. If another process suddenly needs half the servers RAM, MongoDB will yield cached memory to the other process. Technically, the operating systems virtual memory subsystem manages MongoDBs memory. This means that MongoDB will use as much free memory as it can, swapping to disk as needed. Deployments with enough memory to t the applications working data set in RAM will achieve the best performance.

350

Chapter 25. FAQ: MongoDB Fundamentals

MongoDB Documentation, Release 2.0.6

25.7 How do I congure the cache size?


MongoDB has no congurable cache. MongoDB uses all free memory on the system automatically by way of memorymapped les. Operating systems use the same approach with their le system caches.

25.8 Are writes written to disk immediately, or lazily?


Writes are physically written to the journal within 100 milliseconds. At that point, the write is durable in the sense that after a pull-plug-from-wall event, the data will still be recoverable after a hard restart. While the journal commit is nearly instant, MongoDB writes to the data les lazily. MongoDB may wait to write data to the data les for as much as one minute. This does not affect durability, as the journal has enough information to ensure crash recovery.

25.9 Does MongoDB handle caching?


Yes. MongoDB keeps all of the most recently used data in RAM. If you have created indexes for your queries and your working data set ts in RAM, MongoDB serves all queries from memory. MongoDB does not implement a query cache: MongoDB serves all queries directly from the indexes and/or data les.

25.10 What language is MongoDB written in?


MongoDB is implemented in C++. Drivers and client libraries are typically written in their respective languages, although some drivers use C extensions for better performance.

25.11 What are the 32-bit limitations?


MongoDB uses memory-mapped les. When running a 32-bit build of MongoDB, the total storage size for the server, including data and indexes, is 2 gigabytes. For this reason, do not deploy MongoDB to production on 32-bit machines. If youre running a 64-bit build of MongoDB, theres virtually no limit to storage size. For production deployments, 64-bit builds and operating systems are strongly recommended. See Also: Blog Post: 32-bit Limitations Note: 32-bit builds disable journaling by default because journaling further limits the maximum amount of data that the database can store.

25.7. How do I congure the cache size?

351

MongoDB Documentation, Release 2.0.6

352

Chapter 25. FAQ: MongoDB Fundamentals

CHAPTER

TWENTYSIX

FAQ: MONGODB FOR APPLICATION DEVELOPERS


This document answers common questions about application development using MongoDB. If you dont nd the answer youre looking for, check the complete list of FAQs (page 349) or post your question to the MongoDB User Mailing List. Frequently Asked Questions: What is a namespace? (page 353) How do you copy all objects from one collection to another? (page 354) If you remove a document, does MongoDB remove it from disk? (page 354) When does MongoDB write updates to disk? (page 354) How do I do transactions and locking in MongoDB? (page 354) How do you aggregate data with MongoDB? (page 355) Why does MongoDB log so many Connection Accepted events? (page 355) Does MongoDB run on Amazon EBS? (page 355) Why are MongoDBs data les so large? (page 355) How does MongoDB address SQL or Query injection? (page 355) BSON (page 355) JavaScript (page 356) Dollar Sign Operator Escaping (page 356) Driver-Specic Issues (page 357) How does MongoDB provide concurrency? (page 357) What is the compare order for BSON types? (page 357)

26.1 What is a namespace?


A namespace is the concatenation of the database name and the collection names with a period character in between. Collections are containers for documents. that share one or more indexes. Databases are groups of collections stored on disk in a single collection of data les. For an example acme.users namespace, acme is the database name and users is the collection name. Period characters can occur in collection names, so that the acme.user.history is a valid namespace, with the acme database name, and the user.history collection name.

353

MongoDB Documentation, Release 2.0.6

26.2 How do you copy all objects from one collection to another?
In the mongo shell, you can use the following operation to duplicate the entire collection:
db.people.find().forEach( function(x){db.user.insert(x)} );

Note: Because this process decodes BSON documents to JSON during the copy procedure, documents you may incur a loss of type-delity. Consider using mongodump and mongorestore to maintain type delity. Also consider the cloneCollection command that may provide some of this functionality.

26.3 If you remove a document, does MongoDB remove it from disk?


Yes. When you use db.collection.remove() (page 462), the object will no longer exist in MongoDBs on-disk data storage.

26.4 When does MongoDB write updates to disk?


MongoDB ushes writes to disk on a regular interval. In the default conguration, MongoDB writes data to the main data les on disk every 60 seconds and commits the journal every 100 milliseconds. These values are congurable with the journalCommitInterval (page 498) and syncdelay (page 500). These values represent the maximum amount of time between the completion of a write operation and the point when the write is durable in the journal, if enabled, and when MongoDB ushes data to the disk. In many cases MongoDB and the operating system ush data to disk more frequently, so that the above values resents a theoretical maximum. However, by default, MongoDB uses a lazy strategy to write to disk. This is advantageous in situations where the database receives a thousand increments to an object within one second, MongoDB only needs to ush this data to disk once. In addition to the aforementioned conguration options, you can also use fsync and getLastError to modify this strategy.

26.5 How do I do transactions and locking in MongoDB?


MongoDB does not have support for traditional locking or complex transactions with rollback. MongoDB aims to be lightweight, fast, and predictable in its performance. This is similar to the MySQL MyISAM autocommit model. By keeping transaction support extremely simple, MongoDB can provide greater performance especially for partitioned or replicated systems with a number of database server processes. MongoDB does have support for atomic operations within a single document. Given the possibilities provided by nested documents, this feature provides support for a large number of use-cases. See Also: The Atomic Operations wiki page.

354

Chapter 26. FAQ: MongoDB for Application Developers

MongoDB Documentation, Release 2.0.6

26.6 How do you aggregate data with MongoDB?


In version 2.1 and later, you can use the new aggregation framework (page 199), with the aggregate command. MongoDB also supports map-reduce with the mapReduce, as well as basic aggregation with the group, count, and distinct. commands. See Also: The Aggregation wiki page.

26.7 Why does MongoDB log so many Connection Accepted events?


If you see a very large number connection and re-connection messages in your MongoDB log, then clients are frequently connecting and disconnecting to the MongoDB server. This is normal behavior for applications that do not use request pooling, such as CGI. Consider using FastCGI, an Apache Module, or some other kind of persistent application server to decrease the connection overhead. If these connections do not impact your performance you can use the run-time quiet (page 495) option or the command-line option --quiet (page 485) to suppress these messages from the log.

26.8 Does MongoDB run on Amazon EBS?


Yes. MongoDB users of all sizes have had a great deal of success using MongoDB on the EC2 platform using EBS disks. See Also: The MongoDB on the Amazon Platform wiki page.

26.9 Why are MongoDBs data les so large?


MongoDB aggressively preallocates data les to reserve space and avoid le system fragmentation. You can use the smallfiles (page 500) ag to modify the le preallocation strategy. See Also: This wiki page that address MongoDB disk use.

26.10 How does MongoDB address SQL or Query injection?


26.10.1 BSON
As a client program assembles a query in MongoDB, it builds a BSON object, not a string. Thus traditional SQL injection attacks are not a problem. More details and some nuances are covered below. MongoDB represents queries as BSON objects. Typically client libraries (page 225) provide a convenient, injection free, process to build these objects. Consider the following C++ example:

26.6. How do you aggregate data with MongoDB?

355

MongoDB Documentation, Release 2.0.6

BSONObj my_query = BSON( "name" << a_name ); auto_ptr<DBClientCursor> cursor = c.query("tutorial.persons", my_query);

Here, my_query then will have a value such as { name : "Joe" }. If my_query contained special characters, for example ,, :, and {, the query simply wouldnt match any documents. For example, users cannot hijack a query and convert it to a delete.

26.10.2 JavaScript
All of the following MongoDB operations permit you to run arbitrary JavaScript expressions directly on the server:$where: $where db.eval() (page 466) mapReduce group You must exercise care in these cases to prevent users from submitting malicious JavaScript. Fortunately, you can express most queries in MongoDB without JavaScript and for queries that require JavaScript, you can mix JavaScript and non-JavaScript in a single query. Place all the user-supplied elds directly in a BSON eld and pass JavaScript code to the $where eld. If you need to pass user-supplied values in a $where clause, you may escape these values with the CodeWScope mechanism. When you set user-submitted values as variables in the scope document, you can avoid evaluating them on the database server. If you need to use db.eval() (page 466) with user supplied values, you can either use a CodeWScope or you can supply extra arguments to your function. For instance:
db.eval(function(userVal){...}, user_value);

This will ensure that your application sends user_value to the database server as data rather than code.

26.10.3 Dollar Sign Operator Escaping


Field names in MongoDBs query language have a semantic. The dollar sign (i.e $) is a reserved character used to represent operators (i.e. $inc.) Thus, you should ensure that your applications users cannot inject operators into their inputs. In some cases, you may wish to build a BSON object with a user-provided key. In these situations, keys will need to substitute the reserved $ and . characters. Any character is sufcient, but consider using the Unicode full width equivalents: U+FF04 (i.e. $) and U+FFOE (i.e. .). Consider the following example:
BSONObj my_object = BSON( a_key << a_name );

The user may have supplied a $ value in the a_key value. At the same time, my_object might be { $where : "things" }. Consider the following cases: Insert. Inserting this into the database does no harm. The insert process does not evaluate the object as a query. Note: MongoDB client drivers, if properly implemented, check for reserved characters in keys on inserts.

356

Chapter 26. FAQ: MongoDB for Application Developers

MongoDB Documentation, Release 2.0.6

Update. The db.collection.update() (page 464) operation permits $ operators in the update argument but does not support the $where operator. Still, some users may be able to inject operators that can manipulate a single document only. Therefore your application should escape keys, as mentioned above, if reserved characters are possible. Query Generally this is not a problem for queries that resemble { x : user_obj }: dollar signs are not top level and have no effect. Theoretically it may be possible for the user to build a query themselves. But checking the user-submitted content for $ characters in key names may help protect against this kind of injection.

26.10.4 Driver-Specic Issues


See the PHP MongoDB Driver Security Notes page in the PHP driver documentation for more information

26.11 How does MongoDB provide concurrency?


MongoDB implements a server-wide reader-writer lock. This means that at any one time, only one client may be writing or any number of clients may be reading, but that reading and writing cannot occur simultaneously. In standalone and replica sets the locks scope applies to a single mongod instance or primary instance. In a shard cluster, locks apply to each individual shard, not to the whole cluster. A more granular approach to locking will appear in MongoDB v2.2. For now, several yielding optimizations exist to mitigate the coarseness of the lock. These include: Yielding on long operations. Queries and updates that operate on multiple document may yield to writers Yielding on page faults. If an update or query is likely to trigger a page fault, then the operation will yield to keep from blocking other clients for the duration of the page fault.

26.12 What is the compare order for BSON types?


MongoDB permits documents within a single collection to have elds with different BSON types. For instance, the following documents may exist within a single collection.
{ x: "string" } { x: 42 }

When comparing values of different BSON types, MongoDB uses the following compare order: Null Numbers (ints, longs, doubles) Symbol, String Object Array BinData ObjectID Boolean Date, Timestamp

26.11. How does MongoDB provide concurrency?

357

MongoDB Documentation, Release 2.0.6

Regular Expression Note: MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion before comparison. Consider the following mongo example:
db.test.insert({x:3}); db.test.insert( {x : 2.9} ); db.test.insert( {x : new Date()} ); db.test.insert( {x : true } ); db.test.find().sort({x:1}); { "_id" : ObjectId("4b03155dce8de6586fb002c7"), { "_id" : ObjectId("4b03154cce8de6586fb002c6"), { "_id" : ObjectId("4b031566ce8de6586fb002c9"), { "_id" : ObjectId("4b031563ce8de6586fb002c8"),

"x" "x" "x" "x"

: : : :

2.9 } 3 } true } "Tue Nov 17 2009 16:28:03 GMT-0500 (EST)" }

Warning: Mixing types for the same eld is not encouraged. The $type operator provides access to BSON type comparison in the MongoDB query syntax. See the documentation on BSON types and the $type operator for additional information. Warning: Storing values of the different types in the same eld in a collection is strongly discouraged. See Also: The Tailable Cursors wiki page for an example of a C++ use of MinKey. The jsobj.h source le for the denition of MinKey and MaxKey.

358

Chapter 26. FAQ: MongoDB for Application Developers

CHAPTER

TWENTYSEVEN

FAQ: SHARDING WITH MONGODB


This document answers common questions about horizontal scaling using MongoDBs sharding. If you dont nd the answer youre looking for, check the sharding docs or post your question to the MongoDB User Mailing List. Frequently Asked Questions: Is sharding appropriate for a new deployment? (page 359) How does sharding work with replication? (page 360) What happens to unsharded collections in sharded databases? (page 360) How does MongoDB distribute data across shards? (page 360) What happens if a client updates a document in a chunk during a migration? (page 360) What happens to queries if a shard is inaccessible or slow? (page 360) How does MongoDB distribute queries among shards? (page 360) How does MongoDB sort queries in sharded environments? (page 361) How does MongoDB ensure unique _id eld values when using a shard key other than _id? (page 361) Ive enabled sharding and added a second shard, but all the data is still on one server. Why? (page 361) Is it safe to remove old les in the moveChunk directory? (page 361) How many connections does each mongos need? (page 362) Why does mongos hold connections? (page 362) Where does MongoDB report on connections used by mongos? (page 362) What does writebacklisten in the log mean? (page 362) How should administrators deal with failed migrations? (page 362) What is the process for moving, renaming, or changing the number of cong servers? (page 362) When do the mongos servers detect cong server changes? (page 363) Is it possible to quickly update mongos servers after updating a replica set conguration? (page 363) What does the maxConns setting on mongos do? (page 363) How do indexes impact queries in sharded systems? (page 363) Can shard keys be randomly generated? (page 363) Can shard keys have a non-uniform distribution of values? (page 363) Can you shard on the _id eld? (page 364) Can shard key be in ascending order, like dates or timestamps? (page 364) What do moveChunk commit failed errors mean? (page 364)

27.1 Is sharding appropriate for a new deployment?


Sometimes.

359

MongoDB Documentation, Release 2.0.6

If your data set ts on a single servers, you should begin with an unsharded deployment. Converting an unsharded database to a shard cluster is easy and seamless, so there is little advantage in conguring sharding while your data set is small. Still, all production deployments should use replica sets to provide high availability and disaster recovery.

27.2 How does sharding work with replication?


To use replication with sharding, deploy each shard as a replica set.

27.3 What happens to unsharded collections in sharded databases?


In the current implementation, all databases in a shard cluster have a primary shard. All unsharded collection within that database will reside on the same shard.

27.4 How does MongoDB distribute data across shards?


Sharding must be specically enabled on a collection. After enabling sharding on the collection, MongoDB will assign various ranges of collection data to the different shards in the cluster. The cluster automatically corrects imbalances between shards by migrating ranges of data from one shard to another.

27.5 What happens if a client updates a document in a chunk during a migration?


The mongos routes the operation to the old shard, where it will succeed immediately. Then the shard mongod instances will replicate the modication to the new shard before the shard cluster updates that chunks ownership, which effectively nalizes the migration process.

27.6 What happens to queries if a shard is inaccessible or slow?


If a shard is inaccessible or unavailable, queries will return with an error. However, a client may set the partial query bit, which will then return results from all available shards, regardless of whether a given shard is unavailable. If a shard is responding slowly, mongos will merely wait for the shard to return results.

27.7 How does MongoDB distribute queries among shards?


Changed in version 2.0. The exact method for distributing queries among a shard cluster depends on the nature of the query and the conguration of the shard cluster. Consider a sharded collection, using the shard key user_id, that has last_login and email attributes:

360

Chapter 27. FAQ: Sharding with MongoDB

MongoDB Documentation, Release 2.0.6

For a query that selects one or more values for the user_id key: mongos determines which shard or shards contains the relevant data, based on the cluster metadata, and directs a query to the required shard or shards, and returns those results to the client. For a query that selects user_id and also performs a sort: mongos can make a straightforward translation of this operation into a number of queries against the relevant shards, ordered by user_id. When the sorted queries return from all shards, the mongos merges the sorted results and returns the complete result to the client. For queries that select on last_login: These queries must run on all shards: mongos must parallelize the query over the shards and perform a mergesort on the email of the documents found.

27.8 How does MongoDB sort queries in sharded environments?


If you call the cursor.sort() (page 452) method on a query in a sharded environment, the mongod for each shard will sort its results, and the mongos merges each shards results before returning them to the client.

27.9 How does MongoDB ensure unique _id eld values when using a shard key other than _id?
If you do not use _id as the shard key, then your application/client layer must be responsible for keeping the _id eld unique. It is problematic for collections to have duplicate _id values. If youre not sharding your collection by the _id eld, then you should be sure to store a globally unique identier in that eld. The default BSON ObjectID works well in this case.

27.10 Ive enabled sharding and added a second shard, but all the data is still on one server. Why?
First, ensure that youve declared a shard key for your collection. Until you have congured the shard key, MongoDB will not create chunks, and sharding will not occur. Next, keep in mind that the default chunk size is 64 MB. As a result, in most situations, the collection needs at least 64 MB before a migration will occur. Additionally, the system which balances chunks among the servers attempts to avoid superuous migrations. Depending on the number of shards, your shard key, and the amount of data, systems often require at least 10 chunks of data to trigger migrations. You can run db.printShardingStatus() (page 470) to see all the chunks present in your cluster.

27.11 Is it safe to remove old les in the moveChunk directory?


Yes. mongod creates these les as backups during normal shard balancing operations. Once these migrations are complete, you may delete these les.

27.8. How does MongoDB sort queries in sharded environments?

361

MongoDB Documentation, Release 2.0.6

27.12 How many connections does each mongos need?


Typically, mongos uses one connection from each client, as well as one outgoing connection to each shard, or each member of the replica set that backs each shard. If youve enabled the slaveOk bit, then the mongos may create two or more connections per replica set.

27.13 Why does mongos hold connections?


mongos uses a set of connection pools to communicate with each shard. These pools do not shrink when the number of clients decreases. This can lead to an unused mongos with a large number open of connections. If the mongos is no longer in use, youre safe restarting the process to close existing connections.

27.14 Where does MongoDB report on connections used by mongos?


Connect to the mongos with the mongo shell, and run the following command:
db._adminCommand("connPoolStats");

27.15 What does writebacklisten in the log mean?


The writeback listener is a process that opens a long poll to detect non-safe writes sent to a server and to send them back to the correct server if necessary. These messages are a key part of the sharding infrastructure and should not cause concern.

27.16 How should administrators deal with failed migrations?


Failed migrations require no administrative intervention. Chunk moves are consistent and deterministic. If a migration fails to complete for some reason, the shard cluster will retry. When the migration completes successfully, the data will reside only on the new shard.

27.17 What is the process for moving, renaming, or changing the number of cong servers?
See Also: The wiki page that describes this process: Changing Conguration Servers.

362

Chapter 27. FAQ: Sharding with MongoDB

MongoDB Documentation, Release 2.0.6

27.18 When do the mongos servers detect cong server changes?


mongos instances maintain a cache of the cong database that holds the metadata for the shard cluster. This metadata includes the mapping of chunks to shards. mongos updates its cache lazily by issuing a request to a shard and discovering that its metadata is out of date. There is no way to control this behavior from the client, but you can run the flushRouterConfig command against any mongos to force it to refresh its cache.

27.19 Is it possible to quickly update mongos servers after updating a replica set conguration?
The mongos instances will detect these changes without intervention over time. However, if you want to force the mongos to reload its conguration, run the flushRouterConfig command against to each mongos directly.

27.20 What does the maxConns setting on mongos do?


The maxConns (page 495) option limits the number of connections accepted by mongos. If your client driver or application creates a large number of connections but allows them to time out rather than closing them explicitly, then it might make sense to limit the number of connections at the mongos layer. Set maxConns (page 495) to a value slightly higher than the maximum number of connections that the client creates, or the maximum size of the connection pool. This setting prevents the mongos from causing connection spikes on the individual shards. Spikes like these may disrupt the operation and memory allocation of the shard cluster.

27.21 How do indexes impact queries in sharded systems?


If the query does not include the shard key, the mongos must send the query to all shards as a scatter/gather operation. Each shard will, in turn, use either the shard key index or another more efcient index to fulll the query. If the query includes multiple sub-expressions that reference the elds indexed by the shard key and the secondary index, the mongos can route the queries to a specic shard and the shard will use the index that will allow it to fulll most efciently. See this document for more information.

27.22 Can shard keys be randomly generated?


Shard keys can be random. Random keys ensure optimal distribution of data across the cluster. Shard clusters, attempt to route queries to specic shards when queries include the shard key as a parameter, because these directed queries are more efcient. In many cases, random keys can make it difcult to direct queries to specic shards.

27.23 Can shard keys have a non-uniform distribution of values?


Yes. There is no requirement that documents be evenly distributed by the shard key.

27.18. When do the mongos servers detect cong server changes?

363

MongoDB Documentation, Release 2.0.6

However, documents that have the shard key must reside in the same chunk and therefore on the same server. If your sharded data set has too many documents with the exact same shard key you will not be able to distribute those documents across your shard cluster.

27.24 Can you shard on the _id eld?


You can use any eld for the shard key. The _id eld is a common shard key. Be aware that ObjectId() values, which are the default value of the _id eld, increment as a timestamp. As a result, when used as a shard key, all new documents inserted into the collection will initially belong to the same chunk on a single shard. Although the system will eventually divide this chunk and migrate its contents to distribute data more evenly, at any moment the cluster can only direct insert operations at a single shard. This can limit the throughput of inserts. If most of your write operations are updates or read operations rather than inserts, this limitation should not impact your performance. However, if you have a high insert volume, this may be a limitation.

27.25 Can shard key be in ascending order, like dates or timestamps?


If you insert documents with monotonically increasing shard keys, all inserts will initially belong to the same chunk on a single shard. Although the system will eventually divide this chunk and migrate its contents to distribute data more evenly, at any moment the cluster can only direct insert operations at a single shard. This can limit the throughput of inserts. If most of your write operations are updates or read operations rather than inserts, this limitation should not impact your performance. However, if you have a high insert volume, a monotonically increasing shard key may be a limitation. To address this issue, you can use a eld with a value that stores the hash of a key with an ascending value. While you can compute a hashed value in your application and include this value in your documents for use as a shard key, the SERVER-2001 issue will implement this capability within MongoDB.

27.26 What do moveChunk commit failed errors mean?


Consider the following error message:
ERROR: moveChunk commit failed: version is at <n>|<nn> instead of <N>|<NN>" and "ERROR: TERMINATING"

mongod procudes this message if, during a chunk migration (page 121), the shard could not connect to the cong database to update chunk information at the end of the migration process. If the shard cannot update the cong database after moveChunk, the shard cluster will have an inconsistent view of all chunks. In these situations, the primary member of the shard will terminate itself to prevent data inconsistency. If the secondary member can access the cong database, the shards data will be accessible after an election. Administrators will need to resolve the chunk migration failure independently. If you encounter this issue, contact the MongoDB User Group or 10gen support to address this issue.

364

Chapter 27. FAQ: Sharding with MongoDB

CHAPTER

TWENTYEIGHT

FAQ: REPLICA SETS AND REPLICATION IN MONGODB


This document answers common questions about database replication in MongoDB. If you dont nd the answer youre looking for, check the replication index (page 33) or post your question to the MongoDB User Mailing List. Frequently Asked Questions: What kinds of replication does MongoDB support? (page 365) What do the terms primary and master mean? (page 365) What do the terms secondary and slave mean? (page 366) How long does replica set failover take? (page 366) Does replication work over the Internet and WAN connections? (page 366) Can MongoDB replicate over a noisy connection? (page 366) What is the preferred replication method: master/slave or replica sets? (page 367) What is the preferred replication method: replica sets or replica pairs? (page 367) Why use journaling if replication already provides data redundancy? (page 367) Are write operations durable without getLastError? (page 367) How many arbiters do replica sets need? (page 367) What information do arbiters exchange with the rest of the replica set? (page 368) Which members of a replica set vote in elections? (page 368) Do hidden members vote in replica set elections? (page 368)

28.1 What kinds of replication does MongoDB support?


MongoDB supports master-slave replication and a variation on master-slave replication known as replica sets. Replica sets are the recommended replication topology.

28.2 What do the terms primary and master mean?


Primary and master nodes are the nodes that can accept writes. MongoDBs replication is single-master: only one node can accept write operations at a time. In a replica set, if a the current primary node fails or becomes inaccessible, the other members can autonomously elect one of the other members of the set to be the new primary.

365

MongoDB Documentation, Release 2.0.6

By default, clients send all reads to the primary; however, read preference is congurable at the client level on a per-connection basis, which makes it possible to send reads to secondary nodes instead.

28.3 What do the terms secondary and slave mean?


Secondary and slave nodes are read-only nodes that replicate from the primary. Replication operates by way of an oplog, from which secondary/slave members apply new operations to themselves. This replication process is asynchronous, so secondary/slave nodes may not always reect the latest writes to the primary. But usually, the gap between the primary and secondary nodes is just few milliseconds on a local network connection.

28.4 How long does replica set failover take?


It varies, but a replica set will select a new primary within a minute. It may take 10-30 seconds for the members of a replica set to declare a primary inaccessible. This triggers an election. During the election, the cluster is unavailable for writes. The election itself may take another 10-30 seconds. Note: Eventually consistent reads, like the ones that will return from a replica set are only possible with a write concern that permits reads from secondary members.

28.5 Does replication work over the Internet and WAN connections?
Yes. For example, a deployment may maintain a primary and secondary in an East-coast data center along with a secondary member for disaster recovery in a West-coast data center. See Also: Deploy a Geographically Distributed Replica Set (page 66)

28.6 Can MongoDB replicate over a noisy connection?


Yes, but not without connection failures and the obvious latency. Members of the set will attempt to reconnect to the other members of the set in response to networking aps. This does not require administrator intervention. However, if the network connections between the nodes in the replica set are very slow, it might not be possible for the members of the node to keep up with the replication. If the TCP connection between the secondaries and the primary instance breaks, a replica set the set will automatically elect one of the secondary members of the set as primary.

366

Chapter 28. FAQ: Replica Sets and Replication in MongoDB

MongoDB Documentation, Release 2.0.6

28.7 What is the preferred replication method: replica sets?

master/slave or

New in version 1.8. Replica sets are the preferred replication mechanism in MongoDB. However, if your deployment requires more than 12 nodes, you must use master/slave replication.

28.8 What is the preferred replication method: replica sets or replica pairs?
Deprecated since version 1.6. Replica sets replaced replica pairs in version 1.6. Replica sets are the preferred replication mechanism in MongoDB.

28.9 Why use journaling if replication already provides data redundancy?


Journaling facilitates faster crash recovery. Prior to journaling, crashes often required database repairs or full data resync. Both were slow, and the rst was unreliable. Journaling is particularly useful for protection against power failures, especially if your replica set resides in a single data center or power circuit. When a replica set runs with journaling, mongod instances can safely restart without any administrator intervention. Note: Journaling requires some resource overhead for write operations. Journaling has no effect on read performance, however. Journaling is enabled by default on all 64-bit builds of MongoDB v2.0 and greater.

28.10 Are write operations durable without getLastError?


Yes. However, if you want conrmation that a given write has arrived safely at the server, you must also run the getLastError command after each write. If you enable your drivers write concern, or safe mode, the driver will automatically send getLastError this command. If you want to guarantee that a given write syncs to the journal, you must pass the {j: true} option getLastError (or specify it as part of the write concern).

28.11 How many arbiters do replica sets need?


Some congurations do not require any arbiter instances. Arbiters vote in elections for primary but do not replicate the data like secondary members. Replica sets require a majority of the original nodes present to elect a primary. Arbiters allow you to construct this majority without the overhead of adding replicating nodes to the system. There are many possible replica set architectures (page 47). If you have a three node replica set, you dont need an arbiter. 28.7. What is the preferred replication method: master/slave or replica sets? 367

MongoDB Documentation, Release 2.0.6

But a common conguration consists of two replicating nodes, one of which is primary and the other is secondary, as well as an arbiter for the third node. This conguration makes it possible for the set to elect a primary in the event of a failure without requiring three replicating nodes. You may also consider adding an arbiter to a set if it has an equal number of nodes in two facilities and network partitions between the facilities are possible. In these cases, the arbiter will break the tie between the two facilities and allow the set to elect a new primary. See Also: Replication Architectures (page 47)

28.12 What information do arbiters exchange with the rest of the replica set?
Arbiters never receive the contents of a collection but do exchange the following data with the rest of the replica set: Credentials used to authenticate the arbiter with the replica set. All MongoDB processes within a replica set use keyles. These exchanges are encrypted. Replica set conguration data and voting data. This information is not encrypted. Only credential exchanges are encrypted. If your MongoDB deployment uses SSL, then all communications between arbiters and the other members of the replica set are secure. See the documentation for Using MongoDB with SSL Connections (page 143) for more information. Run all arbiters on secure networks, as with all MongoDB components. See Also: The overview of Arbiter Members of Replica Sets (page 41).

28.13 Which members of a replica set vote in elections?


All members of a replica set, unless the value of votes (page 563) is equal to 0, vote in elections. This includes all delayed (page 40), hidden (page 40) and secondary-only (page 39) members, as well as the arbiters (page 41). See Also: Elections (page 34)

28.14 Do hidden members vote in replica set elections?


Hidden members (page 40) of term:replica :sets do vote in elections. To exclude a member from voting in an :election, change the value of the members votes : (page 563) conguration to 0. See Also: Elections (page 34)

368

Chapter 28. FAQ: Replica Sets and Replication in MongoDB

CHAPTER

TWENTYNINE

FAQ: MONGODB STORAGE


This document addresses common questions regarding MongoDBs storage system. If you dont nd the answer youre looking for, check the complete list of FAQs (page 349) or post your question to the MongoDB User Mailing List. Frequently Asked Questions: What are memory mapped les? (page 369) How do memory mapped les work? (page 369) How does MongoDB work with memory mapped les? (page 369) What are page faults? (page 370) What is the difference between soft and hard page faults? (page 370) What tools can I use to investigate storage use in MongoDB? (page 370) What is the working set? (page 370)

29.1 What are memory mapped les?


A memory-mapped le is a le with data that the operating system places in memory by way of the mmap() system call. mmap() thus maps the le to a region of virtual memory. Memory-mapped les are the critical piece of the storage engine in MongoDB. By using memory mapped les MongoDB can treat the content of its data les as if they were in memory. This provides MongoDB with an extremely fast and simple method for accessing and manipulating data.

29.2 How do memory mapped les work?


Memory mapping assigns les to a block of virtual memory with a direct byte-for-byte correlation. Once mapped, the relationship between le and memory allows MongoDB to interact with the data in the le as if it were memory.

29.3 How does MongoDB work with memory mapped les?


MongoDB uses memory mapped les for managing and interacting with all data. MongoDB memory maps data les to memory as it accesses documents. Data that isnt accessed is not mapped to memory.

369

MongoDB Documentation, Release 2.0.6

29.4 What are page faults?


Page faults will occur if youre attempting to access part of a memory-mapped le that isnt in memory. If there is free memory, then the operating system can nd the page on disk and load it to memory directly. However, if there is no free memory, the operating system must: nd a page in memory that is stale or no longer needed, and write the page to disk. read the requested page from disk and load it into memory. This process, particularly on an active system can take a long time, particularly in comparison to reading a page that is already in memory.

29.5 What is the difference between soft and hard page faults?
Page faults occur when MongoDB needs access to data that isnt currently in active memory. A hard page fault refers to situations when MongoDB must access a disk to access the data. A soft page fault, by contrast, merely moves memory pages from one list to another, such as from an operating system le cache. In production, MongoDB will rarely encounter soft page faults.

29.6 What tools can I use to investigate storage use in MongoDB?


The db.stats() (page 472) method in the mongo shell, returns the current state of the active database. The Database Statistics Reference (page 551) document outlines the meaning of the elds in the db.stats() (page 472) output.

29.7 What is the working set?


Working set represents the total body of data that the application uses in the course of normal operation. Often this is a subset of the total data size, but the specic size of the working set depends on actual moment-to-moment use of the database. If you run a query that requires MongoDB to scan every document in a collection, the working set will expand to include every document. Depending on physical memory size, this may cause documents in the working set to page out, or removed from physical memory by the operating system. The next time MongoDB needs to access these documents, MongoDB may incur a hard page fault. If you run a query that requires MongoDB to scan every document in a collection, the working set includes every active document in memory. For best performance, the majority of your active set should t in RAM. See Also: The Indexing FAQ wiki page.

370

Chapter 29. FAQ: MongoDB Storage

Part XII

Reference

371

CHAPTER

THIRTY

MONGODB INTERFACE
See Also: The following interface overview pages: /reference/operators for an overview of all query, update, and projection operators; /reference/meta-query-operators for all speical meta query operators; Aggregation Framework Reference (page 209) for all aggregation (page 197) operators; /reference/commands for an overview of all database commands; and the /reference/javascript for all mongo shell methods and helpers

30.1 Query, Update, Projection, and Aggregation Operators


Query and update operators:

30.1.1 $addToSet
$addToSet The $addToSet operator adds a value to an array only if the value is not in the array already. If the value is in the array, $addToSet returns without modifying the array. Otherwise, $addToSet behaves the same as $push. Consider the following example:
db.collection.update( { field: value }, { $addToSet: { field: value1 } } );

Here, $addToSet appends value1 to the array stored in field, only if value1 is not already a member of this array. $each operator is only used with the $addToSet see the documentation of $addToSet (page 373) for more information. $each The $each is available within the $addToSet, which allows you to add multiple values to the array if they do not exist in the field array in a single operation. Consider the following prototype:

db.collection.update( { field: value }, { $addToSet: { field: { $each : [ value1, value2,

30.1.2 $all
$all Syntax: { field: { $all: [ <value> , <value1> ... ] }

$all selects the documents where the field holds an array and contains all elements (e.g. <value>, <value1>, etc.) in the array. 373

MongoDB Documentation, Release 2.0.6

Consider the following example:


db.inventory.find( { tags: { $all: [ "appliances", "school", "book" ] } } )

This query selects all documents in the inventory collection where the tags eld contains an array with the elements, appliances, school, and technology. Therefore, the above query will match documents in the inventory collection that have a tags eld that hold either of the following arrays:
[ "school", "book", "bag", "headphone", "appliances" ] [ "appliances", "school", "book" ]

The $all operator exists to describe and specify arrays in MongoDB queries. However, you may use the $all operator to select against a non-array field, as in the following example:
db.inventory.find( { qty: { $all: [ 50 ] } } )

However, use the following form to express the same query:


db.inventory.find( { qty: 50 } )

Both queries will select all documents in the inventory collection where the value of the qty eld equals 50. Note: In most cases, MongoDB does not treat arrays as sets. This operator provides a notable exception to this approach. See Also: find() (page 457), update() (page 464), and $set.

30.1.3 $and
$and New in version 2.0. Syntax: { $and: ... , { <expressionN> } ] } [ { <expression1> }, { <expression2> } ,

$and performs a logical AND operation on an array of two or more expressions (e.g. <expression1>, <expression2>, etc.) and selects the documents that satisfy all the expressions in the array. The $and operator uses short-circuit evaluation. If the rst expression (e.g. <expression1>) evaluates to false, MongoDB will not evaluate the remaining expressions. Consider the following example:
db.inventory.find({ $and: [ { price: 1.99 }, { qty: { $lt: 20 } }, { sale: true } ] } )

This query will select all documents in the inventory collection where: price eld value equals 1.99 and qty eld value is less than 20 and sale eld value is equal to true. MongoDB provides an implicit AND operation when specifying a comma separated list of expressions. For example, you may write the above query as:
db.inventory.find( { price: 1.99, qty: { $lt: 20 } , sale: true } )

374

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

If, however, a query requires an AND operation on the same eld, you must use the $and operator as in the following example:

db.inventory.update( { $and: [ { price: { $ne: 1.99 } }, { price: { $exists: true } } ] }, {

This update() (page 464) operation will set the value of the qty eld in documents where: the price eld value does not equal 1.99 and the price eld exists. See Also: find() (page 457), update() (page 464), $ne, $exists, $set.

30.1.4 $atomic
$atomic In multi-update mode, its possible to specify an $atomic operator that allows you to isolate some updates from each other within this operation. Consider the following example:
db.foo.update( { field1 : 1 , $atomic : 1 }, { $inc : { field2 : 1 } } , false , true )

Without the $atomic operator, multi-updates will allow other operations to interleave with this updates. If these interleaved operations contain writes, the update operation may produce unexpected results. By specifying $atomic you can guarantee isolation for the entire multi-update. See Also: See db.collection.update() (page 464) db.collection.update() (page 464) method. for more information about the

30.1.5 $bit
$bit The $bit operator performs a bitwise update of a eld. Only use this with integer elds. For example:
db.collection.update( { field: 1 }, { $bit: { field: { and: 5 } } } );

Here, the $bit operator updates the integer value of the eld named field with a bitwise and: operation. This operator only works with number types.

30.1.6 $box
$box New in version 1.4. The $box operator species a rectangular shape for the $within operator in geospatial queries. To use the $box operator, you must specify the bottom left and top right corners of the rectangle in an array object. Consider the following example:
db.collection.find( { loc: { $within: { $box: [ [0,0], [100,100] ] } } } )

This will return all the documents that are within the box having points at: [0,0], [0,100], [100,0], and [100,100]. Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators.

30.1. Query, Update, Projection, and Aggregation Operators

375

MongoDB Documentation, Release 2.0.6

30.1.7 $center
$center New in version 1.4. This species a circle shape for the $within operator in geospatial queries. To dene the bounds of a query using $center, you must specify: the center point, and the radius Considering the following example:
db.collection.find( { location: { $within: { $center: [ [0,0], 10 } } } );

The above command returns all the documents that fall within a 10 unit radius of the point [0,0]. Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators.

30.1.8 $centerSphere
$centerSphere New in version 1.8. The $centerSphere operator is the spherical equivalent of the $center operator. $centerSphere uses spherical geometry to calculate distances in a circle specied by a point and radius. Considering the following example:
db.collection.find( { loc: { $centerSphere: { [0,0], 10 / 3959 } } } )

This query will return all documents within a 10 mile radius of [0,0] using a spherical geometry to calculate distances. Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators.

30.1.9 $comment
$comment The $comment (page 376) makes it possible to attach a comment to a query. Because these comments propagate to the profile (page 499) log, adding $comment (page 376) modiers can make your prole data much easier to interpret and trace. Consider the following example:
db.collection.find()._addSpecial( "$comment" , "[COMMENT]" )

Here, [COMMENT] represents the text of the comment.

30.1.10 $each
Note: The $each operator is only used with the $addToSet see the documentation of $addToSet (page 373) for more information.

376

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

$each The $each is available within the $addToSet, which allows you to add multiple values to the array if they do not exist in the field array in a single operation. Consider the following prototype:

db.collection.update( { field: value }, { $addToSet: { field: { $each : [ value1, value2, va

30.1.11 $elemMatch (query)


See Also: $elemMatch (projection) (page 396) $elemMatch New in version 1.4. The $elemMatch (page 396) operator matches more than one component within an array element. For example,
db.collection.find( { array: { $elemMatch: { value1: 1, value2: { $gt: 1 } } } } );

returns all documents in collection where the array array satises all of the conditions in the $elemMatch (page 396) expression, or where the value of value1 is 1 and the value of value2 is greater than 1. Matching arrays must have at least one element that matches all specied criteria. Therefore, the following document would not match the above query:
{ array: [ { value1:1, value2:0 }, { value1:2, value2:2 } ] }

while the following document would match this query:


{ array: [ { value1:1, value2:0 }, { value1:1, value2:2 } ] }

30.1.12 $exists
$exists Syntax: { field: { $exists: boolean } }

$exists selects the documents that contain the eld. MongoDB $exists does not correspond to SQL operator exists. For SQL exists, refer to the $in operator. Consider the following example:
db.inventory.find( { $and: [ { qty: { $exists: true } }, { qty: { $nin: [ 5, 15 ] } } ] } )

This query will select all documents in the inventory collection where the qty eld exists and its value does not equal either 5 nor 15. The above query used the $and operator because the query performs an AND operation on the value of the same eld and is not specic to the $exists operator. See Also: find() (page 457), $and, $nin, $in.

30.1.13 $explain
$explain Use the $explain (page 377) operator to return a document that describes the process and indexes used to return the query. This may provide useful insight when attempting to optimize a query. Consider the following example:

30.1. Query, Update, Projection, and Aggregation Operators

377

MongoDB Documentation, Release 2.0.6

db.collection.find()._addSpecial( "$explain", 1 )

The JavaScript function cursor.explain() (page 449) provides equivalent functionality in the mongo shell. See the following example, which is equivalent to the above:
db.collection.find().explain()

30.1.14 $gt
$gt Syntax: {field: {$gt: value} } $gt selects those documents where the value of the field is greater than (i.e. >) the specied value. Consider the following example:
db.inventory.find( { qty: { $gt: 20 } } )

This query will select all documents in the inventory collection where the qty eld value is greater than 20. Consider the following example which uses the $gt operator with a eld from an embedded document:
db.inventory.update( { "carrier.fee": { $gt: 2 } }, { $set: { price: 9.99 } } )

This update() (page 464) operation will set the value of the price eld in the documents that contain the embedded document carrier whose fee eld value is greater than 2. See Also: find() (page 457), update() (page 464), $set.

30.1.15 $gte
$gte Syntax: {field: {$gte: value} }

$gte selects the documents where the value of the field is greater than or equal to (i.e. >=) a specied value (e.g. value.) Consider the following example:
db.inventory.find( { qty: { $gte: 20 } } )

This query would select all documents in inventory where the qty eld value is greater than or equal to 20. Consider the following example which uses the $gte operator with a eld from an embedded document:
db.inventory.update( { "carrier.fee": { $gte: 2 } }, { $set: { price: 9.99 } } )

This update() (page 464) operation will set the value of the price eld that contain the embedded document carrier whosefee eld value is greater than or equal to 2. See Also: find() (page 457), update() (page 464), $set.

378

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.1.16 $hint
$hint Use the $hint (page 379) operator to force the query optimizer to use a specic index to fulll the query. Use $hint (page 379) for testing query performance and indexing strategies. Consider the following form:
db.collection.find()._addSpecial( "$hint", { _id : 1 } )

This operation returns all documents in the collection named collection using the index on the _id eld. Use this operator to override MongoDBs default index selection process and pick indexes manually.

30.1.17 $in
$in Syntax: { field: { $in: [<value1>, <value2>, ... <valueN> ] } } $in selects the documents where the field value equals any value in the specied array (e.g. <value1>, <value2>, etc.) Consider the following example:
db.inventory.find( { qty: { $in: [ 5, 15 ] } } )

This query will select to select all documents in the inventory collection where the qty eld value is either 5 or 15. Although you can express this query using the $or operator, choose the $in operator rather than the $or operator when performing equality checks on the same eld. If the field holds an array, then the $in operator selects the documents whose field holds an array that contains at least one element that matches a value in the specied array (e.g. <value1>, <value2>, etc.) Consider the following example:
db.inventory.update( { tags: { $in: ["appliances", "school"] } }, { $set: { sale:true } } )

This update() (page 464) operation will set the sale eld value in the inventory collection where the tags eld holds an array with at least one element matching an element in the array ["appliances", "school"]. See Also: method:nd() <db.collection.nd()>, update() (page 464), $or, $set.

30.1.18 $inc
$inc The $inc operator increments a value by a specied amount if eld is present in the document. If the eld does not exist, $inc sets eld to the number value. For example:
db.collection.update( { field: value }, { $inc: { field1: amount } } );

In this example, for documents in collection where field has the value value, the value of field1 increments by the value of amount. The above operation only increments the rst matching document unless you specify multi-update:
db.collection.update( { age: 20 }, { $inc: { age: 1 } } ); db.collection.update( { name: "John" }, { $inc: { age: 1 } } );

30.1. Query, Update, Projection, and Aggregation Operators

379

MongoDB Documentation, Release 2.0.6

In the rst example all documents that have an age eld with the value of 20, the operation increases age eld by one. In the second example, in all documents where the name eld has a value of John the operation increases the value of the age eld by one. $inc accepts positive and negative incremental amounts.

30.1.19 $lt
$lt Syntax: {field: {$lt: value} } $lt selects the documents where the value of the field is less than (i.e. <) the specied value. Consider the following example:
db.inventory.find( { qty: { $lt: 20 } } )

This query will select all documents in the inventory collection where the qty eld value is less than 20. Consider the following example which uses the $lt operator with a eld from an embedded document:
db.inventory.update( { "carrier.fee": { $lt: 20 } }, { $set: { price: 9.99 } } )

This update() (page 464) operation will set the price eld value in the documents that contain the embedded document carrier whose fee eld value is less than 20. See Also: method:nd() <db.collection.nd()>, update() (page 464), $set.

30.1.20 $lte
$lte Syntax: { field: { $lte: value} }

$lte selects the documents where the value of the field is less than or equal to (i.e. <=) the specied value. Consider the following example:
db.inventory.find( { qty: { $lte: 20 } } )

This query will select all documents in the inventory collection where the qty eld value is less than or equal to 20. Consider the following example which uses the $lt operator with a eld from an embedded document:
db.inventory.update( { "carrier.fee": { $lte: 5 } }, { $set: { price: 9.99 } } )

This update() (page 464) operation will set the price eld value in the documents that contain the embedded document carrier whose fee eld value is less than or equal to 5. See Also: method:nd() <db.collection.nd()>, update() (page 464), $set.

380

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.1.21 $max
$max Specify a $max (page 381) value to specify an upper boundary for the value of a eld. mongod enforces this boundary with an index of that eld.
db.collection.find()._addSpecial("$max" , { value : 100 })

This operation above limits the documents returned to those that match the query described by [QUERY] where the eld value is less than 20. mongod infers the index based on on the query unless specied by the cursor.hint() (page 450) function. Use operation alone or in conjunction with $min (page 381) to limit results to a specic range. Note: In most cases, you should avoid this operator in favor of $lt.

30.1.22 $maxDistance
$maxDistance The $maxDistance operator species an upper bound to limit the results of a geolocation query. See below, where the $maxDistance operator narrows the results of the $near query:
db.collection.find( { location: { $near: [100,100], $maxDistance: 10 } } );

This query will return documents with location elds from collection that have values with a distance of 5 or fewer units from the point [100,100]. $near returns results ordered by their distance from [100,100]. This operation will return the rst 100 results unless you modify the query with the cursor.limit() (page 450) method. Specify the value of the $maxDistance argument in the same units as the document coordinate system. Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators.

30.1.23 $maxScan
$maxScan Constrains the query to only scan the specied number of documents when fullling the query. Use the following form:
db.collection.find()._addSpecial( "$maxScan" , 50 )

Use this modier to prevent potentially long running queries from disrupting performance by scanning through too much data.

30.1.24 $min
$min Specify a $min (page 381) value to specify a lower boundary for the value of a eld. mongod enforces this boundary with an index of the eld.

30.1. Query, Update, Projection, and Aggregation Operators

381

MongoDB Documentation, Release 2.0.6

db.collection.find( { [QUERY] } )._addSpecial("$min" , { value : 20})

This operation above limits the documents returned to those that match the query described by [QUERY] where the eld value is at least 20. mongod infers the index based on the query unless specied by the cursor.hint() (page 450) function. Use operation alone or in conjunction with $max (page 381) to limit results to a specic range. Note: In most cases, you should avoid this operator in favor of $gte.

30.1.25 $mod
$mod Syntax: { field: { $mod: [ divisor, remainder ]} }

$mod selects the documents where the field value divided by the divisor has the specied remainder. Consider the following example:
db.inventory.find( { qty: { $mod: [ 4, 0 ] } } )

This query will select all documents in the inventory collection where the qty eld value modulo 4 equals 3, such as documents with qty value equal to 0 or 12. In some cases, you can query using the $mod operator rather than the more expensive $where operator. Consider the following example using the $mod operator:
db.inventory.find( { qty: { $mod: [ 4, 3 ] } } )

The above query is less expensive than the following query which uses the $where operator:
db.inventory.find( { $where: "this.qty % 4 == 3" } )

See Also: find() (page 457), update() (page 464), $set.

30.1.26 $ne
$ne Syntax: {field: {$ne: value} } $ne selects the documents where the value of the field is not equal (i.e. !=) to the specied value. This includes documents that do not contain the field. Consider the following example:
db.inventory.find( { qty: { $ne: 20 } } )

This query will select all documents in the inventory collection where the qty eld value does not equal 20, including those documents that do not contain the qty eld. Consider the following example which uses the $ne operator with a eld from an embedded document:
db.inventory.update( { "carrier.state": { $ne: "NY" } }, { $set: { qty: 20 } } )

382

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

This update() (page 464) operation will set the qty eld value in the documents that contains the embedded document carrier whose state eld value does not equal NY, or where the state eld or the carrier embedded document does not exist. See Also: method:nd() <db.collection.nd()>, update() (page 464), $set.

30.1.27 $near
$near The $near operator takes an argument, coordinates in the form of [x, y], and returns a list of objects sorted by distance from those coordinates. See the following example:
db.collection.find( { location: { $near: [100,100] } } );

This query will return 100 ordered records with a location eld in collection. Specify a different limit using the cursor.limit() (page 450), or another geolocation operator, or a non-geospatial operator to limit the results of the query. Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators.

30.1.28 $nearSphere
$nearSphere New in version 1.8. The $nearSphere operator is the spherical equivalent of the $near operator. $nearSphere returns all documents near a point, calculating distances using spherical geometry.
db.collection.find( { loc: { $nearSphere: [0,0] } } )

Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators.

30.1.29 $nin
$nin Syntax: { field: { $nin: [ <value1>, <value2> ... <valueN> ]} }

$nin selects the documents where: the field value is not in the specied array or the field does not exist. Consider the following query:
db.inventory.find( { qty: { $nin: [ 5, 15 ] } } )

This query will select all documents in the inventory collection where the qty eld value does not equal 5 nor 15. The selected documents will include those documents that do not contain the qty eld. If the field holds an array, then the $nin operator selects the documents whose field holds an array with no element equal to a value in the specied array (e.g. <value1>, <value2>, etc.). 30.1. Query, Update, Projection, and Aggregation Operators 383

MongoDB Documentation, Release 2.0.6

Consider the following query:

db.inventory.update( { tags: { $nin: [ "appliances", "school" ] } }, { $set: { sale: false }

This update() (page 464) operation will set the sale eld value in the inventory collection where the tags eld holds an array with no elements matching an element in the array ["appliances", "school"] or where a document does not contain the tags eld. See Also: find() (page 457), update() (page 464), $set.

30.1.30 $nor
$nor Syntax: { $nor: <expressionN> } ] } [ { <expression1> }, { <expression2> }, ... {

$nor performs a logical NOR operation on an array of two or more <expressions> and selects the documents that fail all the <expressions> in the array. Consider the following example:
db.inventory.find( { $nor: [ { price: 1.99 }, { qty: { $lt: 20 } }, { sale: true } ] } )

This query will select all documents in the inventory collection where: the price eld value does not equal 1.99 and the qty eld value is not less than 20 and the sale eld value is not equal to true including those documents that do not contain these eld(s). The exception in returning documents that do not contain the eld in the $nor expression is when the $nor operator is used with the $exists operator. Consider the following query which uses only the $nor operator:
db.inventory.find( { $nor: [ { price: 1.99 }, { sale: true } ] } )

This query will return all documents that: contain the price eld whose value is not equal to 1.99 and contain the sale eld whose value is not equal to true or contain the price eld whose value is not equal to 1.99 but do not contain the sale eld or do not contain the price eld but contain the sale eld whose value is not equal to true or do not contain the price eld and do not contain the sale eld Compare that with the following query which uses the $nor operator with the $exists operator:
db.inventory.find( { $nor: [ { price: 1.99 }, { price: { $exists: false } }, { sale: true }, { sale: { $exists: false } } ] } )

This query will return all documents that: contain the price eld whose value is not equal to 1.99 and contain the sale eld whose value is not equal to true

384

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

See Also: find() (page 457), update() (page 464), $set, $exists.

30.1.31 $not
$not Syntax: { field: { $not: { <operator-expression> } } }

$not performs a logical NOT operation on the specied <operator-expression> and selects the documents that do not match the <operator-expression>. This includes documents that do not contain the field. Consider the following query:
db.inventory.find( { price: { $not: { $gt: 1.99 } } } )

This query will select all documents in the inventory collection where: the price eld value is less than or equal to 1.99 or the price eld does not exist { $not: { $gt: 1.99 } } is different from the $lte operator. { $lt: 1.99 } returns only the documents where price eld exists and its value is less than or equal to 1.99. Remember that the $not operator only affects other operators and cannot check elds and documents independently. So, use the $not operator for logical disjunctions and the $ne operator to test the contents of elds directly. Consider the following behaviors when using the $not operator: The operation of the $not operator is consistent with the behavior of other operators but may yield unexpected results with some data types like arrays. The $not operator does not support operations with the $regex operator. Instead use // or in your driver interfaces, use your langauges regular expression capability to create regular expression objects. Consider the following example whiche uses the pattern match expression //:
db.inventory.find( { item: { $not: /^p.*/ } } )

The query will select all documents in the inventory collection where the item eld value does not start with the letter p. If using PyMongos re.compile(), you can write the above query as:
import re for noMatch in db.inventory.find( { "item": { "$not": re.compile("^p.*") } } ): print noMatch

See Also: method:nd() <db.collection.nd()>, update() (page 464), $set, $gt, $regex, PyMongo, driver.

30.1. Query, Update, Projection, and Aggregation Operators

385

MongoDB Documentation, Release 2.0.6

30.1.32 $or
$or New in version 1.6.Changed in version 2.0: You may nest $or operations; however, these expressions are not as efciently optimized as top-level. Syntax: { $or: [ { <expression1> }, { <expression2> }, ... , { <expressionN> } ] } The $or operator performs a logical OR operation on an array of two or more <expressions> and selects the documents that satisfy at least one of the <expressions>. Consider the following query:
db.inventory.find( { price:1.99, $or: [ { qty: { $lt: 20 } }, { sale: true } ] } )

This query will select all documents in the inventory collection where: the price eld value equals 1.99 and either the qty eld value is less than 20 or the sale eld value is true. Consider the following example which uses the $or operator to select elds from embedded documents:

db.inventory.update( { $or: [ { price:10.99 }, { "carrier.state": "NY"} ] }, { $set: { sale:

This update() (page 464) operation will set the value of the sale eld in the documents in the inventory collection where: the price eld value equals 10.99 or the carrier embedded document contains a eld state whose value equals NY. When using $or with <expressions> that are equality checks for the value of the same eld, choose the $in operator over the $or operator. Consider the query to select all documents in the inventory collection where: either price eld value equals 1.99 or the sale eld value equals true, and either qty eld value equals 20 or qty eld value equals 50, The most effective query would be:
db.inventory.find ( { $or: [ { price: 1.99 }, { sale: true } ], qty: { $in: [20, 50] } } )

Consider the following behaviors when using the $or operator: When using indexes with $or queries, remember that each clause of an $or query will execute in parallel. These clauses can each use their own index. Consider the following query:
db.inventory.find ( { $or: [ { price: 1.99 }, { sale: true } ] } )

For this query, you would create one index on price ( db.inventory.ensureIndex( { price: 1 } ) ) and another index on sale ( db.inventory.ensureIndex( { sale: 1 } ) ) rather than a compound index. Also, when using the $or operator with the sort() (page 452) method in a query, the query will not use the indexes on the $or elds. Consider the following query which adds a sort() (page 452) method to the above query:
db.inventory.find ( { $or: [ { price: 1.99 }, { sale: true } ] } ).sort({item:1})

This modied query will not use the index on price nor the index on sale.

386

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

See Also: find() (page 457), update() (page 464), $set, $and, sort() (page 452).

30.1.33 $orderby
$orderby The $orderby (page 387) operator sorts the results of a query in ascending or descending order. Consider the following syntax:
db.collection.find()._addSpecial( "$orderby", { age : -1} )

This is equivalent to the following cursor.sort() (page 452) method that may be more familiar to you:
db.collection.find().sort( { age: -1 } )

Both of these examples return all documents in the collection named collection sorted for in descending order from greatest to smallest. Specify a value to $orderby (page 387) of negative one (e.g. -1, as above) to sort in descending order or a positive value (e.g. 1) to sort in ascending order. Unless you have a index for the specied key pattern, use $orderby (page 387) in conjunction with $maxScan (page 381) and/or cursor.limit() (page 450) to avoid requiring MongoDB to perform a large in-memory sort. cursor.limit() (page 450) increases the speed and reduce the amount of memory required to return this query by way of an optimized algorithm.

30.1.34 $polygon
$polygon New in version 1.9. Use $polygon to specify a polgon for a bounded query using the $within operator for geospatial queries. To dene the polygon, you must specify an array of coordinate points, as in the following: [ [ x1,y1 ], [x2,y2], [x3,y3] ] The last point specied is always implicitly connected to the rst. You can specify as many points, and therfore sides, as you like. Consider the following bounded query for documents with coordinates within a polygon:
db.collection.find( { loc: { $within: { $polygon: [ [0,0], [3,6], [6,0] ] } } } )

Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators.

30.1.35 $pop
$pop The $pop operator removes the rst or last element of an array. Pass $pop a value of 1 to remove the last element in an array and a value of -1 to remove the rst element of an array. Consider the following syntax:
db.collection.update( {field: value }, { $pop: { field: 1 } } );

30.1. Query, Update, Projection, and Aggregation Operators

387

MongoDB Documentation, Release 2.0.6

This operation removes the last item of the array in field in the document that matches the query statement { field: value }. The following example removes the rst item of the same array:
db.collection.update( {field: value }, { $pop: { field: -1 } } );

Be aware of the following $pop behaviors: The $pop operation fails if field is not an array. $pop will successfully remove the last item in an array. field will then hold an empty array. New in version 1.1.

30.1.36 $pull
$pull The $pull operator removes all instances of a value from an existing array. Consider the following example:
db.collection.update( { field: value }, { $pull: { field: value1 } } );

$pull removes the value value1 from the array in field, in the document that matches the query statement { field: value } in collection. If value1 existed multiple times in the field array, pull would remove all instances of value1 in this array.

30.1.37 $pullAll
$pullAll The $pullAll operator removes multiple values from an existing array. $pullAll provides the inverse operation of the $pushAll operator. Consider the following example:

db.collection.update( { field: value }, { $pullAll: { field1: [ value1, value2, value3 ] } }

Here, $pullAll removes [ value1, value2, value3 ] from the array in field1, in the document that matches the query statement { field: value } in collection.

30.1.38 $push
$push The $push operator appends a specied value to an array. For example:
db.collection.update( { field: value }, { $push: { field: value1 } } );

Here, $push appends value1 to the array identied by value in field. Be aware of the following behaviors: If the eld specied in the $push statement (e.g. { $push: { field: value1 } }) does not exist in the matched document, the operation adds a new array with the specied eld and value (e.g. value1) to the matched document. The operation will fail if the eld specied in the $push statement is not an array. $push does not fail when pushing a value to a non-existent eld. If value1 is an array itself, $push appends the whole array as an element in the identied array. To add multiple items to an array, use $pushAll.

388

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.1.39 $pushAll
$pushAll The $pushAll operator is similar to the $push but adds the ability to append several values to an array at once.

db.collection.update( { field: value }, { $pushAll: { field1: [ value1, value2, value3 ] } }

Here, $pushAll appends the values in [ value1, value2, value3 ] to the array in field1 in the document matched by the statement { field: value } in collection. If you specify a single value, $pushAll will behave as $push.

30.1.40 $query
$query The $query (page 389) operator provides an interface to describe queries. Consider the following operation.
db.collection.find()._addSpecial( "$query" : { value : 100 } )

This is equivalent to the following db.collection.find() (page 457) method that may be more familiar to you:
db.collection.find( { value : 100 } )

30.1.41 $regex
$regex The $regex operator provides regular expression capabilities in queries. MongoDB uses Perl compatible regular expressions (i.e. PCRE.))The following examples are equivalent:
db.collection.find( { field: /acme.*corp/i } ); db.collection.find( { field: { $regex: acme.*corp, $options: i } } );

These expressions match all documents in collection where the value of field matches the caseinsensitive regular expression acme.*corp. $regex uses Perl Compatible Regular Expressions (PCRE) as the matching engine. $options $regex provides four option ags: i toggles case insensitivity, and allows all letters in the pattern to match upper and lower cases. m toggles multiline regular expression. Without this option, all regular expression match within one line. If there are no newline characters (e.g. \n) or no start/end of line construct, the m option has no effect. x toggles an extended capability. When set, $regex ignores all white space characters unless escaped or included in a character class. Additionally, it ignores characters between an un-escaped # character and the next new line, so that you may include comments in complicated patterns. This only applies to data characters; white space characters may never appear within special character sequences in a pattern. The x option does not affect the handling of the VT character (i.e. code 11.) New in version 1.9.0.

30.1. Query, Update, Projection, and Aggregation Operators

389

MongoDB Documentation, Release 2.0.6

s allows the dot (e.g. .) character to match all characters including newline characters. $regex only provides the i and m options in the short JavaScript syntax (i.e. /acme.*corp/i). To use x and s you must use the $regex operator with the $options syntax. To combine a regular expression match with other operators, you need to specify the $regex operator. For example:
db.collection.find( { field: $regex: /acme.*corp/i, $nin: [ acmeblahcorp } );

This expression returns all instances of field in collection that match the case insensitive regular expression acme.*corp that dont match acmeblahcorp. $regex uses indexes only when the regular expression has an anchor for the beginning (i.e. ^) of a string. Additionally, while /^a/, /^a.*/, and /^a.*$/ are equivalent, they have different performance characteristics. All of these expressions use an index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can stop scanning after matching the prex.

30.1.42 $rename
$rename The $rename operator changes the name of a eld. Consider the following example:
db.collection.update( { field: value }, { $rename: { old_field: new_field } } );

Here, the $rename operator changes the name of the old_field eld to new_field, in the document that matches the query { field: value } in collection. The $rename operator will expand arrays and sub-documents to nd a match for eld names (e.g. old_field in the example above.) New in version 1.7.2.

30.1.43 $returnKey
$returnKey Only return the index key (i.e. _id) or keys for the results of the query. Use the following form:
db.collection.find()._addSpecial("$returnKey" , true )

30.1.44 $set
$set Use the $set operator to set a particular value. The $set operator requires the following syntax:
db.collection.update( { field: value1 }, { $set: { field1: value2 } } );

This statement updates in the document in collection where field matches value1 by replacing the value of the eld field1 with value2. This operator will add the specied eld or elds if they do not exist in this document or replace the existing value of the specied eld(s) if they already exist.

30.1.45 $showDiskLoc
$showDiskLoc Use the following modier to display the disk location:

390

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

db.collection.find()._addSpecial("$showDiskLoc" , true)

30.1.46 $size
$size The $size operator matches any array with the number of elements specied by the argument. For example:
db.collection.find( { field: { $size: 2 } } );

returns all documents in collection where field is an array with 2 or more elements. For instance, the above expression will return { field: [ red, green ] } and { field: [ apple, lime ] } but not { field: fruit } or { field: [ orange, lemon, grapefruit ] }. To match elds with only one element within an array use $size with a value of 1, as follows:
db.collection.find( { field: { $size: 1 } } );

$size does not accept ranges of values. To select documents based on elds with different numbers of elements, create a counter eld that you increment when you add elements to a eld. Queries cannot use indexes for the $size portion of a query, although the other portions of a query can use indexes if applicable.

30.1.47 $snapshot
$snapshot The $snapshot (page 391) operator ensures that the results returned by a query: contains no duplicates. misses no objects. returns all matching objects that were present at the beginning and the end of the query. Snapshot mode does not guarantee the inclusion (or omission) of an object present at the beginning of the query but not at the end (due to an update.) Use the following syntax:
db.foo.find()._addSpecial( "$snapshot", true )

The JavaScript function cursor.snapshot() (page 452) provides equivalent functionality in the mongo shell. See the following example, which is equivalent to the above:
db.foo.find().snapshot()

Do not use snapshot with $hint (page 379), or $orderby (page 387) (cursor.sort() (page 452).)

30.1.48 $type
$type Syntax: { field: { $type: <BSON type> } }

$type selects the documents where the value of the field is the specied BSON type. Consider the following example:
db.inventory.find( { price: { $type : 1 } } )

30.1. Query, Update, Projection, and Aggregation Operators

391

MongoDB Documentation, Release 2.0.6

This query will select all documents in the inventory collection where the price eld value is a Double. If the field holds an array, the $type operator performs the type check against the array elements and not the field. Consider the following example where the tags eld holds an array:
db.inventory.find( { tags: { $type : 4 } } )

This query will select all documents in the inventory collection where the tags array contains an element that is itself an array. If instead you want to determine whether the tags eld is an array type, use the $where operator:
db.inventory.find( { $where : "Array.isArray(this.tags)" } )

See the SERVER-1475 for more information about the array type. Refer to the following table for the available BSON types and their corresponding numbers. Type Double String Object Array Binary data Object id Boolean Date Null Regular Expression JavaScript Symbol JavaScript (with scope) 32-bit integer Timestamp 64-bit integer Min key Max key Number 1 2 3 4 5 7 8 9 10 11 13 14 15 16 17 18 255 127

MinKey and MaxKey compare less than and greater than all other possible BSON element values, respectively, and exist primarily for internal use. Note: To query if a eld value is a MinKey, you must use the $type with -1 as in the following example:
db.collection.find( { field: { $type: -1 } } )

Example Consider the following example operation sequence that demonstrates both type comparison and the special MinKey and MaxKey values:
db.test.insert( {x : 3}); db.test.insert( {x : 2.9} ); db.test.insert( {x : new Date()} );

392

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

db.test.insert( {x : true } ); db.test.insert( {x : MaxKey } ) db.test.insert( {x : MinKey } ) db.test.find().sort({x:1}) { "_id" : ObjectId("4b04094b7c65b846e2090112"), { "_id" : ObjectId("4b03155dce8de6586fb002c7"), { "_id" : ObjectId("4b03154cce8de6586fb002c6"), { "_id" : ObjectId("4b031566ce8de6586fb002c9"), { "_id" : ObjectId("4b031563ce8de6586fb002c8"), { "_id" : ObjectId("4b0409487c65b846e2090111"),

"x" "x" "x" "x" "x" "x"

: : : : : :

{ $minKey : 1 } } 2.9 } 3 } true } "Tue Jul 25 2012 18:42:03 GMT-0500 (ES { $maxKey : 1 } }

To query for the minimum value of a shard key of a shard cluster, use the following operation when connected to the mongos:
use config db.chunks.find( { "min.shardKey": { $type: -1 } } )

Warning: Storing values of the different types in the same eld in a collection is strongly discouraged. See Also: find() (page 457), insert() (page 460), $where, BSON , shard key, shard cluster .

30.1.49 $uniqueDocs
$uniqueDocs New in version 2.0. For geospatial queries, MongoDB may return a single document more than once for a single query, because geospatial indexes may include multiple coordinate pairs in a single document, and therefore return the same document more than once. The $uniqueDocs operator inverts the default behavior of the $within operator. By default, the $within operator returns the document only once. If you specify a value of false for $uniqueDocs, MongoDB will return multiple instances of a single document. Example Given an addressBook collection with a document in the following form:
{ addresses: [ { name: "Home", loc: [55.5, 42.3] }, { name: "Work", loc: [32.3, 44.2] } ] }

The following query would return the same document multiple times:

db.addressBook.find( { "addresses.loc": { "$within": { "$box": [ [0,0], [100,100] ], $unique

The following query would return each matching document, only once:

db.addressBook.find( { "address.loc": { "$within": { "$box": [ [0,0], [100,100] ], $uniqueDo

You cannot specify $uniqueDocs with $near or haystack queries. Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators.

30.1. Query, Update, Projection, and Aggregation Operators

393

MongoDB Documentation, Release 2.0.6

30.1.50 $unset
$unset The $unset operator deletes a particular eld. Consider the following example:
db.collection.update( { field: value1 }, { $unset: { field1: "" } } );

The above example deletes field1 in collection from documents where field has a value of value1. The value of specied for the value of the eld in the $unset statement (i.e. "" above,) does not impact the operation. If documents match the initial query (e.g. { field: value1 } above) but do not have the eld specied in the $unset operation, (e.g. field1) there the statement has no effect on the document.

30.1.51 $where
$where Use the $where operator to pass a string containing a JavaScript expression to the query system to provide greater exibility with queries. Consider the following:
db.collection.find( { $where: "this.a == this.b" } );

Warning: $where evaluates JavaScript and cannot take advantage of indexes. Therefore, query performance improves when you express your query using the standard MongoDB operators (e.g., $gt, $in). In general, you should use $where only when you cant express your query using another operator. If you must use $where, try to include at least one other standard query operator to lter the result set. Using $where alone requires a table scan.

30.1.52 $within
$within The $within operator allows you to select items that exist within a shape on a coordinate system for geospatial queries. This operator uses the following syntax:
db.collection.find( { location: { $within: { shape } } } );

Replace { shape } with a document that describes a shape. The $within command supports three shapes. These shapes and the relevant expressions follow: Rectangles. Use the $box operator, consider the following variable and $within document:
db.collection.find( { location: { $within: { $box: [[100,0], [120,100]] } } } );

Here a box, [[100,120], [100,0]] describes the parameter for the query. As a minimum, you must specify the lower-left and upper-right corners of the box. Circles. Use the $center operator. Specify circles in the following form:
db.collection.find( { location: { $within: { $circle: [ center, radius } } } );

Polygons. Use the $polygon operator. Specify polygons with an array of points. See the following example:

394

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

db.collection.find( { location: { $within: { $box: [[100,120], [100,100], [120,100], [240

The last point of a polygon is implicitly connected to the rst point. All shapes include the border of the shape as part of the shape, although this is subject to the imprecision of oating point numbers. Use uniqueDocs to control whether documents with many location elds show up multiple times when more than one of its elds match the query. Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators. $box New in version 1.4. The $box operator species a rectangular shape for the $within operator in geospatial queries. To use the $box operator, you must specify the bottom left and top right corners of the rectangle in an array object. Consider the following example:
db.collection.find( { loc: { $within: { $box: [ [0,0], [100,100] ] } } } )

This will return all the documents that are within the box having points at: [0,0], [0,100], [100,0], and [100,100]. Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators. $polygon New in version 1.9. Use $polygon to specify a polgon for a bounded query using the $within operator for geospatial queries. To dene the polygon, you must specify an array of coordinate points, as in the following: [ [ x1,y1 ], [x2,y2], [x3,y3] ] The last point specied is always implicitly connected to the rst. You can specify as many points, and therfore sides, as you like. Consider the following bounded query for documents with coordinates within a polygon:
db.collection.find( { loc: { $within: { $polygon: [ [0,0], [3,6], [6,0] ] } } } )

Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators. $center New in version 1.4. This species a circle shape for the $within operator in geospatial queries. To dene the bounds of a query using $center, you must specify: the center point, and the radius Considering the following example:
db.collection.find( { location: { $within: { $center: [ [0,0], 10 } } } );

The above command returns all the documents that fall within a 10 unit radius of the point [0,0].

30.1. Query, Update, Projection, and Aggregation Operators

395

MongoDB Documentation, Release 2.0.6

Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators. $uniqueDocs New in version 2.0. For geospatial queries, MongoDB may return a single document more than once for a single query, because geospatial indexes may include multiple coordinate pairs in a single document, and therefore return the same document more than once. The $uniqueDocs operator inverts the default behavior of the $within operator. By default, the $within operator returns the document only once. If you specify a value of false for $uniqueDocs, MongoDB will return multiple instances of a single document. Example Given an addressBook collection with a document in the following form:

{ addresses: [ { name: "Home", loc: [55.5, 42.3] }, { name: "Work", loc: [32.3, 44.2] } ]

The following query would return the same document multiple times:

db.addressBook.find( { "addresses.loc": { "$within": { "$box": [ [0,0], [100,100] ], $uni

The following query would return each matching document, only once:

db.addressBook.find( { "address.loc": { "$within": { "$box": [ [0,0], [100,100] ], $uniqu

You cannot specify $uniqueDocs with $near or haystack queries. Note: A geospatial index must exist on a eld holding coordinates before using any of the geolocation query operators. Projection operators:

30.1.53 $elemMatch (projection)


See Also: $elemMatch (query) (page 377) $elemMatch New in version 2.2. Use the $elemMatch (page 396) projection operator to limit the response of a query to a single matching element of an array. Consider the following: Example Given the following document fragment:
{ _id: ObjectId(), zipcode: 63109, dependents: [ { name: "john", school: 102, age: 10 }, { name: "jess", school: 102, age: 11 }, { name: "jeff", school: 108, age: 15 }

396

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

] }

Consider the following find() (page 457) operation:


var projection = { _id: 0, dependents: { $elemMatch: { school: 102 }}}; db.students.find( { zipcode: 63109 }, projection);

The query would return all documents where the value of the zipcode eld is 63109, while the projection excludes the _id eld and only includes the rst matching element of the dependents array where the school element has a value of 102. The documents would take the following form:
{ dependents: [ { name: "john", school: 102, age: 10 } ] }

Note: The $elemMatch (page 396) projection will only match one array element per source document.

30.1.54 $slice
$slice The $slice (page 397) operator controls the number of items of an array that a query returns. Consider the following prototype query:
db.collection.find( { field: value }, { array: {$slice: count } } );

This operation selects the document collection identied by a eld named field that holds value and returns the number of elements specied by the value of count from the array stored in the array eld. If count has a value greater than the number of elements in array the query returns all elements of the array. $slice (page 397) accepts arguments in a number of formats, including negative values and arrays. Consider the following examples:
db.posts.find( {}, { comments: { $slice: 5 } } )

Here, $slice (page 397) selects the rst ve items in an array in the comments eld.
db.posts.find( {}, { comments: { $slice: -5 } } )

This operation returns the last ve items in array. The following examples specify an array as an argument to slice. Arrays take the form of [ skip , limit ], where the rst value indicates the number of items in the array to skip and the second value indicates the number of items to return.
db.posts.find( {}, { comments: { $slice: [ 20, 10 ] } } )

Here, the query will only return 10 items, after skipping the rst 20 items of that array.
db.posts.find( {}, { comments: { $slice: [ -20, 10 ] } } )

This operation returns 10 items as well, beginning with the item that is 20th from the last item of the array. Aggregation operators: 30.1. Query, Update, Projection, and Aggregation Operators 397

MongoDB Documentation, Release 2.0.6

30.1.55 $add
$add Takes an array of one or more numbers and adds them together, returning the sum.

30.1.56 $addToSet
$addToSet Returns an array of all the values found in the selected eld among the documents in that group. Every unique value only appears once in the result set. There is no ordering guarantee for the output documents.

30.1.57 $and
$and Takes an array one or more values and returns true if all of the values in the array are true. Otherwise $and returns false. Note: $and uses short-circuit logic: the operation stops evaluation after encountering the rst false expression.

30.1.58 $avg
$avg Returns the average of all the values of the eld in all documents selected by this group.

30.1.59 $cmp
$cmp Takes two values in an array and returns an integer. The returned value is: A negative number if the rst value is less than the second. A positive number if the rst value is greater than the second. 0 if the two values are equal.

30.1.60 $cond
$cond

Example
{ $cond: [ <boolean-expression>, <true-case>, <false-case> ] }

Takes an array with three expressions, where the rst expression evaluates to a Boolean value. If the rst expression evaluates to true, $cond (page 398) returns the value of the second expression. If the rst expression evaluates to false, $cond (page 398) evaluates and returns the third expression.

398

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.1.61 $dayOfMonth
$dayOfMonth Takes a date and returns the day of the month as a number between 1 and 31.

30.1.62 $dayOfWeek
$dayOfWeek Takes a date and returns the day of the week as a number between 1 (Sunday) and 7 (Saturday.)

30.1.63 $dayOfYear
$dayOfYear Takes a date and returns the day of the year as a number between 1 and 366.

30.1.64 $divide
$divide Takes an array that contains a pair of numbers and returns the value of the rst number divided by the second number.

30.1.65 $eq
$eq Takes two values in an array and returns a boolean. The returned value is: true when the values are equivalent. false when the values are not equivalent.

30.1.66 $rst
$first Returns the rst value it encounters for its group . Note: Only use $first (page 400) when the $group (page 399) follows an $sort (page 406) operation. Otherwise, the result of this operation is unpredictable.

30.1.67 $group
$group Groups documents together for the purpose of calculating aggregate values based on a collection of documents. Practically, group often supports tasks such as average page views for each page in a website on a daily basis. The output of $group (page 399) depends on how you dene groups. Begin by specifying an identier (i.e. a _id eld) for the group youre creating with this pipeline. You can specify a single eld from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming elds.

30.1. Query, Update, Projection, and Aggregation Operators

399

MongoDB Documentation, Release 2.0.6

With the exception of the _id eld, $group (page 399) cannot output nested documents. Every group expression must specify an _id eld. You may specify the _id eld as a dotted eld path reference, a document with multiple elds enclosed in braces (i.e. { and }), or a constant value. Note: Use $project (page 404) as needed to rename the grouped eld after an $group (page 399) operation, if necessary. Consider the following example:
db.article.aggregate( { $group : { _id : "$author", docsPerAuthor : { $sum : 1 }, viewsPerAuthor : { $sum : "$pageViews" } }} );

This groups by the author eld and computes two elds, the rst docsPerAuthor is a counter eld that adds one for each document with a given author eld using the $sum (page 408) function. The viewsPerAuthor eld is the sum of all of the pageViews elds in the documents for each group. Each eld dened for the $group (page 399) must use one of the group aggregation function listed below to generate its composite value: $addToSet Returns an array of all the values found in the selected eld among the documents in that group. Every unique value only appears once in the result set. There is no ordering guarantee for the output documents. $first Returns the rst value it encounters for its group . Note: Only use $first (page 400) when the $group (page 399) follows an $sort (page 406) operation. Otherwise, the result of this operation is unpredictable. $last Returns the last value it encounters for its group. Note: Only use $last (page 401) when the $group (page 399) follows an $sort (page 406) operation. Otherwise, the result of this operation is unpredictable. $max Returns the highest value among all values of the eld in all documents selected by this group. $min Returns the lowest value among all values of the eld in all documents selected by this group. $avg Returns the average of all the values of the eld in all documents selected by this group. $push Returns an array of all the values found in the selected eld among the documents in that group. A value may appear more than once in the result set if more than one eld in the grouped documents has that value.

400

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

$sum Returns the sum of all the values for a specied eld in the grouped documents, as in the second use above. Alternately, if you specify a value as an argument, $sum (page 408) will increment this eld by the specied value for every document in the grouping. Typically, as in the rst use above, specify a value of 1 in order to count members of the group. Warning: The aggregation system currently stores $group (page 399) operations in memory, which may cause problems when processing a larger number of groups.

30.1.68 $gt
$gt Takes two values in an array and returns an integer. The returned value is: true when the rst value is greater than the second value. false when the rst value is less than or equal to the second value.

30.1.69 $gte
$gte Takes two values in an array and returns an integer. The returned value is: true when the rst value is greater than or equal to the second value. false when the rst value is less than the second value.

30.1.70 $hour
$hour Takes a date and returns the hour between 0 and 23.

30.1.71 $ifNull
$ifNull

Example
{ $ifNull: [ <expression>, <replacement-if-null> ] }

Takes an array with two expressions. $ifNull (page 401) returns the rst expression if it evaluates to a non-null value. Otherwise, $ifNull (page 401) returns the second expressions value.

30.1.72 $last
$last Returns the last value it encounters for its group.

30.1. Query, Update, Projection, and Aggregation Operators

401

MongoDB Documentation, Release 2.0.6

Note: Only use $last (page 401) when the $group (page 399) follows an $sort (page 406) operation. Otherwise, the result of this operation is unpredictable.

30.1.73 $limit
$limit Restricts the number of documents that pass through the $limit (page 402) in the pipeline. $limit (page 402) takes a single numeric (positive whole number) value as a parameter. Once the specied number of documents pass through the pipeline operator, no more will. Consider the following example:
db.article.aggregate( { $limit : 5 } );

This operation returns only the rst 5 documents passed to it from by the pipeline. $limit (page 402) has no effect on the content of the documents it passes.

30.1.74 $lt
$lt Takes two values in an array and returns an integer. The returned value is: true when the rst value is less than the second value. false when the rst value is greater than or equal to the second value.

30.1.75 $lte
$lte Takes two values in an array and returns an integer. The returned value is: true when the rst value is less than or equal to the second value. false when the rst value is greater than the second value.

30.1.76 $match
$match Provides a query-like interface to lter documents out of the aggregation pipeline. The $match (page 402) drops documents that do not match the condition from the aggregation pipeline, and it passes documents that match along the pipeline unaltered. The syntax passed to the $match (page 402) is identical to the query syntax. Consider the following prototype form:
db.article.aggregate( { $match : <match-predicate> } );

The following example performs a simple eld equality test:

402

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

db.article.aggregate( { $match : { author : "dave" } } );

This operation only returns documents where the author eld holds the value dave. Consider the following example, which performs a range test:
db.article.aggregate( { $match : { score ); : { $gt : 50, $lte : 90 } } }

Here, all documents return when the score eld holds a value that is greater than 50 and less than or equal to 90. Note: Place the $match (page 402) as early in the aggregation pipeline as possible. Because $match (page 402) limits the total number of documents in the aggregation pipeline, earlier $match (page 402) operations minimize the amount of later processing. If you place a $match (page 402) at the very beginning of a pipeline, the query can take advantage of indexes like any other db.collection.find() (page 457) or db.collection.findOne() (page 458). Warning: You cannot use $where or geospatial operations in $match (page 402) queries as part of the aggregation pipeline.

30.1.77 $max
$max Returns the highest value among all values of the eld in all documents selected by this group.

30.1.78 $min
$min Returns the lowest value among all values of the eld in all documents selected by this group.

30.1.79 $minute
$minute Takes a date and returns the minute between 0 and 59.

30.1.80 $mod
$mod Takes an array that contains a pair of numbers and returns the remainder of the rst number divided by the second number. See Also: $mod

30.1. Query, Update, Projection, and Aggregation Operators

403

MongoDB Documentation, Release 2.0.6

30.1.81 $month
$month Takes a date and returns the month as a number between 1 and 12.

30.1.82 $multiply
$multiply Takes an array of one or more numbers and multiples them, returning the resulting product.

30.1.83 $ne
$ne Takes two values in an array returns an integer. The returned value is: true when the values are not equivalent. false when the values are equivalent.

30.1.84 $not
$not Returns the boolean opposite value passed to it. When passed a true value, $not returns false; when passed a false value, $not returns true.

30.1.85 $or
$or Takes an array of one or more values and returns true if any of the values in the array are true. Otherwise $or returns false. Note: $or uses short-circuit logic: the operation stops evaluation after encountering the rst true expression.

30.1.86 $project
$project Reshapes a document stream by renaming, adding, or removing elds. Also use $project (page 404) to create computed values or sub-objects. Use $project (page 404) to: Include elds from the original document. Insert computed elds. Rename elds. Create and populate elds that hold sub-documents. Use $project (page 404) to quickly select the elds that you want to include or exclude from the response. Consider the following aggregation framework operation.

404

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

db.article.aggregate( { $project : { title : 1 , author : 1 , }} );

This operation includes the title eld and the author eld in the document that returns from the aggregation pipeline. Note: The _id eld is always included by default. You may explicitly exclude _id as follows:
db.article.aggregate( { $project : { _id : 0 , title : 1 , author : 1 }} );

Here, the projection excludes the _id eld but includes the title and author elds. Projections can also add computed elds to the document stream passing through the pipeline. A computed eld can use any of the expression operators (page 216). Consider the following example:
db.article.aggregate( { $project : { title : 1, doctoredPageViews : { $add:["$pageViews", 10] } }} );

Here, the eld doctoredPageViews represents the value of the pageViews eld after adding 10 to the original eld using the $add (page 398). Note: You must enclose the expression that denes the computed eld in braces, so that the expression is a valid object. You may also use $project (page 404) to rename elds. Consider the following example:
db.article.aggregate( { $project : { title : 1 , page_views : "$pageViews" , bar : "$other.foo" }} );

This operation renames the pageViews eld to page_views, and renames the foo eld in the other sub-document as the top-level eld bar. The eld references used for renaming elds are direct expressions and do not use an operator or surrounding braces. All aggregation eld references can use dotted paths to refer to elds in nested documents. Finally, you can use the $project (page 404) to create and populate new sub-documents. Consider the following example that creates a new object-valued eld named stats that holds a number of values:

30.1. Query, Update, Projection, and Aggregation Operators

405

MongoDB Documentation, Release 2.0.6

db.article.aggregate( { $project : { title : 1 , stats : { pv : "$pageViews", foo : "$other.foo", dpv : { $add:["$pageViews", 10] } } }} );

This projection includes the title eld and places $project (page 404) into inclusive mode. Then, it creates the stats documents with the following elds: pv which includes and renames the pageViews from the top level of the original documents. foo which includes the value of other.foo from the original documents. dpv which is a computed eld that adds 10 to the value of the pageViews eld in the original document using the $add (page 398) aggregation expression.

30.1.87 $push
$push Returns an array of all the values found in the selected eld among the documents in that group. A value may appear more than once in the result set if more than one eld in the grouped documents has that value.

30.1.88 $second
$second Takes a date and returns the second between 0 and 59, but can be 60 to account for leap seconds.

30.1.89 $skip
$skip Skips over the specied number of documents that pass through the $skip (page 406) in the pipeline before passing all of the remaining input. $skip (page 406) takes a single numeric (positive whole number) value as a parameter. Once the operation has skipped the specied number of documents, it passes all the remaining documents along the pipeline without alteration. Consider the following example:
db.article.aggregate( { $skip : 5 } );

This operation skips the rst 5 documents passed to it by the pipeline. $skip (page 406) has no effect on the content of the documents it passes along the pipeline.

30.1.90 $sort
$sort The $sort (page 406) pipeline operator sorts all input documents and returns them to the pipeline in sorted order. Consider the following prototype form:

406

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

db.<collection-name>.aggregate( { $sort : { <sort-key> } } );

This sorts the documents in the collection named <collection-name>, according to the key and specication in the { <sort-key> } document. Specify the sort in a document with a eld or elds that you want to sort by and a value of 1 or -1 to specify an ascending or descending sort respectively, as in the following example:
db.users.aggregate( { $sort : { age : -1, posts: 1 } } );

This operation sorts the documents in the users collection, in descending order according by the age eld and then in ascending order according to the value in the posts eld. Note: The $sort (page 406) cannot begin sorting documents until previous operators in the pipeline have returned all output. $skip (page 406) $sort (page 406) operator can take advantage of an index when placed at the beginning of the pipleline or placed before the following aggregation operators: $project (page 404) $unwind (page 408) $group (page 399). Warning: Unless the $sort (page 406) operator can use an index, in the current release, the sort must t within memory. This may cause problems when sorting large numbers of documents.

30.1.91 $strcasecmp
$strcasecmp Takes in two strings. Returns a number. $strcasecmp (page 407) is positive if the rst string is greater than the second and negative if the rst string is less than the second. $strcasecmp (page 407) returns 0 if the strings are identical. Note: $strcasecmp (page 407) may not make sense when applied to glyphs outside the Roman alphabet. $strcasecmp (page 407) internally capitalizes strings before comparing them to provide a caseinsensitive comparison. Use $cmp (page 398) for a case sensitive comparison.

30.1.92 $substr
$substr $substr (page 407) takes a string and two numbers. The rst number represents the number of bytes in the string to skip, and the second number species the number of bytes to return from the string.

30.1. Query, Update, Projection, and Aggregation Operators

407

MongoDB Documentation, Release 2.0.6

Note: $substr (page 407) is not encoding aware and if used improperly may produce a result string containing an invalid utf-8 character sequence.

30.1.93 $subtract
$subtract Takes an array that contains a pair of numbers and subtracts the second from the rst, returning their difference.

30.1.94 $sum
$sum Returns the sum of all the values for a specied eld in the grouped documents, as in the second use above. Alternately, if you specify a value as an argument, $sum (page 408) will increment this eld by the specied value for every document in the grouping. Typically, as in the rst use above, specify a value of 1 in order to count members of the group.

30.1.95 $toLower
$toLower Takes a single string and converts that string to lowercase, returning the result. All uppercase letters become lowercase. Note: $toLower (page 408) may not make sense when applied to glyphs outside the Roman alphabet.

30.1.96 $toUpper
$toUpper Takes a single string and converts that string to uppercase, returning the result. All lowercase letters become uppercase. Note: $toUpper (page 408) may not make sense when applied to glyphs outside the Roman alphabet.

30.1.97 $unwind
$unwind Peels off the elements of an array individually, and returns a stream of documents. $unwind (page 408) returns one document for every member of the unwound array within every source document. Take the following aggregation command:
db.article.aggregate( { $project : { author : 1 , title : 1 , tags : 1 }},

408

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

{ $unwind : "$tags" } );

Note: The dollar sign (i.e. $) must proceed the eld specication handed to the $unwind (page 408) operator. In the above aggregation $project (page 404) selects (inclusively) the author, title, and tags elds, as well as the _id eld implicitly. Then the pipeline passes the results of the projection to the $unwind (page 408) operator, which will unwind the tags eld. This operation may return a sequence of documents that resemble the following for a collection that contains one document holding a tags eld with an array of 3 items.
{ "result" : [ { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "good" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" } ], "OK" : 1 }

A single document becomes 3 documents: each document is identical except for the value of the tags eld. Each value of tags is one of the values in the original tags array. Note: $unwind (page 408) has the following behaviors: $unwind (page 408) is most useful in combination with $group (page 399). You may undo the effects of unwind operation with the $group (page 399) pipeline operator. If you specify a target eld for $unwind (page 408) that does not exist in an input document, the pipeline ignores the input document, and will generate no result documents. If you specify a target eld for $unwind (page 408) that is not an array, aggregate() generates an error. If you specify a target eld for $unwind (page 408) that holds an empty array ([]) in an input document, the pipeline ignores the input document, and will generates no result documents.

30.1. Query, Update, Projection, and Aggregation Operators

409

MongoDB Documentation, Release 2.0.6

30.1.98 $week
$week Takes a date and returns the week of the year as a number between 0 and 53. Weeks begin on Sundays, and week 1 begins with the rst Sunday of the year. Days preceding the rst Sunday of the year are in week 0. This behavior is the same as the %U operator to the strftime standard library function.

30.1.99 $year
$year Takes a date and returns the full year.

30.2 Database Commands


30.2.1 addShard
addShard Options name Optional. Unless specied, a name will be automatically provided to uniquely identify the shard. maxSize Optional. Unless specied, shards will consume the total amount of available space on their machines if necessary. Use the maxSize value to limit the amount of space the database can use. Specify this value in megabytes. The addShard command registers a new with a sharded cluster. You must run this command against a mongos instance. The command takes the following form:
{ addShard: "<hostname>:<port>" }

Replace <hostname>:<port> with the hostname and port of the database instance you want to add as a shard. Because the mongos instances do not have state and distribute conguration in the cong database, send this command to only one mongos instance. Note: Specify a maxSize when you have machines with different disk capacities, or if you want to limit the amount of data on some shards. The maxSize constraint prevents the balancer from migrating chunks to the shard when the value of mem.mapped (page 542) exceeds the value of maxSize.

30.2.2 aggregate
aggregate New in version 2.1.0. aggregate implements the aggregation framework. Consider the following prototype form:
{ aggregate: "[collection]", pipeline: [pipeline] }

410

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

Where [collection] species the name of the collection that contains the data that you wish to aggregate. The pipeline argument holds an array that contains the specication for the aggregation operation. Consider the following example from the aggregation documentation (page 199).
db.runCommand( { aggregate : "article", pipeline : [ { $project : { author : 1, tags : 1, } }, { $unwind : "$tangs" }, { $group : { _id : { tags : 1 }, authors : { $addToSet : "$author" } } } ] } );

More typically this operation would use the aggregate helper in the mongo shell, and would resemble the following:
db.article.aggregate( { $project : { author : 1, tags : 1, } }, { $unwind : "$tags" }, { $group : { _id : { tags : 1 }, authors : { $addToSet : "$author" } } } );

For more aggregation documentation, please see: Aggregation Framework (page 199) Aggregation Framework Reference (page 209) Aggregation Framework Examples (page 205)

30.2.3 applyOps (internal)


applyOps Arguments operations (array) an array of operations to perform. preCondition (array) Optional. Denes one or more conditions that the destination must meet applying the entries from the <operations> array. ns to specify a namespace, q to specify a query and res to specify the result that the query should match. You may specify, zero, one, or many preCondition documents. applyOps provides a way to apply entries from an oplog created by replica set members and master instances in a master/slave deployment. applyOps is primarily an internal command to support sharding functionality, and has the following prototype form:

db.runCommand( { applyOps: [ <operations> ], preCondition: [ { ns: <namespace>, q: <query>, res:

30.2. Database Commands

411

MongoDB Documentation, Release 2.0.6

applyOps applies oplog entries from the <operations> array, to the mongod instance. The preCondition array provides the ability to specify conditions that must be true in order to apply the oplog entry. You can specify as many preCondition sets as needed. If you specify the ns option, applyOps will only apply oplog entries for the collection described by that namespace. You may also specify a query in the q eld with a corresponding expected result in the res eld that must match in order to apply the oplog entry.

30.2.4 authenticate
authenticate Clients use authenticate to authenticate a connection. When using the shell, use the command helper as follows:
db.authenticate( "username", "password" )

30.2.5 availableQueryOptions (internal)


availableQueryOptions availableQueryOptions is an internal command that is only available on mongos instances.

30.2.6 checkShardingIndex (internal)


checkShardingIndex checkShardingIndex is an internal command that supports the sharding functionality.

30.2.7 clean (internal)


clean clean is an internal command.

30.2.8 clone
clone The clone command clone a database from a remote MongoDB instance to the current host. clone copies the database on the remote instance with the same name as the current database. The command takes the following form:
{ clone: "db1.example.net:27017" }

Replace db1.example.net:27017 above with the resolvable hostname for the MongoDB instance you wish to copy from. Note the following behaviors: clone can run against a slave or a non-primary member of a replica set. clone does not snapshot the database. If the copied database is updated at any point during the clone operation, the resulting database may be inconsistent. You must run clone on the destination server. The destination server is not locked for the duration of the clone operation. This means that clone will occasionally yield to allow other operations to complete. See copydb for similar functionality. 412 Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.2.9 cloneCollection
cloneCollection The cloneCollection command copies a collection from a remote server to the server where you run the command. Options from Specify a resolvable hostname, and optional port number of the remote server where the specied collection resides. query Optional. A query document, in the form of a document, that lters the documents in the remote collection that cloneCollection will copy to the current database. See db.collection.find() (page 457). copyIndexes (Boolean) Optional. true by default. When set to false the indexes on the originating server are not copied with the documents in the collection. Consider the following example:
{ cloneCollection: "users", from: "db.example.net:27017", query: { active: true }, copyIndexes:

This operation copies the users collection from the current database on the server at db.example.net. The operation only copies documents that satisfy the query { active: true } and does not copy indexes. cloneCollection copies indexes by default, but you can disable this behavior by setting { copyIndexes: false }. The query and copyIndexes arguments are optional. cloneCollection creates a collection on the current database with the same name as the origin collection. If, in the above example, the users collection already exists, then MongoDB appends documents in the remote collection to the destination collection.

30.2.10 closeAllDatabases (internal)


closeAllDatabases closeAllDatabases is an internal command that invalidates all cursors and closes the open database les. The next operation that uses the database will reopen the le.

30.2.11 collMod
collMod New in version 2.2. collMod makes it possible to add ags to a collection to modify the behavior of MongoDB. In the current release the only available ag is usePowerof2Sizes. The command takes the following prototype form:
db.runCommand( {"collMod" : [collection] , "[flag]" : [value]" } )

In this command substitute [collection] with the name of the collection, and [flag] and [value] with the ag and value you want to set. usePowerOf2Sizes The usePowerOf2Sizes ag changes the method that MongoDB uses to allocate space on disk for documents in this collection. By setting usePowerOf2Sizes, you ensure that MongoDB will allocates space for documents in sizes that are powers of 2 (e.g. 4, 8, 16, 32, 64, 128, 512...) With this option MongoDB will be able to more effectively reuse space. usePowerOf2Sizes is useful for collections where you will be inserting and deleting large numbers of documents to ensure that MongoDB will effectively use space on disk.

30.2. Database Commands

413

MongoDB Documentation, Release 2.0.6

30.2.12 collStats
collStats The collStats command returns a variety of storage statistics for a given collection. Use the following syntax:
{ collStats: "database.collection" , scale : 1024 }

Specify a namespace database.collection and use the scale argument to scale the output. The above example will display values in kilobytes. Examine the following example output, which uses the db.collection.stats() (page 463) helper in the mongo shell.
> db.users.stats() { "ns" : "app.users", "count" : 9, "size" : 432, "avgObjSize" : 48, "storageSize" : 3840, "numExtents" : 1, "nindexes" : 2, "lastExtentSize" : 3840, "paddingFactor" : 1, "flags" : 1, "totalIndexSize" : 16384, "indexSizes" : { "_id_" : 8192, "username" : 8192 }, "ok" : 1 }

// // // // // // // // //

namespace number of documents collection size in bytes average object size in bytes (pre)allocated space for the collection number of extents (contiguously allocated chunks of d number of indexes size of the most recently created extent padding can speed up updates if documents grow

// total index size in bytes // size of specific indexes in bytes

Note: The scale factor rounds values to whole numbers. This can produce unpredictable and unexpected results in some situations. See Also: Collection Statistics Reference (page 552).

30.2.13 compact
compact New in version 2.0. The compact command rewrites and defragments a single collection. Additionally, the command drops all indexes at the beginning of compaction and rebuilds the indexes at the end. compact is conceptually similar to repairDatabase, but works on a single collection rather than an entire database. The command has the following syntax:
{ compact: <collection name> }

You may also specify one of the following options: force: true

414

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

To run on the primary node in a replica set. Otherwise, the compact command returns an error when invoked on a replica set primary because the command blocks all other activity. Changed in version 2.2: compact blocks activities only for its database. paddingFactor: <factor>

To specify a padding factor ranging from 1.0 to 4.0 for the compacted documents. Default factor is 1.0, specifying no padding and the maximum padding factor is 4.0. If you do updates that increase the size of the documents, you will want some padding, especially if you have several indexes for the collection. New in version v2.2. paddingBytes: <bytes>

To specify a padding as an absolute number of bytes. Specifying paddingBytes can be useful if your documents start small but then increase in size signicantly. For example,if your documents are initially 40 bytes long and you grow them by 1KB, using paddingBytes: 1024 might be reasonable since using paddingFactor: 4.0 would only add 40 * (4.0 - 1) = 120 bytes of padding. New in version v2.2. In production deployments, collections should always have at least 100 bytes of padding, and generally have a padding factor that is 10% of the average document size.
db.runCommand ( { compact: collection name, paddingBytes: 100, paddingFactor: 1.1 } )

Warning: Always have an up-to-date backup before performing server maintenance such as the compact operation. Note the following behaviors: compact blocks all other activity (in v2.2, blocks activities only for its database.) You may view the intermediate progress either by viewing the the mongod log le, or by running the db.currentOp() (page 466) in another shell instance. compact removes any padding factor in the collection when issued without either the paddingFactor option or the paddingByte option. This may impact performance if the documents grow regularly. However, compact retains existing paddingFactor statistics for the collection that MongoDB will use to calculate the padding factor for future inserts. compact generally uses less disk space than repairDatabase and is faster. However,the compact command is still slow and does block database activities, so you should run the command during scheduled maintenance. If you kill the operation by running the db.killOp(opid) (page 469) or restart the server before it has nished: If you have journaling enabled, your data will be safe. However, you may have to manually rebuild the indexes. If you do not have journaling enabled, the compact command is much less safe, and there are no guarantees made about the safety of your data in the event of a shutdown or a kill. In either case, much of the existing free space in the collection may become un-reusable. In this scenario, you should rerun the compaction to completion to restore the use of this free space. compact may increase the total size and number of our data les, expecially when run for the rst time. However, this will not increase the total colletion storage space since storage size is the amount of data allocated within the database les, and not the size/number of the les on the le system. compact requires a small amount of additional disk space while running but unlike repairDatabase it does not free space on the le system.

30.2. Database Commands

415

MongoDB Documentation, Release 2.0.6

You may also wish to run the collstats command before and after compaction to see how the storage space changes for the collection. compact commands do not replicate. When running compact on a replica set: Compact each member separately. Ideally, compaction runs on a secondary. (See option force:true above for information regarding compacting the primary.) If compact runs on a secondary, the secondary will enter a recovering state to prevent clients from directing treads to it during compaction. Once the compaction nishes the secondary will automatically return to secondary state. You may refer to the partial script for automating step down and compaction) for an example. compact is a command issued to a mongod. In a sharded environment, run compact on each shard separately as a maintenance operation. (This will likely change in the future with other enhancements.) It is not possible to compact capped collections because they dont have padding, and documents cannot grow in these collections. However, the documents of a capped collections are not subject o fragmentation. See Also: repairDatabase

30.2.14 connPoolStats
connPoolStats

Note: connPoolStats only returns meaningful results for mongos instances and for mongod instances in shard clusters. The command connPoolStats returns information regarding the number of open connections to the current database instance, including client connections and server-to-server connections for replication and clustering. The command takes the following form:
{ connPoolStats: 1 }

The value of the argument (i.e. 1 ) does not affect the output of the command. See Connection Pool Statistics Reference (page 557) for full documentation of the connPoolStats output.

30.2.15 connPoolSync (internal)


connPoolSync connPoolSync is an internal command.

30.2.16 convertToCapped
convertToCapped The convertToCapped command converts an existing, non-capped collection to a capped collection. Use the following syntax:
{convertToCapped: "collection", size: 100 * 1024 }

416

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

This command coverts collection, an existing collection, to a capped collection, with a maximum size of 100 KB. This command accepts the size and max options. See the create command for additional details. Warning: The convertToCapped will not recreate indexes from the original collection on the new collection. If you need indexes on this collection you will need to create these indexes after the conversion is complete.

30.2.17 copydb
copydb The copydb command copies a database from a remote host to the current host. The command has the following syntax:
{ copydb: 1: fromhost: <hostname>, fromdb: <db>, todb: <db>, slaveOk: <bool>, username: <username>, password: <password>, nonce: <nonce>, key: <key> }

All of the following arguments are optional: slaveOK username password nonce key You can omit the fromhost argument, to copy one database to another database within a single MongoDB instance. You must run this command on the the destination, or the todb server. Be aware of the following behaviors: copydb can run against a slave or a non-primary member of a replica set. In this case, you must set the slaveOk option to true. copydb does not snapshot the database. If the state of the database changes at any point during the operation, the resulting database may be inconsistent. You must run copydb on the destination server. The destination server is not locked for the duration of the copydb operation. This means that copydb will occasionally yield to allow other operations to complete. If the remote server has authentication enabled, then you must include a username and password. You must also include a nonce and a key. The nonce is a one-time password that you request from the remote server using the copydbgetnonce command. The key is a hash generated as follows:
hex_md5(nonce + username + hex_md5(username + ":mongo:" + pass))

If you need to copy a database and authenticate, its easiest to use the shell helper:

30.2. Database Commands

417

MongoDB Documentation, Release 2.0.6

db.copyDatabase(<remove_db_name>, <local_db_name>, <from_host_name>, <username>, <password>)

30.2.18 copydbgetnonce (internal)


copydbgetnonce Client libraries use copydbgetnonce to get a one-time password for use with the copydb command.

30.2.19 count
count The count command counts the number of documents in a collection. For example:
> db.runCommand( { count: "collection" } ); { "n" : 10 , "ok" : 1 }

In the mongo shell, this returns the number of documents in the collection (e.g. collection.) You may also use the count() (page 449) method on any cursor object to return a count of the number of documents in that cursor. The following operation produces the same result in the mongo shell:
> db.collection.count(): { "n" : 10 , "ok" : 1 }

The collection in this example has 10 documents.

30.2.20 create
create The create command explicitly creates a collection. The command uses the following syntax:
{ create: <collection_name> }

To create a capped collection limited to 40 KB, issue command in the following form:
{ create: "collection", capped: true, size: 40 * 1024 }

The options for creating capped collections are: Options capped Specify true to create a capped collection. size The maximum size for the capped collection. Once a capped collection reaches its max size, MongoDB will drop old documents from the database to make way for the new documents. You must specify a size argument for all capped collections. max The maximum number of documents to preserve in the capped collection. This limit is subject to the overall size of the capped collection. If a capped collection reaches its maximum size before it contains the maximum number of documents, the database will remove old documents. Thus, if you use this option, ensure that the total size for the capped collection is sufcient to contain the max. The db.createCollection() (page 465) provides a wrapper function that provides access to this functionality.

418

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.2.21 cursorInfo
cursorInfo The cursorInfo command returns information about current cursor allotment and use. Use the following form:
{ cursorInfo: 1 }

The value (e.g. 1 above,) does not effect the output of the command. cursorInfo returns the total number of open cursors (totalOpen,) the size of client cursors in current use (clientCursors_size,) and the number of timed out cursors since the last server restart (timedOut.)

30.2.22 dataSize
dataSize For internal use. The dataSize (page 552) command returns the size data size for a set of data within a certain range:

{ dataSize: "database.collection", keyPattern: { field: 1 }, min: { field: 10 }, max: { field: 1

This will return a document that contains the size of all matching documents. Replace database.collection value with database and collection from your deployment. The keyPattern, min, and max parameters are options. The amount of time required to return dataSize (page 552) depends on the amount of data in the collection.

30.2.23 dbHash (internal)


dbHash dbHash is an internal command.

30.2.24 dbStats
dbStats The dbStats command returns storage statistics for a given database. The command takes the following syntax:
{ dbStats: 1, scale: 1 }

The value of the argument (e.g. 1 above) to dbStats does not affect the output of the command. The scale option allows you to specify how to scale byte values. For example, a scale value of 1024 will display the results in kilobytes rather than in bytes. The time required to run the command depends on the total size of the database. Because the command has to touch all data les, the command may take several seconds to run. In the mongo shell, the db.stats() (page 472) function provides a wrapper around this functionality. See the Database Statistics Reference (page 551) document for an overview of this output.

30.2.25 diagLogging (internal)


diagLogging diagLogging is an internal command. 30.2. Database Commands 419

MongoDB Documentation, Release 2.0.6

30.2.26 distinct
distinct The distinct command returns an array of distinct values for a given eld across a single collection. The command takes the following form:
{ distinct: collection, key: age, query: { query: { field: { $exists: true } } } }

This operation returns all distinct values of the eld (or key) age in documents that match the query { field: { $exists: true }. Note: The query portion of the distinct is optional. The shell and many drivers provide a helper method that provides this functionality. You may prefer the following equivalent syntax:
db.collection.distinct("age", { field: { $exists: true } } );

The distinct command will use an index to locate and return data.

30.2.27 driverOIDTest (internal)


driverOIDTest driverOIDTest is an internal command.

30.2.28 drop
drop The drop command removes an entire collection from a database. The command has following syntax:
{ drop: <collection_name> }

The mongo shell provides the equivalent helper method:


db.collection.drop();

Note that this command also removes any indexes associated with the dropped collection.

30.2.29 dropDatabase
dropDatabase The dropDatabase command drops a database, deleting the associated data les. dropDatabase operates on the current database. In the shell issue the use <database> command, replacing <database> with the name of the database you wish to delete. Then use the following command form:
{ dropDatabase: 1 }

The mongo shell also provides the following equivalent helper method:
db.dropDatabase();

420

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.2.30 dropIndexes
dropIndexes The dropIndexes command drops one or all indexes from the current collection. To drop all indexes, issue the command like so:
{ dropIndexes: "collection", index: "*" }

To drop a single, issue the command by specifying the name of the index you want to drop. For example, to drop the index named age_1, use the following command:
{ dropIndexes: "collection", index: "age_1" }

The shell provides a useful command helper. Heres the equivalent command:
db.collection.dropIndex("age_1");

30.2.31 emptycapped
emptycapped The emptycapped command removes all documents from a capped collection. Use the following syntax:
{ emptycapped: "events" }

This command removes all records from the capped collection named events.

30.2.32 enableSharding
enableSharding The enableSharding command enables sharding on a per-database level. Use the following command form:
{ enableSharding: 1 }

Once youve enabled sharding in a database, you can use the shardCollection command to begin the process of distributing data among the shards.

30.2.33 eval
eval The eval command evaluates JavaScript functions on the database server. Consider the following (trivial) example:
{ eval: function() { return 3+3 } }

The shell also provides a helper method, so you can express the above as follows:
db.eval( function { return 3+3 } } );

The shells JavaScript interpreter evaluates functions entered directly into the shell. If you want to use the servers interpreter, you must run eval. Be aware of following behaviors and limitations: eval does not work in sharded environments.

30.2. Database Commands

421

MongoDB Documentation, Release 2.0.6

The eval operation take a write lock by default. This means that writes to database arent permitted while its running. You can, however, disable the lock by setting the nolock ag to true. For example:
{ eval: function() { return 3+3 }, nolock: true }

Warning: Do not disable the write lock if the operation may modify the contents of the database in anyway. There are some circumstances where the eval() implements a strictly-read only operation that need not block other operations when disabling the write lock may be useful. Use this functionality with extreme caution.

30.2.34 features (internal)


features features is an internal command that returns the build-level feature settings.

30.2.35 lemd5
filemd5 The filemd5 command returns the md5 hashes for a single les stored using the GridFS specication. Client libraries use this command to verify that les are correctly written to MongoDB. The command takes the files_id of the le in question and the name of the GridFS root collection as arguments. For example:
{ filemd5: ObjectId("4f1f10e37671b50e4ecd2776"), root: "fs" }

30.2.36 ndAndModify
findAndModify The findAndModify command atomically modies and returns a single document. The shell and many drivers provide a findAndModify() helper method. The command has the following prototype form:
{ findAndModify: "collection", <options> }

Replace, collection with the name of the collection containing the document that you want to modify, and specify options, as a sub-document that species the following: Fields query A query object. This statement might resemble the document passed to db.collection.find() (page 457), and should return one document from the database. sort Optional. If the query selects multiple documents, the rst document given by this sort clause will be the one modied. remove When true, findAndModify removes the selected document. update an update operator to modify the selected document. new when true, returns the modied document rather than the original. findAndModify ignores the new option for remove operations. elds a subset of elds to return. See projection operators for more information.

422

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

upsert when true, creates a new document if the specied query returns no documents. The default is false. The return value for these operations is null in version 2.2 Changed in version 2.2: Previously, upsert operations returned a an empty document (e.g. { },) see the 2.2 release notes (page 588) for more information. For example:
{ findAndModify: "people", { query: { name: "Tom", state: "active", rating: { $gt: 10 } }, sort: { rating: 1 }, update: { $inc: { score: 1 } } } }

This operation, nds a document in the people collection where the name eld has the value Tom, the active value in the state eld and a value in the rating eld greater than 10. If there is more than one result for this query, MongoDB sorts the results of the query in descending order, and increments the value of the score eld by 1. Using the shell helper, this same operation can take the following form:
db.people.findAndModify( { query: { name: "Tom", state: "active", rating: { $gt: 10 } }, sort: { rating: 1 }, update: { $inc: { score: 1 } } } );

Warning: When using findAndModify in a sharded environment, the query must contain the shard key for all operations against the shard cluster. findAndModify operations issued against mongos instances for non-sharded collections function normally.

30.2.37 ushRouterCong
flushRouterConfig flushRouterConfig clears the current cluster information cached by a mongos instance and reloads all shard cluster metadata from the cong database. This forces an update when the conguration database holds data that is newer that the data cached in the mongos process. Warning: Do not modify the cong data, except as explicitly documented. A cong database cannot typically tolerate manual manipulation. flushRouterConfig is an administrative command that is only available for mongos instances. New in version 1.8.2.

30.2.38 forceerror (internal)


forceerror The forceerror command is for testing purposes only. Use forceerror to force a user assertion exception. This command always returns an ok value of 0.

30.2.39 fsync
fsync The fsync command forces the mongod process to ush all pending writes to the storage layer. mongod is

30.2. Database Commands

423

MongoDB Documentation, Release 2.0.6

always writing data to the storage layer as applications write more data to the database. MongoDB guarantees that it will write all data to disk within the syncdelay (page 500) interval, which is 60 seconds by default.
{ fsync: 1 }

The fsync operation is synchronous by default, to run fsync asynchronously, use the following form:
{ fsync: 1, async: true }

The connection will return immediately. You can check the output of db.currentOp() (page 466) for the status of the fsync operation. The primary use of fsync is to lock the database during backup operations. This will ush all data to the data storage layer and block all write operations until you unlock the database. Consider the following command form:
{ fsync: 1, lock: true }

Note: You may continue to perform read operations on a database that has a fsync lock. However, following the rst write operation all subsequent read operations wait until you unlock the database. To check on the current state of the fsync lock, use db.currentOp() (page 466). Use the following JavaScript function in the shell to test if the database is currently locked:
serverIsLocked = function () { var co = db.currentOp(); if (co && co.fsyncLock) { return true; } return false; }

After loading this function into your mongo shell session you can call it as follows:
serverIsLocked()

This function will return true if the database is currently locked and false if the database is not locked. To unlock the database, make a request for an unlock using the following command:
db.getSiblingDB("admin").$cmd.sys.unlock.findOne();

New in version 1.9.0: The db.fsyncLock() (page 467) and db.fsyncUnlock() (page 467) helpers in the shell. In the mongo shell, you may use the db.fsyncLock() (page 467) and db.fsyncUnLock() wrappers for the fsync lock and unlock process:
db.fsyncLock(); db.fsyncUnlock();

Note: fsync lock is only possible on individual shards of a shard cluster, not on the entire shard cluster. To backup an entire shard cluster, please read considerations for backing up shard clusters (page 162). If your mongod has journaling enabled, consider using another method (page 157) to back up your database.

Note: The database cannot be locked with db.fsyncLock() (page 467) while proling is enabled. You must disable proling before locking the database with db.fsyncLock() (page 467). Disable proling using db.setProfilingLevel() (page 471) as follows in the mongo shell:

424

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

db.setProfilingLevel(0)

30.2.40 geoNear
geoNear The geoNear command provides an alternative to the $near operator. In addition to the functionality of $near, geoNear returns the distance of each item from the specied point along with additional diagnostic information. For example:
{ geoNear : "places" , near : [50,50], num : 10 }

Here, geoNear returns the 10 items nearest to the coordinates [50,50] in the collection named places. geoNear provides the following options (specify all distances in the same units as the document coordinate system:) Fields near Takes the coordinates (e.g. [ x, y ]) to use as the center of a geospatial query. num Species the maximum number of documents to return. maxDistance Optional. Limits the results to those falling within a given distance of the center coordinate. query Optional. Further narrows the results using any standard MongoDB query operator or selection. See db.collection.find() (page 457) and /reference/operators for more information. spherical Optional. Default: false. When true MongoDB will return the query as if the coordinate system references points on a spherical plane rather than a plane. distanceMultipler Optional. Species a factor to multiply all distances returned by geoNear. For example, use distanceMultiplier to convert from spherical queries returned in radians to linear units (i.e. miles or kilometers) by multiplying by the radius of the Earth. includeLocs Optional. Default: false. When specied true, the query will return the location of the matching documents in the result. uniqueDocs Optional. Default true. The default settings will only return a matching document once, even if more than one of its location elds match the query. When false the query will return documents with multiple matching location elds more than once. See $uniqueDocs for more information on this option

30.2.41 geoSearch
geoSearch The geoSearch command provides an interface to MongoDBs haystack index functionality. These indexes are useful for returning results based on location coordinates after collecting results based on some other query (i.e. a haystack.) Consider the following example:

{ geoSearch : "places", near : [33, 33], maxDistance : 6, search : { type : "restaurant" }, limi

The above command returns all documents with a type of restaurant having a maximum distance of 6 units from the coordinates [30,33] in the collection places up to a maximum of 30 results. Unless specied otherwise, the geoSearch command limits results to 50 documents.

30.2. Database Commands

425

MongoDB Documentation, Release 2.0.6

30.2.42 geoWalk
geoWalk geoWalk is an internal command.

30.2.43 getCmdLineOpts
getCmdLineOpts The getCmdLineOpts command returns a document containing command line options used to start the given mongod:
{ getCmdLineOpts: 1 }

This command returns a document with two elds, argv and parsed. The argv eld contains an array with each item from the command string used to invoke mongod. The document in the parsed eld includes all runtime options, including those parsed from the command line and those specied in the conguration le (if specied.)

30.2.44 getLastError
getLastError The getLastError command returns the error status of the last operation on the current connection. By default MongoDB does not provide a response to conrm the success or failure of a write operation, clients typically use getLastError in combination with write operations to ensure that the write succeeds. Consider the following prototype form.
{ getLastError: 1 }

The following options are available: Options j (boolean) If true, wait for the next journal commit before returning, rather than a full disk ush. If mongod does not have journaling enabled, this option has no effect. w When running with replication, this is the number of servers to replica to before returning. A w value of 1 indicates the primary only. A w value of 2 includes the primary and at least one secondary, etc. In place of a number, you may also set w to majority to indicate that the command should wait until the latest write propagates to a majority of replica set members. If using w, you should also use wtimeout. Specifying a value for w without also providing a wtimeout may cause getLastError to block indenitely. fsync (boolean) If true, wait for mongod to write this data to disk before returning. Defaults to false. In most cases, use the j option to ensure durability and consistency of the data set. wtimeout (integer) (Milliseconds; Optional.) Specify a value in milliseconds to control how long the to wait for write propagation to complete. If replication does not complete in the given timeframe, the getLastError command will return with an error status. See Also: Replica Set Write Concern (page 50) and db.getLastError() (page 467).

426

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.2.45 getLog
getLog The getLog command returns a document with a log array that contains recent messages from the mongod process log. The getLog command has the following syntax:
{ getLog: <log> }

Replace <log> with one of the following values: global - returns the combined output of all recent log entries. rs - if the mongod is part of a replica set, getLog will return recent notices related to replica set activity. startupWarnings - will return logs that may contain errors or warnings from MongoDBs log from when the current process started. If mongod started without warnings, this lter may return an empty array. You may also specify an asterisk (e.g. *) as the <log> value to return a list of available log lters. The following interaction from the mongo shell connected to a replica set:
db.adminCommand({getLog: "*" }) { "names" : [ "global", "rs", "startupWarnings" ], "ok" : 1 }

getLog returns events from a RAM cache of the mongod events and does not read log data from the log :le.

30.2.46 getParameter
getParameter getParameter is an administrative command for retrieving the value of options normally set on the command line. Issue commands against the admin database as follows:
{ getParameter: 1, <option>: 1 }

The values specied for getParameter and <option> do not affect the output. The command works with the following options: quiet notablescan logLevel syncdelay See Also: setParameter for more about these parameters.

30.2.47 getPrevError
getPrevError The getPrevError command returns the errors since the last resetError command. See Also: db.getPrevError() (page 468)

30.2. Database Commands

427

MongoDB Documentation, Release 2.0.6

30.2.48 getShardMap (internal)


getShardMap getShardMap is an internal command that supports the sharding functionality.

30.2.49 getShardVersion (internal)


getShardVersion getShardVersion (page 473) is an internal command that supports sharding functionality.

30.2.50 getnonce (internal)


getnonce Client libraries use getnonce to generate a one-time password for authentication.

30.2.51 getoptime (internal)


getoptime getoptime is an internal command.

30.2.52 godinsert (internal)


godinsert godinsert is an internal command for testing purposes only.

30.2.53 group
group The group command returns an array of grouped items. group provides functionality analogous to the GROUP BY statement in SQL. Consider the following example:
db.users.runCommand( { group: { key: { school_id: true }, cond: { active: 1 }, reduce: function(obj, prev) { obj.total += 1; }, initial: { total: 0 } } } );

More typically, in the mongo shell, you will call the group command using the group() method. Consider the following form:
db.users.group( { key: { school_id: true }, cond: { active: 1 }, reduce: function(obj, prev) { obj.total += 1; }, initial: { total: 0 } } );

In these examples group runs against the collection users and counts the total number of active users from each school. Fields allowed by the group command include: Fields key (document) Specify one or more elds to group by. Use the form of a document. 428 Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

reduce Specify a reduce function that operates over all the iterated objects. Typically these aggregator functions perform some sort of summing or counting. The reduce function takes two arguments: the current document and an aggregation counter object. initial The starting value of the aggregation counter object. keyf Optional. A function that returns a key object for use as the grouping key. Use keyf instead of key to specify a key that is not a single/multiple existing elds. For example, use keyf to group by day or week in place of a xed key. cond Optional. A statement that must evaluate to true for the db.collection.group() (page 459) to process this document. Essentially this argument species a query document (as for db.collection.find() (page 457)). Unless specied, db.collection.group() (page 459) runs the reduce function against all documents in the collection. nalize Optional. A function that runs each item in the result set before db.collection.group() (page 459) returns the nal value. This function can either modify the document by computing and adding an average eld, or return compute and return a new document. Note: The result set of the db.collection.group() (page 459) must t within the size maximum BSON document (page 573). Furthermore, you must ensure that there are fewer then 10,000 unique keys. If you have more than this, use mapReduce. Warning: group() does not work in shard environments. Use the aggregation framework or map-reduce (i.e. mapReduce in sharded environments.

30.2.54 handshake (internal)


handshake handshake is an internal command.

30.2.55 isMaster
isMaster The isMaster command provides a basic overview of the current replication conguration. MongoDB drivers and clients use this command to determine what kind of member theyre connected to and to discover additional members of a replica set. The db.isMaster() (page 469) method provides a wrapper around this database command. The command takes the following form:
{ isMaster: 1 }

This command returns a document containing the following elds: isMaster.setname The name of the current replica set, if applicable. isMaster.ismaster A boolean value that reports when this node is writable. If true, then the current node is either a primary node in a replica set, a master node in a master-slave conguration, of a standalone mongod.

30.2. Database Commands

429

MongoDB Documentation, Release 2.0.6

isMaster.secondary A boolean value that, when true, indicates that the current node is a secondary member of a replica set. isMaster.hosts An array of strings in the format of [hostname]:[port] listing all nodes in the replica set that are not hidden. isMaster.primary The [hostname]:[port] for the current replica set primary, if applicable. isMaster.me The [hostname]:[port] of the node responding to this command. isMaster.maxBsonObjectSize The maximum permitted size of a BSON object in bytes for this mongod process. If not provided, clients should assume a max size of 4 * 1024 * 1024. isMaster.localTime New in version 2.1.1. Returns the local server time in UTC. This value is a ISOdate. You can use the toString() JavaScript method to convert this value to a local date string, as in the following example:
db.isMaster().localTime.toString();

30.2.56 isSelf (internal)


_isSelf _isSelf is an internal command.

30.2.57 isdbGrid
isdbGrid Use this command to determine if the process is a mongos or a mongod. Consider the following command prototype:
{ isdbGrid: 1 }

If connected to a mongos, the response document resembles the following:


{ "isdbgrid" : 1, "hostname" : "app.example.net", "ok" : 1 }

You can also use the isMaster command, which when connected to a mongos, contains the string isdbgrid in the msg eld of its output document.

30.2.58 journalLatencyTest
journalLatencyTest journalLatencyTest (page 430) is an admin command that tests the length of time required to write and perform a le system sync (e.g. fsync) for a le in the journal directory. The command syntax is:
{ journalLatencyTest: 1 }

The value (i.e. 1 above), does not affect the operation of the command.

430

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.2.59 listCommands
listCommands The listCommands command generates a list of all database commands implemented for the current mongod instance.

30.2.60 listDatabases
listDatabases The listDatabases command provides a list of existing databases along with basic statistics about them:
{ listDatabases: 1 }

The value (e.g. 1) does not effect the output of the command. listDatabases returns a document for each database Each document contains a name eld with the database name, a sizeOnDisk eld with the total size of the database le on disk in bytes, and an empty eld specifying whether the database has any data.

30.2.61 listShards
listShards Use the listShards command to return a list of congured shards. The command takes the following form:
{ listShards: 1 }

30.2.62 logRotate
logRotate logRotate is an admin only command that allows you to rotate the MongoDB logs to prevent a single logle from consuming too much disk space. Use the following syntax:
{ logRotate: 1 }

Note: Your mongod instance needs to be running with the --logpath [file] (page 486) option. You may also rotate the logs by sending a SIGUSR1 signal to the mongod process. If your mongod has a process ID of 2200, heres how to send the signal on Linux:
kill -SIGUSR1 2200

The rotated les will have a timestamp appended to the lename. Note: The logRotate command is not available to mongod instances running on windows systems.

30.2.63 logout
logout The logout command terminates the current authenticated session:
{ logout: 1 }

30.2. Database Commands

431

MongoDB Documentation, Release 2.0.6

Note: If youre not logged in and using authentication, this command will have no effect.

30.2.64 mapReduce
mapReduce The mapReduce command allows you to run map-reduce-style aggregations over a collection. Options map A JavaScript function that performs the map step of the map-reduce operation. This function references the current input document and calls the emit(key,value) method that supplies values to the reduce function. Map functions may call emit(), once, more than once, or not at all depending on the type of aggregation. reduce A JavaScript function that performs the reduce step of the MapReduce operation. The reduce function receives an array of emitted values from the map function, and returns a single value. Because its possible to invoke the reduce function more than once for the same key, the structure of the object returned by function must be identical to the structure of the emitted function. out Species the location of the out of the reduce stage of the operation. Specify a string to write the output of the Map/Reduce job to a collection with that name. The map-reduce operation will replace the content of the specied collection in the current database by default. See below for additional options. query Optional. A query object, like the query used by the db.collection.find() (page 457) method. Use this to lter to limit the number of documents enter the map phase of the aggregation. sort Optional. Sorts the input objects using this key. This option is useful for optimizing the job. Common uses include sorting by the emit key so that there are fewer reduces. limit Optional. Species a maximum number of objects to return from the collection. nalize Optional. Species an optional nalize function to run on a result, following the reduce stage, to modify or control the output of the mapReduce operation. scope Optional. Place a document as the contents of this eld, to place elds into the global javascript scope. jsMode (Boolean) Optional. The jsMode option defaults to false. verbose (Boolean) Optional. The verbose option provides statistics on job execution times. mapReduce only require map and reduce options, all other elds are optional. You must write all map and reduce functions in JavaScript. The out eld of the mapReduce, provides a number of additional conguration options that you may use to control how MongoDB returns data from the map-reduce job. Consider the following 4 output possibilities. Arguments replace Optional. Specify a collection name (e.g. { out: { replace: collectionName } }) where the output of the map-reduce overwrites the contents of the collection specied (i.e. collectionName) if there is any data in that collection. This is the default behavior if you only specify a collection name in the out eld.

432

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

merge Optional. Specify a collection name (e.g. { out: { merge: collectionName } }) where the map-reduce operation writes output to an existing collection (i.e. collectionName,) and only overwrites existing documents when a new document has the same key as an old document in this collection. reduce Optional. This operation behaves as the merge option above, except that when an existing document has the same key as a new document, reduce function from the map reduce job will run on both values and MongoDB writes the result of this function to the new collection. The specication takes the form of { out: { reduce: collectionName } }, where collectionName is the name of the results collection. inline Optional. Indicate the inline option (i.e. { out: { inline: 1 } }) to perform the map reduce job in ram and return the results at the end of the function. This option is only possible when the entire result set will t within the maximum size of a BSON document (page 573). When performing map-reduce jobs on secondary members of replica sets, this is the only available option. db Optional. The name of the database that you want the map-reduce operation to write its output. By default this will be the same database as the input collection. sharded Optional. If true, and the output mode writes to a collection, and the output database has sharding enabled, the map-reduce operation will shard the results collection according to the _id eld. See Also: mapReduce() and map-reduce. Also, the MapReduce page, provides a greater overview of MongoDBs map-reduce functionality. Consider the Simple application support for basic aggregation operations and Aggregation Framework (page 199) for a more exible approach to data aggregation in MongoDB.

30.2.65 mapreduce.shardednish (internal)


mapreduce.shardedfinish Provides internal functionality to support map-reduce in sharded environments. See Also: mapReduce

30.2.66 medianKey (internal)


medianKey medianKey is an internal command.

30.2.67 migrateClone (internal)


_migrateClone _migrateClone is an internal command. Do not call directly.

30.2. Database Commands

433

MongoDB Documentation, Release 2.0.6

30.2.68 moveChunk
moveChunk moveChunk is an internal administrative command that moves chunks between shards. The command has the following prototype form:
db.runCommand( { moveChunk : <namespace> , find : <query> , to : <destination>, <options> } )

Arguments moveChunk (command) The name of the collection which the chunk exists. Specify the collections full namespace, including the database name. nd A query expression that will select a document within the chunk you wish to move. The query need not specify the shard key. to The identier of the shard, that you want to migrate the chunk to. _secondaryThrottle Optional. Set to false by default. Provides write concern (page 50) support for chunk migrations. If you set _secondaryThrottle to true, during chunk migrations when a shard hosted by a replica set, the mongod will wait until the secondary members replicate the migration operations continuing to migrate chunk data. You may also congure _secondaryThrottle in the balancer conguration. Use the sh.moveChunk() (page 482) helper in the mongo shell to migrate chunks manually. The chunk migration (page 121) section describes how chunks move between shards on MongoDB. moveChunk will return the following if another cursor is using the chunk you are moving:
errmsg: "The collections metadata lock is already taken."

These errors usually occur when there are too many open cursors accessing the chunk you are migrating. You can either wait until the cursors complete their operation or close the cursors manually. Note: Only use the moveChunk in special circumstances such as preparing your shard cluster for an initial ingestion of data, or a large bulk import operation. See Create Chunks (Pre-Splitting) (page 105) for more information.

30.2.69 movePrimary
movePrimary In a shard cluster, this command reassigns the databases primary shard. The primary shard for a database holds all un-sharded collections in the database. movePrimary is an administrative command that is only available for mongos instances. Only use movePrimary when removing a shard from a shard cluster. movePrimary changes the primary shard for this database in the cluster metadata, and migrates all un-sharded collections to the specied shard. Use the command with the following form:
{ moveprimary : "test", to : "shard0001" }

When the command returns, the databases primary location will shift to the designated shard. To fully decommission a shard, use the removeshard command.

434

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

Warning: Before running movePrimary you must ensure that no sharded data exists on this shard. You must drain this shard before running this command because it will move all data in this database from this shard. Use the removeshard command to migrate sharded data from this shard. If you do not remove all sharded data collections before running movePrimary your shard cluster may have orphaned and unreachable data. See Also: Remove Shards from an Existing Shard Cluster (page 127)

30.2.70 netstat (internal)


netstat netstat is an internal command that is only available on mongos instances.

30.2.71 ping
ping The ping command is a no-op used to test whether a server is responding to commands. This command will return immediately even if the server is write-locked:
{ ping: 1 }

The value (e.g. 1 above,) does not impact the behavior of the command.

30.2.72 printShardingStatus
printShardingStatus Returns data regarding the status of a shard cluster and includes information regarding the distribution of chunks. printShardingStatus is only available when connected to a shard cluster via a mongos. Typically, you will use the sh.status() (page 484) mongo shell wrapper to access this data.

30.2.73 prole
profile Use the profile (page 499) command to enable, disable, or change the query proling level. This allows administrators to capture data regarding performance. The database proling system can impact performance and can allow the server to write the contents of queries to the log, which might information security implications for your deployment. Consider the following prototype syntax:
{ profile: <level> }

The following proling levels are available: Level 0 1 2 Setting Off. No proling. On. Only includes slow operations. On. Includes all operations.

You may optionally set a threshold in milliseconds for proling using the slowms option, as follows:
{ profile: 1, slowms: 200 }

30.2. Database Commands

435

MongoDB Documentation, Release 2.0.6

mongod writes the output of the database proler to the system.profile collection. mongod records a record of queries that take longer than the slowms (page 500) to the log even when the database proler is not active. See Also: Additional documentation regarding database proling Database Proling (page 150). See Also: db.getProfilingStatus() (page 468) and db.setProfilingLevel() (page 471) provide wrappers around this functionality in the mongo shell. Note: The database cannot be locked with db.fsyncLock() (page 467) while proling is enabled. You must disable proling before locking the database with db.fsyncLock() (page 467). Disable proling using db.setProfilingLevel() (page 471) as follows in the mongo shell:
db.setProfilingLevel(0)

30.2.74 reIndex
reIndex The reIndex command rebuilds all indexes for a specied collection. Use the following syntax:
{ reIndex: "collection" }

Normally, MongoDB compacts indexes during routine updates. For most users, the reIndex is unnecessary. However, it may be worth running if the collection size has changed signicantly or if the indexes are consuming a disproportionate amount of disk space. Note that the reIndex command will block the server against writes and may take a long time for large collections. Call reIndex using the following form:
db.collection.reIndex();

30.2.75 recvChunkAbort (internal)


_recvChunkAbort _recvChunkAbort is an internal command. Do not call directly.

30.2.76 recvChunkCommit (internal)


_recvChunkCommit _recvChunkCommit is an internal command. Do not call directly.

30.2.77 recvChunkStart (internal)


_recvChunkStart _recvChunkStart is an internal command. Do not call directly.

436

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.2.78 recvChunkStatus (internal)


_recvChunkStatus _recvChunkStatus is an internal command. Do not call directly.

30.2.79 removeShard
removeShard Starts the process of removing a shard from a shard cluster. This is a multi-stage process. Begin by issuing the following command:
{ removeShard : "[shardName]" }

The balancer will then migrating chunks from the shard specied by [shardName]. This process happens slowly to avoid placing undue load on the overall cluster. The command returns immediately, with the following message:
{ msg : "draining started successfully" , state: "started" , shard: "shardName" , ok : 1 }

If you run the command again, youll see the following progress output:
{ msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 23 , dbs: 1 }, ok: 1 }

The remaining document species how many chunks and databases remain on the shard. printShardingStatus to list the databases that you must move from the shard.

Use

Each database in a shard cluster has a primary shard. If the shard you want to remove is also the primary of one the clusters databases, then you must manually move the database to a new shard. This can be only after the shard is empty. See the movePrimary command for details. After removing all chunks and databases from the shard, you may issue the command again, to return:
{ msg: "remove shard completed successfully , stage: "completed", host: "shardName", ok : 1 }

30.2.80 renameCollection
renameCollection The renameCollection command changes the name of an existing collection. Use the following form to rename the collection named things to events:
{ renameCollection: "store.things", to: "store.events" }

You must run this command against the admin database. and thus requires you to specify the complete namespace (i.e., database name and collection name.) Note: renameCollection will not succeed if you have existing data, or indexes, in the new collection. The shell helper renameCollection() provides a more simple interface for this functionality. The following is equivalent to the previous example:
db.things.renameCollection( "events" )

renameCollection operates by changing the metadata associated with a given collection. The duration of the command is constant and not related to the size of the collection or the indexes; however, the operation will invalidate open cursors which will interrupt queries that are currently returning data.

30.2. Database Commands

437

MongoDB Documentation, Release 2.0.6

You may safely use renameCollection in production environments. Warning: You cannot use renameCollection with sharded collections.

30.2.81 repairDatabase
repairDatabase Warning: In general, if you have an intact copy of your data, such as would exist on a very recent backup or an intact member of a replica set, do not use repairDatabase or related options like db.repairDatabase() (page 471) in the mongo shell or mongod --repair (page 488). Restore from an intact copy of your data.

Note: When using journaling, there is almost never any need to run repairDatabase. In the event of an unclean shutdown, the server will be able restore the data les to a pristine state automatically. The repairDatabase command checks and repairs errors and inconsistencies with the data storage. The command is analogous to a fsck command for le systems. If your mongod instance is not running with journaling the system experiences an unexpected system restart or crash, and you have no other intact replica set members with this data, you should run the repairDatabase command to ensure that there are no errors in the data storage. As a side effect, the repairDatabase command will compact the database, as the compact command, and also reduces the total size of the data les on disk. The repairDatabase command will also recreate all indexes in the database. Use the following syntax:
{ repairDatabase: 1 }

Be aware that this command can take a long time to run if your database is large. In addition, it requires a quantity of free disk space equal to the size of your database. If you lack sufcient free space on the same volume, you can mount a separate volume and use that for the repair. In this case, you must run the command line and use the --repairpath (page 489) switch to specify the folder in which to store the temporary repair les. This command is accessible via a number of different avenues. You may: Use the shell to run the above command, as above. Use the db.repairDatabase() (page 471) in the mongo shell. Run mongod directly from your systems shell. Make sure that mongod isnt already running, and that you issue this command as a user that has access to MongoDBs data les. Run as:
$ mongod --repair

To add a repair path:


$ mongod --repair --repairpath /opt/vol2/data

Note: This command will fail if your database is not a master or primary. In most cases, you should recover a corrupt secondary using the data from an existing intact node. If you must repair a secondary or slave node, rst restart the node as a standalone mongod by omitting the --replSet (page 490) or --slave (page 490) options, as necessary. 438 Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.2.82 replSetElect (internal)


replSetElect replSetElect is an internal command that support replica set functionality.

30.2.83 replSetFreeze
replSetFreeze The replSetFreeze command prevents a replica set member from seeking election for the specied number of seconds. Use this command in conjunction with the replSetStepDown (page 441) command to make a different node in the replica set a primary. The replSetFreeze command uses the following syntax:
{ replSetFreeze: <seconds> }

If you want to unfreeze a replica set member before the specied number of seconds has elapsed, you can issue the command with a seconds value of 0:
{ replSetFreeze: 0 }

Restarting the mongod process also unfreezes a replica set member. replSetFreeze is an administrative command, and you must issue the it against the admin database.

30.2.84 replSetFresh (internal)


replSetFresh replSetFresh is an internal command that supports replica set functionality.

30.2.85 replSetGetRIBD (internal)


replSetGetRBID replSetGetRBID is an internal command that supports replica set functionality.

30.2.86 replSetGetStatus
replSetGetStatus The replSetGetStatus command returns the status of the replica set from the point of view of the current server. You must run the command against the admin database. The command has the following prototype format:
{ replSetGetStatus: 1 }

However, you can also run this command from the shell like so:
rs.status()

See Also: Replica Status Reference (page 558) and Replication Fundamentals (page 33)

30.2. Database Commands

439

MongoDB Documentation, Release 2.0.6

30.2.87 replSetHeartbeat (internal)


replSetHeartbeat replSetHeartbeat is an internal command that supports replica set functionality.

30.2.88 replSetInitiate
replSetInitiate The replSetInitiate command initializes a new replica set. Use the following syntax:
{ replSetInitiate : <config_document> }

The <config_document> is a document that species the replica sets conguration. For instance, heres a cong document for creating a simple 3-member replica set:
{ _id : <setname>, members : [ {_id : 0, host : <host0>}, {_id : 1, host : <host1>}, {_id : 2, host : <host2>}, ] }

A typical way of running this command is to assign the cong document to a variable and then to pass the document to the rs.initiate() (page 477) helper:
config = { _id : "my_replica_set", members : [ {_id : 0, host : "rs1.example.net:27017"}, {_id : 1, host : "rs2.example.net:27017"}, {_id : 2, host : "rs3.example.net", arbiterOnly: true}, ] } rs.initiate(config) Notice that omitting the port cause the host to use the default port of 27017. Notice also that you can specify other options in the config documents such as the arbiterOnly setting in this example.

See Also: Replica Set Conguration (page 561), Replica Set Administration (page 38), and Replica Set Reconguration (page 564).

30.2.89 replSetRecong
replSetReconfig The replSetReconfig command modies the conguration of an existing replica set. You can use this command to add and remove members, and to alter the options set on existing members. Use the following syntax:
{ replSetReconfig: <new_config_document>, force: false }

440

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

You may also run the command using the shells rs.reconfig() (page 478) method. Be aware of the following replSetReconfig behaviors: You must issue this command against the admin database of the current primary member of the replica set. You can optionally force the replica set to accept the new conguration by specifying force: true. Use this option if the current member is not primary or if a majority of the members of the set are not accessible. Warning: Forcing the replSetReconfig command can lead to a rollback situation. Use with caution. Use the force option to restore a replica set to new servers with different hostnames. This works even if the set members already have a copy of the data. A majority of the sets members must be operational for the changes to propagate properly. This command can cause downtime as the set renegotiates primary-status. Typically this is 10-20 seconds, but could be as long as a minute or more. Therefore, you should attempt to recongure only during scheduled maintenance periods. In some cases, replSetReconfig forces the current primary to step down, initiating an election for primary among the members of the replica set. When this happens, the set will drop all current connections.

30.2.90 replSetStepDown
replSetStepDown Options force (boolean) Forces the primary to step down even if there arent any secondary members within 10 seconds of the primarys latest optime. This option is not available in versions of mongod before 2.0. The replSetStepDown (page 441) command forces the primary of the replica set to relinquish its status as primary. This initiates an election for primary (page 58). You may specify a number of seconds for the node to avoid election to primary:
{ replSetStepDown: <seconds> }

If you do not specify a value for <seconds>, replSetStepDown will attempt to avoid reelection to primary for 60 seconds. Warning: This will force all clients currently connected to the database to disconnect. This help to ensure that clients maintain an accurate view of the replica set. New in version 2.0: If there is no secondary, within 10 seconds of the primary, replSetStepDown (page 441) will not succeed to prevent long running elections.

30.2.91 replSetSyncFrom
replSetSyncFrom New in version 2.2. Options host Species the name and port number of the set member that you want this member to sync from. Use the [hostname]:[port] form.

30.2. Database Commands

441

MongoDB Documentation, Release 2.0.6

replSetSyncFrom allows you to explicitly congure which host the current mongod will pull data from. This operation may be useful for testing different patterns and in situations where a set member is not syncing from the host you want. You may not use this command to force a member to sync from: itself. an arbiter. a member that does not build indexes. an unreachable member. a mongod instance that is not a member of the same replica set. If you attempt to sync from a member that is more than 10 seconds behind the current member, mongod will return and log a warning, but will sync from such members. The command has the following prototype form:
{ replSetSyncFrom: "[hostname]:[port]" }

To run the command in the mongo shell, use the following invocation:
db.adminCommand( { replSetSyncFrom: "[hostname]:[port]" } )

You may also use the rs.syncFrom() (page 479) helper in the mongo shell, in an operation with the following form:
rs.syncFrom("[hostname]:[port]")

30.2.92 replSetTest (internal)


replSetTest replSetTest is internal diagnostic command used for regression tests that supports replica set functionality.

30.2.93 resetError
resetError The resetError command resets the last error status. See Also: db.resetError() (page 471)

30.2.94 resync
resync The resync command forces an out-of-date slave mongod instance to re-synchronize itself. Note that this command is relevant to master-slave replication only. It does no apply to replica sets.

30.2.95 serverStatus
serverStatus The serverStatus command returns a document that provides an overview of the database processs state. Most monitoring applications run this command at a regular interval to collection statistics about the instance:

442

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

{ serverStatus: 1 }

The value (i.e. 1 above), does not affect the operation of the command. See Also: db.serverStatus() (page 471) and Server Status Reference (page 537)

30.2.96 setParameter
setParameter setParameter is an administrative command for modifying options normally set on the command line. You must issue the setParameter command against the admin database in the form:
{ setParameter: 1, <option>: <value> }

Replace the <option> with one of the following options supported by this command: Options journalCommitInterval (integer) Specify an integer between 1 and 500 specifying the number of milliseconds (ms) between journal commits. logLevel (integer) Specify an integer between 0 and 5 signifying the verbosity of the logging, where larger is more verbose. notablescan (boolean) If true, queries that do not using an index will fail. traceExceptions (boolean) If true, mongod will log full stack traces on assertions or errors. This parameter is only available in version 2.1 and later. quiet (boolean) Enables a quiet logging mode when true. Use false to disable. Quiet logging removes the following messages from the log: connection events; the drop, dropIndexes, diagLogging, validate, and clean; and replication synchronization activity. syncdelay (integer) Specify the interval, in seconds, between fsyncs (i.e., ushes of memory to disk). By default, mongod will ush memory to disk every 60 seconds. Do not change this value unless you see a background ush average greater than 60 seconds.

30.2.97 setShardVersion
setShardVersion setShardVersion is an internal command that supports sharding functionality.

30.2.98 shardCollection
shardCollection The shardCollection command marks a collection for sharding and will allow data to begin distributing among shards. You must run enableSharding on a database before running the shardCollection command.
{ shardcollection: "<db>.<collection>", key: "<shardkey>" }

This enables sharding for the collection specied by <collection> in the database named <db>, using the key <shardkey> to distribute documents among the shard.

30.2. Database Commands

443

MongoDB Documentation, Release 2.0.6

Choosing the right shard key to effectively distribute load among your shards requires some planning. See Also: Sharding (page 91) for more information related to sharding. Also consider the section on Shard Keys (page 96) for documentation regarding shard keys. Warning: Theres no easy way to disable sharding once youve enabled it. In addition, shard keys are immutable. If you must revert a shard clustero to a single node or replica set, youll have to make a single backup of the entire cluster and then restore the backup to the standalone mongod.

30.2.99 shardingState
shardingState The shardingState command returns true if the mongod instance is a member of a shard cluster. Run the command using the following syntax:
{ shardingState: 1 }

30.2.100 shutdown
shutdown The shutdown command cleans up all database resources and then terminates the process. The command has the following form:
{ shutdown: 1 }

Note: Run the shutdown against the admin database. When using shutdown, the connection must originate from localhost or use an authenticated connection. If the node youre trying to shut down is a replica set (page 33) primary, then the command will succeed only if there exists a secondary node whose oplog data is within 10 seconds of the primary. You can override this protection using the force option:
{ shutdown: 1, force: true }

Alternatively, the shutdown command also supports a timeoutSecs argument which allows you to specify a number of seconds to wait for other members of the replica set to catch up:
{ shutdown: 1, timeoutSecs: 60 }

The equivalent mongo shell helper syntax looks like this:


db.shutdownServer({timeoutSecs: 60});

30.2.101 skewClockCommand (internal)


_skewClockCommand _skewClockCommand is an internal command. Do not call directly.

444

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.2.102 sleep (internal)


sleep sleep is an internal command for testing purposes. The sleep command forces the db block all operations. It takes the following options:
{ sleep: { w: true, secs: <seconds> } }

The above command places the mongod instance in a write-lock state for a specied (i.e. <seconds>) number of seconds. Without arguments, sleep, causes a read lock for 100 seconds.

30.2.103 split
split The split command creates new chunks in a sharded environment. While splitting is typically managed automatically by the mongos instances, this command makes it possible for administrators to manually create splits. Note: In normal operation there is no need to manually split chunks. Consider the following example:
db.runCommand( { split : "test.people" , find : { _id : 99 } } )

This command inserts a new split in the collection named people in the test database. This will split the chunk that contains the document that matches the query { _id : 99 } in half. If the document specied by the query does not (yet) exist, the split will divide the chunk where that document would exist. The split divides the chunk in half, and does not split the chunk using the identied document as the middle. To dene an arbitrary split point, use the following form:
db.runCommand( { split : "test.people" , middle : { _id : 99 } } )

This form is typically used when pre-splitting data in a collection. split is an administrative command that is only available for mongos instances.

30.2.104 splitChunk
splitChunk splitChunk is an internal command. Use the sh.splitFind() (page 483) and sh.splitAt() (page 483) functions in the mongo shell to access this functionality.

30.2.105 testDistLockWithSkew (internal)


_testDistLockWithSkew _testDistLockWithSkew is an internal command. Do not call directly.

30.2.106 testDistLockWithSyncCluster (internal)


_testDistLockWithSyncCluster _testDistLockWithSyncCluster is an internal command. Do not call directly.

30.2. Database Commands

445

MongoDB Documentation, Release 2.0.6

30.2.107 top
top The top command returns raw usage of each database, and provides amount of time, in microseconds, used and a count of operations for the following event types: total readLock writeLock queries getmore insert update remove commands The command takes the following form:
{ top: 1 }

buildInfo The buildInfo command returns a build summary for the current mongod:
{ buildInfo: 1 } The information provided includes the following:

The version of MongoDB currently running. The information about the system that built the mongod binary, including a timestamp for the build. The architecture of the binary (i.e. 64 or 32 bits.) The maximum allowable BSON object size in bytes (in the eld maxBsonObjectSize.) You must issue the buildInfo command against the admin database.

30.2.108 touch
touch New in version 2.2. The touch command loads data from the data storage layer into memory. touch can load the data (i.e. documents,) indexes or both documents and indexes. Use this command to ensure that a collection, and/or its indexes, are in memory before another operation. By loading the collection or indexes into memory, mongod will ideally be able to perform subsequent operations more efciently. The touch command has the following prototypical form:
{ touch: [collection], data: [boolean], index: [boolean] }

By default, data and index are false, and touch will perform no operation. For example, to load both the data and the index for a collection named records, you would use the following command in the mongo shell:

446

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

db.runCommand({ touch: "records", data: true, index: true })

touch will not block read and write operations on a mongod, and can run on secondary members of replica sets. Note: Using touch to control or tweak what a mongod stores in memory may displace other records data in memory and hinder performance. Use with caution in production systems.

30.2.109 transferMods (internal)


_transferMods _transferMods is an internal command. Do not call directly.

30.2.110 unsetSharding (internal)


unsetSharding unsetSharding is an internal command that supports sharding functionality.

30.2.111 validate
validate The validate command checks the contents of a namespace by scanning a collections data and indexes for correctness. The command can be slow, particularly on larger data sets:
{ validate: "users" }

This command will validate the contents of the collection named users. You may also specify one of the following options: full: true provides a more thorough scan of the data. false skips the scan of the base collection without skipping the scan of the index.

scandata:

The mongo shell also provides a wrapper:


db.collection.validate();

Use one of the following forms to perform the full collection validation:
db.collection.validate(true) db.runCommand( { validate: "collection", full: true } )

Warning: This command is resource intensive and may have an impact on the performance of your MongoDB instance.

30.2.112 whatsmyuri (internal)


whatsmyuri whatsmyuri is an internal command.

30.2. Database Commands

447

MongoDB Documentation, Release 2.0.6

30.2.113 writebacklisten (internal)


writebacklisten writebacklisten is an internal command.

30.2.114 writeBacksQueued (internal)


writeBacksQueued writeBacksQueued (page 549) is an internal command that returns true if there are operations in the write back queue for the given mongos. This command applies to shard clusters only.

30.3 JavaScript Methods


30.3.1 Date()
Date() Returns Current date, as a string.

30.3.2 cat()
cat(lename) Arguments lename (string) Specify a path and le name on the local le system. Returns the contents of the specied le. This function returns with output relative to the current shell session, and does not impact the server.

30.3.3 cd()
cd(path) Arguments le (string) Specify a path on the local le system. Changes the current working directory to the specied path. This function returns with output relative to the current shell session, and does not impact the server. Note: This feature is not yet implemented.

30.3.4 clearRawMongoProgramOutput()
clearRawMongoProgramOutput() For internal use.

448

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.3.5 copyDbPath()
copyDbpath() For internal use.

30.3.6 cursor.count()
cursor.count() Arguments override (boolean) Override the effects of the cursor.skip() (page 452) and cursor.limit() (page 450) methods on the cursor. Append the count() (page 449) method on a find() (page 457) query to return the number of matching objects for any query. In normal operation, cursor.count() (page 449) ignores the effects of the cursor.skip() (page 452) and cursor.limit() (page 450). To consider these effects specify count(true) (page 449). See Also: cursor.size() (page 451).

30.3.7 cursor.explain()
cursor.explain() Returns A document that describes the process used to return the query. This method may provide useful insight when attempting to optimize a query. When you call the explain (page 449) on a query, the query system reevaluates the query plan. As a result, these operations only provide a realistic account of how MongoDB would perform the query, and not how long the query would take. See Also: $explain (page 377) for related functionality and the Optimization wiki page for information regarding optimization strategies.

30.3.8 cursor.forEarch()
cursor.forEach(function) Arguments function function to apply to each document visited by the cursor. Provides the ability to loop or iterate over the cursor returned by a db.collection.find() (page 457) query and returns each result on the shell. Specify a JavaScript function as the argument for the cursor.forEach() (page 449) function. Consider the following example:
db.users.find().forEach( function(u) { print("user: " + u.name); } );

See Also: cursor.map() (page 450) for similar functionality.

30.3. JavaScript Methods

449

MongoDB Documentation, Release 2.0.6

30.3.9 cursor.hasNext()
cursor.hasNext() Returns Boolean. cursor.hasNext() (page 450) returns true if the cursor returned by the db.collection.find() (page 457) query can iterate further to return more documents.

30.3.10 cursor.hint()
cursor.hint(index) Arguments index The specication for the index to hint or force MongoDB to use when performing the query. Call this method on a query to override MongoDBs default index selection and query optimization process. The argument is an index specication, like the argument to ensureIndex() (page 455). Use db.collection.getIndexes() (page 458) to return the list of current indexes on a collection. See Also: $hint (page 379)

30.3.11 cursor.limit()
cursor.limit() Use the cursor.limit() (page 450) method on a cursor to specify the maximum number of documents a the cursor will return. cursor.limit() (page 450) is analogous to the LIMIT statement in a SQL database. Note: You must apply cursor.limit() (page 450) to the cursor before retrieving any documents from the database. Use cursor.limit() (page 450) to maximize performance and prevent MongoDB from returning more results than required for processing. A cursor.limit() (page 450) value of 0 (e.g. .limit(0) (page 450)) is equivalent to setting no limit.

30.3.12 cursor.map()
cursor.map(function) Arguments function function to apply to each document visited by the cursor. Apply function to each document visited by the cursor, and collect the return values from successive application into an array. Consider the following example:
db.users.find().map( function(u) { return u.name; } );

See Also: cursor.forEach() (page 449) for similar functionality.

450

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.3.13 cursor.next()
cursor.next() Returns The next document in the cursor returned by the db.collection.find() (page 457) method. See cursor.hasNext() (page 450) related functionality.

30.3.14 cursor.readPref()
cursor.readPref() Arguments mode (string) Read preference mode tagSet (array) Optional. Array of tag set objects Append the readPref() (page 451) to a cursor to control how the client will route the query will route to members of the replica set. The mode string should be one of: primary (page 52) primaryPreferred (page 52) secondary (page 53) secondaryPreferred (page 53) nearest (page 53) The tagSet parameter, if given, should consist of an array of tag set objects for ltering secondary read operations. For example, a secondary member tagged { dc: ny, rack: 2, size: large } will match the tag set { dc: ny, rack: 2 }. Clients match tag sets rst in the order they appear in the read preference specication. You may specify an empty tag set {} as the last element to default to any available secondary. See the:ref:tag sets <replica-set-read-preference-tag-sets> documentation for more information. Note: You must apply cursor.readPref() (page 451) to the cursor before retrieving any documents from the database.

30.3.15 cursor.showDiskLoc()
cursor.showDiskLoc() Returns A modied cursor object that contains documents with appended information that describes the on-disk location of the document. See Also: $showDiskLoc (page 390) for related functionality.

30.3.16 cursor.size()
cursor.size()

30.3. JavaScript Methods

451

MongoDB Documentation, Release 2.0.6

Returns A count of the number of documents that match the db.collection.find() (page 457) query after applying any cursor.skip() (page 452) and cursor.limit() (page 450) methods.

30.3.17 cursor.skip()
cursor.skip() Call the cursor.skip() (page 452) method on a cursor to control where MongoDB begins returning results. This approach may be useful in implementing paged results. Note: You must apply cursor.skip() (page 452) to the cursor before retrieving any documents from the database. Consider the following JavaScript function as an example of the sort function:

function printStudents(pageNumber, nPerPage) { print("Page: " + pageNumber); db.students.find().skip((pageNumber-1)*nPerPage).limit(nPerPage).forEach( function(student) { }

The cursor.skip() (page 452) method is often expensive because it requires the server to walk from the beginning of the collection or index to get the offset or skip position before beginning to return result. As offset (e.g. pageNumber above) increases, cursor.skip() (page 452) will become slower and more CPU intensive. With larger collections, cursor.skip() (page 452) may become IO bound. Consider using range-based pagination for these kinds of tasks. That is, query for a range of objects, using logic within the application to determine the pagination rather than the database itself. This approach features better index utilization, if you do not need to easily jump to a specic page.

30.3.18 cursor.snapshot()
cursor.snapshot() Append the cursor.snapshot() (page 452) method to a cursor to toggle the snapshot mode. This ensures that the query will not miss any documents and return no duplicates, even if other operations modify objects while the query runs. Note: You must apply cursor.snapshot() (page 452) to the cursor before retrieving any documents from the database. Queries with results of less than 1 megabyte are effectively implicitly snapshotted.

30.3.19 cursor.sort()
cursor.sort(sort) Arguments sort A document whose elds specify the attributes on which to sort the result set. Append the sort() (page 452) method to a cursor to control the order that the query returns matching documents. For each eld in the sort document, if the elds corresponding value is positive, then sort() (page 452) returns query results in ascending order for that attribute: if the elds corresponding value is negative, then sort() (page 452) returns query results in descending order. 452 Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

Note: You must apply cursor.limit() (page 450) to the cursor before retrieving any documents from the database. Consider the following example:
db.collection.find().sort( { age: -1 } );

Here, the query returns all documents in collection sorted by the age eld in descending order. Specify a value of negative one (e.g. -1), as above, to sort in descending order or a positive value (e.g. 1) to sort in ascending order. Unless you have a index for the specied key pattern, use cursor.sort() (page 452) in conjunction with cursor.limit() (page 450) to avoid requiring MongoDB to perform a large, in-memory sort. cursor.limit() (page 450) increases the speed and reduces the amount of memory required to return this query by way of an optimized algorithm. Warning: The sort function requires that the entire sort be able to complete within 32 megabytes. When the sort option consumes more than 32 megabytes, MongoDB will return an error. Use cursor.limit() (page 450), or create an index on the eld that youre sorting to avoid this error.

30.3.20 db.addUser()
db.addUser(username, password[, readOnly ]) Arguments username (string) Species a new username. password (string) Species the corresponding password. readOnly (boolean) Optional. Restrict a user to read-privileges only. Defaults to false. Use this function to create new database users, by specifying a username and password as arguments to the command. If you want to restrict the user to have only read-only privileges, supply a true third argument; however, this defaults to false.

30.3.21 db.auth()
db.auth(username, password) Arguments username (string) Species an existing username with access privileges for this database. password (string) Species the corresponding password. Allows a user to authenticate to the database from within the shell. Alternatively use mongo --username (page 504) and --password (page 504) to specify authentication credentials.

30.3.22 db.cloneDatabase()
db.cloneDatabase(hostname) Arguments

30.3. JavaScript Methods

453

MongoDB Documentation, Release 2.0.6

hostname (string) Species the hostname to copy the current instance. Use this function to copy a database from a remote to the current database. The command assumes that the remote database has the same name as the current database. For example, to clone a database named importdb on a host named hostname, do
use importdb db.cloneDatabase("hostname");

New databases are implicitly created, so the current host does not need to have a database named importdb for this command to succeed. This function provides a wrapper around the MongoDB database command clone. The copydb database command provides related functionality.

30.3.23 db.collection.aggregate()
db.collection.aggregate(pipeline) New in version 2.1.0. Always call the db.collection.aggregate() (page 454) method on a collection object. Arguments pipeline Species a sequence of data aggregation processes. See the aggregation reference (page 209) for documentation of these operators. Consider the following example from the aggregation documentation (page 199).
db.article.aggregate( { $project : { author : 1, tags : 1, } }, { $unwind : "$tags" }, { $group : { _id : { tags : 1 }, authors : { $addToSet : "$author" } } } );

See Also: aggregate, Aggregation Framework (page 199), and Aggregation Framework Reference (page 209).

30.3.24 db.collection.dataSize()
db.collection.dataSize() Returns The size of the collection. This method provides a wrapper around the size (page 553) output of the collStats (i.e. db.collection.stats() (page 463)) command.

30.3.25 db.collection.distinct()
db.collection.distinct(eld) Arguments string (eld) A eld that exists in a document or documents within the collection.

454

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

Returns an array that contains a list of the distinct values for the specied eld. Note: The db.collection.distinct() (page 454) method provides a wrapper around the distinct. Results must not be larger than the maximum BSON size (page 573).

30.3.26 db.collection.drop()
db.collection.drop() Call the db.collection.drop() (page 455) method on a collection to drop it from the database. db.collection.drop() (page 455) takes no arguments and will produce an error if called with any arguments.

30.3.27 db.collection.dropIndex()
db.collection.dropIndex(name) Arguments name The name of the index to drop. Drops or removes the specied index. This method provides a wrapper around the dropIndexes. Use db.collection.getIndexes() (page 458) to get a list of the indexes on the current collection, and only call db.collection.dropIndex() (page 455) as a method on a collection object.

30.3.28 db.collection.dropIndexes()
db.collection.dropIndexes() Drops all indexes other than the required index on the _id eld. Only call dropIndexes() as a method on a collection object.

30.3.29 db.collection.ensureIndex()
db.collection.ensureIndex(keys, options) Arguments keys (document) A document that contains pairs with the name of the eld or elds to index and order of the index. A 1 species ascending and a -1 species descending. options (document) A document that controls the creation of the index. This argument is optional. Warning: Index names, including their full namespace (i.e. database.collection) can be no longer than 128 characters. See the db.collection.getIndexes() (page 458) eld name for the names of existing indexes. Creates an index on the eld specied, if that index does not already exist. If the keys document species more than one eld, than db.collection.ensureIndex() (page 455) creates a compound index. For example:

30.3. JavaScript Methods

455

MongoDB Documentation, Release 2.0.6

db.collection.ensureIndex({ [key]: 1})

This command creates an index, in ascending order, on the eld [key]. To specify a compound index use the following form:
db.collection.ensureIndex({ [key]: 1, [key1]: -1 })

This command creates a compound index on the key eld (in ascending order) and key1 eld (in descending order.) Note: Typically the order of an index is only important when doing cursor.sort() (page 452) operations on the indexed elds. The available options, possible values, and the default settings are as follows: Option background unique dropDups sparse expireAfterSeconds v Value true or false true or false true or false true or false integer Default false false false false none

index version.

11

Options background (Boolean) Specify true to build the index in the background so that building an index will not block other database activities. unique (Boolean) Specify true to create a unique index so that the collection will not accept insertion of documents where the index key or keys matches an existing value in the index. dropDups (Boolean) Specify true when creating a unique index, on a eld that may have duplicate to index only the rst occurrence of a key, and ignore subsequent occurrences of that key. sparse (Boolean) If true, the index only references documents with the specied eld. These indexes use less space, but behave differently in some situations (particularly sorts.) expireAfterSeconds (integer) Specify a value, in seconds, as a TTL to control how long MongoDB will retain documents in this collection. See Expire Data from Collections by Setting TTL (page 234) for more information on this functionality. v Only specify a different index version in unusual situations. The latest index version (version 1) provides a smaller and faster index format.
1 The default index version depends on the version of mongod running when creating the index. Before version 2.0, the this value was 0; versions 2.0 and later use version 1.

456

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.3.30 db.collection.nd()
db.collection.find(query, projection) Arguments query (document) A document that species the query using the JSON-like syntax and query operators. projection (document) Optional. A document that controls the projection, or the contents of the data returned. Returns A cursor whose iteration visits all of the documents that match the query document. Queries for documents matching query. The argument to find() (page 457) takes the form of a document. See the /reference/operators for an overview of the available operators for specifying and narrowing the query.

30.3.31 db.collection.ndAndModify()
db.collection.findAndModify() The db.collection.findAndModify() (page 457) method atomically modies and returns a single document. Always call db.collection.findAndModify() (page 457) on a collection object, using the following form:
db.collection.findAndModify();

Replace, collection with the name of the collection containing the document that you want to modify, and specify options, as a sub-document that species the following: Fields query (document) A query object. This statement might resemble the document passed to db.collection.find() (page 457), and should return one document from the database. sort Optional. If the query selects multiple documents, the rst document given by this sort clause will be the one modied. remove Optional. When true, findAndModify removes the selected document. The default is false update An update operator to modify the selected document. new Optional. When true, returns the modied document rather than the original. findAndModify ignores the new option for remove operations. The default is false. elds Optional. A subset of elds to return. See projection operators for more information. upsert Optional. When true, findAndModify creates a new document if the specied query returns no documents. The default is false. For example:
db.people.findAndModify( { query: { name: "Tom", state: "active", rating: { $gt: 10 } }, sort: { rating: 1 }, update: { $inc: { score: 1 } } } );

30.3. JavaScript Methods

457

MongoDB Documentation, Release 2.0.6

This operation nds a document in the people collection where the name eld has the value Tom, the active value in the state eld and a value in the rating eld greater than 10, and increments the value of the score eld by 1. If there is more than one result for this query, MongoDB sorts the results of the query in ascending order, and updates and returns the rst matching document found. Warning: When using findAndModify in a sharded environment, the query must contain the shard key for all operations against the shard cluster. findAndModify operations issued against mongos instances for non-sharded collections function normally.

30.3.32 db.collection.ndOne()
db.collection.findOne(query) Arguments query (document) Optional. A document that species the query using the JSON-like syntax and query operators. Returns One document that satises the query specied as the argument to this method. Returns only one document that satises the specied query. If multiple documents satisfy the query, this method returns the rst document according to the natural order which reects the order of documents on the disc. In capped collections, natural order is the same as insertion order.

30.3.33 db.collection.getIndexes()
db.collection.getIndexes() Returns an array that holds a list of documents that identify and describe the existing indexes on the collection. You must call the db.collection.getIndexes() (page 458) on a collection. For example:
db.collection.getIndexes()

Change collection to the name of the collection whose indexes you want to learn. The db.collection.getIndexes() (page 458) items consist of the following elds: getIndexes.v Holds the version of the index. The index version depends on the version of mongod that created the index. Before version 2.0 of MongoDB, the this value was 0; versions 2.0 and later use version 1. getIndexes.key Contains a document holding the keys held in the index, and the order of the index. Indexes may be either descending or ascending order. A value of negative one (e.g. -1) indicates an index sorted in descending order while a positive value (e.g. 1) indicates an index sorted in an ascending order. getIndexes.ns The namespace context for the index. getIndexes.name A unique name for the index comprised of the eld names and orders of all keys.

458

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.3.34 db.collection.group()
db.collection.group({key, reduce, initial, [keyf,] [cond,] nalize}) The db.collection.group() (page 459) accepts a single document that contains the following: Fields key Species one or more elds to group by. reduce Species a reduce function that operates over all the iterated objects. Typically these aggregator functions perform some sort of summing or counting. The reduce function takes two arguments: the current document and an aggregation counter object. initial The starting value of the aggregation counter object. keyf Optional. An optional function that returns a key object for use as the grouping key. Use keyf instead of key to specify a key that is not a single/multiple existing elds. For example, use keyf to group by day or week in place of a xed key. cond Optional. A statement that must evaluate to true for the db.collection.group() (page 459) to process this document. Simply, this argument species a query document (as for db.collection.find() (page 457)). Unless specied, db.collection.group() (page 459) runs the reduce function against all documents in the collection. nalize Optional. A function that runs each item in the result set before db.collection.group() (page 459) returns the nal value. This function can either modify the document by computing and adding an average eld, or return compute and return a new document. Warning: db.collection.group() (page 459) does not work in shard environments. Use the aggregation framework or map-reduce in sharded environments.

Note: The result set of the db.collection.group() (page 459) must t within a single BSON object. Furthermore, you must ensure that there are fewer then 10,000 unique keys. If you have more than this, use mapReduce. db.collection.group() (page 459) provides a simple aggregation capability similar to the function of GROUP BY in SQL statements. Use db.collection.group() (page 459) to return counts and averages from collections of MongoDB documents. Consider the following example db.collection.group() (page 459) command:
db.collection.group( {key: { a:true, b:true }, cond: { active: 1 }, reduce: function(obj,prev) { prev.csum += obj.c; }, initial: { csum: 0 } });

This command in the mongo shell groups the documents in the collection where the eld active equals 1 into sets for all combinations of combinations values of the a and b elds. Then, within each group, the reduce function adds up each documents c eld into the csum eld in the aggregation counter document. This is equivalent to the following SQL statement.
SELECT a,b,sum(c) csum FROM collection WHERE active=1 GROUP BY a,b

See Also:

30.3. JavaScript Methods

459

MongoDB Documentation, Release 2.0.6

The Aggregation wiki page and Aggregation Framework (page 199).

30.3.35 db.collection.insert()
db.collection.insert(document) Arguments document Specify a document to save to the collection. documents (array) Optional alternate. After version 2.2, if you pass an array to insert() (page 460), mongo will perform a bulk insert operation and insert all documents into the collection. Inserts the document, or documents, into a collection. If you do not specify a value for the _id eld, then MongoDB will create a new ObjectID for this document before inserting.

30.3.36 db.collection.mapReduce()
db.collection.mapReduce(map, reduce, out[, query ][, sort ][, limit ][, nalize ][, scope ][, jsMode ][, verbose ]) The db.collection.mapReduce() (page 460) provides a wrapper around the mapReduce database command. Always call the db.collection.mapReduce() (page 460) method on a collection. The following argument list species a document with 3 required and 8 optional elds: Arguments map A JavaScript function that performs the map step of the MapReduce operation. This function references the current input document and calls the emit(key,value) method to supply the value argument to the reduce function, grouped by the key argument. Map functions may call emit(), once, more than once, or not at all depending on the type of aggregation. reduce A JavaScript function that performs the reduce step of the MapReduce operation. The reduce function receives a key value and an array of emitted values from the map function, and returns a single value. Because its possible to invoke the reduce function more than once for the same key, the structure of the object returned by function must be identical to the structure of the emitted function. out Species the location of the out of the reduce stage of the operation. Specify a string to write the output of the map-reduce job to a collection with that name. The map-reduce operation will replace the content of the specied collection in the current database by default. See below for additional options. query (document) Optional. A query object, like the query used by the db.collection.find() (page 457) method. Use this to specify which documents should enter the map phase of the aggregation. sort Optional. Sorts the input objects using this key. This option is useful for optimizing the job. Common uses include sorting by the emit key so that there are fewer reduces. limit Optional. Species a maximum number of objects to return from the collection. nalize Optional. Species an optional nalize function to run on a result, following the reduce stage, to modify or control the output of the db.collection.mapReduce() (page 460) operation. scope Optional. Place a document as the contents of this eld, to place elds into the global javascript scope for the execution of the map-reduce command. 460 Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

jsMode (Boolean) Optional. Species whether to convert intermediate data into BSON format between the mapping and reducing steps. If false, map-reduce execution internally converts the values emitted during the map function from JavaScript objects into BSON objects, and so must convert those BSON objects into JavaScript objects when calling the reduce function. When this argument is false, db.collection.mapReduce() (page 460) places the BSON objects used for intermediate values in temporary, on-disk storage, allowing the map-reduce job to execute over arbitrarily large data sets. If true, map-reduce execution retains the values emitted by the map function and returned as JavaScript objects, and so does not need to do extra conversion work to call the reduce function. When this argument is true, the map-reduce job can execute faster, but can only work for result sets with less than 500K distinct key arguments to the mappers emit function. The jsMode option defaults to true. verbose (Boolean) Optional. The verbose option provides statistics on job execution times. The out eld of the db.collection.mapReduce() (page 460), provides a number of additional conguration options that you may use to control how MongoDB returns data from the map-reduce job. Consider the following 4 output possibilities. Arguments replace Optional. Specify a collection name (e.g. { out: { replace: collectionName } }) where the output of the map-reduce overwrites the contents of the collection specied (i.e. collectionName) if there is any data in that collection. This is the default behavior if you only specify a collection name in the out eld. merge Optional. Specify a collection name (e.g. { out: { merge: collectionName } }) where the map-reduce operation writes output to an existing collection (i.e. collectionName,) and only overwrites existing documents in the collection when a new document has the same key as a document that existed before the mapreduce operation began. reduce Optional. This operation behaves like the merge option above, except that when an existing document has the same key as a new document, reduce function from the map reduce job will run on both values and MongoDB will write the result of this function to the new collection. The specication takes the form of { out: { reduce: collectionName } }, where collectionName is the name of the results collection. inline Optional. Indicate the inline option (i.e. { out: { inline: 1 } }) to perform the map reduce job in memory and return the results at the end of the function. This option is only possible when the entire result set will t within the maximum size of a BSON document (page 573). When performing map-reduce jobs on secondary members of replica sets, this is the only available out option. db Optional. The name of the database that you want the map-reduce operation to write its output. By default this will be the same database as the input collection. sharded Optional. If true, and the output mode writes to a collection, and the output database has sharding enabled, the map-reduce operation will shard the results collection according to the _id eld. See Also: map-reduce, provides a greater overview of MongoDBs map-reduce functionality.

30.3. JavaScript Methods

461

MongoDB Documentation, Release 2.0.6

Also consider Aggregation Framework (page 199) for a more exible approach to data aggregation in MongoDB, and the Aggregation wiki page for an over view of aggregation in MongoDB.

30.3.37 db.collection.reIndex()
db.collection.reIndex() This method drops all indexes and recreates them. This operation may be expensive for collections that have a large amount of data and/or a large number of indexes. Call this method, which takes no arguments, on a collection object. For example:
db.collection.reIndex()

Change collection to the name of the collection that you want to rebuild the index.

30.3.38 db.collection.remove()
db.collection.remove(query, justOne) Call the db.collection.remove() (page 462) method on a collection object, to remove documents from a collection. Use the following form:
db.collection.remove()

Where collection is the name of the collection that you want to remove. Without arguments, this method removes all documents in the collection. To control the output of db.collection.remove() (page 462): Arguments query Optional. Specify a query object to limit or lter the documents to remove. See db.collection.find() (page 457) and the operator reference for more information. justOne (Boolean) Optional. Specify true to only delete the rst result. Equivalent to the operation of db.collection.findOne() (page 458). Consider the following example:
db.records.remove({expired: 1, archived: 1}, false)

This is functionally equivalent to:


db.records.remove({expired: 1, archived: 1})

These operations remove documents with expired and archived elds holding a value of 1 from the collection named records.

30.3.39 db.collection.renameCollection()
db.collection.renameCollection() Arguments name (string) Species the new name of the collection. Enclose the string in quotes. Call the db.collection.renameCollection() (page 462) method on a collection object, to rename a collection. Specify the new name of the collection as an argument. For example:

462

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

db.rrecord.renameCollection("record")

This method renames a collection named rrecord to record. If the target name (i.e. record) is the name of an existing collection, then the operation will fail. db.collection.renameCollection() (page 462) provides a wrapper around the database command renameCollection. Warning: You cannot use renameCollection() (page 462) with sharded collections.

30.3.40 db.collection.save()
db.collection.save(document) Arguments document Specify a document to save to the collection. If document has an _id eld, then perform an db.collection.update() (page 464) with no updateoperators as an upsert. Otherwise, insert a new document with elds from document and a newly generated ObjectId() for the _id eld.

30.3.41 db.collection.stats()
db.collection.stats(scale) Arguments scale Optional. Species the scale to deliver results. Unless specied, this command returns all sizes in bytes. Returns A document containing statistics that reecting the state of the specied collection. This function provides a wrapper around the database command collStats. The scale option allows you to congure how the mongo shell scales the the sizes of things in the output. For example, specify a scale value of 1024 to display kilobytes rather than bytes. Call the db.collection.stats() (page 463) method on a collection object, to return statistics regarding that collection. For example, the following operation returns stats on the people collection:
db.people.stats()

See Also: Collection Statistics Reference (page 552) for an overview of the output of this command.

30.3.42 db.collection.storageSize()
db.collection.storageSize() Returns The amount of storage space, calculated using the number of extents, used by the collection. This method provides a wrapper around the storageSize (page 552) output of the collStats (i.e. db.collection.stats() (page 463)) command.

30.3. JavaScript Methods

463

MongoDB Documentation, Release 2.0.6

30.3.43 db.collection.totalIndexSize()
db.collection.totalIndexSize() Returns The total size of all indexes for the collection. This method provides a wrapper around the db.collection.totalIndexSize() (page 464) output of the collStats (i.e. db.collection.stats() (page 463)) command.

30.3.44 db.collection.update()
db.collection.update(query, update, [upsert,] [multi]) The db.collection.update() (page 464) takes the following four arguments. Arguments query A query object that selects one or more records to update. Use the query selectors as you would in a db.collection.find() (page 457) operation. update A document. If the update documents elds include any update operators, then all the elds must be update operators, and applies those operators to values in the matching document. If none of the update documents the elds are update operators, then update() (page 464) replaces all of the matching documents elds except the _id with the elds in the update document. upsert (boolean) Optional. Defaults to false. When true, this operation will update a document if one matches the query portion and insert a new document if no documents match the query portion. The new document will consist of the union of elds and values from the query document and update document. multi (boolean) Optional. Defaults to false. When true, all the operation updates all documents that match the query. When false, update only the rst document that matches the query. Provides the ability to update an existing document in the current database and collection. The second argument to db.collection.update() (page 464) takes the form of a document. See update-operators for a reference of all operators that affect updates. Note: An upsert operation only affects one document, and cannot update multiple documents.

30.3.45 db.collection.validate()
db.collection.validate() Arguments full (Boolean) Optional. Specify true to enable a full validation. MongoDB disables full validation by default because it is a potentially resource intensive operation. Provides a wrapper around the validate database command. Call the db.collection.validate() (page 464) method on a collection object, to validate the collection itself. Specify the full option to return full statistics. The validation operation scans all of the data structures for correctness and returns a single document that describes the relationship between the logical collection and the physical representation of that data. The output can provide a more in depth view of how the collection uses storage. Be aware that this command is potentially resource intensive, and may impact the performance of your MongoDB instance.

464

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

See Also: Collection Validation Data (page 554)

30.3.46 db.commandHelp()
db.commandHelp(command) Arguments command Species a database command name. Returns Help text for the specied database command. reference for full documentation of these commands. See the database command

30.3.47 db.copyDatabase()
db.copyDatabase(origin, destination, hostname) Arguments origin (database) Species the name of the database on the origin system. destination (database) Species the name of the database that you wish to copy the origin database into. hostname (origin) Indicate the hostname of the origin database host. Omit the hostname to copy from one name to another on the same server. Use this function to copy a specic database, named origin running on the system accessible via hostname into the local database named destination. The command creates destination databases implicitly when they do not exist. If you omit the hostname, MongoDB will copy data from one database into another on the same instance. This function provides a wrapper around the MongoDB database command copydb. The clone database command provides related functionality.

30.3.48 db.createCollection()
db.createCollection(name[, {capped: <boolean>, size: <value>, max <bytes>} ]) Arguments name (string) Species the name of a collection to create. capped (boolean) Optional. If this document is present, this command creates a capped collection. The capped argument is a document that contains the following three elds: capped Enables a collection cap. False by default. If enabled, you must specify a size parameter. size (bytes) If capped is true, size Species a maximum size in bytes, for the as a cap for the collection. When capped is false, you may use size max (int) Optional. Species a maximum cap, in number of documents for capped collections. You must also specify size when specifying max.

30.3. JavaScript Methods

465

MongoDB Documentation, Release 2.0.6

Explicitly creates a new collation. Because MongoDB creates collections implicitly when referenced, this command is primarily used for creating new capped collections. In some circumstances, you may use this command to pre-allocate space for an ordinary collection. Capped collections have maximum size or document counts that prevent them from growing beyond maximum thresholds. All capped collections must specify a maximum size, but may also specify a maximum document count. The collection will remove older documents if a collection reaches the maximum size limit before it reaches the maximum document count. Consider the following example:
db.createCollection("log", { capped : true, size : 536870912, max : 5000 } )

This command creates a collection named log with a maximum size of 5 megabytes (512 kilobytes) or a maximum of 5000 documents. The following command simply pre-allocates a 2 gigabyte, uncapped, collection named people:
db.createCollection("people", { size: 2147483648 })

This command provides a wrapper around the database command create. See the Capped Collections wiki page for more information about capped collections.

30.3.49 db.currentOp()
db.currentOp() Returns A document that contains an array named inprog. The inprog array reports the current operation in progress for the database instance. db.currentOp() (page 466) is only available for users with administrative privileges.

30.3.50 db.dropDatabase()
db.dropDatabase() Removes the current database. Does not change the current database, so the insertion of any documents in this database will allocate a fresh set of data les.

30.3.51 db.eval()
db.eval(function, arguments) Arguments function (JavaScript) A JavaScript function. arguments A list of arguments to pass to the JavaScript function. Provides the ability to run JavaScript code using the JavaScript engine embedded in the MongoDB instance. In this environment the value of the db variable on the server is the name of the current database. Unless you use db.eval() (page 466), the mongo shell itself will evaluate all JavaScript entered into mongo shell itself. Warning: Do not use db.eval() (page 466) for long running operations, as db.eval() (page 466) blocks all other operations. Consider using map-reduce for similar functionality in these situations. The db.eval() (page 466) method cannot operate on sharded data. However, you may use db.eval() (page 466) with non-sharded collections and databases stored in shard cluster.

466

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.3.52 db.fsyncLock()
db.fsyncLock() Forces the database to ush all write operations to the disk and locks the database to prevent additional writes until the user releases the lock with the db.fsyncUnlock() (page 467) command. db.fsyncLock() (page 467) is an administrative command. This command provides a simple wrapper around a fsync database command with the following syntax:
{ fsync: 1, lock: true }

This function locks the database and create a window for backup operations (page 156). Note: The database cannot be locked with db.fsyncLock() (page 467) while proling is enabled. You must disable proling before locking the database with db.fsyncLock() (page 467). Disable proling using db.setProfilingLevel() (page 471) as follows in the mongo shell:
db.setProfilingLevel(0)

30.3.53 db.fsyncUnlock()
db.fsyncUnlock() Unlocks a database server to allow writes and reverses the operation of a db.fsyncLock() (page 467) operation. Typically you will use db.fsyncUnlock() (page 467) following a database backup operation (page 156). db.fsyncUnlock() (page 467) is an administrative command.

30.3.54 db.getCollection()
db.getCollection(name) Arguments name The name of a collection. Returns A collection. Use this command to obtain a handle on a collection whose name might interact with the shell itself, including collections with names that begin with _ or mirror the database commands.

30.3.55 db.getCollectionNames()
db.getCollectionNames() Returns An array containing all collections in the existing database.

30.3.56 db.getLastError()
db.getLastError() Returns The last error message string.

30.3. JavaScript Methods

467

MongoDB Documentation, Release 2.0.6

In many situation MongoDB drivers and users will follow a write operation with this command in order to ensure that the write succeeded. Use safe mode for most write operations. See Also: Replica Set Write Concern (page 50) and getLastError.

30.3.57 db.getLastErrorObj()
db.getLastErrorObj() Returns A full document with status information.

30.3.58 db.getMongo()
db.getMongo() Returns The current database connection. db.getMongo() (page 468) runs when the shell initiates. Use this command to test that the mongo shell has a connection to the proper database instance.

30.3.59 db.getName()
db.getName() Returns the current database name.

30.3.60 db.getPrevError()
db.getPrevError() Returns A status document, containing the errors. Deprecated since version 1.6. This output reports all errors since the last time the database received a resetError (also db.resetError() (page 471)) command. This method provides a wrapper around the getPrevError command.

30.3.61 db.getProlingLevel()
db.getProfilingLevel() This method provides a wrapper around the database command profile (page 499) and returns the current proling level. Deprecated since version 1.8.4: Use db.getProfilingStatus() (page 468) for related functionality.

30.3.62 db.getProlingStatus()
db.getProfilingStatus() Returns The current profile (page 499) level and slowms (page 500) setting.

468

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.3.63 db.getReplicationInfo()
db.getReplicationInfo() Returns A status document. The output reports statistics related to replication. See Also: Replication Info Reference (page 566) for full documentation of this output.

30.3.64 db.getSiblingDB()
db.getSiblingDB() Used to return another database without modifying the db variable in the shell environment.

30.3.65 db.isMaster()
db.isMaster() Returns a status document with elds that includes the ismaster eld that reports if the current node is the primary node, as well as a report of a subset of current replica set conguration. This function provides a wrapper around the database command isMaster

30.3.66 db.killOp()
db.killOP(opid) Arguments opid Specify an operation ID. Terminates the specied operation. Use db.currentOp() (page 466) to nd operations and their corresponding ids.

30.3.67 db.listCommands()
db.listCommands() Provides a list of all database commands. See the /reference/commands document for a more extensive index of these options.

30.3.68 db.loadServerScripts()
db.loadServerScripts() db.loadServerScripts() (page 469) loads all scripts in the system.js collection for the current database into the mongo shell session. Documents in the system.js collection have the following prototype form:
{ _id : "<name>" , value : <function> } }

The documents in the system.js collection provide functions that your applications can use in any JavaScript context with MongoDB in this database. These contexts include $where clauses and mapReduce operations.

30.3. JavaScript Methods

469

MongoDB Documentation, Release 2.0.6

30.3.69 db.logout()
db.logout() Ends the current authentication session. This function has no effect if the current session is not authenticated. This function provides a wrapper around the database command logout.

30.3.70 db.printCollectionStats()
db.printCollectionStats() Provides a wrapper around the db.collection.stats() (page 463) method. Returns statistics from every collection separated by three hyphen characters. See Also: Collection Statistics Reference (page 552)

30.3.71 db.printReplicationInfo()
db.printReplicationInfo() Provides a formatted report of the status of a replica set from the perspective of the primary set member. See the Replica Status Reference (page 558) for more information regarding the contents of this output. This function will return db.printSlaveReplicationInfo() (page 470) if issued against a secondary set member.

30.3.72 db.printShardingStatus()
db.printShardingStatus() Provides a formatted report of the sharding conguration and the information regarding existing chunks in a shard cluster. See Also: sh.status() (page 484)

30.3.73 db.printSlaveReplicationInfo()
db.printSlaveReplicationInfo() Provides a formatted report of the status of a replica set from the perspective of the secondary set member. See the Replica Status Reference (page 558) for more information regarding the contents of this output.

30.3.74 db.removeUser()
db.removeUser(username) Arguments username Specify a database username. Removes the specied username from the database.

470

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.3.75 db.repairDatabase()
db.repairDatabase() Warning: In general, if you have an intact copy of your data, such as would exist on a very recent backup or an intact member of a replica set, do not use repairDatabase or related options like db.repairDatabase() (page 471) in the mongo shell or mongod --repair (page 488). Restore from an intact copy of your data.

Note: When using journaling, there is almost never any need to run repairDatabase. In the event of an unclean shutdown, the server will be able restore the data les to a pristine state automatically. db.repairDatabase() (page 471) provides a wrapper around the database command repairDatabase, and has the same effect as the run-time option mongod --repair (page 488) option, limited to only the current database. See repairDatabase for full documentation.

30.3.76 db.resetError()
db.resetError() Deprecated since version 1.6. Resets the error message returned by db.getPrevError (page 468) or getPrevError. Provides a wrapper around the resetError command.

30.3.77 db.runCommand()
db.runCommand(command) Arguments command (string) Species a database command in the form of a document. command When specifying a command as a string, db.runCommand() (page 471) transforms the command into the form { command: 1 }. Provides a helper to run specied database commands. This is the preferred method to issue database commands, as it provides a consistent interface between the shell and drivers.

30.3.78 db.serverStatus()
db.serverStatus() Returns a document that provides an overview of the database processs state. This command provides a wrapper around the database command serverStatus. See Also: Server Status Reference (page 537) for complete documentation of the output of this function.

30.3.79 db.setProlingLevel()
db.setProfilingLevel(level[, slowms ]) Arguments level Species a proling level, see list of possible values below. 30.3. JavaScript Methods 471

MongoDB Documentation, Release 2.0.6

slowms Optionally modify the threshold for the prole to consider a query or operation slow. Modies the current database proler level. This allows administrators to capture data regarding performance. The database proling system can impact performance and can allow the server to write the contents of queries to the log, which might information security implications for your deployment. The following proling levels are available: Level 0 1 2 Setting Off. No proling. On. Only includes slow operations. On. Includes all operations.

Also congure the slowms (page 500) option to set the threshold for the proler to consider a query slow. Specify this value in milliseconds to override the default. This command provides a wrapper around the database command profile (page 499). mongod writes the output of the database proler to the system.profile collection. mongod prints information about queries that take longer than the slowms (page 500) to the log even when the database proler is not active. Note: The database cannot be locked with db.fsyncLock() (page 467) while proling is enabled. You must disable proling before locking the database with db.fsyncLock() (page 467). Disable proling using db.setProfilingLevel() (page 471) as follows in the mongo shell:
db.setProfilingLevel(0)

30.3.80 db.shutdownServer()
db.shutdownServer() Shuts down the current mongod or mongos process cleanly and safely. This operation fails when the current database is not the admin database. This command provides a wrapper around the shutdown.

30.3.81 db.stats()
db.stats(scale) Arguments scale Optional. Species the scale to deliver results. Unless specied, this command returns all data in bytes. Returns A document that contains statistics reecting the database systems state. This function provides a wrapper around the database command dbStats. The scale option allows you to congure how the mongo shell scales the the sizes of things in the output. For example, specify a scale value of 1024 to display kilobytes rather than bytes. See the Database Statistics Reference (page 551) document for an overview of this output. Note: The scale factor rounds values to whole numbers. This can produce unpredictable and unexpected results in some situations. 472 Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.3.82 db.version()
db.version() Returns The version of the mongod instance.

30.3.83 fuzzFile()
fuzzFile(lename) Arguments lename (string) Specify a lename or path to a local le. Returns null For internal use.

30.3.84 db.getDB()
getDB() Returns the name of the current database as a string.

30.3.85 getHostName()
getHostName() Returns The hostname of the system running the mongo shell process.

30.3.86 getMemInfo
getMemInfo() Returns a document with two elds that report the amount of memory used by the JavaScript shell process. The elds returned are resident and virtual.

30.3.87 getShardDistribution()
getShardDistribution() See SERVER-4902 for more information.

30.3.88 getShardVersion()
getShardVersion() This method returns information regarding the state of data in a shard cluster that is useful when diagnosing underlying issues with a shard cluster. For internal and diagnostic use only.

30.3. JavaScript Methods

473

MongoDB Documentation, Release 2.0.6

30.3.89 hostname()
hostname() Returns The hostname of the system running the mongo shell process.

30.3.90 isWindows()
_isWindows() Returns boolean. Returns true if the server is running on a system that is Windows, or false if the server is running on a Unix or Linux systems.

30.3.91 listFiles()
listFiles() Returns an array, containing one document per object in the directory. This function operates in the context of the mongo process. The included elds are: name Returns a string which contains the name of the object. isDirectory Returns true or false if the object is a directory. size Returns the size of the object in bytes. This eld is only present for les.

30.3.92 load()
load(le) Para string le Specify a path and le name containing JavaScript. This native function loads and runs a JavaScript le into the current shell environment. To run JavaScript with the mongo shell, you can either: use the --eval (page 504) option when invoking the shell to evaluate a small amount of JavaScript code, or specify a le name with mongo (page 504). mongo will execute the script and then exit. Add the --shell (page 503) option to return to the shell after running the command. Specify les loaded with the load() function in relative terms to the current directory of the mongo shell session. Check the current directory using the pwd() function.

30.3.93 ls()
ls() Returns a list of the les in the current directory. This function returns with output relative to the current shell session, and does not impact the server.

474

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.3.94 md5sumFile()
md5sumFile(lename) Arguments lename (string) a le name. Returns The md5 hash of the specied le. Note: The specied lename must refer to a le located on the system running the mongo shell.

30.3.95 mkdir()
mkdir(path) Arguments path (string) A path on the local lesystem. Creates a directory at the specied path. This command will create the entire path specied, if the enclosing directory or directories do not already exit. Equivalent to mkdir -p with BSD or GNU utilities.

30.3.96 mongo.setSlaveOk()
mongo.setSlaveOk() For the current session, this command permits read operations from non-master (i.e. slave or secondary) instances. Practically, use this method in the following form:
db.getMongo().setSlaveOK()

Indicates that eventually consistent read operations are acceptable for the current application. This function provides the same functionality as rs.slaveOk() (page 479).

30.3.97 pwd()
pwd() Returns the current directory. This function returns with output relative to the current shell session, and does not impact the server.

30.3.98 quit()
quit() Exits the current shell session.

30.3.99 rand()
_rand() Returns A random number between 0 and 1.

30.3. JavaScript Methods

475

MongoDB Documentation, Release 2.0.6

This function provides functionality similar to the Math.rand() function from the standard library.

30.3.100 rawMongoProgramOutput()
rawMongoProgramOutput() For internal use.

30.3.101 removeFile()
removeFile(lename) Arguments lename (string) Specify a lename or path to a local le. Returns boolean. Removes the specied le from the local le system.

30.3.102 resetDbPath()
resetDbpath() For internal use.

30.3.103 rs.add()
rs.add(hostspec, arbiterOnly) Specify one of the following forms: Arguments host (string) Either a string or a document. If a string, species a host (and optionally port-number) for a new host member for the replica set; MongoDB will add this host with the default conguration. If a document, species any attributes about a member of a replica set. arbiterOnly Optional. If true, this host is an arbiter. Provides a simple method to add a member to an existing replica set. You can specify new hosts in one of two ways: as a hostname with an optional port number to use the default conguration, or as a conguration document. This function will disconnect the shell briey and forces a reconnection as the replica set renegotiates which node will be primary. As a result, the shell will display an error even if this command succeeds. rs.add() (page 476) provides a wrapper around some of the functionality of the replSetReconfig database command.

30.3.104 rs.addArb()
rs.addArb(hostname) Arguments host (string) Species a host (and optionally port-number) for a arbiter member for the replica set. 476 Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

Adds a new arbiter to an existing replica set. This function will disconnect the shell briey and forces a reconnection as the replica set renegotiates which node will be primary. As a result, the shell will display an error even if this command succeeds.

30.3.105 rs.conf()
rs.conf() Returns a document that contains the current replica set conguration object.

30.3.106 rs.freeze()
rs.freeze(seconds) Arguments seconds (init) Specify the duration of this operation. Forces the current node to become ineligible to become primary for the period specied. rs.freeze() (page 477) provides a wrapper around the database command replSetFreeze.

30.3.107 rs.help()
rs.help() Returns a basic help text for all of the replication (page 33) related shell functions.

30.3.108 rs.initiate()
rs.initiate(conguration) Arguments conguration Optional. A document that species the conguration of a replica set. If not specied, MongoDB will use a default conguration. Initiates a replica set. Optionally takes a conguration argument in the form of a document that holds the conguration of a replica set. Consider the following model of the most basic conguration for a 3-member replica set:
{ _id : <setname>, members : [ {_id : 0, host : <host0>}, {_id : 1, host : <host1>}, {_id : 2, host : <host2>}, ] }

This function provides a wrapper around the replSetInitiate database command.

30.3. JavaScript Methods

477

MongoDB Documentation, Release 2.0.6

30.3.109 rs.recong()
rs.reconfig(conguration[, force ]) Arguments conguration A document that species the conguration of a replica set. force Optional. Specify { force: true } as the force parameter to force the replica set to accept the new conguration even if a majority of the members are not accessible. Use with caution, as this can lead to rollback situations. Initializes a new replica set conguration. This function will disconnect the shell briey and forces a reconnection as the replica set renegotiates which node will be primary. As a result, the shell will display an error even if this command succeeds. rs.reconfig() (page 478) provides a wrapper around the replSetReconfig database command. rs.reconfig() (page 478) overwrites the existing replica set conguration. Retrieve the current conguration object with rs.conf() (page 477), modify the conguration as needed and then use rs.reconfig() (page 478) to submit the modied conguration object. To recongure a replica set, use the following sequence of operations:
conf = rs.conf() // modify conf to change configuration rs.reconfig(conf)

If you want to force the reconguration if a majority of the set isnt connected to the current member, or youre issuing the command against a secondary, use the following form:
conf = rs.conf() // modify conf to change configuration rs.reconfig(conf, { force: true } )

Warning: Forcing a rs.reconfig() (page 478) can lead to rollback situations and other difcult to recover from situations. Exercise caution when using this option. See Also: Replica Set Conguration (page 561) and Replica Set Administration (page 38).

30.3.110 rs.remove()
rs.remove(hostname) Arguments hostname Specify one of the existing hosts to remove from the current replica set. Removes the node described by the hostname parameter from the current replica set. This function will disconnect the shell briey and forces a reconnection as the replica set renegotiates negotiates which node will be primary. As a result, the shell will display an error even if this command succeeds.

478

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

Note: Before running the rs.remove() (page 478) operation, you must shut down the replica set member that youre removing. Changed in version 2.2: This procedure is no longer required when using rs.remove() (page 478), but it remains good practice.

30.3.111 rs.slaveOk()
rs.slaveOk() Provides a shorthand for the following operation:
db.getMongo().setSlaveOK()

This allows the current connection to allow read operations to run on secondary nodes.

30.3.112 rs.status()
rs.status() Returns A document with status information. This output reects the current status of the replica set, using data derived from the heartbeat packets sent by the other members of the replica set. This method provides a wrapper around the replSetGetStatus database command. See Also: Replica Status Reference (page 558) for documentation of this output.

30.3.113 rs.stepDown()
rs.stepDown(seconds) Arguments seconds (init) Specify the duration of this operation. If not specied the command uses the default value of 60 seconds. Returns disconnects shell. Forces the current replica set member to step down as primary and then attempt to avoid election as primary for the designated number of seconds. Produces an error if the current node is not primary. This function will disconnect the shell briey and forces a reconnection as the replica set renegotiates which node will be primary. As a result, the shell will display an error even if this command succeeds. rs.stepDown() (page 479) provides a wrapper around the database command replSetStepDown (page 441).

30.3.114 rs.syncFrom()
rs.syncFrom() New in version 2.2. Provides a wrapper around the replSetSyncFrom, which allows administrators to congure the member of a replica set that the current member will pull data from. Specify the name of the member you want to sync from in the form of [hostname]:[port]. See replSetSyncFrom for more details.

30.3. JavaScript Methods

479

MongoDB Documentation, Release 2.0.6

30.3.115 run()
run() For internal use.

30.3.116 runMongoProgram()
runMongoProgram() For internal use.

30.3.117 runProgram()
runProgram() For internal use.

30.3.118 sh.addShard()
sh.addShard(host) Arguments host Specify the hostname of a new shard server. name (optional) Specify the name of the new shard. If you do not specify a name for this shard, MongoDB will assign one by default. Use this to add shard instances to the present shard cluster. The host parameter can be in any of the following forms:
[hostname] [hostname]:[port] [set]/[hostname] [set]/[hostname],[hostname]:port

You can specify shards using the hostname, or a hostname and port combination if the shard is running on a nonstandard port. A replica set can also function as a shard member. In these cases supply addShard (page 480) with the set name, followed by at least one existing member of the set as a seed in a comma separated list, as in the nal two examples. This function provides a wrapper around the administrative command addShard.

30.3.119 sh.addShardTag()
sh.addShardTag(shard, tag) New in version 2.2. Arguments shard Species the name of the shard that you want to give a specic tag. tag Species the name of the tag that you want to add to the shard. sh.addShardTag() (page 480) associates a shard with a tag or identier. MongoDB can use these identiers, to home or attach (i.e. with sh.addTagRange() (page 481)) specic data to a specic shard.

480

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

Always issue sh.addShardTag() (page 480) when connected to a mongos instance. The following example adds three tags, LGA, EWR, and JFK, to three shards:
sh.addShardTag("shard0000", "LGA") sh.addShardTag("shard0001", "EWR") sh.addShardTag("shard0002", "JFK")

30.3.120 sh.addTagRange()
sh.addTagRange(namespace, minimum, maximum, tag) New in version 2.2. Arguments namespace Species the namespace, in the form of <database>.<collection> of the sharded collection that you would like to tag. minimum Species the minimum value of the shard key range to include in the tag. Specify the minimum value in the form of <fieldname>:<value>. maximum Species the maximum value of the shard key range to include in the tag. Specify the minimum value in the form of <fieldname>:<value>. tag Species the name of the tag to attach the range specied by the minimum and maximum arguments to. sh.addTagRange() (page 481) attaches a range of values of the shard key to a shard tag created using the sh.addShardTag() (page 480) helper. Use this operation to ensure that the documents that exist within the specied range exist on shards that have a matching tag. Always issue sh.addTagRange() (page 481) when connected to a mongos instance.

30.3.121 sh.enableSharding()
sh.enableSharding(database) Arguments database (name) Specify a database name to shard. Enables sharding on the specied database. This does not automatically shard any collections, but makes it possible to begin sharding collections using sh.shardCollection() (page 483).

30.3.122 sh.getBalancerState() 30.3.123 sh.help()


sh.help() Returns a basic help text for all sharding related shell functions.

30.3.124 sh.isBalancerRunning()
sh.isBalancerRunning() Returns boolean.

30.3. JavaScript Methods

481

MongoDB Documentation, Release 2.0.6

Returns true if the balancer process is currently running and migrating chunks and false if the balancer process is not running. Use sh.getBalancerState() (page 482) to determine if the balancer is enabled or disabled.

30.3.125 sh.moveChunk()
sh.moveChunk(collection, query, destination) Arguments collection (string) Specify the sharded collection containing the chunk to migrate. query Specify a query to identify documents in a specic chunk. Typically specify the shard key for a document as the query. destination (string) Specify the name of the shard that you wish to move the designated chunk to. Moves the chunk containing the documents specied by the query to the shard described by destination. This function provides a wrapper around the moveChunk. In most circumstances, allow the balancer to automatically migrate chunks, and avoid calling sh.moveChunk() (page 482) directly. See Also: moveChunk and Sharding (page 91) for more information.

30.3.126 sh.removeShardTag()
sh.removeShardTag(shard, tag) New in version 2.2. Arguments shard Species the name of the shard that you want to remove a tag from. tag Species the name of the tag that you want to remove from the shard. Removes the association between a tag and a shard. Always issue sh.removeShardTag() (page 482) when connected to a mongos instance.

30.3.127 sh.setBalancerState()
sh.setBalancerState(state) Arguments state (boolean) true enables the balancer if disabled, and false disables the balancer. Enables or disables the balancer. Use sh.getBalancerState() (page 482) to determine if the balancer is currently enabled or disabled and sh.isBalancerRunning() (page 481) to check its current state. sh.getBalancerState() Returns boolean. sh.getBalancerState() (page 482) returns true when the balancer is enabled and false when the balancer is disabled. This does not reect the current state of balancing operations: use sh.isBalancerRunning() (page 481) to check the balancers current state.

482

Chapter 30. MongoDB Interface

MongoDB Documentation, Release 2.0.6

30.3.128 sh.shardCollection()
sh.shardCollection(collection, key, unique) Arguments collection (name) The name of the collection to shard. key (document) A document containing shard key that the sharding system uses to partition and distribute objects among the shards. unique (boolean) When true, the unique option ensures that the underlying index enforces uniqueness so long as the unique index is a prex of the shard key. Shards the named collection, according to the specied shard key. Specify shard keys in the form of a document. Shard keys may refer to a single document eld, or more typically several document elds to form a compound shard key.

30.3.129 sh.splitAt()
sh.splitAt(collection, query) Arguments collection (string) Specify the sharded collection containing the chunk to split. query (document) Specify a query to identify a document in a specic chunk. Typically specify the shard key for a document as the query. Splits the chunk containing the document specied by the query as if that document were at the middle of the collection, even if the specied document is not the actual median of the collection. Use this command to manually split chunks unevenly. Use the sh.splitFind() (page 483) function to split a chunk at the actual median. In most circumstances, you should leave chunk splitting to the automated processes within MongoDB. However, when initially deploying a shard cluster it is necessary to perform some measure of pre-splitting using manual methods including sh.splitAt() (page 483).

30.3.130 sh.splitFind()
sh.splitFind(collection, query) Arguments collection (string) Specify the sharded collection containing the chunk to split. query Specify a query to identify a document in a specic chunk. Typically specify the shard key for a document as the query. Splits the chunk containing the document specied by the query at its median point, creating two roughly equal chunks. Use sh.splitAt() (page 483) to split a collection in a specic point. In most circumstances, you should leave chunk splitting to the automated processes. However, when initially deploying a shard cluster it is necessary to perform some measure of pre-splitting using manual methods including sh.splitFind() (page 483).

30.3. JavaScript Methods

483

MongoDB Documentation, Release 2.0.6

30.3.131 sh.status()
sh.status() Returns a formatted report of the status of the shard cluster, including data regarding the distribution of chunks.

30.3.132 srand()
_srand() For internal use.

30.3.133 startMongoProgram()
_startMongoProgram() For internal use.

30.3.134 stopMongoProgram()
stopMongoProgram() For internal use.

30.3.135 stopMongoProgramByPid()
stopMongoProgramByPid() For internal use.

30.3.136 stopMongod()
stopMongod() For internal use.

30.3.137 waitMongoProgramOnPort()
waitMongoProgramOnPort() For internal use.

30.3.138 waitProgram()
waitProgram() For internal use.

484

Chapter 30. MongoDB Interface

CHAPTER

THIRTYONE

MANUAL PAGES
31.1 mongod Manual
31.1.1 Synopsis
mongod is the primary daemon process for the MongoDB system. It handles data requests, manages data format, and performs background management operations. This document provides a complete overview of all command line options for mongod. These options are primarily useful for testing purposes. In common operation, use the conguration le options (page 494) to control the behavior of your database, which is fully capable of all operations described below.

31.1.2 Options
mongod --help, -h Returns a basic help and usage text. --version Returns the version of the mongod daemon. --config <filename>, -f <filename> Species a conguration le, that you can use to specify runtime-congurations. While the options are equivalent and accessible via the other command line arguments, the conguration le is the preferred method for runtime conguration of mongod. See the Conguration File Options (page 494) document for more information about these options. --verbose, -v Increases the amount of internal reporting returned on standard output or in the log le specied by --logpath (page 486). Use the -v form to control the level of verbosity by including the option multiple times, (e.g. -vvvvv.) --quiet Runs the mongod instance in a quiet mode that attempts to limit the amount of output. --port <port> Species a TCP port for the mongod to listen for client connections. By default mongod listens for connections on port 27017. UNIX-like systems require root privileges to use ports with numbers lower than 1000.

485

MongoDB Documentation, Release 2.0.6

--bind_ip <ip address> The IP address that the mongod process will bind to and listen for connections. By default mongod listens for connections on the localhost (i.e. 127.0.0.1 address.) You may attach mongod to any interface; however, if you attach mongod to a publicly accessible interface ensure that you have implemented proper authentication and/or rewall restrictions to protect the integrity of your database. --maxConns <number> Species the maximum number of simultaneous connections that mongod will accept. This setting will have no effect if it is higher than your operating systems congured maximum connection tracking threshold. --objcheck Forces the mongod to validate all requests from clients upon receipt to ensure that invalid objects are never inserted into the database. This option can produce a signicant performance impact, and is not enabled by default. --logpath <path> Specify a path for the log le that will hold all diagnostic logging information. Unless specied, mongod will output all log information to the standard output. Additionally, unless you also specify --logappend (page 486), the logle will be overwritten when the process restarts. Note: The behavior of the logging system may change in the near future in response to the SERVER-4499 case. --logappend When specied, this option ensures that mongod appends new entries to the end of the logle rather than overwriting the content of the log when the process restarts. --syslog Sends all logging output to the hosts syslog system rather than to standard output or a log le as with --logpath (page 486). Warning: You cannot use --syslog (page 486) with --logpath (page 486). --pidfilepath <path> Specify a le location to hold the PID or process ID of the mongod process. Useful for tracking the mongod process in combination with the mongod --fork (page 486) option. If this option is not set, mongod will create no PID le. --keyFile <file> Specify the path to a key le to store authentication information. This option is only useful for the connection between replica set members. See Also: Replica Set Security (page 44) and Replica Set Administration (page 38). --nounixsocket Disables listening on the UNIX socket. Unless set to false, mongod and mongos provide a UNIX-socket. --unixSocketPrefix <path> Species a path for the UNIX socket. Unless this option has a value, mongod and mongos, create a socket with the /tmp as a prex. --fork Enables a daemon mode for mongod that runs the process to the background. This is the normal mode of operation, in production and production-like environments, but may not be desirable for testing.

486

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

--auth Enables database authentication for users connecting from remote hosts. congure users via the mongo shell shell (page 503). If no users exist, the localhost interface will continue to have access to the database until the you create the rst user. See the Security and Authentication wiki page for more information regarding this functionality. --cpu Forces mongod to report the percentage of CPU time in write lock. mongod generates output every four seconds. MongoDB writes this data to standard output or the logle if using the logpath (page 496) option. --dbpath <path> Specify a directory for the mongod instance to store its data. Typical locations include: /srv/mongodb, /var/lib/mongodb or /opt/mongodb Unless specied, mongod will look for data les in the default /data/db directory. (Windows systems use the \data\db directory.) If you installed using a package management system. Check the /etc/mongodb.conf le provided by your packages to see the conguration of the dbpath (page 497). --diaglog <value> Creates a very verbose, diagnostic log for troubleshooting and recording various errors. MongoDB writes these log les in the dbpath (page 497) in a series of les that begin with the string diaglog. The specied value congures the level of verbosity. Possible values, and their impact are as follows. Value 0 1 2 3 7 Setting off. No logging. Log write operations. Log read operations. Log both read and write operations. Log write and some read operations.

You can use the mongosniff tool to replay this output for investigation. Given a typical diaglog le, located at /data/db/diaglog.4f76a58c, you might use a command in the following form to read these les:
mongosniff --source DIAGLOG /data/db/diaglog.4f76a58c

--diaglog (page 487) is for internal use and not intended for most users. --directoryperdb Alters the storage pattern of the data directory to store each databases les in a distinct folder. This option will create directories within the --dbpath (page 487) named for each directory. Use this option in conjunction with your le system and device conguration so that MongoDB will store data on a number of distinct disk devices to increase write throughput or disk capacity. --journal Enables operation journaling to ensure write durability and data consistency. mongodb enables journaling by default on 64-bit builds of versions after 2.0. --journalOptions <arguments> Provides functionality for testing. Not for general use, and may affect database integrity. --journalCommitInterval <value> Species the maximum amount of time for mongod to allow between journal operations. The default value is 100 milliseconds, while possible values range from 2 to 300 milliseconds. Lower values increase the durability of the journal, at the expense of disk performance. --ipv6 Specify this option to enable IPv6 support. This will allow clients to connect to mongod using IPv6 networks. mongod disables IPv6 support by default in mongod and all utilities.

31.1. mongod Manual

487

MongoDB Documentation, Release 2.0.6

--jsonp Permits JSONP access via an HTTP interface. Consider the security implications of allowing this activity before enabling this option. --noauth Disable authentication. Currently the default. Exists for future compatibility and clarity. --nohttpinterface Disables the HTTP interface. --nojournal Disables the durability journaling. By default, mongod enables journaling in 64-bit versions after v2.0. --noprealloc Disables the preallocation of data les. This will shorten the start up time in some cases, but can cause signicant performance penalties during normal operations. --noscripting Disables the scripting engine. --notablescan Forbids operations that require a table scan. --nssize <value> Species the default value for namespace les (i.e .ns). This option has no impact on the size of existing namespace les. The default value is 16 megabytes, this provides for effectively 12,000 possible namespaces. The maximum size is 2 gigabytes. --profile <level> Changes the level of database proling, which inserts information about operation performance into output of mongod or the log le. The following levels are available: Level 0 1 2 Setting Off. No proling. On. Only includes slow operations. On. Includes all operations.

Proling is off by default. Database proling can impact database performance. Enable this option only after careful consideration. --quota Enables a maximum limit for the number data les each database can have. When running with --quota (page 488), there are a maximum of 8 data les per database. Adjust the quota with the --quotaFiles (page 488) option. --quotaFiles <number> Modify limit on the number of data les per database. This option requires the --quota (page 488) setting. The default value for --quotaFiles (page 488) is 8. --rest Enables the simple REST API. --repair Runs a repair routine on all databases. This is equivalent to shutting down and running repairDatabase database command on all databases.

488

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

Warning: In general, if you have an intact copy of your data, such as would exist on a very recent backup or an intact member of a replica set, do not use repairDatabase or related options like db.repairDatabase() (page 471) in the mongo shell or mongod --repair (page 488). Restore from an intact copy of your data.

Note: When using journaling, there is almost never any need to run repairDatabase. In the event of an unclean shutdown, the server will be able restore the data les to a pristine state automatically. Changed in version 2.1.2. If you run the repair option and have data in a journal le, mongod will refuse to start. In these cases you should start mongod without the --repair (page 488) option to allow mongod to recover data from the journal. This will complete more quickly and will result in a more consistent and complete data set. To continue the repair operation despite the journal les, shut down mongod cleanly and restart with the --repair (page 488) option. --repairpath <path> Species the root directory containing MongoDB data les, to use for the --repair (page 488) operation. Defaults to the value specied by --dbpath (page 487). --slowms <value> Denes the value of slow, for the --profile (page 488) option. The database logs all slow queries to the log, even when the proler is not turned on. When the database proler is on, mongod the proler writes to the system.profile collection. See profile (page 499) for more information on the database proler. --smallfiles Enables a mode where MongoDB uses a smaller default le size. Specically, --smallfiles (page 489) reduces the initial size for data les and limits them to 512 megabytes. --smallfiles (page 489) also reduces the size of each journal les from 1 gigabyte to 128 megabytes. Use --smallfiles (page 489) if you have a large number of databases that each holds a small quantity of data. --smallfiles (page 489) can lead your mongod to create a large number of les, which may affect performance for larger databases. --shutdown Used in control scripts, the --shutdown (page 489) will cleanly and safely terminate the mongod process. When invoking mongod with this option you must set the --dbpath (page 487) option either directly or by way of the conguration le (page 494) and the --config (page 485) option. --syncdelay <value> This setting controls the maximum number of seconds between disk syncs. While mongod is always writing data to disk, this setting controls the maximum guaranteed interval between a successful write operation and the next time the database ushes data to disk. In many cases, the actual interval between write operations and disk ushes is much shorter than the value If set to 0, mongod ushes all operations to disk immediately, which may have a signicant performance impact. If --journal (page 487) is true, all writes will be durable, by way of the journal within the time specied by --journalCommitInterval (page 487). --sysinfo Returns diagnostic system information and then exits. The information provides the page size, the number of physical pages, and the number of available physical pages. --upgrade Upgrades the on-disk data format of the les specied by the --dbpath (page 487) to the latest version, if needed.

31.1. mongod Manual

489

MongoDB Documentation, Release 2.0.6

This option only affects the operation of mongod if the data les are in an old format. Note: In most cases you should not set this value, so you can exercise the most control over your upgrade process. See the MongoDB release notes (on the download page) for more information about the upgrade process. --traceExceptions For internal diagnostic use only. Replication Options --replSet <setname> Use this option to congure replication with replica sets. Specify a setname as an argument to this set. All hosts must have the same set name. See Also: Replication (page 31), Replica Set Administration (page 38), and Replica Set Conguration (page 561) --oplogSize <value> Species a maximum size in megabytes for the replication operation log (e.g. oplog.) By mongod creates an oplog based on the maximum amount of space available. For 64-bit systems, the op log is typically 5% of available disk space. Once the mongod has created the oplog for the rst time, changing --oplogSize (page 490) will not affect the size of the oplog. --fastsync In the context of replica set replication, set this option if you have seeded this replica with a snapshot of the dbpath of another member of the set. Otherwise the mongod will attempt to perform a full sync. Warning: If the data is not perfectly synchronized and mongod starts with fastsync (page 501), then the secondary or slave will be permanently out of sync with the primary, which may cause signicant consistency problems. --replIndexPrefetch New in version 2.2. You must use --replIndexPrefetch (page 490) in conjunction with replSet (page 501). The default value is all and available options are: none all _id_only By default secondary members of a replica set will load all indexes related to an operation into memory before applying operations from the oplog. You can modify this behavior so that the secondaries will only load the _id index. Specify _id_only or none to prevent the mongod from loading any index into memory. Master/Slave Replication These options provide access to conventional master-slave database replication. While this functionality remains accessible in MongoDB, replica sets are the preferred conguration for database replication. --master Congures mongod to run as a replication master.

490

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

--slave Congures mongod to run as a replication slave. --source <host>:<port> For use with the --slave (page 490) option, the --source option designates the server that this instance will replicate. --only <arg> For use with the --slave (page 490) option, the --only option species only a single database to replicate. --slavedelay <value> For use with the --slave (page 490) option, the --slavedelay option congures a delay in seconds, for this slave to wait to apply operations from the master node. --autoresync For use with the --slave (page 490) option, the --autoresync (page 491) option allows this slave to automatically resync if the local data is more than 10 seconds behind the master. This option may be problematic if the oplog is too small (controlled by the --oplogSize (page 490) option.) If the oplog not large enough to store the difference in changes between the masters current state and the state of the slave, this node will forcibly resync itself unnecessarily. When you set the If the --autoresync (page 491) option the slave will not attempt an automatic resync more than once in a ten minute period. Sharding Cluster Options --configsvr Declares that this mongod instance serves as the cong database of a shard cluster. The default port for mongod with this option is 27019 and mongod writes all data les to the /configdb sub-directory of the --dbpath (page 487) directory. --shardsvr Congures this mongod instance as a shard in a partitioned cluster. The default port for these instances is 27018. --noMoveParanoia Disables a paranoid mode for data writes for the moveChunk.

31.1.3 Usage
In common usage, the invocation of mongod will resemble the following in the context of an initialization or control script:
mongod --config /etc/mongodb.conf

See the Conguration File Options (page 494) for more information on how to congure mongod using the conguration le.

31.2 mongos Manual


31.2.1 Synopsis
mongos for MongoDB Shard, is a routing service for MongoDB shard congurations that processes queries from the application layer, and determines the location of this data in the shard cluster, in order to complete these operations. From the perspective of the application, a mongos instance behaves identically to any other MongoDB instance. See Also: 31.2. mongos Manual 491

MongoDB Documentation, Release 2.0.6

See the Sharding wiki page for more information regarding MongoDBs sharding functionality. Note: Changed in version 2.1. Some aggregation operations using the aggregate will cause mongos instances to require more CPU resources than in previous versions. This modied performance prole may dictate alternate architecture decisions if you make use the aggregation framework extensively in a sharded environment.

31.2.2 Options
mongos --help, -h Returns a basic help and usage text. --version Returns the version of the mongod daemon. --config <filename>, -f <filename> Species a conguration le, that you can use to specify runtime-congurations. While the options are equivalent and accessible via the other command line arguments, the conguration le is the preferred method for runtime conguration of mongod. See the Conguration File Options (page 494) document for more information about these options. Not all conguration options for mongod make sense in the context of mongos. --verbose, -v Increases the amount of internal reporting returned on standard output or in the log le specied by --logpath (page 492). Use the -v form to control the level of verbosity by including the option multiple times, (e.g. -vvvvv.) --quiet Runs the mongos instance in a quiet mode that attempts to limit the amount of output. --port <port> Species a TCP port for the mongos to listen for client connections. By default mongos listens for connections on port 27017. UNIX-like systems require root access to access ports with numbers lower than 1000. --bind_ip <ip address> The IP address that the mongos process will bind to and listen for connections. By default mongos listens for connections on the localhost (i.e. 127.0.0.1 address.) You may attach mongos to any interface; however, if you attach mongos to a publicly accessible interface you must implement proper authentication or rewall restrictions to protect the integrity of your database. --maxConns <number> Species the maximum number of simultaneous connections that mongos will accept. This setting will have no effect if the value of this setting is higher than your operating systems congured maximum connection tracking threshold. This is particularly useful for mongos if you have a client that creates a number of collections but allows them to timeout rather than close the collections. When you set maxConns (page 495), ensure the value is slightly higher than the size of the connection pool or the total number of connections to prevent erroneous connection spikes from propagating to the members of a shard cluster. --objcheck Forces the mongos to validate all requests from clients upon receipt to ensure that invalid objects are never inserted into the database. This option can produce a signicant performance impact, and is not enabled by default.

492

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

--logpath <path> Specify a path for the log le that will hold all diagnostic logging information. Unless specied, mongos will output all log information to the standard output. Additionally, unless you also specify --logappend (page 493), the logle will be overwritten when the process restarts. --logappend Specify to ensure that mongos appends additional logging data to the end of the logle rather than overwriting the content of the log when the process restarts. --pidfilepath <path> Specify a le location to hold the PID or process ID of the mongod process. Useful for tracking the mongod process in combination with the mongos --fork (page 493) option. Without this option, mongos will create a PID le. --keyFile <file> Specify the path to a key le to store authentication information. This option is only useful for the connection between mongos instances and components of the shard cluster. See Also: Replica Set Security (page 44) and Replica Set Administration (page 38). --nounixsocket Disables listening on the UNIX socket. Without this option mongos creates a UNIX socket. --unixSocketPrefix <path> Species a path for the UNIX socket. Unless specied, mongos creates a socket in the /tmp path. --fork Enables a daemon mode for mongod which forces the process to the background. This is the normal mode of operation, in production and production-like environments, but may not be desirable for testing. --configdb <config1>,<config2><:port>,<config3> Set this option to specify a conguration database (i.e. cong database) for the shard cluster. You must specify either 1 conguration server or 3 conguration servers, in a comma separated list. Note: Each mongos reads from the rst cong server in the list provided. If your conguration databases reside in more that one data center, you should specify the closest cong servers as the rst servers in the list. --test This option is for internal testing use only, and runs unit tests without starting a mongos instance. --upgrade This option updates the meta data format used by the cong database. --chunkSize <value> The value of the --chunkSize (page 493) determines the size of each chunk of data distributed around thee shard cluster. The default value is 64 megabytes, which is the ideal size for chunks in most deployments: larger chunk size can lead to uneven data distribution, smaller chunk size often leads to inefcient movement of chunks between nodes. However, in some circumstances it may be necessary to set a different chunk size. This option only sets the chunk size when initializing the cluster for the rst time. If you modify the run-time option later, the new value will have no effect. See the Modifying Chunk Size (page 106) procedure if you need to change the chunk size on an existing shard cluster. --ipv6 Enables IPv6 support to allow clients to connect to mongos using IPv6 networks. MongoDB disables IPv6 support by default in mongod and all utilities.

31.2. mongos Manual

493

MongoDB Documentation, Release 2.0.6

--jsonp Permits JSONP access via an HTTP interface. Consider the security implications of allowing this activity before enabling this option. --noscripting Disables the scripting engine. --nohttpinterface New in version 2.1.2. Disables the HTTP interface. --localThreshold New in version 2.2. --localThreshold (page 494) affects the logic that program:mongos uses when selecting replica set members to pass reads operations to from clients. Specify a value to --localThreshold (page 494) in milliseconds. The default value is 15, which corresponds to the default value in all of the client drivers (page 225). When mongos receives a request that permits reads to secondary members, the mongos will: nd the nearest suitable member of the set, in terms of ping time. construct a list of replica set members that is within a ping time of 15 milliseconds of the nearest suitable member of the set. If you specify a value for --localThreshold (page 494), mongos will construct the list of replica members that are within the latency allowed by this value. The mongos will select a member to read from at random from this list. See the Member Selection (page 55) section of the read preference (page 51) documentation for more information.

31.3 Conguration File Options


31.3.1 Synopsis
Administrators and users can control mongod or mongos instances at runtime either directly from mongods command line arguments (page 485) or using a conguration le. While both methods are functionally equivalent and all settings are similar, the conguration le method is preferable. If you installed from a package and have started MongoDB using your systems control script, youre already using a conguration le. To start mongod or mongos using a cong le, use one of the following forms:
mongod mongod mongos mongos --config /etc/mongodb.conf -f /etc/mongodb.conf --config /srv/mongodb/mongos.conf -f /srv/mongodb/mongos.conf

Declare all settings in this le using the following form:


<setting> = <value>

New in version 2.0: Before version 2.0, Boolean (i.e. true|false) or ag parameters, register as true, if they appear in the conguration le, regardless of their value.

494

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

31.3.2 Settings
verbose Default: false Increases the amount of internal reporting returned on standard output or in the log le generated by logpath (page 496). v Default: false Alternate form of verbose (page 495). vv Default: false Additional increase in verbosity of output and logging. vvv Default: false Additional increase in verbosity of output and logging. vvvv Default: false Additional increase in verbosity of output and logging. vvvvv Default: false Additional increase in verbosity of output and logging. quiet Default: false Runs the mongod or mongos instance in a quiet mode that attempts to limit the amount of output. port Default: 27017 Species a TCP port for the mongod or mongos instance to listen for client connections. UNIX-like systems require root access for ports with numbers lower than 1000. bind_ip Default: All interfaces. Set this option to congure the mongod or mongos process to bind to and listen for connections from applications on this address. You may attach mongod or mongos instances to any interface; however, if you attach the process to a publicly accessible interface, implement proper authentication or rewall restrictions to protect the integrity of your database. You may set this value multiple times to bind mongod to multiple IP addresses. maxConns Default: depends on system (i.e. ulimit and le descriptor) limits. Unless set MongoDB will not limit its own connections. Species a value to set the maximum number of simultaneous connections that mongod or mongos will accept. This setting has no effect if it is higher than your operating systems congured maximum connection tracking threshold. This is particularly useful for mongos if you have a client that creates a number of collections but allows them to timeout rather than close the collections. When you set maxConns (page 495), ensure the value is slightly

31.3. Conguration File Options

495

MongoDB Documentation, Release 2.0.6

higher than the size of the connection pool or the total number of connections to prevent erroneous connection spikes from propagating to the members of a shard cluster. objcheck Default: false Set to true to force mongod to validate all requests from clients upon receipt to ensure that invalid BSON objects are never inserted into the database. mongod does not enable this by default because of the required overhead. logpath Default: None. (i.e. /dev/stdout) Specify the path to a le name for the log le that will hold all diagnostic logging information. Unless specied, mongod will output all log information to the standard output. (page 496) is true, the logle will be overwritten when the process restarts. Unless logappend

Note: Currently, MongoDB will overwrite the contents of the log le if the logappend (page 496) is not used. This behavior may change in the future depending on the outcome of SERVER-4499. logappend Default: false Set to true to add new entries to the end of the logle rather than overwriting the content of the log when the process restarts. If this setting is not specied, then MongoDB will overwrite the existing logle upon start up. Note: The behavior of the logging system may change in the near future in response to the SERVER-4499 case. syslog Sends all logging output to the hosts syslog system rather than to standard output or a log le as with logpath (page 496). Warning: You cannot use syslog (page 496) with logpath (page 496). pidfilepath Default: None. Specify a le location to hold the PID or process ID of the mongod process. Useful for tracking the mongod process in combination with the fork (page 497) setting. Without this option, mongod creates no PID le. keyFile Default: None. Specify the path to a key le to store authentication information. This option is only useful for the connection between replica set members. See Also: Replica Set Security (page 44) and Replica Set Administration (page 38). nounixsocket Default: false

496

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

Set to true to disable listening on the UNIX socket. Unless set to false, mongod and mongos provide a UNIX-socket. unixSocketPrefix Default: /tmp Species a path for the UNIX socket. Unless this option has a value, mongod and mongos, create a socket with the /tmp as a prex. fork Default: false Set to true to enable a daemon mode for mongod that runs the process in the background. auth Default: false Set to true to enable database authentication for users connecting from remote hosts. Congure users via the mongo shell (page 503). If no users exist, the localhost interface will continue to have access to the database until the you create the rst user. cpu Default: false Set to true to force mongod to report every four seconds CPU utilization and the amount of time that the processor waits for I/O operations to complete (i.e. I/O wait.) MongoDB writes this data to standard output, or the logle if using the logpath (page 496) option. dbpath Default: /data/db/ Set this value to designate a directory for the mongod instance to store its data. Typical locations include: /srv/mongodb, /var/lib/mongodb or /opt/mongodb Unless specied, mongod will look for data les in the default /data/db directory. (Windows systems use the \data\db directory.) If you installed using a package management system. Check the /etc/mongodb.conf le provided by your packages to see the conguration of the dbpath (page 497). diaglog Default: 0 Creates a very verbose, diagnostic log for troubleshooting and recording various errors. MongoDB writes these log les in the dbpath (page 497) in a series of les that begin with the string diaglog. The value of this setting congures the level of verbosity. Possible values, and their impact are as follows. Value 0 1 2 3 7 Setting off. No logging. Log write operations. Log read operations. Log both read and write operations. Log write and some read operations.

You can use the mongosniff tool to replay this output for investigation. Given a typical diaglog le, located at /data/db/diaglog.4f76a58c, you might use a command in the following form to read these les:
mongosniff --source DIAGLOG /data/db/diaglog.4f76a58c

diaglog (page 497) is for internal use and not intended for most users. directoryperdb Default: false

31.3. Conguration File Options

497

MongoDB Documentation, Release 2.0.6

Set to true to modify the storage pattern of the data directory to store each databases les in a distinct folder. This option will create directories within the dbpath (page 497) named for each directory. Use this option in conjunction with your le system and device conguration so that MongoDB will store data on a number of distinct disk devices to increase write throughput or disk capacity. journal Default: (on 64-bit systems) true Default: (on 32-bit systems) false Set to true to enable operation journaling to ensure write durability and data consistency. Set to false to prevent the overhead of journaling in situations where durability is not required. To reduce the impact of the journaling on disk usage, you can leave journal (page 498) enabled, and set smallfiles (page 500) to true to reduce the size of the data and journal les. journalCommitInterval Default: 100 Set this value to specify the maximum amount of time for mongod to allow between journal operations. The default value is 100 milliseconds. Lower values increase the durability of the journal, at the possible expense of disk performance. This option accepts values between 2 and 300 milliseconds. ipv6 Default: false Set to true to IPv6 support to allow clients to connect to mongod using IPv6 networks. mongod disables IPv6 support by default in mongod and all utilities. jsonp Default: false Set to true to permit JSONP access via an HTTP interface. Consider the security implications of allowing this activity before setting this option. noauth Default: true Disable authentication. Currently the default. Exists for future compatibility and clarity. For consistency use the auth (page 497) option. nohttpinterface Default: false Set to true to disable the HTTP interface. This command will override the rest (page 499) and disable the HTTP interface if you specify both. Changed in version 2.1.2: The nohttpinterface (page 498) option is not available for mongos instances before 2.1.2 nojournal Default: (on 64-bit systems) false Default: (on 32-bit systems) true Set nojournal = true to disable durability journaling. By default, mongod enables journaling in 64-bit versions after v2.0. noprealloc Default: false Set noprealloc = true to disable the preallocation of data les. This will shorten the start up time in some cases, but can cause signicant performance penalties during normal operations.

498

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

noscripting Default: false Set noscripting = true to disable the scripting engine. notablescan Default: false Set notablescan = true to forbid operations that require a table scan. nssize Default: 16 Specify this value in megabytes. Use this setting to control the default size for all newly created namespace les (i.e .ns). This option has no impact on the size of existing namespace les. The default value is 16 megabytes, this provides for effectively 12,000 possible namespace. The maximum size is 2 gigabytes. profile Default: 0 Modify this value to changes the level of database proling, which inserts information about operation performance into output of mongod or the log le if specied by logpath (page 496). The following levels are available: Level 0 1 2 Setting Off. No proling. On. Only includes slow operations. On. Includes all operations.

By default, mongod disables proling. Database proling can impact database performance because the proler must record and process all database operations. Enable this option only after careful consideration. quota Default: false Set to true to enable a maximum limit for the number data les each database can have. The default quota is 8 data les, when quota is true. Adjust the quota size with the with the quotaFiles (page 499) setting. quotaFiles Default: 8 Modify limit on the number of data les per database. This option requires the quota (page 499) setting. rest Default: false Set to true to enable a simple REST interface. repair Default: false Set to true to run a repair routine on all databases following start up. In general you should set this option on the command line and not in the conguration le (page 139) or in a control script. Use the mongod --repair (page 488) option to access this functionality. Note: Because mongod rewrites all of the database les during the repair routine, if you do not run repair (page 499) under the same user account as mongod usually runs, you will need to run chown on your database les to correct the permissions before starting mongod again.

31.3. Conguration File Options

499

MongoDB Documentation, Release 2.0.6

repairpath Default: dbpath (page 497) Specify the path to the directory containing MongoDB data les, to use in conjunction with the repair (page 499) setting or mongod --repair (page 488) operation. Defaults to the value specied by dbpath (page 497). slowms Default: 100 Specify values in milliseconds. Sets the threshold for mongod to consider a query slow for the database proler. The database logs all slow queries to the log, even when the proler is not turned on. When the database proler is on, mongod the proler writes to the system.profile collection. See Also: profile (page 499) smallfiles Default: false Set to true to modify MongoDB to use a smaller default data le size. Specically, smallfiles (page 500) reduces the initial size for data les and limits them to 512 megabytes. The smallfiles (page 500) setting also reduces the size of each journal les from 1 gigabyte to 128 megabytes. Use the smallfiles (page 500) setting if you have a large number of databases that each hold a small quantity of data. The smallfiles (page 500) setting can lead mongod to create many les, which may affect performance for larger databases. syncdelay Default: 60 This setting controls the maximum number of seconds between ushes of pending writes to disk. While mongod is always writing data to disk, this setting controls the maximum guaranteed interval between a successful write operation and the next time the database ushes data to disk. In many cases, the actual interval between write operations and disk ushes is much shorter than the value If set to 0, mongod ushes all operations to disk immediately, which may have a signicant performance impact. If journal (page 498) is true, all writes will be durable, by way of the journal within the time specied by journalCommitInterval (page 498). sysinfo Default: false When set to true, mongod returns diagnostic system information regarding the page size, the number of physical pages, and the number of available physical pages to standard output. More typically, run this operation by way of the mongod --sysinfo (page 489) command. When running with the sysinfo (page 500), only mongod only outputs the page information and no database process will start. upgrade Default: false When set to true this option upgrades the on-disk data format of the les specied by the dbpath (page 497) to the latest version, if needed. This option only affects the operation of mongod if the data les are in an old format. When specied for a mongos instance, this option updates the meta data format used by the cong database.

500

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

Note: In most cases you should not set this value, so you can exercise the most control over your upgrade process. See the MongoDB release notes (on the download page) for more information about the upgrade process. traceExceptions Default: false For internal diagnostic use only. Replication Options replSet Default: <none> Form: <setname> Use this setting to congure replication with replica sets. Specify a replica set name as an argument to this set. All hosts must have the same set name. See Also: Replication (page 31), Replica Set Administration (page 38), and Replica Set Conguration (page 561) oplogSize Species a maximum size in megabytes for the replication operation log (e.g. oplog.) mongod creates an oplog based on the maximum amount of space available. For 64-bit systems, the oplog is typically 5% of available disk space. Once the mongod has created the oplog for the rst time, changing oplogSize (page 501) will not affect the size of the oplog. fastsync Default: false In the context of replica set replication, set this option to true if you have seeded this replica with a snapshot of the dbpath of another member of the set. Otherwise the mongod will attempt to perform a full sync. Warning: If the data is not perfectly synchronized and mongod starts with fastsync (page 501), then the secondary or slave will be permanently out of sync with the primary, which may cause signicant consistency problems. replIndexPrefetch New in version 2.2. Default: all Values: all, none, and _id_only You must use replIndexPrefetch (page 501) in conjunction with replSet (page 501). By default secondary members of a replica set will load all indexes related to an operation into memory before applying operations from the oplog. You can modify this behavior so that the secondaries will only load the _id index. Specify _id_only or none to prevent the mongod from loading any index into memory. Master/Slave Replication master Default: false Set to true to congure the current instance to act as master instance in a replication conguration.

31.3. Conguration File Options

501

MongoDB Documentation, Release 2.0.6

slave Default: false Set to true to congure the current instance to act as slave instance in a replication conguration. source Default: <> Form: <host>:<port> Used with the slave (page 501) setting to specify the master instance from which this slave instance will replicate only Default: <> Used with the slave (page 501) option, the only setting species only a single database to replicate. slavedelay Default: 0 Used with the slave (page 501) setting, the slavedelay setting congures a delay in seconds, for this slave to wait to apply operations from the master instance. autoresync Default: false Used with the slave (page 501) setting, set autoresync to true to force the slave to automatically resync if the is more than 10 seconds behind the master. This setting may be problematic if the --oplogSize (page 490) oplog is too small (controlled by the --oplogSize (page 490) option.) If the oplog not large enough to store the difference in changes between the masters current state and the state of the slave, this instance will forcibly resync itself unnecessarily. When you set the autoresync (page 502) option, the slave will not attempt an automatic resync more than once in a ten minute period. Sharding Cluster Options configsvr Default: false Set this value to true to congure this mongod instance to operate as the cong database of a shard cluster. The default port for mongod with this option is 27019 and mongod writes all data les to the /configdb sub-directory of the dbpath (page 497) directory. shardsvr Default: false Set this value to true to congure this mongod instance as a shard in a partitioned cluster. The default port for these instances is 27018. noMoveParanoia Default: false Disables a paranoid mode for data writes for the moveChunk command when set to true. configdb Default: None. Format: <config1>,<config2><:port>,<config3> Set this option to specify a conguration database (i.e. cong database) for the shard cluster. You must specify either 1 conguration server or 3 conguration servers, in a comma separated list. This setting only affects mongos processes. 502 Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

test Default: false Only runs unit tests and does not start a mongos instance. This setting only affects mongos processes and is for internal testing use only. chunkSize Default: 64 The value of this option determines the size of each chunk of data distributed around the shard cluster. The default value is 64 megabytes. Larger chunks may lead to an uneven distribution of data, while smaller chunks may lead to frequent and unnecessary migrations. However, in some circumstances it may be necessary to set a different chunk size. This setting only affects mongos processes. Furthermore, chunkSize (page 503) only sets the chunk size when initializing the cluster for the rst time. If you modify the run-time option later, the new value will have no effect. See the Modifying Chunk Size (page 106) procedure if you need to change the chunk size on an existing shard cluster. localThreshold New in version 2.2. localThreshold (page 503) affects the logic that program:mongos uses when selecting replica set members to pass reads operations to from clients. Specify a value to localThreshold (page 503) in milliseconds. The default value is 15, which corresponds to the default value in all of the client drivers (page 225). This setting only affects mongos processes. When mongos receives a request that permits reads to secondary members, the mongos will: nd the nearest suitable member of the set, in terms of ping time. construct a list of replica set members that is within a ping time of 15 milliseconds of the nearest suitable member of the set. If you specify a value for localThreshold (page 503), mongos will construct the list of replica members that are within the latency allowed by this value. The mongos will select a member to read from at random from this list. See the Member Selection (page 55) section of the read preference (page 51) documentation for more information.

31.4 mongo Manual


31.4.1 Synopsis
mongo is an interactive JavaScript shell interface to MongoDB. mongo provides a powerful administrative interface for systems administrators as well as an way to test queries and operations directly with the database. To increase the exibility of mongo, the shell provides a fully functional JavaScript environment. This manual page, addresses the basic invocation of the mongo shell and an overview of its usage.

31.4.2 Options
mongo

31.4. mongo Manual

503

MongoDB Documentation, Release 2.0.6

--shell If you invoke the mongo and specify a JavaScript le as an argument, or mongo --eval (page 504) the --shell (page 503) provides the user with a shell prompt after the le nishes executing. --nodb Use this option to prevent the shell from connecting to any database instance. --norc By default mongo runs the ~/.mongorc.js le when it starts. Use this option to prevent the shell from sourcing this le on start up. --quiet Silences output from the shell during the connection process. --port <PORT> Specify the port where the mongod or mongos instance is listening. Unless specied mongo connects to mongod instances on port 27017, which is the default mongod port. --host <HOSTNAME> Specic the host where the mongod or mongos is running to connect to as <HOSTNAME>. By default mongo will attempt to connect to MongoDB process running on the localhost. --eval <JAVASCRIPT> Evaluates a JavaScript specied as an argument to this option. mongo does not load its own environment when evaluating code: as a result many convinces of the shell environment are not available. --username <USERNAME>, -u <USERNAME> Specify a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the mongo --password (page 504) option to supply a password. --password <password>, -p <password> Specify a password to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the mongo --username (page 504) option to supply a username. If you specify a --username (page 504) without the --password (page 504) option, mongo will prompt for a password interactively. --help, -h Returns a basic help and usage text. --version Returns the version of the shell. --verbose Increases the verbosity of the output of the shell during the connection process. --ipv6 Enables IPv6 support that allows mongo to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongo, disable IPv6 support by default. <db address> Specify the database address of the database to connect to. For example:
mongo admin

The above command will connect the mongo shell to the administrative database on the local machine. You may specify a remote database instance, with the resolvable hostname or IP address. Separate the database name from the hostname using a / character. See the following examples:
mongo mongodb1.example.net mongo mongodb1/admin mongo 10.8.8.10/test

504

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

<file.js> Optionally, specify a JavaScript le as the nal argument to the shell. The shell will run the le and then exit. Use the mongo --shell (page 503) to return to a shell after the le nishes running. This should be the last address

31.4.3 Usage
Typically users invoke the shell with the mongo command at the system prompt. Consider the following examples for other scenarios. To connect to a database on a remote host using authentication and a non-standard port, use the following form:
mongo --username <user> --password <pass> --hostname <host> --port 28015

Alternatively, consider the following short form:


mongo -u <user> -p <pass> --host <host> --port 28015

Replace <user>, <pass>, and <host> with the appropriate values for your situation and substitute or omit the --port (page 504) as needed. To execute a JavaScript le without evaluating the ~/.mongorc.js le before starting a sell session, use the following form:
mongo --shell --norc alternate-environment.js

To print return a query as JSON , from the system prompt using the --eval (page 504) option, use the following form:
mongo --eval db.collection.find().forEach(printJson)

Note the use of single quotes (e.g. ) to enclose the JavaScript, as well as the additional JavaScript required to generate this output.

31.5 mongodump Manual


31.5.1 Synopsis
mongodump is a utility for creating a binary export of the contents of a database. Consider using this utility as part an effective backup strategy (page 156). Use in conjunction with mongorestore to provide restore functionality. Note: If you use the mongodump tool from the 2.2 distribution to create a dump of a database, you can restore that dump only to a 2.2 database. See Also: mongorestore and Backup and Restoration Strategies (page 156).

31.5.2 Options
mongodump --help Returns a basic help and usage text. 31.5. mongodump Manual 505

MongoDB Documentation, Release 2.0.6

--verbose, -v Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.) --version Returns the version of the mongodump utility and exits. --host <hostname><:port> Species a resolvable hostname for the mongod that you wish to use to create the database dump. By default mongodump will attempt to connect to a MongoDB process ruining on the localhost port number 27017. Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017. To connect to a replica set, use the --host (page 506) argument with a setname, followed by a slash and a comma separated list of host names and port numbers. The mongodump utility will, given the seed of at least one connected set member, connect to the primary member of that set. This option would resemble:

mongodump --host repl0/mongo0.example.net,mongo0.example.net:27018,mongo1.example.net,mongo2.exa

You can always connect directly to a single MongoDB instance by specifying the host and port number directly. --port <port> Species the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the --host (page 506) option. --ipv6 Enables IPv6 support that allows mongodump to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongodump, disable IPv6 support by default. --username <username>, -u <username> Species a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the --password (page 506) option to supply a password. --password <password> Species a password to authenticate to the MongoDB instance. Use in conjunction with the --username (page 506) option to supply a username. If you specify a --username (page 506) without the --password (page 506) option, mongodump will prompt for a password interactively. --dbpath <path> Species the directory of the MongoDB data les. If used, the --dbpath (page 506) option enables mongodump to attach directly to local data les and copy the data without the mongod. To run with --dbpath (page 506), mongodump needs to restrict access to the data directory: as a result, no mongod can access the same path while the process runs. --directoryperdb Use the --directoryperdb (page 506) in conjunction with the corresponding option to mongod. This option allows mongodump to read data les organized with each database located in a distinct directory. This option is only relevant when specifying the --dbpath (page 506) option. --journal Allows mongodump operations to use the durability journal to ensure that the export is in a consistent state. This option is only relevant when specifying the --dbpath (page 506) option. --db <db>, -d <db> Use the --db (page 506) option to specify a database for mongodump to backup. If you do not specify a DB, mongodump copies all databases in this instance into the dump les. Use this option to backup or copy a smaller subset of your data. --collection <collection>, -c <collection> Use the --collection (page 506) option to specify a collection for mongodump to backup. If you do not 506 Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

specify a collection, this option copies all collections in the specied database or instance to the dump les. Use this option to backup or copy a smaller subset of your data. --out <path>, -o <path> Species a path where mongodump and store the output the database dump. If you want to output the the database dump to standard output, specify a - rather than a path. --query <json>, -q <json> Provides a query to limit (optionally) the documents included in the output of mongodump. --oplog Use this option to ensure that mongodump creates a dump of the database that includes an oplog, to create a point-in-time snapshot of the state of a mongod instance. To restore to a specic point-in-time backup, use the output created with this option in conjunction with mongorestore --oplogReplay (page 510). Without --oplog (page 507), if there are write operations during the dump operation, the dump will not reect a single moment in time. Changes made to the database during the update process can affect the output of the backup. --oplog (page 507) has no effect when running mongodump against a mongos instance to dump the entire contents of a shard cluster. However, you can use --oplog (page 507) to dump individual shards. Note: --oplog (page 507) only works against nodes that maintain a oplog. This includes all members of a replica set, as well as master nodes in master/slave replication deployments. --repair Use this option to run a repair option in addition to dumping the database. The repair option attempts to repair a database that may be in an inconsistent state as a result of an improper shutdown or mongod crash. --forceTableScan Forces mongodump to scan the data store directly: typically, mongodump saves entries as they appear in the index of the _id eld. Use --forceTableScan (page 507) to skip the index and scan the data directly. Typically there are two cases where this behavior is preferable to the default: 1.If you have key sizes over 800 bytes that would not be present in the _id index. 2.Your database uses a custom _id eld. When you run with --forceTableScan (page 507), mongodump does not use $snapshot (page 391). As a result, the dump produced by mongodump can reect the state of the database at many different points in time. Warning: Use --forceTableScan (page 507) with extreme caution and consideration. operations

31.5.3 Behavior
When running mongodump against a mongos instance where the shard cluster consists of replica sets, the read preference of the operation will prefer reads from secondary members of the set.

31.5.4 Usage
See the backup guide section on database dumps (page 160) for a larger overview of mongodump usage. Also see the mongorestore Manual (page 508) document for an overview of the mongorestore, which provides the related inverse functionality. 31.5. mongodump Manual 507

MongoDB Documentation, Release 2.0.6

The following command, creates a dump le that contains only the collection named collection in the database named test. In this case the database is running on the local interface on port 27017:
mongodump --collection collection --database test

In the next example, mongodump creates a backup of the database instance stored in the /srv/mongodb directory on the local machine. This requires that no mongod instance is using the /srv/mongodb directory.
mongodump --dbpath /srv/mongodb

In the nal example, mongodump creates a database dump located at /opt/backup/mongodump-2011-10-24, from a database running on port 37017 on the host mongodb1.example.net and authenticating using the username user and the password pass, as follows:

mongodump --host mongodb1.example.net --port 37017 --username user --password pass /opt/backup/mongod

31.6 mongorestore Manual


31.6.1 Synopsis
The mongorestore tool imports content from binary database dump into a specic database. mongorestore can import content to an existing database or create a new one. Specically, mongorestore takes the output from mongodump and restores it. mongorestore creates indexes on a restore. mongorestore performs only inserts. If existing data with the same _id already exists on the database, mongorestore will not replace it. The behavior of mongorestore has the following properties: all operations are inserts, not updates. all inserts are re and forget, mongorestore does not wait for a response from a mongod to ensure that the MongoDB process has received or recorded the operation. The mongod will record any errors to its log that occur during a restore operation but mongorestore will not receive errors. Note: If you use the mongodump tool from the 2.2 distribution to create a dump of a database, you can restore that dump only to a 2.2 database.

31.6.2 Options
mongorestore --help Returns a basic help and usage text. --verbose, -v Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.) --version Returns the version of the mongorestore tool.

508

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

--host <hostname><:port> Species a resolvable hostname for the mongod to which you want to restore the database. By default mongorestore will attempt to connect to a MongoDB process running on the localhost port number 27017. Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017. --port <port> Species the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the --host (page 508) command. --ipv6 Enables IPv6 support that allows mongorestore to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongorestore, disable IPv6 support by default. --username <username>, -u <username> Species a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the --password (page 509) option to supply a password. --password <password> Species a password to authenticate to the MongoDB instance. Use in conjunction with the mongorestore --username (page 509) option to supply a username. If you specify a --username (page 509) without the --password (page 509) option, mongorestore will prompt for a password interactively. --dbpath <path> Species the directory of the MongoDB data les. If used, the --dbpath (page 509) option enables mongorestore to attach directly to local data les and insert the data without the mongod. To run with --dbpath (page 509), mongorestore needs to lock access to the data directory: as a result, no mongod can access the same path while the process runs. --directoryperdb Use the --directoryperdb (page 509) in conjunction with the corresponding option to mongod, which allows mongorestore to import data into MongoDB instances that have every databases les saved in discrete directories on the disk. This option is only relevant when specifying the --dbpath (page 509) option. --journal Allows mongorestore write to the durability journal to ensure that the data les will remain in a consistent state during the write process. This option is only relevant when specifying the --dbpath (page 509) option. --db <db>, -d <db> Use the --db (page 509) option to specify a database for mongorestore to restore data into. If the database doesnt exist, mongorestore will create the specied database. If you do not specify a <db>, mongorestore creates new databases that correspond to the databases where data originated and data may be overwritten. Use this option to restore data into a MongoDB instance that already has data. --db (page 509) does not control which BSON les mongorestore restores. mongorestore path option (page 510) to limit that restored data. You must use the

--collection <collection>, -c <collection> Use the --collection (page 509) option to specify a collection for mongorestore to restore. If you do not specify a <collection>, mongorestore imports all collections created. Existing data may be overwritten. Use this option to restore data into a MongoDB instance that already has data, or to restore only some data in the specied imported data set. --objcheck Forces mongorestore to validate every object before inserting it in the target database. --filter <JSON> Limits the documents that mongorestore imports to only those documents that match the JSON document

31.6. mongorestore Manual

509

MongoDB Documentation, Release 2.0.6

specied as <JSON>. Be sure to include the document in single quotes to avoid interaction with your systems shell environment. --drop Modies the restoration procedure to drop every collection from the target database before restoring the collection from the dumped backup. --oplogReplay Replays the oplog after restoring the dump to ensure that the current state of the database reects the point-intime backup captured with the mongodump --oplog (page 507) command. --keepIndexVersion Prevents mongorestore from upgrading the index to the latest version during the restoration process. --w <number of replicas per write> New in version 2.2. Species the write concern (page 50) for each write operation that mongorestore writes to the target database. By default, mongorestore waits for the write operation to return on 1 member of the set (i.e. the primary.) --noOptionsRestore New in version 2.2. Prevents mongorestore from setting the collection options, such as those specied by the collMod database command, on restored collections. --noIndexRestore New in version 2.2. Prevents mongorestore from restoring and building indexes as specied in the corresponding mongodump output. --oplogLimit <timestamp> New in version 2.2. Prevents mongorestore from applying oplog entries newer than the <timestamp>. Specify <timestamp> values in the form of <time_t>:<ordinal>, where <time_t> is the seconds since the UNIX epoch, and <ordinal> represents a counter of operations in the oplog that occurred in the specied second. You must use --oplogLimit (page 510) in conjunction with the --oplogReplay (page 510) option. <path> The nal argument of the mongorestore command is a directory path. This argument species the location of the database dump from which to restore.

31.6.3 Usage
See the backup guide section on database dumps (page 160) for a larger overview of mongorestore usage. Also see the mongodump Manual (page 505) document for an overview of the mongodump, which provides the related inverse functionality. Consider the following example:
mongorestore --collection people --db accounts dump/accounts/

Here, mongorestore reads the database dump in the dump/ sub-directory of the current directory, and restores only the documents in the collection named people from the database named accounts. mongorestore restores data to the instance running on the localhost interface on port 27017. In the next example, mongorestore restores a backup of the database instance located in dump to a database instance stored in the /srv/mongodb on the local machine. This requires that there are no active mongod instances attached to /srv/mongodb data directory.
mongorestore --dbpath /srv/mongodb

510

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

In the nal example, mongorestore restores a database dump located at /opt/backup/mongodump-2011-10-24, from a database running on port 37017 on the host mongodb1.example.net. mongorestore authenticates to the this MongoDB instance using the username user and the password pass, as follows:

mongorestore --host mongodb1.example.net --port 37017 --username user --password pass /opt/backup/mon

31.7 mongoimport Manual


31.7.1 Synopsis
The mongoimport utility provides a route to import content from a JSON, CSV, or TSV export created by mongoexport, or potentially, another third-party export tool. See the Importing and Exporting MongoDB Data (page 153) document for a more in depth usage overview, and the mongoexport Manual (page 514) document for more information regarding the mongoexport utility, which provides the inverse importing capability. Note: Do not use mongoimport and mongoexport for full-scale backups because they may not reliably capture data type information. Use mongodump and mongorestore as described in Backup and Restoration Strategies (page 156) for this kind of functionality.

31.7.2 Options
mongoimport --help Returns a basic help and usage text. --verbose, -v Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.) --version Returns the version of the mongoimport utility. --host <hostname><:port>, -h Species a resolvable hostname for the mongod to which you want to restore the database. By default mongoimport will attempt to connect to a MongoDB process ruining on the localhost port numbered 27017. Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017. To connect to a replica set, use the --host (page 511) argument with a setname, followed by a slash and a comma separated list of host and port names. The mongo utility will, given the seed of at least one connected set member, connect to primary node of that set. this option would resemble:
--host repl0/mongo0.example.net,mongo0.example.net,27018,mongo1.example.net,mongo2.example.net

You can always connect directly to a single MongoDB instance by specifying the host and port number directly. --port <port> Species the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the mongoimport --host (page 511) command. --ipv6 Enables IPv6 support that allows mongoimport to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongoimport, disable IPv6 support by default. 31.7. mongoimport Manual 511

MongoDB Documentation, Release 2.0.6

--username <username>, -u <username> Species a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the mongoimport --password (page 512) option to supply a password. --password <password> Species a password to authenticate to the MongoDB instance. Use in conjunction with the mongoimport --username (page 512) option to supply a username. If you specify a --username (page 512) without the --password (page 512) option, mongoimport will prompt for a password interactively. --dbpath <path> Species the directory of the MongoDB data les. If used, the --dbpath (page 512) option enables mongoimport to attach directly to local data les and insert the data without the mongod. To run with --dbpath, mongoimport needs to lock access to the data directory: as a result, no mongod can access the same path while the process runs. --directoryperdb Use the --directoryperdb (page 512) in conjunction with the corresponding option to mongod, which allows mongoimport to import data into MongoDB instances that have every databases les saved in discrete directories on the disk. This option is only relevant when specifying the --dbpath (page 512) option. --journal Allows mongoexport write to the durability journal to ensure that the data les will remain in a consistent state during the write process. This option is only relevant when specifying the --dbpath (page 512) option. --db <db>, -d <db> Use the --db (page 512) option to specify a database for mongoimport to restore data. If you do not specify a <db>, mongoimport creates new databases that correspond to the databases where data originated and data may be overwritten. Use this option to restore data into a MongoDB instance that already has data, or to restore only some data in the specied backup. --collection <collection>, -c <collection> Use the --collection (page 512) option to specify a collection for mongorestore to restore. If you do not specify a <collection>, mongoimport imports all collections created. Existing data may be overwritten. Use this option to restore data into a MongoDB instance that already has data, or to restore only some data in the specied imported data set. --fields <field1[,filed2]>, -f <field1[,filed2]> Specify a eld or number elds to import from the specied le. All other elds present in the export will be excluded during importation. Comma separate a list of elds to limit the elds imported. --fieldFile <filename> As an alternative to mongoimport --fields (page 512) the --fieldFile (page 512) option allows you to specify a le (e.g. <file>) to hold a list of eld names to specify a list of elds to include in the export. All other elds will be excluded from the export. Place one eld per line. --ignoreBlanks In csv and tsv exports, ignore empty elds. If not specied, mongoimport creates elds without values in imported documents. --type <json|csv|tsv> Declare the type of export format to import. The default format is JSON , but its possible to import csv and tsv les. --file <filename> Specify the location of a le containing the data to import. mongoimport will read data from standard input (e.g. stdin.) if you do not specify a le.

512

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

--drop Modies the importation procedure so that the target instance drops every collection before restoring the collection from the dumped backup. --headerline If using --type csv (page 512) or --type tsv (page 512), use the rst line as eld names. Otherwise, mongoimport will import the rst line as a distinct document. --upsert Modies the import process to update existing objects in the database if they match an imported object, while inserting all other objects. If you do not specify a eld or elds using the --upsertFields (page 513) mongoimport will upsert on the basis of the _id eld. --upsertFields <field1[,field2]> Species a list of elds for the query portion of the upsert. Use this option if the _id elds in the existing documents dont match the eld in the document, but another eld or eld combination can uniquely identify documents as a basis for performing upsert operations. To ensure adequate performance, indexes should exist for this eld or elds. --stopOnError New in version 2.2. Forces mongoimport to halt the import operation at the rst error rather than continuing the operation despite errors. --jsonArray Changed in version 2.2: The limit on document size increased from 4MB to 16MB. Accept import of data expressed with multiple MongoDB document within a single JSON array. Use in conjunction with mongoexport --jsonArray (page 515) to import data written as a single JSON array. Limited to imports of 16 MB or smaller.

31.7.3 Usage
In this example, mongoimport imports the csv formatted data in the /opt/backups/contacts.csv into the collection contacts in the users database on the MongoDB instance running on the localhost port numbered 27017.
mongoimport --db users --collection contacts --type csv --file /opt/backups/contacts.csv

In the following example, mongoimport imports the data in the JSON formatted le contacts.json into the collection contacts on the MongoDB instance running on the localhost port number 27017. Journaling is explicitly enabled.
mongoimport --collection contacts --file contacts.json --journal

In the next example, mongoimport takes data passed to it on standard input (i.e. with a | pipe.) and imports it into the collection contacts in the sales database is the MongoDB datales located at /srv/mongodb/. if the import process encounters an error, the mongoimport will halt because of the --stopOnError (page 513) option.
mongoimport --db sales --collection contacts --stopOnError --dbpath /srv/mongodb/

In the nal example, mongoimport imports data from the le /opt/backups/mdb1-examplenet.json into the collection contacts within the database marketing on a remote MongoDB database. This mongoimport accesses the mongod instance running on the host mongodb1.example.net over port 37017, which requires the username user and the password pass.

31.7. mongoimport Manual

513

MongoDB Documentation, Release 2.0.6

mongoimport --host mongodb1.example.net --port 37017 --username user --password pass --collection con

31.8 mongoexport Manual


31.8.1 Synopsis
mongoexport is a utility that produces a JSON or CSV export of data stored in a MongoDB instance. See the Importing and Exporting MongoDB Data (page 153) document for a more in depth usage overview, and the mongoimport Manual (page 511) document for more information regarding the mongoimport utility, which provides the inverse importing capability. Note: Do not use mongoimport and mongoexport for full-scale backups because they may not reliably capture data type information. Use mongodump and mongorestore as described in Backup and Restoration Strategies (page 156) for this kind of functionality.

31.8.2 Options
mongoexport --help Returns a basic help and usage text. --verbose, -v Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.) --version Returns the version of the mongoexport utility. --host <hostname><:port> Species a resolvable hostname for the mongod from which you want to export data. By default mongoexport attempts to connect to a MongoDB process ruining on the localhost port number 27017. Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017. --port <port> Species the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the mongoexport --host (page 514) command. --ipv6 Enables IPv6 support that allows mongoexport to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongoexport, disable IPv6 support by default. --username <username>, -u <username> Species a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the mongoexport --password (page 514) option to supply a password. --password <password> Species a password to authenticate to the MongoDB instance. Use in conjunction with the --username (page 514) option to supply a username. If you specify a --username (page 514) without the --password (page 514) option, mongoexport will prompt for a password interactively.

514

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

--dbpath <path> Species the directory of the MongoDB data les. If used, the --dbpath option enables mongoexport to attach directly to local data les and insert the data without the mongod. To run with --dbpath, mongoexport needs to lock access to the data directory: as a result, no mongod can access the same path while the process runs. --directoryperdb Use the --directoryperdb (page 515) in conjunction with the corresponding option to mongod, which allows mongoexport to export data into MongoDB instances that have every databases les saved in discrete directories on the disk. This option is only relevant when specifying the --dbpath (page 514) option. --journal Allows mongoexport operations to access the durability journal to ensure that the export is in a consistent state. This option is only relevant when specifying the --dbpath (page 514) option. --db <db>, -d <db> Use the --db (page 515) option to specify the name of the database that contains the collection you want to export. --collection <collection>, -c <collection> Use the --collection (page 515) option to specify the collection that you want mongoexport to export. --fields <field1[,field2]>, -f <field1[,field2]> Specify a eld or number elds to include in the export. All other elds will be excluded from the export. Comma separate a list of elds to limit the elds exported. --fieldFile <file> As an alternative to --fields (page 515) the --fieldFile (page 515) option allows you to specify a le (e.g. <file>) to hold a list of eld names to specify a list of elds to include in the export. All other elds will be excluded from the export. Place one eld per line. --query <JSON> Provides a JSON document as a query that optionally limits the documents returned in the export. --csv Changes the export format to a comma separated values (CSV) format. By default mongoexport writes data using one JSON document for every MongoDB document. --jsonArray Modies the output of mongoexport so that to write the entire contents of the export as a single JSON array. By default mongoexport writes data using one JSON document for every MongoDB document. --slaveOk, -k Allows mongoexport to read data from secondary or slave nodes when using mongoexport with a replica set. This option is only available if connected to a mongod or mongos and is not available when used with the mongoexport --dbpath (page 514) option. This is the default behavior. --out <file>, -o <file> Specify a le to write the export to. If you do not specify a le name, the mongoexport writes data to standard output (e.g. stdout).

31.8.3 Usage
In the following example, mongoexport exports the collection contacts from the users database from the mongod instance running on the localhost port number 27017. This command writes the export data in CSV format into a le located at /opt/backups/contacts.csv.

31.8. mongoexport Manual

515

MongoDB Documentation, Release 2.0.6

mongoexport --db users --collection contacts --csv --out /opt/backups/contacts.csv

The next example creates an export of the collection contacts from the MongoDB instance running on the localhost port number 27017, with journaling explicitly enabled. This writes the export to the contacts.json le in JSON format.
mongoexport --db sales --collection contacts --out contacts.json --journal

The following example exports the collection contacts from the sales database located in the MongoDB data les located at /srv/mongodb/. This operation writes the export to standard output in JSON format.
mongoexport --db sales --collection contacts --dbpath /srv/mongodb/

Warning: The above example will only succeed if there is no mongod connected to the data les located in the /srv/mongodb/ directory. The nal example exports the collection contacts from the database marketing . This data resides on the MongoDB instance located on the host mongodb1.example.net running on port 37017, which requires the username user and the password pass.

mongoexport --host mongodb1.example.net --port 37017 --username user --password pass --collection con

31.9 mongostat Manual


31.9.1 Synopsis
The mongostat utility provides a quick overview of the status of a currently running mongod instance. mongostat is functionally similar to the UNIX/Linux le system utility vmstat, but provides data regarding mongod instances. See Also: For more information about monitoring MongoDB, see Monitoring Database Systems (page 146). For more background on various other MongoDB status outputs see: Server Status Reference (page 537) Replica Status Reference (page 558) Database Statistics Reference (page 551) Collection Statistics Reference (page 552) For an additional utility that provides MongoDB metrics see mongotop (page 520). mongostat connects to the mongod process running on the local host interface on TCP port 27017; however, mongostat can connect to any accessible remote MongoDB process.

31.9.2 Options
mongostat --help Returns a basic help and usage text.

516

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

--verbose, -v Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.) --version Returns the version of the mongostat utility. --host <hostname><:port> Species a resolvable hostname for the mongod from which you want to export data. By default mongostat attempts to connect to a MongoDB process running on the localhost port number 27017. Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017. --port <port> Species the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the mongostat --host (page 517) command. --ipv6 Enables IPv6 support that allows mongostat to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongostat, disable IPv6 support by default. --username <username>, -u <username> Species a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the mongostat --password (page 517) option to supply a password. --password <password> Species a password to authenticate to the MongoDB instance. Use in conjunction with the mongostat --username (page 517) option to supply a username. If you specify a --username (page 517) without the --password (page 517) option, mongostat will prompt for a password interactively. --noheaders Disables the output of column or eld names. --rowcount <number>, -n <number> Controls the number of rows to output. Use in conjunction with mongostat <sleeptime> to control the duration of a mongostat operation. Unless specication, mongostat will return an innite number of rows (e.g. value of 0.) --http Congures mongostat to collect data using HTTP interface rather than a raw database connection. --discover With this option mongostat discovers and reports on statistics from all members of a replica set or shard cluster. When connected to any member of a replica set, --discover (page 517) all non-hidden members of the replica set. When connected to a mongos, mongostat will return data from all shards in the cluster. If a replica set provides a shard in the shard cluster, mongostat will report on non-hidden members of that replica set. The mongostat --host (page 517) option is not required but potentially useful in this case. --all Congures mongostat to return all optional elds (page 518). <sleeptime> The nal argument the length of time, in seconds, that mongostat waits in between calls. By default mongostat returns one call every second. mongostat returns values that reect the operations over a 1 second period. For values of <sleeptime> greater than 1, mongostat averages data to reect average operations per second.

31.9. mongostat Manual

517

MongoDB Documentation, Release 2.0.6

31.9.3 Fields
mongostat returns values that reect the operations over a 1 second period. When mongostat <sleeptime> has a value greater than 1, mongostat averages the statistics to reect average operations per second. mongostat outputs the following elds: inserts The number of objects inserted into the database per second. If followed by an asterisk (e.g. *), the datum refers to a replicated operation. query The number of query operations per second. update The number of update operations per second. delete The number of delete operations per second. getmore The number of get more (i.e. cursor batch) operations per second. command The number of commands per second. On slave and secondary systems, mongostat presents two values separated by a pipe character (e.g. |), in the form of local|replicated commands. flushes The number of fsync operations per second. mapped The total amount of data mapped in megabytes. This is the total data size at the time of the last mongostat call. size The amount of (virtual) memory used by the process at the time of the last mongostat call. res The amount of (resident) memory used by the process at the time of the last mongostat call. faults Changed in version 2.1. The number of page faults per second. Before version 2.1 this value was only provided for MongoDB instances running on Linux hosts. locked The percent of time in a global write lock. Changed in version 2.2: The locked db eld replaces the locked % eld to more appropriate data regarding the database specic locks in version 2.2. locked db New in version 2.2. The percent of time in the per-database context-specic lock. mongostat will report the database that has spent the most time since the last mongostat call with a write lock. This value represents the amount of time the database had a database specic lock and the time that the mongod spent in the global lock. Because of this, and the sampling method, you may see some values greater than 100%. idx miss The percent of index access attempts that required a page fault to load a btree node. This is a sampled value. qr The length of the queue of clients waiting to read data from the MongoDB instance.

518

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

qw The length of the queue of clients waiting to write data from the MongoDB instance. ar The number of active clients performing read operations. aw The number of active clients performing write operations. netIn The amount of network trafc, in bits, received by the MongoDB. This includes trafc from mongostat itself. netOut The amount of network trafc, in bits, sent by the MongoDB. This includes trafc from mongostat itself. conn The total number of open connections. set The name, if applicable, of the replica set. repl The replication status of the node. Value M SEC REC UNK SLV Replication Type master secondary recovering unknown slave

31.9.4 Usage
In the rst example, mongostat will return data every second for 20 seconds. mongostat collects data from the mongod instance running on the localhost interface on port 27017. All of the following invocations produce identical behavior:
mongostat mongostat mongostat mongostat --rowcount 20 1 --rowcount 20 -n 20 1 -n 20

In the next example, mongostat returns data every 5 minutes (or 300 seconds) for as long as the program runs. mongostat collects data from the mongod instance running on the localhost interface on port 27017. Both of the following invocations produce identical behavior.
mongostat --rowcount 0 300 mongostat -n 0 300 mongostat 300

In the following example, mongostat returns data every 5 minutes for an hour (12 times.) mongostat collects data from the mongod instance running on the localhost interface on port 27017. Both of the following invocations produce identical behavior.
mongostat --rowcount 12 300 mongostat -n 12 300

31.9. mongostat Manual

519

MongoDB Documentation, Release 2.0.6

In many cases, using the --discover (page 517) will help provide a more complete snapshot of the state of an entire group of machines. If a mongos process connected to a shard cluster is running on port 27017 of the local machine, you can use the following form to return statistics from all members of the cluster:
mongostat --discover

31.10 mongotop Manual


31.10.1 Synopsis
mongotop provides a method to track the amount of time a MongoDB instance spends reading and writing data. mongotop provides statistics on a per-collection level. By default, mongotop returns values every second. See Also: For more information about monitoring MongoDB, see Monitoring Database Systems (page 146). For additional background on various other MongoDB status outputs see: Server Status Reference (page 537) Replica Status Reference (page 558) Database Statistics Reference (page 551) Collection Statistics Reference (page 552) For an additional utility that provides MongoDB metrics see mongostat (page 516).

31.10.2 Options
mongotop --help Returns a basic help and usage text. --verbose, -v Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.) --version Print the version of the mongotop utility and exit. --host <hostname><:port> Species a resolvable hostname for the mongod from which you want to export data. By default mongotop attempts to connect to a MongoDB process running on the localhost port number 27017. Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017. --port <port> Species the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the mongotop --host (page 520) command. --ipv6 Enables IPv6 support that allows mongotop to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongotop, disable IPv6 support by default.

520

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

--username <username>, -u <username> Species a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the mongotop (page 521) option to supply a password. --password <password> Species a password to authenticate to the MongoDB instance. Use in conjunction with the --username (page 520) option to supply a username. If you specify a --username (page 520) without the --password (page 521) option, mongotop will prompt for a password interactively. --locks New in version 2.2. Toggles the mode of mongotop to report on use of per-database locks (page 538). These data are useful for measuring concurrent operations and lock percentage. <sleeptime> The nal argument is the length of time, in seconds, that mongotop waits in between calls. By default mongotop returns data every second.

31.10.3 Fields
mongotop returns time values specied in milliseconds (ms.) mongotop only reports active namespaces or databases, depending on the --locks (page 521) option. If you dont see a database or collection, it has received no recent activity. You can issue a simple operation in the mongo shell to generate activity to affect the output of mongotop. ns Contains the database namespace, which combines the database name and collection. Changed in version 2.2: If you use the --locks (page 521), the ns (page 521) eld does not appear in the mongotop output. db New in version 2.2. Contains the name of the database. The database named . refers to the global lock, rather than a specic database. This eld does not appear unless you have invoked mongotop with the --locks (page 521) option. total Provides the total amount of time that this mongod spent operating on this namespace. read Provides the amount of time that this mongod spent performing read operations on this namespace. write Provides the amount of time that this mongod spent performing write operations on this namespace. <timestamp> Provides a time stamp for the returned data.

31.10.4 Use
By default mongotop connects to the MongoDB instance running on the localhost port 27017. However, mongotop can optionally connect to remote mongod instances. See the mongotop options (page 520) for more information. To force mongotop to return less frequently specify a number, in seconds at the end of the command. In this example, mongotop will return every 15 seconds.

31.10. mongotop Manual

521

MongoDB Documentation, Release 2.0.6

mongotop 15

This command produces the following output:


connected to: 127.0.0.1 ns test.system.namespaces local.system.replset local.system.indexes admin.system.indexes admin. ns test.system.namespaces local.system.replset local.system.indexes admin.system.indexes admin. total 0ms 0ms 0ms 0ms 0ms total 0ms 0ms 0ms 0ms 0ms read 0ms 0ms 0ms 0ms 0ms read 0ms 0ms 0ms 0ms 0ms write 0ms 0ms 0ms 0ms 0ms write 0ms 0ms 0ms 0ms 0ms 2012-08-13T15:45:40

2012-08-13T15:45:55

To return return a mongotop report every 5 minutes, use the following command:
mongotop 300

To report the use of per-database locks, use mongotop --locks (page 521), which produces the following output:
$ mongotop --locks connected to: 127.0.0.1 db local admin . total 0ms 0ms 0ms read 0ms 0ms 0ms write 0ms 0ms 0ms 2012-08-13T16:33:34

31.11 mongooplog Manual


New in version 2.1.1.

31.11.1 Synopsis
mongooplog is a simple tool that polls operations from the replication oplog of a remote server, and applies them to the local server. This capability supports certain classes of real-time migrations that require that the source server remain online and in operation throughout the migration process. Typically this command will take the following form:
mongooplog --from mongodb0.example.net --host mongodb1.example.net

This command copies oplog entries from the mongod instance running on the host mongodb0.example.net and duplicates operations to the host mongodb1.example.net. If you do not need to keep the --from (page 524) host running during the migration, consider using mongodump and mongorestore or another backup (page 156) operation, which may be better suited to your operation. Note: If the mongod instance specied by the --from (page 524) argument is running with authentication (page 497), then mongooplog will not be able to copy oplog entries.

522

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

See Also: mongodump, mongorestore, Backup and Restoration Strategies (page 156), Oplog Internals Overview (page 57), and Replica Set Oplog Sizing (page 36).

31.11.2 Options
mongooplog --help Returns a basic help and usage text. --verbose, -v Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.) --version Returns the version of the mongooplog utility. --host <hostname><:port>, -h Species a resolvable hostname for the mongod instance to which mongooplog will apply oplog operations retrieved from the serve specied by the --from (page 524) option. mongooplog assumes that all target mongod instances are accessible by way of port 27017. You may, optionally, declare an alternate port number as part of the hostname argument. You can always connect directly to a single mongod instance by specifying the host and port number directly. --port Species the port number of the mongod instance where mongooplog will apply oplog entries. Only specify this option if the MongoDB instance that you wish to connect to is not running on the standard port. (i.e. 27017) You may also specify a port number using the --host (page 523) command. --ipv6 Enables IPv6 support that allows mongooplog to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongooplog, disable IPv6 support by default. --username <username>, -u <username> Species a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the --password (page 523) option to supply a password. --password <password>, -p <password> Species a password to authenticate to the MongoDB instance. Use in conjunction with the --username (page 523) option to supply a username. If you specify a --username (page 523) without the --password (page 523) option, mongooplog will prompt for a password interactively. --dbpath <path> Species a directory, containing MongoDB data les, to which mongooplog will apply operations from the oplog of the database specied with the --from (page 524) option. When used, the --dbpath (page 523) option enables mongo to attach directly to local data les and write data without a running mongod instance. To run with --dbpath (page 523), mongooplog needs to restrict access to the data directory: as a result, no mongod can be access the same path while the process runs. --directoryperdb Use the --directoryperdb (page 523) in conjunction with the corresponding option to mongod. This option allows mongooplog to write to data les organized with each database located in a distinct directory. This option is only relevant when specifying the --dbpath (page 523) option. 31.11. mongooplog Manual 523

MongoDB Documentation, Release 2.0.6

--journal Allows mongooplog operations to use the durability journal to ensure that the data les will remain in a consistent state during the writing process. This option is only relevant when specifying the --dbpath (page 523) option. --db <db>, -d <db> Use the --db (page 524) option to specify a database for mongooplog to write data to. If you do not specify a DB, mongooplog will apply operations that apply to all databases that appear in the oplog. Use this option to migrate a smaller subset of your data. --collection <collection>, -c <c> Use the --collection (page 524) option to specify a collection for mongooplog to write data to. If you do not specify a collection, mongooplog will apply operations that apply to all collections that appear in the oplog in the specied database. Use this option to migrate a smaller subset of your data. --fields [field1[,field2]], -f [field1[,field2]] Specify a eld or number elds to constrain which data mongooplog will migrate. All other elds will be excluded from the migration. Comma separate a list of elds to limit the applied elds. --fieldFile <file> As an alternative to --fields (page 524) the --fieldFile (page 524) option allows you to specify a le (e.g. <file>) that holds a list of eld names to include in the migration. All other elds will be excluded from the migration. Place one eld per line. --seconds <number>, -s <number> Specify a number of seconds of operations for mongooplog to pull from the remote host (page 524). Unless specied the default value is 86400 seconds, or 24 hours. --from <host[:port]> Specify the host for mongooplog to retrieve oplog operations from. mongooplog requires this option. Unless you specify the --host (page 523) option, mongooplog will apply the operations collected with this option to the oplog of the mongod instance running on the localhost interface connected to port 27017. --oplogns <namespace> Specify a namespace in the --from (page 524) host where the oplog resides. The default value is local.oplog.rs, which is the where replica set members store their operation log. However, if youve copied oplog entries into another database or collection, use this option to copy oplog entries stored in another location. Namespaces take the form of [database].[collection]. Usage Consider the following prototype mongooplog command:
mongooplog --from mongodb0.example.net --host mongodb1.example.net

Here, entries from the oplog of the mongod running on port 27017. This only pull entries from the last 24 hours. In the next command, the parameters limit this operation to only apply operations to the database people in the collection usage on the target host (i.e. mongodb1.example.net):
mongooplog

--from mongodb0.example.net --host mongodb1.example.net --database people --collection us

This operation only applies oplog entries from the last 24 hours. Use the --seconds (page 524) argument to capture a greater or smaller amount of time. Consider the following example:
mongooplog --from mongodb0.example.net --seconds 172800

524

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

In this operation, mongooplog captures 2 full days of operations. To migrate 12 hours of oplog entries, use the following form:
mongooplog --from mongodb0.example.net --seconds 43200

For the previous two examples, mongooplog migrates entries to the mongod process running on the localhost interface connected to the 27017 port. mongooplog can also operate directly on MongoDBs data les if no mongod is running on the target host. Consider the following example:
mongooplog --from mongodb0.example.net --dbpath /srv/mongodb --journal

Here, mongooplog imports oplog operations from the mongod host connected to port 27017. This migrates operations to the MongoDB data les stored in the /srv/mongodb directory. Additionally mongooplog will use the durability journal to ensure that the data les remain in a consistent state.

31.12 mongosniff Manual


31.12.1 Synopsis
mongosniff provides a low-level operation tracing/snifng view into database activity in real time. Think of mongosniff as a MongoDB-specic analogue of tcpdump for TCP/IP network trafc. Typically, mongosniff is most frequently used in driver development. Note: mongosniff requires libcap and is only available for Unix-like systems. The Wireshark network snifng tool is capable of inspecting and parsing the MongoDB wire protocol.

31.12.2 Options
mongosniff --help Returns a basic help and usage text. --forward <host>:<port> Declares a host to forward all parsed requests that the mongosniff intercepts to another mongod instance and issue those operations on that database instance. Specify the target host name and port in the <host>:<port> format. --source <NET [interface]>, <FILE [filename]>, <DIAGLOG [filename]> Species source material to inspect. Use --source NET [interface] to inspect trafc from a network interface (e.g. eth0 or lo.) Use --source FILE [filename] to read captured packets in pcap format. You may use the --source DIAGLOG [filename] option to read the output les produced by the --diaglog (page 487) option. --objcheck Modies the behavior to only display invalid BSON objects and nothing else. Use this option for troubleshooting driver development. <port> Species alternate ports to sniff for trafc. By default, mongosniff watches for MongoDB trafc on port 27017. Append multiple port numbers to the end of mongosniff to monitor trafc on multiple ports.

31.12. mongosniff Manual

525

MongoDB Documentation, Release 2.0.6

31.12.3 Usage
Use the following command to connect to a mongod or mongos running on port 27017 and 27018 on the localhost interface:
mongosniff --source NET lo 27017 27018

Use the following command to only log invalid BSON objects for the mongod or mongos running on the localhost interface and port 27018, for driver development and troubleshooting:
mongosniff --objcheck --source NET lo 27018

31.13 mongofiles Manual


31.13.1 Synopsis
The mongofiles utility makes it possible to manipulate les stored in your MongoDB instance in GridFS objects from the command line. It is particularly useful as it provides an interface between objects stored in your le system and GridFS. All mongofiles commands take arguments in three groups: 1. Options (page 527). You may use one or more of these options to control the behavior of mongofiles. 2. Commands (page 526). Use one of these commands to determine the action of mongofiles. 3. A le name representing either the name of a le on your systems le system, a GridFS object. Like mongodump, mongoexport, mongoimport, and mongorestore mongofiles can access data stored in a MongoDB data directory without requiring a running mongod instance, if no other mongod is running. Note: For replica sets, mongofiles can only read from the sets primary.

31.13.2 Commands
mongofiles list <prefix> Lists the les in the GridFS store. The characters specied after list (e.g. <prefix>) optionally limit the list of returned items to les that begin with that string of characters. search <string> Lists the les in the GridFS store with names that match any portion of <string>. put <filename> Copy the specied le from the local le system into GridFS storage. Here, <filename> refers to the name the object will have in GridFS, and mongofiles assumes that this reects the name the le has on the local le system. If the local lename is different use the mongofiles --local (page 528) option. get <filename> Copy the specied le from GridFS storage to the local le system.

526

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

Here, <filename> refers to the name the object will have in GridFS, and mongofiles assumes that this reects the name the le has on the local le system. If the local lename is different use the mongofiles --local (page 528) option. delete <filename> Delete the specied le from GridFS storage.

31.13.3 Options
--help Returns a basic help and usage text. --verbose, -v Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.) --version Returns the version of the mongofiles utility. --host <hostname><:port> Species a resolvable hostname for the mongod that holds your GridFS system. By default mongofiles attempts to connect to a MongoDB process ruining on the localhost port number 27017. Optionally, specify a port number to connect a MongoDB instance running on a port other than 27017. --port <port> Species the port number, if the MongoDB instance is not running on the standard port. (i.e. 27017) You may also specify a port number using the mongofiles --host (page 527) command. --ipv6 Enables IPv6 support that allows mongofiles to connect to the MongoDB instance using an IPv6 network. All MongoDB programs and processes, including mongofiles, disable IPv6 support by default. --username <username>, -u <username> Species a username to authenticate to the MongoDB instance, if your database requires authentication. Use in conjunction with the mongofiles --password (page 527) option to supply a password. --password <password> Species a password to authenticate to the MongoDB instance. Use in conjunction with the mongofiles --username (page 527) option to supply a username. If you specify a --username (page 527) without the --password (page 527) option, mongofiles will prompt for a password interactively. --dbpath <path> Species the directory of the MongoDB data les. If used, the --dbpath (page 527) option enables mongofiles to attach directly to local data les interact with the GridFS data without the mongod. To run with --dbpath (page 527), mongofiles needs to lock access to the data directory: as a result, no mongod can access the same path while the process runs. --directoryperdb Use the --directoryperdb (page 527) in conjunction with the corresponding option to mongod, which allows mongofiles when running with the --dbpath (page 527) option and MongoDB uses an on-disk format where every database has a distinct directory. This option is only relevant when specifying the --dbpath (page 527) option. --journal Allows mongofiles operations to use the durability journal when running with --dbpath (page 527) to ensure that the database maintains a recoverable state. This forces mongofiles to record all data on disk regularly. 31.13. mongofiles Manual 527

MongoDB Documentation, Release 2.0.6

--db <db>, -d <db> Use the --db (page 528) option to specify the MongoDB database that stores or will store the GridFS les. --collection <collection>, -c <collection> This option has no use in this context and a future release may remove it. See SERVER-4931 for more information. --local <filename>, -l <filename> Species the local lesystem name of a le for get and put operations. In the mongoles put and mongoles get commands the required <filename> modier refers to the name the object will have in GridFS. mongofiles assumes that this reects the les name on the local le system. This setting overrides this default. --type <MIME>, t <MIME> Provides the ability to specify a MIME type to describe the le inserted into GridFS storage. mongofiles omits this option in the default operation. Use only with mongoles put operations. --replace, -r Alters the behavior of mongoles put to replace existing GridFS objects with the specied local le, rather than adding an additional object with the same name. In the default operation, les will not be overwritten by a mongoles put option.

31.14 bsondump Manual


31.14.1 Synopsis
The bsondump converts BSON les into human-readable formats, including JSON . For example, bsondump is useful for reading the output les generated by mongodump.

31.14.2 Options
bsondump --help Returns a basic help and usage text. --verbose, -v Increases the amount of internal reporting returned on the command line. Increase the verbosity with the -v form by including the option multiple times, (e.g. -vvvvv.) --version Returns the version of the bsondump utility. --objcheck Validates each BSON object before outputting it in JSON format. Use this option to lter corrupt objects from the output. --filter <JSON> Limits the documents that bsondump exports to only those documents that match the JSON document specied as <JSON>. Be sure to include the document in single quotes to avoid interaction with your systems shell environment. --type <=json|=debug> Changes the operation of bsondump from outputting JSON (the default) to a debugging format. 528 Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

<bsonfilename> The nal argument to bsondump is a document containing BSON . This data is typically generated by mongodump or by MongoDB in a rollback operation.

31.14.3 Usage
By default, bsondump outputs data to standard output. To create corresponding JSON les, you will need to use the shell redirect. See the following command:
bsondump collection.bson > collection.json

Use the following command (at the system shell) to produce debugging output for a BSON le:
bsondump --type=debug collection.bson

31.15 mongod.exe Manual


31.15.1 Synopsis
mongod.exe is the build of the MongoDB daemon (i.e. mongod) for the Windows platform. mongod.exe has all of the features of mongod on Unix-like platforms and is completely compatible with the other builds of mongod. In addition, mongod.exe provides several options for interacting with the Windows platform itself. This document only references options that are unique to mongod.exe. All mongod options are available. See the mongod Manual (page 485) and the Conguration File Options (page 494) documents for more information regarding mongod.exe. To install and use mongod.exe, read the Install MongoDB on Windows (page 22) document.

31.15.2 Options
--install Installs mongod.exe as a Windows Service and exits. --remove Removes the mongod.exe Windows Service. If mongod.exe is running, this operation will stop and then remove the service. Note: --remove (page 529) requires the --serviceName (page 529) if you congured a non-default --serviceName (page 529) during the --install (page 529) operation. --reinstall Removes mongod.exe and reinstalls mongod.exe as a Windows Service. --serviceName <name> Default: MongoDB Set the service name of mongod.exe when running as a Windows Service. Use this name with the net start <name> and net stop <name> operations. You must use --serviceName (page 529) in conjunction with either the --install (page 529) or --remove (page 529) install option.

31.15. mongod.exe Manual

529

MongoDB Documentation, Release 2.0.6

--serviceDisplayName <name> Default: Mongo DB Sets the name listed for MongoDB on the Services administrative application. --serviceDescription <description> Default: MongoDB Server Sets the mongod.exe service description. You must use --serviceDescription (page 530) in conjunction with the --install (page 529) option. Note: For descriptions that contain spaces, you must enclose the description in quotes. --serviceUser <user> Runs the mongod.exe service in the context of a certain user. This user must have Log on as a service privileges. You must use --serviceUser (page 530) in conjunction with the --install (page 529) option. --servicePassword <password> Sets the password for <user> for mongod.exe when running with the --serviceUser (page 530) option. You must use --servicePassword (page 530) in conjunction with the --install (page 529) option.

31.16 mongos.exe Manual


31.16.1 Synopsis
mongos.exe is the build of the MongoDB Shard (i.e. mongos) for the Windows platform. mongos.exe has all of the features of mongos on Unix-like platforms and is completely compatible with the other builds of mongos. In addition, mongos.exe provides several options for interacting with the Windows platform itself. This document only references options that are unique to mongos.exe. All mongos options are available. See the mongos Manual (page 491) and the Conguration File Options (page 494) documents for more information regarding mongos.exe. To install and use mongos.exe, read the Install MongoDB on Windows (page 22) document.

31.16.2 Options
--install Installs mongos.exe as a Windows Service and exits. --remove Removes the mongos.exe Windows Service. If mongos.exe is running, this operation will stop and then remove the service. Note: --remove (page 530) requires the --serviceName (page 530) if you congured a non-default --serviceName (page 530) during the --install (page 530) operation. --reinstall Removes mongos.exe and reinstalls mongos.exe as a Windows Service.

530

Chapter 31. Manual Pages

MongoDB Documentation, Release 2.0.6

--serviceName <name> Default: MongoS Set the service name of mongos.exe when running as a Windows Service. Use this name with the net start <name> and net stop <name> operations. You must use --serviceName (page 530) in conjunction with either the --install (page 530) or --remove (page 530) install option. --serviceDisplayName <name> Default: Mongo DB Router Sets the name listed for MongoDB on the Services administrative application. --serviceDescription <description> Default: Mongo DB Sharding Router Sets the mongos.exe service description. You must use --serviceDescription (page 531) in conjunction with the --install (page 530) option. Note: For descriptions that contain spaces, you must enclose the description in quotes. --serviceUser <user> Runs the mongos.exe service in the context of a certain user. This user must have Log on as a service privileges. You must use --serviceUser (page 531) in conjunction with the --install (page 530) option. --servicePassword <password> Sets the password for <user> for mongos.exe when running with the --serviceUser (page 531) option. You must use --servicePassword (page 531) in conjunction with the --install (page 530) option.

31.16. mongos.exe Manual

531

MongoDB Documentation, Release 2.0.6

532

Chapter 31. Manual Pages

CHAPTER

THIRTYTWO

STATUS AND REPORTING


32.1 Server Status Output Index
This document provides a quick overview and example of the the serverStatus command. The helper db.serverStatus() (page 471) in the mongo shell provides access to this output. For full documentation of the content of this output, see Server Status Reference (page 537). Note: The elds included in this output vary slightly depending on the version of MongoDB, underlaying operating system platform, and the kind of node, including mongos, mongod or replica set member. The Instance Information (page 537) section displays information regarding the specic mongod and mongos and its state.
{ "host" : "<hostname>", "version" : "<version>", "process" : "<mongod|mongos>", "pid" : <num>, "uptime" : <num>, "uptimeMillis" : <num>, "uptimeEstimate" : <num>, "localTime" : ISODate(""),

The locks (page 538) section reports data that reect the state and use of both global (i.e. .) and database specic locks:
"locks" : { "." : { "timeLockedMicros" : { "R" : <num>, "W" : <num> }, "timeAcquiringMicros" : { "R" : <num>, "W" : <num> } }, "admin" : { "timeLockedMicros" : { "r" : <num>, "w" : <num> },

533

MongoDB Documentation, Release 2.0.6

"timeAcquiringMicros" : { "r" : <num>, "w" : <num> } }, "local" : { "timeLockedMicros" : { "r" : <num>, "w" : <num> }, "timeAcquiringMicros" : { "r" : <num>, "w" : <num> } }, "<database>" : { "timeLockedMicros" : { "r" : <num>, "w" : <num> }, "timeAcquiringMicros" : { "r" : <num>, "w" : <num> } } },

The globalLock (page 540) eld reports on MongoDBs global system lock. In most cases the locks (page 538) document provides more ne grained data that reects lock use:
"globalLock" : { "totalTime" : <num>, "lockTime" : <num>, "currentQueue" : { "total" : <num>, "readers" : <num>, "writers" : <num> }, "activeClients" : { "total" : <num>, "readers" : <num>, "writers" : <num> } },

The mem (page 542) eld reports on MongoDBs current memory use:
"mem" : { "bits" : <num>, "resident" : <num>, "virtual" : <num>, "supported" : <boolean>, "mapped" : <num>, "mappedWithJournal" : <num> },

The connections (page 542) eld reports on MongoDBs current memory use by the MongoDB process:

534

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

"connections" : { "current" : <num>, "available" : <num> },

The elds in the extra_info (page 543) document provide platform specic information. The following example block is from a Linux-based system:
"extra_info" : { "note" : "fields vary by platform", "heap_usage_bytes" : <num>, "page_faults" : <num> },

The indexCounters (page 543) document reports on index use:


"indexCounters" : { "btree" : { "accesses" : <num>, "hits" : <num>, "misses" : <num>, "resets" : <num>, "missRatio" : <num> } },

The backgroundFlushing (page 544) document reports on the process MongoDB uses to write data to disk:
"backgroundFlushing" : { "flushes" : <num>, "total_ms" : <num>, "average_ms" : <num>, "last_ms" : <num>, "last_finished" : ISODate("") },

The cursors (page 545) document reports on current cursor use and state:
"cursors" : { "totalOpen" : <num>, "clientCursors_size" : <num>, "timedOut" : <num> },

The network (page 545) document reports on network use and state:
"network" : { "bytesIn" : <num>, "bytesOut" : <num>, "numRequests" : <num> },

The repl (page 546) document reports on the state of replication and the replica set. This document only appears for replica sets.
"repl" : { "setName" : "<string>", "ismaster" : <boolean>, "secondary" : <boolean>, "hosts" : [

32.1. Server Status Output Index

535

MongoDB Documentation, Release 2.0.6

<hostname>, <hostname>, <hostname> ], "primary" : <hostname>, "me" : <hostname> },

The opcountersRepl (page 546) document reports the number of replicated operations:
"opcountersRepl" : { "insert" : <num>, "query" : <num>, "update" : <num>, "delete" : <num>, "getmore" : <num>, "command" : <num> },

The replNetworkQueue (page 547) document holds information regarding the queue that secondaries use to poll data from other members of their set:
"replNetworkQueue" : "waitTimeMs" "numElems" : "numBytes" : }, { : <num>, <num>, <num>

The opcounters (page 547) document reports the number of operations this MongoDB instance has processed:
"opcounters" : { "insert" : <num>, "query" : <num>, "update" : <num>, "delete" : <num>, "getmore" : <num>, "command" : <num> },

The asserts (page 548) document reports the number of assertions or errors produced by the server:
"asserts" : { "regular" : <num>, "warning" : <num>, "msg" : <num>, "user" : <num>, "rollovers" : <num> },

The writeBacksQueued (page 549) document reports the number of writebacks:


"writeBacksQueued" : <num>,

The dur (page 549) document reports data that reect this mongod journaling related operations and performance:
"dur" : { "commits" : <num>, "journaledMB" : <num>, "writeToDataFilesMB" : <num>, "compression" : <num>,

536

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

"commitsInWriteLock" : <num>, "earlyCommits" : <num>, "timeMs" : { "dt" : <num>, "prepLogBuffer" : <num>, "writeToJournal" : <num>, "writeToDataFiles" : <num>, "remapPrivateView" : <num> } },

The recordStats (page 550) document reports data on MongoDBs ability to predict page faults and yield write operations when required data isnt in memory:
"recordStats" : { "accessesNotInMemory" : <num>, "pageFaultExceptionsThrown" : <num>, "local" : { "accessesNotInMemory" : <num>, "pageFaultExceptionsThrown" : <num> }, "<database>" : { "accessesNotInMemory" : <num>, "pageFaultExceptionsThrown" : <num> } },

The nal ok eld holds the return status for the serverStatus command:
"ok" : 1 }

32.2 Server Status Reference


The serverStatus returns a collection of information that reects the databases status. These data are useful for diagnosing and assessing the performance of your MongoDB instance. This reference catalogs each datum included in the output of this command and provides context for using this data to more effectively administer your database. See Also: Much of the output of serverStatus is also displayed dynamically by mongostat. See the mongostat Manual (page 516) for more information. For examples of the serverStatus output, see Server Status Output Index (page 533).

32.2.1 Instance Information


Example output of the instance information elds (page 533). host The host (page 537) eld contains the systems hostname. In Unix/Linux system, this should be the same as the output of the hostname command.

32.2. Server Status Reference

537

MongoDB Documentation, Release 2.0.6

version The version (page 537) eld contains the version of MongoDB running on the current mongod or mongos instance. process The process (page 538) eld identies which kind of MongoDB instance is running. Possible values are: mongos mongod uptime The value of the uptime (page 538) eld corresponds to the number of seconds that the mongos or mongod process has been active. uptimeEstimate uptimeEstimate (page 538) provides the uptime as calculated from MongoDBs internal course-grained time keeping system. localTime The localTime (page 538) value is the current time, according to the server, in UTC specied in an ISODate format.

32.2.2 locks
New in version 2.1.2: All locks (page 569) statuses rst appeared in the 2.1.2 development release for the 2.2 series. Example output of the locks elds (page 533). locks The locks (page 569) document contains sub-documents that provides a granular report on MongoDB database-level lock use. All values are of the NumberLong() type. Generally, elds named: R refer to the global read lock, W refer to the global write lock, r refer to the database specic read lock, and w refer to the database specic write lock. If a document does not have any elds, it means that no locks have existed with this context since the last time the mongod started. locks.. A eld named . holds the rst document in locks (page 569) that contains information about the global lock as well as aggregated data regarding lock use in all databases. locks...timeLockedMicros The locks...timeLockedMicros (page 538) document reports the amount of time in microseconds that a lock has existed in all databases in this mongod instance. locks...timeLockedMicros.R The R eld reports the amount of time in microseconds that any database has held the global read lock.

538

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

locks...timeLockedMicros.W The W eld reports the amount of time in microseconds that any database databases has held the global write lock. locks...timeLockedMicros.r The r eld reports the amount of time in microseconds that any database databases has held the local read lock. locks...timeLockedMicros.w The w eld reports the amount of time in microseconds that any database databases has held the local write lock. locks...timeAcquiringMicros The locks...timeAcquiringMicros (page 539) document reports the amount of time in microseconds that operations have spent waiting to acquire a lock in all database in this mongod instance. locks...timeAcquiringMicros.R The R eld reports the amount of time in microseconds that any database databases has spent waiting for the global read lock. locks...timeAcquiringMicros.W The W eld reports the amount of time in microseconds that any database databases has spent waiting for the global write lock. locks.admin The locks.admin (page 539) document contains two sub-documents that reports data regarding lock use in the admin database. locks.admin.timeLockedMicros The locks.admin.timeLockedMicros (page 539) document reports the amount of time in microseconds that locks have existed in the context of the admin database. locks.admin.timeLockedMicros.r The r eld reports the amount of time in microseconds that the admin database has held the read lock. locks.admin.timeLockedMicros.w The w eld reports the amount of time in microseconds that the admin database has held the write lock. locks.admin.timeAcquiringMicros The locks.admin.timeAcquiringMicros (page 539) document reports on the amount of eld time in microseconds that operations have spent waiting to acquire a lock for the admin database. locks.admin.timeAcquiringMicros.r The r eld reports the amount of time in microseconds that operations have spent waiting a read lock on the admin database. locks.admin.timeAcquiringMicros.w The w eld reports the amount of time in microseconds that operations have spent waiting a write lock on the admin database. locks.local The locks.local (page 539) document contains two sub-documents that reports data regarding lock use in the local database. The local database contains a number of instance specic data, including the oplog for replication. locks.local.timeLockedMicros The locks.local.timeLockedMicros (page 539) document reports on the amount of time in microseconds that locks have existed in the context of the local database. locks.local.timeLockedMicros.r The r eld reports the amount of time in microseconds that the local database has held the read lock. locks.local.timeLockedMicros.w The w eld reports the amount of time in microseconds that the local database has held the write lock.

32.2. Server Status Reference

539

MongoDB Documentation, Release 2.0.6

locks.local.timeAcquiringMicros The locks.local.timeAcquiringMicros (page 539) document reports on the amount of time in microseconds that operations have spent waiting to acquire a lock for the local database. locks.local.timeAcquiringMicros.r The r eld reports the amount of time in microseconds that operations have spent waiting a read lock on the local database. locks.local.timeAcquiringMicros.w The w eld reports the amount of time in microseconds that operations have spent waiting a write lock on the local database. locks.<database> For each additional database locks (page 569) includes a document that reports on the lock use for this database. The names of these documents reect the database name itself. locks.<database>.timeLockedMicros The locks.<database>.timeLockedMicros (page 540) document reports on the amount of time in microseconds that locks have existed in the context of the <database> database. locks.<database>.timeLockedMicros.r The r eld reports the amount of time in microseconds that the <database> database has held the read lock. locks.<database>.timeLockedMicros.w The w eld reports the amount of time in microseconds that the <database> database has held the write lock. locks.<database>.timeAcquiringMicros The locks.<database>.timeAcquiringMicros (page 540) document reports on the amount of time in microseconds that operations have spent waiting to acquire a lock for the <database> database. locks.<database>.timeAcquiringMicros.r The r eld reports the amount of time in microseconds that operations have spent waiting a read lock on the <database> database. locks.<database>.timeAcquiringMicros.w The w eld reports the amount of time in microseconds that operations have spent waiting a write lock on the <database> database.

32.2.3 globalLock
Example output of the globalLock elds (page 534). globalLock The globalLock (page 540) data structure contains information regarding the databases current lock state, historical lock status, current operation queue, and the number of active clients. globalLock.totalTime The value of globalLock.totalTime (page 540) represents the time, in microseconds, since the database last started and creation of the globalLock (page 540). This is roughly equivalent to total server uptime. globalLock.lockTime The value of globalLock.lockTime (page 540) represents the time, in microseconds, since the database last started, that the globalLock (page 540) has been held. Consider this value in combination with the value of globalLock.totalTime (page 540). MongoDB aggregates these values in the globalLock.ratio (page 541) value. If the globalLock.ratio (page 541)

540

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

value is small but globalLock.totalTime (page 540) is high the globalLock (page 540) has typically been held frequently for shorter periods of time, which may be indicative of a more normal use pattern. If the globalLock.lockTime (page 540) is higher and the globalLock.totalTime (page 540) is smaller (relatively,) then fewer operations are responsible for a greater portion of servers use (relatively.) globalLock.ratio Changed in version 2.2: globalLock.ratio (page 541) was removed. See locks (page 569). The value of globalLock.ratio (page 541) displays the relationship between globalLock.lockTime (page 540) and globalLock.totalTime (page 540). Low values indicate that operations have held the globalLock (page 540) frequently for shorter periods of time. High values indicate that operations have held globalLock (page 540) infrequently for longer periods of time. globalLock.currentQueue globalLock.currentQueue The globalLock.currentQueue (page 541) data structure value provides more granular information concerning the number of operations queued because of a lock. globalLock.currentQueue.total The value of globalLock.currentQueue.total (page 541) provides a combined total of operations queued waiting for the lock. A consistently small queue, particularly of shorter operations should cause no concern. Also, consider this value in light of the size of queue waiting for the read lock (e.g. globalLock.currentQueue.readers (page 541)) and write-lock (e.g. globalLock.currentQueue.readers (page 541)) individually. globalLock.currentQueue.readers The value of globalLock.currentQueue.readers (page 541) is the number of operations that are currently queued and waiting for the read-lock. A consistently small write-queue, particularly of shorter operations should cause no concern. globalLock.currentQueue.writers The value of globalLock.currentQueue.writers (page 541) is the number of operations that are currently queued and waiting for the write-lock. A consistently small write-queue, particularly of shorter operations is no cause for concern. globalLock.activeClients globalLock.activeClients The globalLock.activeClients (page 541) data structure provides more granular information about the number of connected clients and the operation types (e.g. read or write) performed by these clients. Use this data to provide context for the currentQueue (page 541) data. globalLock.activeClients.total The value of globalLock.activeClients.total (page 541) is the total number of active client connections to the database. This combines clients that are performing read operations (e.g. globalLock.activeClients.readers (page 541)) and clients that are performing write operations (e.g. globalLock.activeClients.writers (page 541)). globalLock.activeClients.readers The value of globalLock.activeClients.readers (page 541) contains a count of the active client connections performing read operations.

32.2. Server Status Reference

541

MongoDB Documentation, Release 2.0.6

globalLock.activeClients.writers The value of globalLock.activeClients.writers (page 541) contains a count of active client connections performing write operations.

32.2.4 mem
Example output of the memory elds (page 534). mem The mem data structure holds information regarding the target system architecture of mongod and current memory use. mem.bits The value of mem.bits (page 542) is either 64 or 32, depending on which target architecture specied during the mongod compilation process. In most instances this is 64, and this value does not change over time. mem.resident The value of mem.resident (page 542) is roughly equivalent to the amount of RAM, in bytes, currently used by the database process. In normal use this value tends to grow. In dedicated database servers this number tends to approach the total amount of system memory. mem.virtual mem.virtual (page 542) displays the quantity, in bytes, of virtual memory used by the mongod process. In typical deployments this value is slightly larger than mem.mapped (page 542). If this value is signicantly (i.e. gigabytes) larger than mem.mapped (page 542), this could indicate a memory leak. With journaling enabled, the value of mem.virtual (page 542) is twice the value of mem.mapped (page 542). mem.supported mem.supported (page 542) is true when the underlying system supports extended memory information. If this value is false and the system does not support extended memory information, then other mem values may not be accessible to the database server. mem.mapped The value of mem.mapped (page 542) provides the amount of mapped memory by the database. Because MongoDB uses memory-mapped les, this value is likely to be to be roughly equivalent to the total size of your database or databases.

32.2.5 connections
Example output of the connections elds (page 534). connections The connections sub document data regarding the current connection status and availability of the database server. Use these values to asses the current load and capacity requirements of the server. connections.current The value of connections.current (page 542) corresponds to the number of connections to the

542

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

database server from clients. This number includes the current shell session. connections.available (page 543) to add more context to this datum.

Consider the value of

This gure will include the current shell connection as well as any inter-node connections to support a replica set or shard cluster. connections.available connections.available (page 543) provides a count of the number of unused available connections that the database can provide. Consider this value in combination with the value of connections.current (page 542) to understand the connection load on the database.

32.2.6 extra_info
Example output of the extra_info elds (page 535). extra_info The extra_info (page 543) data structure holds data collected by the mongod instance about the underlying system. Your system may only report a subset of these elds. extra_info.note The eld extra_info.note (page 543) reports that the data in this structure depend on the underlying platform, and has the text: elds vary by platform. extra_info.heap_usage_bytes The extra_info.heap_usage_bytes (page 543) eld is only available on Unix/Linux systems, and relates the total size in bytes of heap space used by the database process. extra_info.page_faults The extra_info.page_faults (page 543) eld is only available on Unix/Linux systems, and relates the total number of page faults that require disk operations. Page faults refer to operations that require the database server to access data which isnt available in active memory. The page_fault (page 543) counter may increase dramatically during moments of poor performance and may correlate with limited memory environments and larger data sets. Limited and sporadic page faults do not in and of themselves indicate an issue.

32.2.7 indexCounters
Example output of the indexCounters elds (page 535). indexCounters Changed in version 2.2: Previously, data in the indexCounters (page 543) document reported sampled data, and were only useful in relative comparison to each other, because they could not reect absolute index use. In 2.2 and later, these data reect actual index use. The indexCounters (page 543) data structure reports information regarding the state and use of indexes in MongoDB. indexCounters.btree The indexCounters.btree (page 543) data structure contains data regarding MongoDBs btree indexes. indexCounters.btree.accesses indexCounters.btree.accesses (page 543) reports the number of times that operations have accessed indexes. This value is the combination of the indexCounters.btree.hits (page 544) and

32.2. Server Status Reference

543

MongoDB Documentation, Release 2.0.6

indexCounters.btree.misses (page 544). Higher values indicate that your database has indexes and that queries are taking advantage of these indexes. If this number does not grow over time, this might indicate that your indexes do not effectively support your use. indexCounters.btree.hits The indexCounters.btree.hits (page 544) value reects the number of times that an index has been accessed and mongod is able to return the index from memory. A higher value indicates effective index use. indexCounters.btree.hits (page 544) values that represent a greater proportion of the indexCounters.btree.accesses (page 543) value, tend to indicate more effective index conguration. indexCounters.btree.misses The indexCounters.btree.misses (page 544) value represents the number of times that an operation attempted to access an index that was not in memory. These misses, do not indicate a failed query or operation, but rather an inefcient use of the index. Lower values in this eld indicate better index use and likely overall performance as well. indexCounters.btree.resets The index Counter.btree.resets value reects the number of times that the index counters have been reset since the database last restarted. Typically this value is 0, but use this value to provide context for the data specied by other indexCounters (page 543) values. indexCounters.btree.missRatio The indexCounters.btree.missRatio (page 544) value is the ratio of indexCounters.btree.hits (page 544) to indexCounters.btree.misses (page 544) misses. This value is typically 0 or approaching 0.

32.2.8 backgroundFlushing
Example output of the backgroundFlushing elds (page 535). backgroundFlushing mongod periodically ushes writes to disk. In the default conguration, this happens every 60 seconds. The backgroundFlushing (page 544) data structure contains data that regarding these operations. Consider these values if you have concerns about write performance and journaling (page 549). backgroundFlushing.flushes backgroundFlushing.flushes (page 544) is a counter that collects the number of times the database has ushed all writes to disk. This value will grow as database runs for longer periods of time. backgroundFlushing.total_ms The backgroundFlushing.total_ms (page 544) value provides the total number of milliseconds (ms) that the mongod processes have spent writing (i.e. ushing) data to disk. Because this is an absolute value, consider the value of backgroundFlushing.flushes (page 544) and backgroundFlushing.average_ms (page 544) to provide better context for this datum. backgroundFlushing.average_ms The backgroundFlushing.average_ms (page 544) value describes the relationship between the number of ushes and the total amount of time that the database has spent writing data to disk. The larger backgroundFlushing.flushes (page 544) is, the more likely this value is likely to represent a normal, time; however, abnormal data can skew this value. Use the backgroundFlushing.last_ms (page 544) to ensure that a high average is not skewed by transient historical issue or a random write distribution.

544

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

backgroundFlushing.last_ms The value of the backgroundFlushing.last_ms (page 544) eld is the amount of time, in milliseconds, that the last ush operation took to complete. Use this value to verify that the current performance of the server and is in line with the historical data provided by backgroundFlushing.average_ms (page 544) and backgroundFlushing.total_ms (page 544). backgroundFlushing.last_finished The backgroundFlushing.last_finished (page 545) eld provides a timestamp of the last completed ush operation in the ISODate format. If this value is more than a few minutes old relative to your servers current time and accounting for differences in time zone, restarting the database may result in some data loss. Also consider ongoing operations that might skew this value by routinely block write operations.

32.2.9 cursors
Example output of the cursors (page 535) elds. cursors The cursors data structure contains data regarding cursor state and use. cursors.totalOpen cursors.totalOpen (page 545) provides the number of cursors that MongoDB is maintaining for clients. Because MongoDB exhausts unused cursors, typically this value small or zero. However, if there is a queue, stale tailable cursors, or a large number of operations this value may rise. cursors.clientCursors_size Deprecated since version 1.x: See cursors.totalOpen (page 545) for this datum. cursors.timedOut cursors.timedOut (page 545) provides a counter of the total number of cursors that have timed out since the server process started. If this number is large or growing at a regular rate, this may indicate an application error.

32.2.10 network
Example output of the network elds (page 535). network The network data structure contains data regarding MongoDBs network use. network.bytesIn The value of the network.bytesIn (page 545) eld reects the amount of network trafc, in bytes, received by this database. Use this value to ensure that network trafc sent to the mongod process is consistent with expectations and overall inter-application trafc. network.bytesOut The value of the network.bytesOut (page 545) eld reects the amount of network trafc, in bytes, sent from this database. Use this value to ensure that network trafc sent by the mongod process is consistent with expectations and overall inter-application trafc.

32.2. Server Status Reference

545

MongoDB Documentation, Release 2.0.6

network.numRequests The network.numRequests (page 545) eld is a counter of the total number of distinct requests that the server has received. Use this value to provide context for the network.bytesIn (page 545) and network.bytesOut (page 545) values to ensure that MongoDBs network utilization is consistent with expectations and application use.

32.2.11 repl
Example output of the repl elds (page 535). repl The repl data structure contains status information for MongoDBs replication (i.e. replica set) conguration. These values only appear when the current host has replication enabled. See Replication Fundamentals (page 33) for more information on replication. repl.setName The repl.setName (page 546) eld contains a string with the name of the current replica set. This value reects the --replSet (page 490) command line argument, or replSet (page 501) value in the conguration le. See Replication Fundamentals (page 33) for more information on replication. repl.ismaster The value of the repl.ismaster (page 546) eld is either true or false and reects whether the current node is the master or primary node in the replica set. See Replication Fundamentals (page 33) for more information on replication. repl.secondary The value of the repl.secondary (page 546) eld is either true or false and reects whether the current node is a secondary node in the replica set. See Replication Fundamentals (page 33) for more information on replication. repl.hosts repl.hosts (page 546) is an array that lists the other nodes in the current replica set. Each member of the replica set appears in the form of hostname:port. See Replication Fundamentals (page 33) for more information on replication.

32.2.12 opcountersRepl
Example output of the opcountersRepl elds (page 536). opcountersRepl The opcountersRepl (page 546) data structure, similar to the opcounters data structure, provides an overview of database replication operations by type and makes it possible to analyze the load on the replica in more granular manner. These values only appear when the current host has replication enabled. These values will differ from the opcounters values because of how MongoDB serializes operations during replication. See Replication Fundamentals (page 33) for more information on replication.

546

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

These numbers will grow over time and in response to database use. Analyze these values over time to track database utilization. opcountersRepl.insert opcountersRepl.insert (page 547) provides a counter of the total number of replicated insert operations since the mongod instance last started. opcountersRepl.query opcountersRepl.query (page 547) provides a counter of the total number of replicated queries since the mongod instance last started. opcountersRepl.update opcountersRepl.update (page 547) provides a counter of the total number of replicated update operations since the mongod instance last started. opcountersRepl.delete opcountersRepl.delete (page 547) provides a counter of the total number of replicated delete operations since the mongod instance last started. opcountersRepl.getmore opcountersRepl.getmore (page 547) provides a counter of the total number of getmore operations since the mongod instance last started. This counter can be high even if the query count is low. Secondary nodes send getMore operations as part of the replication process. opcountersRepl.command opcountersRepl.command (page 547) provides a counter of the total number of replicated commands issued to the database since the mongod instance last started.

32.2.13 replNetworkQueue
New in version 2.1.2. Example output of the replNetworkQueue elds (page 536). replNetworkQueue The replNetworkQueue (page 547) document reports on the network replication buffer, which permits replication operations to happen in the background. This feature is internal. This document only appears on secondary members of replica sets. replNetworkQueue.waitTimeMs replNetworkQueue.waitTimeMs (page 547) reports the amount of time that a secondary waits to add operations to network queue. This value is cumulative. replNetworkQueue.numElems replNetworkQueue.numElems (page 547) reports the number of operations stored in the queue. replNetworkQueue.numBytes replNetworkQueue.numBytes (page 547) reports the total size of the network replication queue.

32.2.14 opcounters
Example output of the opcounters elds (page 536). 32.2. Server Status Reference 547

MongoDB Documentation, Release 2.0.6

opcounters The opcounters data structure provides an overview of database operations by type and makes it possible to analyze the load on the database in more granular manner. These numbers will grow over time and in response to database use. Analyze these values over time to track database utilization. opcounters.insert opcounters.insert (page 548) provides a counter of the total number of insert operations since the mongod instance last started. opcounters.query opcounters.query (page 548) provides a counter of the total number of queries since the mongod instance last started. opcounters.update opcounters.update (page 548) provides a counter of the total number of update operations since the mongod instance last started. opcounters.delete opcounters.delete (page 548) provides a counter of the total number of delete operations since the mongod instance last started. opcounters.getmore opcounters.getmore (page 548) provides a counter of the total number of getmore operations since the mongod instance last started. This counter can be high even if the query count is low. Secondary nodes send getMore operations as part of the replication process. opcounters.command opcounters.command (page 548) provides a counter of the total number of commands issued to the database since the mongod instance last started.

32.2.15 asserts
Example output of the asserts elds (page 536). asserts The asserts data structure provides an account of the number of asserts on the database. While assert errors are typically uncommon, if there are non-zero values for the asserts, you should check the log le for the mongod process for more information. In many cases these errors are trivial, but are worth investigating. asserts.regular The asserts.regular (page 548) counter tracks the number of regular assertions raised since the server process started. Check the log le for more information about these messages. asserts.warning The asserts.warning (page 548) counter tracks the number of warnings raised since the server process started. Check the log le for more information about these warnings. asserts.msg The asserts.msg (page 548) counter tracks the number of message assertions raised since the server process started. Check the log le for more information about these messages. asserts.user The asserts.users counter reports the number of user asserts that have occurred since the last time the 548 Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

server process started. These are errors that user may generate, such as out of disk space or duplicate key. You can prevent these assertions by xing a problem with your application or deployment. Check the MongoDB log for more information. asserts.rollovers The asserts.rollovers (page 549) counter displays the number of times that the rollover counters have rolled over since the last time the server process started. The counters will rollover to zero after 230 assertions. Use this value to provide context to the other values in the asserts data structure

32.2.16 writeBacksQueued
Example output of the writeBacksQueued elds (page 536). writeBacksQueued The value of writeBacksQueued (page 549) is true when there are operations from a mongos instance queued for retrying. Typically this option is false. See Also: writeBacks

32.2.17 dur
New in version 1.8. Journaling

Example output of the journaling elds (page 536). dur The dur (for durability) data structure contains data regarding MongoDBs journaling. mongod must be running with journaling for these data to appear in the output of serverStatus. See the Journaling wiki page for more information about journaling operations. dur.commits The dur.commits (page 549) value provides the number of commits to the journal in the last commit interval. MongoDB groups commits to the journal to improve performance. By default the interval is 100 milliseconds (ms), but the interval is congurable as a run-time option and can range from 2ms to 300ms. dur.journaledMB The dur.journaledMB (page 549) value provides the amount of data in megabytes (MB) written to the journal in the last commit interval. MongoDB groups commits to the journal to improve performance. By default the commit interval is 100 milliseconds (ms), but the interval is congurable as a run-time option and can range from 2ms to 300ms. dur.writeToDataFilesMB The dur.writeToDataFilesMB (page 549) value provides the amount of data in megabytes (MB) written from the journal to the data les in the last commit interval. 32.2. Server Status Reference 549

MongoDB Documentation, Release 2.0.6

MongoDB groups Commits to the journal to improve performance. By default the commit interval is 100 milliseconds (ms), but the interval is congurable as a run-time option and can range from 2ms to 300ms. dur.compression New in version 2.0. The dur.compression (page 550) represents the compression ratio of journal. dur.commitsInWriteLock The value of the eld dur.commitsInWriteLock (page 550) provides a count of the commits that behind a write lock. Commits in a write lock are undesirable and may indicate a capacity limitation for the database. dur.earlyCommits The dur.earlyCommits (page 550) value reects the number of time MongoDB requested a commit before the scheduled commit interval. Use this value to ensure that your journal commit interval is not too long for your deployment timeMS dur.timeMS The dur.timeMS (page 550) data structure provides information about the performance of the mongod instance for journaling operations. dur.timeMS.dt The dur.timeMS.dt (page 550) value provides, in milliseconds, the length of time over which MongoDB collected the dur.timeMS (page 550) data. Use this eld to provide context to the adjacent values. dur.timeMS.prepLogBuffer The dur.timeMS.prepLogBuffer (page 550) value provides, in milliseconds, the amount of time preparing to write to the journal. Smaller values indicate better journal performance. dur.timeMS.writeToJournal The dur.timeMS.writeToJournal (page 550) value provides, in milliseconds, the amount of time spent actually writing to the journal. File system speeds and device interfaces can affect performance. dur.timeMS.writeToDataFiles The dur.timeMS.writeToDataFiles (page 550) value provides, in milliseconds, the amount of time spent writing to data les after journaling. File system speeds and device interfaces can affect performance. dur.timeMS.remapPrivateView The dur.timeMS.remapPrivateView (page 550) value provides, in milliseconds, the amount of time remapping copy-on-write memory mapped views. Smaller values indicate better journal performance.

32.2.18 recordStats
Example output of the recordStats (page 537) elds. recordStats The recordStats (page 550) document provides ne grained reporting on page faults on a per database level. recordStats.accessesNotInMemory recordStats.accessesNotInMemory (page 550) reects the number of times mongod needed to access a memory page that was not resident in memory for all databases managed by this mongod instance. recordStats.pageFaultExceptionsThrown recordStats.pageFaultExceptionsThrown (page 550) reects the number of page fault exceptions thrown by mongod when accessing data for all databases managed by this mongod instance. 550 Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

recordStats.local.accessNotInMemory recordStats.local.accessNotInMemory (page 550) reects the number of times mongod needed to access a memory page that was not resident in memory for the local database. recordStats.local.pageFaultExceptionsThrown recordStats.local.pageFaultExceptionsThrown (page 551) reects the number of page fault exceptions thrown by mongod when accessing data for the local database. recordStats.admin.accessNotInMemory recordStats.admin.accessNotInMemory (page 551) reects the number of times mongod needed to access a memory page that was not resident in memory for the admin database. recordStats.admin.pageFaultExceptionsThrown recordStats.admin.pageFaultExceptionsThrown (page 551) reects the number of page fault exceptions thrown by mongod when accessing data for the admin database. recordStats.<database>.accessNotInMemory recordStats.<database>.accessNotInMemory (page 551) reects the number of times mongod needed to access a memory page that was not resident in memory for the <database> database. recordStats.<database>.pageFaultExceptionsThrown recordStats.<database>.pageFaultExceptionsThrown (page 551) reects the number of page fault exceptions thrown by mongod when accessing data for the <database> database.

32.3 Database Statistics Reference


32.3.1 Synopsis
MongoDB can report data that reects the current state of the active database. In this context database, refers to a single MongoDB database. To run dbStats issue this command in the shell:
db.runCommand( { dbStats: 1 } )

The mongo shell provides the helper function db.stats() (page 472). Use the following form:
db.stats()

The above commands are equivalent. Without any arguments, db.stats() (page 472) returns values in bytes. To convert the returned values to kilobytes, use the scale argument:
db.stats(1024)

Or:
db.runCommand( { dbStats: 1, scale: 1024 } )

Note: Because scaling rounds values to whole number, scaling may return unlikely or unexpected results. The above commands are equivalent. See the dbStats database command and the db.stats() (page 472) helper for the mongo shell for additional information.

32.3.2 Fields
db Contains the name of the database.

32.3. Database Statistics Reference

551

MongoDB Documentation, Release 2.0.6

collections Contains a count of the number of collections in that database. objects Contains a count of the number of objects (i.e. documents) in the database across all collections. avgObjSize The average size of each object. The scale argument affects this value. This is the dataSize (page 552) divided by the number of objects. dataSize The total size of the data held in this database including the padding factor. The scale argument affects this value. The dataSize (page 552) will not decrease when documents shrink, but will decrease when you remove documents. storageSize The total amount of space allocated to collections in this database for document storage. The scale argument affects this value. The storageSize (page 552) does not decrease as you remove or shrink documents. numExtents Contains a count of the number of extents in the database across all collections. indexes Contains a count of the total number of indexes across all collections in the database. indexSize The total size of all indexes created on this database. The scale arguments affects this value. fileSize The total size of the data les that hold the database. This value includes preallocated space and the padding factor. The value of fileSize (page 552) only reects the size of the data les for the database and not the namespace le. The scale argument affects this value. nsSizeMB The total size of the namespace les (i.e. that end with .ns) for this database. You cannot change the size of the namespace le after creating a database, but you can change the default size for all new namespace les with the nssize (page 499) runtime option. See Also: The nssize (page 499) option, and Maximum Namespace File Size (page 573)

32.4 Collection Statistics Reference


32.4.1 Synopsis
To fetch collection statistics, call the db.collection.stats() (page 463) method on a collection object in the mongo shell:
db.collection.stats()

You may also use the literal command format:


db.runCommand( { collStats: "collection" } )

552

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

Replace collection in both examples with the name of the collection you want statistics for. By default, the return values will appear in terms of bytes. You can, however, enter a scale argument. For example, you can convert the return values to kilobytes like so:
db.collection.stats(1024)

Or:
db.runCommand( { collStats: "collection", scale: 1024 } )

Note: The scale argument rounds values to whole numbers. This can produce unpredictable and unexpected results in some situations. See Also: The documentation of the collStats command and the db.collection.stats() (page 463), method in the mongo shell.

32.4.2 Fields
ns The namespace of the current collection, which follows the format [database].[collection]. count The number of objects or documents in this collection. size The size of the collection. The scale argument affects this value. avgObjSize The average size of an object in the collection. The scale argument affects this value. storageSize The total amount of storage allocated to this collection for document storage. The scale argument affects this value. The storageSize (page 552) does not decrease as you remove or shrink documents. numExtents The total number of contiguously allocated data le regions. nindexes The number of indexes on the collection. On standard, non-capped collections, there is always at least one index on the primary key (i.e. _id). lastExtentSize The size of the last extent allocated. The scale argument affects this value. paddingFactor The amount of space added to the end of each document at insert time. The document padding provides a small amount of extra space on disk to allow a document to grow slightly without needing to move the document. mongod automatically calculates this padding factor flags Changed in version 2.2: Removed in version 2.2 and replaced with the userFlags (page 553) and systemFlags (page 553) elds. Indicates the number of ags on the current collection. In version 2.0, the only ag notes the existence of an index on the _id eld. systemFlags New in version 2.2. Reports the ags on this collection that reect internal server options. Typically this value is 1 and reects the existence of an index on the _id eld. 32.4. Collection Statistics Reference 553

MongoDB Documentation, Release 2.0.6

userFlags New in version 2.2. Reports the ags on this collection set by the user. In version 2.2 the only user ag is usePowerOf2Sizes. See collMod for more information on setting user ags and usePowerOf2Sizes. totalIndexSize The total size of all indexes. The scale argument affects this value. indexSizes This eld species the key and size of every existing index on the collection. The scale argument affects this value.

32.5 Collection Validation Data


32.5.1 Synopsis
The collection validation command checks all of the structures within a name space for correctness and returns a document containing information regarding the on-disk representation of the collection. Warning: The validate process may consume signicant system resources and impede application performance because it must scan all data in the collection. Run the validation command in the mongo shell using the following form to validate a collection named people:
db.people.validate()

Alternatively you can use the command prototype and the db.runCommand() (page 471) shell helper in the following form:
db.runCommand( { validate: "people", full: true } ) db.people.validate(true)

See Also: validate and validate().

32.5.2 Values
ns The full namespace name of the collection. Namespaces include the database name and the collection name in the form database.collection. firstExtent The disk location of the rst extent in the collection. The value of this eld also includes the namespace. lastExtent The disk location of the last extent in the collection. The value of this eld also includes the namespace. extentCount The number of extents in the collection. extents validate returns one instance of this document for every extent in the collection. This sub-document is only returned when you specify the full option to the command.

554

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

extents.loc The disk location for the beginning of this extent. extents.xnext The disk location for the extent following this one. null if this is the end of the linked list of extents. extents.xprev The disk location for the extent preceding this one. null if this is the head of the linked list of extents. extents.nsdiag The namespace this extent belongs to (should be the same as the namespace shown at the beginning of the validate listing). extents.size The number of bytes in this extent. extents.firstRecord The disk location of the rst record in this extent. extents.lastRecord The disk location of the last record in this extent. datasize The number of bytes in all data records. This value does not include deleted records, nor does it include extent headers, nor record headers, nor space in a le unallocated to any extent. datasize (page 555) includes record padding. nrecords The number of documents in the collection. lastExtentSize The size of the last new extent created in this collection. This value determines the size of the next extent created. padding A oating point value between 1 and 2. When MongoDB creates a new record it uses the padding factor to determine how much additional space to add to the record. The padding factor is automatically adjusted by mongo when it notices that update operations are triggering record moves. firstExtentDetails The size of the rst extent created in this collection. This data is similar to the data provided by the extents (page 554) sub-document; however, the data reects only the rst extent in the collection and is always returned. firstExtentDetails.loc The disk location for the beginning of this extent. firstExtentDetails.xnext The disk location for the extent following this one. null if this is the end of the linked list of extents, which should only be the case if there is only one extent. firstExtentDetails.xprev The disk location for the extent preceding this one. This should always be null. firstExtentDetails.nsdiag The namespace this extent belongs to (should be the same as the namespace shown at the beginning of the validate listing). firstExtentDetails.size The number of bytes in this extent.

32.5. Collection Validation Data

555

MongoDB Documentation, Release 2.0.6

firstExtentDetails.firstRecord The disk location of the rst record in this extent. firstExtentDetails.lastRecord The disk location of the last record in this extent. objectsFound The number of records actually encountered in a scan of the collection. This eld should have the same value as the nrecords (page 555) eld. invalidObjects The number of records containing BSON documents that do not pass a validation check. Note: This eld is only included in the validation output when you specify the full option. bytesWithHeaders This is similar to datasize, except that bytesWithHeaders (page 556) includes the record headers. In version 2.0, record headers are 16 bytes per document. Note: This eld is only included in the validation output when you specify the full option. bytesWithoutHeaders bytesWithoutHeaders (page 556) returns data collected from a scan of all records. The value should be the same as datasize (page 555). Note: This eld is only included in the validation output when you specify the full option. deletedCount The number of deleted or free records in the collection. deletedSize The size of all deleted or free records in the collection. nIndexes The number of indexes on the data in the collection. keysPerIndex A document containing a eld for each index, named after the indexs name, that contains the number of keys, or documents referenced, included in the index. valid Boolean. true, unless validate determines that an aspect of the collection is not valid. When false, see the errors (page 556) eld for more information. errors Typically empty; however, if the collection is not valid (i.e valid (page 556) is false,) this eld will contain a message describing the validation error. ok Set to 1 when the command succeeds. If the command fails the ok (page 556) eld has a value of 0.

556

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

32.6 Connection Pool Statistics Reference


32.6.1 Synopsis
mongos instances maintain a pool of connections for interacting with constituent members of the shard clusters. Additionally, mongod instances maintain connection with other shards in the cluster for migrations. The connPoolStats command returns statistics regarding these connections between the mongos and mongod instances or between the mongod instances in a shard cluster. Note: connPoolStats only returns meaningful results for mongos instances and for mongod instances in shard clusters.

32.6.2 Output
hosts The sub-documents of the hosts (page 557) document report connections between the mongos or mongod instance and each component mongod of the shard cluster. hosts.[host].available hosts.[host].available (page 557) reports the total number of connections that the mongos or mongod could use to connect to this mongod. hosts.[host].created hosts.[host].created (page 557) reports the number of connections that this mongos or mongod has ever created for this host. replicaSets replicaSets (page 557) is a document that contains replica set information for the shard cluster. replicaSets.shard The replicaSets.shard (page 557) document reports on each shard within the shard cluster replicaSets.[shard].host The replicaSets.[shard].host (page 557) eld holds an array of document that reports on each host within the shard in the replica set. These values derive from the replica set status (page 558) values. replicaSets.[shard].host[n].addr replicaSets.[shard].host[n].addr (page 557) reports the address for the host in the shard cluster in the format of [hostname]:[port]. replicaSets.[shard].host[n].ok replicaSets.[shard].host[n].ok (page 557) reports false when: the mongos or mongod cannot connect to instance. the mongos or mongod received a connection exception or error. This eld is for internal use. replicaSets.[shard].host[n].ismaster replicaSets.[shard].host[n].ismaster (page 557) reports true replicaSets.[shard].host (page 557) is the primary member of the replica set. replicaSets.[shard].host[n].hidden replicaSets.[shard].host[n].hidden (page 557) reports true replicaSets.[shard].host (page 557) is a hidden member of the replica set. if this

if

this

32.6. Connection Pool Statistics Reference

557

MongoDB Documentation, Release 2.0.6

replicaSets.[shard].host[n].secondary replicaSets.[shard].host[n].secondary (page 557) reports true replicaSets.[shard].host (page 557) is a secondary member of the replica set.

if

this

replicaSets.[shard].host[n].pingTimeMillis replicaSets.[shard].host[n].pingTimeMillis (page 558) reports the ping time in milliseconds from the mongos or mongod to this replicaSets.[shard].host (page 557). replicaSets.[shard].host[n].tags New in version 2.2. replicaSets.[shard].host[n].tags (page 558) reports the members[n].tags (page 562), if this member of the set has tags congured. replicaSets.[shard].master replicaSets.[shard].master (page 558) reports the ordinal identier of the host in the replicaSets.[shard].host (page 557) array that is the primary of the replica set. replicaSets.[shard].nextSlave Deprecated since version 2.2. replicaSets.[shard].nextSlave (page 558) reports the secondary member that the mongos will use to service the next request for this replica set. createdByType createdByType (page 558) document reports the number of each type of connection that mongos or mongod has created in all connection pools. mongos connect to mongod instances using one of three types of connections. The following sub-document reports the total number of connections by type. createdByType.master createdByType.master (page 558) reports the total number of connections to the primary member in each shard cluster. createdByType.set createdByType.set (page 558) reports the total number of connections to a replica set member. createdByType.sync createdByType.sync (page 558) reports the total number of cong database connections. totalAvailable totalAvailable (page 558) reports the running total of connections from the mongos or mongod to all mongod instances in the shard cluster available for use. This value does not reect those connections that totalCreated totalCreated (page 558) reports the total number of connections ever created from the mongos or mongod to all mongod instances in the shard cluster. numDBClientConnection numDBClientConnection (page 558) reports the total number of connections from the mongos or mongod to all of the mongod instances in the shard cluster. numAScopedConnection numAScopedConnection (page 558) reports the number of exception safe connections created from mongos or mongod to all mongod in the shard cluster. The mongos or mongod releases these connections after receiving a socket exception from the mongod.

32.7 Replica Status Reference


The replSetGetStatus provides an overview of the current status of a replica set. Issue the following command against the admin database, in the mongo shell:

558

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

db.runCommand( { replSetGetStatus: 1 } )

The value specied (e.g 1 above,) does not impact the output of the command. Data provided by this command derives from data included in heartbeats sent to the current instance by other members of the replica set: because of the frequency of heartbeats, these data can be several seconds out of date. Note: The mongod that you issue the replSetGetStatus command to needs to have replication enabled, and be a member of a replica set for this command to return successfully. See Also: The rs.status() (page 479) function in the mongo shell provides a wrapper around the replSetGetStatus command. Also consider the Replication (page 31) documentation index for more information on replication.

32.7.1 Statuses
rs.status.set The set value is the name of the replica set, congured in the replSet (page 501) setting. This is the same value as _id (page 561) in rs.conf() (page 477). rs.status.date The value of the date eld is an ISODate of the current time, according to the current server. Compare this to the value of the members.lastHeartbeat (page 560) to nd the operational lag between the current host and the other hosts in the set. rs.status.myState The value of myState (page 559) reects state of the current replica set member. An integer between 0 and 10 represents the state of the member. These integers map to states, as described in the following table: Number 0 1 2 3 4 5 6 7 8 9 10 State Starting up, phase 1 (parsing conguration) Primary Secondary Recovering (initial syncing, post-rollback, stale members) Fatal error Starting up, phase 2 (forking threads) Unknown state (the set has never connected to the member) Arbiter Down Rollback Removed

rs.status.members The members eld holds an array that contains a document for every member in the replica set. See the Member Statuses (page 559) for an overview of the values included in these documents. rs.status.syncingTo The syncingTo eld is only present on the output of rs.status() (page 479) on secondary and recovering members, and holds the hostname of the member from which this instance is syncing.

32.7.2 Member Statuses


members.name The name eld holds the name of the server.

32.7. Replica Status Reference

559

MongoDB Documentation, Release 2.0.6

members.self The self eld is only included in the document for the current mongod instance in the members array. Its value is true. members.errmsg This eld contains the most recent error or status message received from the member. This eld may be empty (e.g. "") in some cases. members.health The health value is only present for the other members of the replica set (i.e. not the member that returns rs.status (page 479).) This eld conveys if the member is up (i.e. 1) or down (i.e. 0.) members.state The value of the members.state (page 560) reects state of this replica set member. An integer between 0 and 10 represents the state of the member. These integers map to states, as described in the following table: Number 0 1 2 3 4 5 6 7 8 9 10 State Starting up, phase 1 (parsing conguration) Primary Secondary Recovering (initial syncing, post-rollback, stale members) Fatal error Starting up, phase 2 (forking threads) Unknown state (the set has never connected to the member) Arbiter Down Rollback Removed

members.stateStr A string that describes members.state (page 560). members.uptime The members.uptime (page 560) eld holds a value that reects the number of seconds that this member has been online. This value does not appear for the member that returns the rs.status() (page 479) data. members.optime A document that contains information regarding the last operation from the operation log that this member has applied. members.optime.t A 32-bit timestamp of the last operation applied to this member of the replica set from the oplog. members.optime.i An incremented eld, which reects the number of operations in since the last time stamp. This value only increases if there is more than one operation per second. members.optimeDate An ISODate formatted date string that reects the last entry from the oplog that this member applied. If this differs signicantly from members.lastHeartbeat (page 560) this member is either experiencing replication lag or there have not been any new operations since the last update. Compare members.optimeDate between all of the members of the set. members.lastHeartbeat The lastHeartbeat value provides an ISODate formatted date of the last heartbeat received from this member. Compare this value to the value of the date (page 559) eld to track latency between these members. This value does not appear for the member that returns the rs.status() (page 479) data.

560

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

members.pingMS The pingMS represents the number of milliseconds (ms) that a round-trip packet takes to travel between the remote member and the local instance. This value does not appear for the member that returns the rs.status() (page 479) data.

32.8 Replica Set Conguration


32.8.1 Synopsis
This reference provides an overview of all possible replica set conguration options and settings. Use rs.conf() (page 477) in the mongo shell to retrieve this conguration. Note that default values are not explicitly displayed.

32.8.2 Conguration Variables


rs.conf._id Type: string Value: <setname> An _id eld holding the name of the replica set. This reects the set name congured with replSet (page 501) or mongod --replSet (page 490). rs.conf.members Type: array Contains an array holding an embedded document for each member of the replica set. The members document contains a number of elds that describe the conguration of each member of the replica set. members[n]._id Type: ordinal Provides a zero-indexed identier of every member in the replica set. members[n].host Type: <hostname>:<port> Identies the host name of the set member with a hostname and port number. This name must be resolvable for every host in the replica set. Warning: members[n].host (page 561) cannot hold a value that resolves to localhost or the local interface unless all members of the set are on hosts that resolve to localhost. members[n].arbiterOnly Optional. Type: boolean Default: false Identies an arbiter. For arbiters, this value is true, and is automatically congured by rs.addArb() (page 476). members[n].buildIndexes Optional.

32.8. Replica Set Conguration

561

MongoDB Documentation, Release 2.0.6

Type: boolean Default: true Determines whether the mongod builds indexes on this member. Do not set to false if a replica set can become a master, or if any clients ever issue queries against this instance. Omitting index creation, and thus this setting, may be useful, if: You are only using this instance to perform backups using mongodump, this instance will receive no queries will, and index creation and maintenance overburdens the host system. If set to false, secondaries congured with this option do build indexes on the _id eld, to facilitate operations required for replication. members[n].hidden Optional. Type: boolean Default: false When this value is true, the replica set hides this instance, and does not include the member in the output of db.isMaster() (page 469) or isMaster. This prevents read operations (i.e. queries) from ever reaching this host by way of secondary read preference. See Also: Hidden Replica Set Members (page 40) members[n].priority Optional. Type: Number, between 0 and 1000 including decimals. Default: 1 Specify higher values to make a node more eligible to become primary, and lower values to make the node less eligible to become primary. Priorities are only used in comparison to each other, members of the set will veto elections from nodes when another eligible node has a higher absolute priority value. A members[n].priority (page 562) of 0 makes it impossible for a node to become primary. See Also: Replica Set Node Priority (page 34) and Replica Set Elections (page 34). members[n].tags Optional. Type: MongoDB Document Default: none Used to represent arbitrary values for describing or tagging nodes for the purposes of extending write concern (page 50) to allow congurable data center awareness. Use in conjunction with settings.getLastErrorModes (page 563) and settings.getLastErrorDefaults (page 563) and db.getLastError() (page 467) (i.e. getLastError.) members[n].slaveDelay Optional.

562

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

Type: Integer. (seconds.) Default: 0 Describes the number of seconds behind the master that this replica set member should lag. Use this option to create delayed nodes (page 40), that maintain a copy of the data that reects the state of the data set some amount of time (specied in seconds.) Typically these nodes help protect against human error, and provide some measure of insurance against the unforeseen consequences of changes and updates. members[n].votes Optional. Type: Integer Default: 1 Controls the number of votes a server has in a replica set election (page 34). The number of votes each member has can be any non-negative integer, but it is highly recommended each member has 1 or 0 votes. If you need more than 7 members, use this setting to add additional non-voting members with a members[n].votes (page 563) value of 0. For most deployments and most members, use the default value, 1, for members[n].votes (page 563). settings Optional. Type: MongoDB Document The setting document holds two optional elds, which affect the available write concern options and default congurations. settings.getLastErrorDefaults Optional. Type: MongoDB Document Specify arguments to the getLastError that members of this replica set will use when no arguments to getLastError has no arguments. If you specify any arguments, getLastError , ignores these defaults. settings.getLastErrorModes Optional. Type: MongoDB Document Denes the names and combination of tags (page 562) for use by the application layer to guarantee write concern to database using the getLastError command to provide data-center awareness.

32.8.3 Example Document


the following document is a prototypical representation a mongodb replica set conguration document. square brackets (e.g. [ and ]) enclose all optional elds.
{ _id : <setname>, members: [ { _id : <ordinal>, host : <hostname[:port]> [, arbiteronly : true] [, buildindexes : <bool>] [, hidden : true]

32.8. Replica Set Conguration

563

MongoDB Documentation, Release 2.0.6

[, [, [, [,

priority: <priority>] tags: {loc1 : desc1, loc2 : desc2, ..., locn : descn}] slavedelay : <n>] votes : <n>]

} , ... ], [settings: { [getlasterrordefaults: <lasterrdefaults>] [, getlasterrormodes : <modes>] }] }

32.8.4 Use
Most modications of replica set conguration use the mongo shell. Consider the following reconguration operation: Example Given the following replica set conguration:
{ "_id" : "rs0", "version" : 1, "members" : [ { "_id" : 0, "host" : "mongodb0.example.net:27017" }, { "_id" : 1, "host" : "mongodb1.example.net:27017" }, { "_id" : 2, "host" : "mongodb2.example.net:27017" } ] }

And the following reconguration operation:


cfg = rs.conf() cfg.members[0].priority = 0.5 cfg.members[1].priority = 2 cfg.members[2].priority = 2 rs.reconfig(cfg)

This operation begins by saving the current replica set conguration to the local variable cfg using the rs.conf() (page 477) method. Then it adds priority values to the document where the members[n]._id (page 561) eld has a value of 0, 1, or 2. Finally, it calls the rs.reconfig() (page 478) method with the argument of cfg to initialize this new conguration. The replica set conguration after this operation will resemble the following:
{ "_id" : "rs0", "version" : 1, "members" : [

564

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

{ "_id" : 0, "host" : "mongodb0.example.net:27017", "priority" : 0.5 }, { "_id" : 1, "host" : "mongodb1.example.net:27017", "priority" : 2 }, { "_id" : 2, "host" : "mongodb2.example.net:27017", "priority" : 1 } ] }

Using the dot notation demonstrated in the above example, you can modify any existing setting or specify any of optional replica set conguration variables (page 561). Until you run rs.reconfig(cfg) at the shell, no changes will take effect. You can issue cfg = rs.conf() at any time before using rs.reconfig() (page 478) to undo your changes and start from the current conguration. If you issue cfg as an operation at any point, the mongo shell at any point will output the complete document with modications for your review. The rs.reconfig() (page 478) operation has a force option, to make it possible to recongure a replica set if a majority of the replica set is not visible, and there is no primary member of the set. use the following form:
rs.reconfig(cfg, { force: true } )

Warning: Forcing a rs.reconfig() (page 478) can lead to rollback situations and other difcult to recover from situations. Exercise caution when using this option.

Note: The rs.reconfig() (page 478) shell command can force the current primary to step down and causes an election in some situations. When the primary node steps down, all clients will disconnect. This is by design. While this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods.

32.8.5 Tag Sets


Tag sets provide custom and congurable write concern (page 50) and read preferences (page 51). This section will outline the process for specifying tags for a replica set, for more information see the full documentation of the behavior of tags sets write concern (page 50) and tag sets for read preference (page 53). Congure tag sets by adding elds and values to the document stored in the members[n].tags (page 562). Consider the following example: Example Given the following replica set conguration:
{ "_id" : "rs0", "version" : 1, "members" : [ {

32.8. Replica Set Conguration

565

MongoDB Documentation, Release 2.0.6

"_id" : 0, "host" : "mongodb0.example.net:27017" }, { "_id" : 1, "host" : "mongodb1.example.net:27017" }, { "_id" : 2, "host" : "mongodb2.example.net:27017" } ] }

And the following tag set:


{ "dc": "east", "use": "reporting" }

You could add the tag set, to the members[1] member of the replica set, with the following command sequence in the mongo shell:
conf = rs.conf() conf.members[1].tags = { "dc": "east", "use": "reporting" rs.reconfig(conf) }

After this operation the output of rs.conf() (page 477), would resemble the following:
{ "_id" : "rs0", "version" : 2, "members" : [ { "_id" : 0, "host" : "mongodb0.example.net:27017", }, { "_id" : 1, "host" : "mongodb1.example.net:27017", "tags" : { "dc": "east", "use": "reporting" } }, { "_id" : 2, "host" : "mongodb2.example.net:27017", } ] }

32.9 Replication Info Reference


The db.getReplicationInfo() (page 469) provides current status of the current replica status, using data polled from the oplog. Consider the values of this output when diagnosing issues with replication. See Also:

566

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

Replication Fundamentals (page 33) for more information on replication.

32.9.1 All Nodes


The following elds are present in the output of db.getReplicationInfo() (page 469) for both primary and secondary nodes. logSizeMB Returns the total size of the oplog in megabytes. This refers to the total amount of space allocated to the oplog rather than the current size of operations stored in the oplog. usedMB Returns the total amount of space used by the oplog in megabytes. This refers to the total amount of space currently used by operations stored in the oplog rather than the total amount of space allocated.

32.9.2 Primary Nodes


The following elds appear in the output of db.getReplicationInfo() (page 469) for primary nodes. errmsg Returns the last error status. oplogMainRowCount Returns a counter of the number of items or rows (i.e. documents) in the oplog.

32.9.3 Secondary Nodes


The following elds appear in the output of db.getReplicationInfo() (page 469) for secondary nodes. timeDiff Returns the difference between the rst and last operation in the oplog, represented in seconds. timeDiffHours Returns the difference between the rst and last operation in the oplog, rounded and represented in hours. tFirst Returns a time stamp for the rst (i.e. earliest) operation in the oplog. Compare this value to the last write operation issued against the server. tLast Returns a time stamp for the last (i.e. latest) operation in the oplog. Compare this value to the last write operation issued against the server. now Returns a time stamp reecting the current time. The shell process generates this value, and the datum may differ slightly from the server time if youre connecting from a remote host as a result. Equivalent to Date() (page 448).

32.10 Current Operation Reporting


Changed in version 2.2. The db.currentOp() (page 466) helper in the mongo shell reports on the current operations running on the mongod instance. The command returns the inprog array, which contains a document for each in progress operation. Consider the following example output:

32.10. Current Operation Reporting

567

MongoDB Documentation, Release 2.0.6

{ "inprog": [ { "opid" : 3434473, "active" : <boolean>, "secs_running" : 0, "op" : "<operation>", "ns" : "<database>.<collection>", "query" : { }, "client" : "<host>:<outgoing>", "desc" : "conn57683", "threadId" : "0x7f04a637b700", "connectionId" : 57683, "locks" : { "^" : "w", "^local" : "W", "^<database>" : "W" }, "waitingForLock" : false, "msg": "<string>" "numYields" : 0, "progress" : { "done" : <number>, "total" : <number> } "lockStats" : { "timeLockedMicros" : { "R" : NumberLong(), "W" : NumberLong(), "r" : NumberLong(), "w" : NumberLong() }, "timeAcquiringMicros" : { "R" : NumberLong(), "W" : NumberLong(), "r" : NumberLong(), "w" : NumberLong() } } }, ] }

Optional You may specify the true argument to db.currentOp() (page 466) to return a more verbose output including idle connections and system operations. For example:
db.currentOp(true)

Furthermore, active operations (i.e. where active (page 569) is true) will return additional elds.

568

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

32.10.1 Output Reference


opid Holds an identier for the operation. You can pass this value to db.killOp() in the mongo shell to terminate the operation. active A boolean value, that is true if the operation is currently running or false if the operation is queued and waiting for a lock to run. secs_running The duration of the operation in seconds. MongoDB calculates this value by subtracting the current time from the start time of the operation. op A string that identies the type of operation. The possible values are: insert query update remove getmore command ns The namespace the operation targets. MongoDB forms namespaces using the name of the database and the name of the collection. query A document containing the current operations query. The document is empty for operations that do not have queries: getmore, insert, and command. client The IP address (or hostname) and the ephemeral port of the client connection where the operation originates. If your inprog array has operations from many different clients, use this string to relate operations to clients. For some commands, including findAndModify and db.eval() (page 466), the client will be 0.0.0.0:0, rather than an actual client. desc A description of the client. This string includes the connectionId (page 569). threadId An identier for the thread that services the operation and its connection. connectionId An identier for the connection where the operation originated. locks The locks (page 569) document reports on the kinds of locks the operation currently holds. The following kinds of locks are possible: locks.^ locks.^ (page 569) reports on the global lock state for the mongod instance. The operation must hold this for some global phases of an operation. locks.^local locks.^ (page 569) reports on the lock for the local database. MongoDB uses the local database

32.10. Current Operation Reporting

569

MongoDB Documentation, Release 2.0.6

for a number of operations, but the most frequent use of the local database is for the oplog used in replication. locks.^<database> locks.^ reports on the lock state for the database that this operation targets. waitingForLock Returns a boolean value. waitingForLock (page 570) is true if the operation is waiting for a lock and false if the operation has the required lock. msg The msg (page 570) provides a message that describes the status and progress of the operation. In the case of indexing operations, the eld reports the completion percentage. numYields numYields (page 570) is a counter that reports the number of times the operation has yielded to allow other operations to complete. Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete quickly while MongoDB reads in data for the yielding operation. lockStats The lockStats (page 570) document reects the amount of time the operation has spent both acquiring and holding locks. lockStats (page 570) reports data on a per-lock type, with the following possible lock types: R represents the global read lock, W represents the global write lock, r represents the database specic read lock, and w represents the database specic write lock. timeLockedMicros The timeLockedMicros (page 570) document reports the amount of time the operation has spent holding a specic lock. timeLockedMicros.R Reports the amount of time in microseconds the operation has held the global read lock. timeLockedMicros.W Reports the amount of time in microseconds the operation has held the global write lock. timeLockedMicros.r Reports the amount of time in microseconds the operation has held the database specic read lock. timeLockedMicros.w Reports the amount of time in microseconds the operation has held the database specic write lock. timeAquiringMicros The timeLockedMicros (page 570) document reports the amount of time the operation has spent waiting to acquire a specic lock. timeAcquiringMicros.R Reports the mount of time in microseconds the operation has waited for the global read lock. timeAcquiringMicros.W Reports the mount of time in microseconds the operation has waited for the global write lock.

570

Chapter 32. Status and Reporting

MongoDB Documentation, Release 2.0.6

timeAcquiringMicros.r Reports the mount of time in microseconds the operation has waited for the database specic read lock. timeAcquiringMicros.w Reports the mount of time in microseconds the operation has waited for the database specic write lock.

32.10. Current Operation Reporting

571

MongoDB Documentation, Release 2.0.6

572

Chapter 32. Status and Reporting

CHAPTER

THIRTYTHREE

GENERAL REFERENCE
33.1 MongoDB Limits and Thresholds
33.1.1 Synopsis
This document provides a collection of hard and soft limitations of the MongoDB system.

33.1.2 Limits
BSON Object Size The maximum BSON object size is 16 megabytes. Index Size Indexed items, including their namespace/database, can be no larger than 1024 bytes. This value is the indexed content (i.e. the eld value.) Number of Indexes per Collection A single collection can have no more than 64 indexes. Namespace Length Each namespace, including database and collection name, must be shorter than 628 bytes. Index Name Length The names of indexes, including their namespace (i.e database and collection name) cannot be longer than 128 characters. Number of Namespaces The limitation on the number of namespaces is a function of the size of the namespace le. By default namespace les are 16 megabytes; however, with the nssize (page 499) setting, ns les can be no larger than 2 gigabytes. A 16 megabyte namespace le can support 24,000 namespaces. Size of Namespace File Namespace les can be no larger than 2 gigabytes. By default namespace les are 16 megabytes. You can congure the size using the nssize (page 499). Sorted Documents MongoDB will only return sorted results on elds without an index if the sort operation uses less than 32 megabytes of memory.

573

MongoDB Documentation, Release 2.0.6

Nested Depth for BSON Objects Changed in version 2.2. MongoDB only supports 100 levels of nesting for BSON documents. Operations Unavailable in Sharded Environments The group does not work with sharding. Use mapreduce or aggregate instead. db.eval() (page 466) is incompatible with sharded collections. You may use db.eval() (page 466) with un-sharded collections in a shard cluster. $where does not permit references to the db object from the $where function. This is uncommon in unsharded collections. Unique Indexes in Sharded Collections MongoDB does not support unique indexes across shards, except when the unique index contains the full shard key as a prex of the index. In these situations MongoDB will enforce uniqueness across the full key, not a single eld. See Also: Enforce Unique Keys for Sharded Collections (page 129) for an alternate approach.

33.2 Glossary
$cmd A virtual collection that exposes MongoDBs database commands. _id A eld containing a unique ID, typically a BSON ObjectId. If not specied, this value is automatically assigned upon the creation of a new document. You can think of the _id as the documents primary key. accumulator An expression in the aggregation framework that maintains state between documents in the aggregation pipeline. See: $group (page 399) for a list of accumulator operations. admin database A privileged database named admin. Users must have access to this database to run certain administrative commands. See commands for more information on these commands. aggregation Any of a variety of operations that reduce and summarize large sets of data. SQLs GROUP and MongoDBs map-reduce are two examples of aggregation functions. aggregation framework The MongoDB aggregation framework provides a means to calculate aggregate values without having to use map-reduce. See Also: Aggregation Framework (page 199). arbiter A member of a replica set that exists solely to vote in elections. Arbiter nodes do not replicate data. See Also: Delayed Nodes (page 40) balancer An internal MongoDB process that runs in the context of a shard cluster and manages the splitting and migration of chunks. Administrators must disable the balancer for all maintenance operations on a shard cluster. box MongoDBs geospatial indexes and querying system allow you to to build queries around rectangles on twodimensional coordinate systems. These queries use the $box operator to dene a shape using the lower-left and the upper-right coordinates. BSON A serialization format used to store documents and make remote procedure calls in MongoDB. BSON is a a portmanteau of the words binary and JSON. Think of BSON as a binary representation of JSON (JavaScript Object Notation) documents. For a detailed spec, see bsonspec.org. See Also:

574

Chapter 33. General Reference

MongoDB Documentation, Release 2.0.6

The Data Type Fidelity (page 153) section. BSON types The set of types supported by the BSON serialization format. The following types are available: Type Double String Object Array Binary data Object id Boolean Date Null Regular Expression JavaScript Symbol JavaScript (with scope) 32-bit integer Timestamp 64-bit integer Min key Max key Number 1 2 3 4 5 7 8 9 10 11 13 14 15 16 17 18 255 127

btree A data structure used by most database management systems for to store indexes. MongoDB uses b-trees for its indexes. CAP Theorem Given three properties of computing systems, consistency, availability, and partition tolerance, a distributed computing system can provide any two of these features, but never all three. capped collection A xed-sized collection. Once they reach their xed size, capped collections automatically overwrite their oldest entries. MongoDBs oplog replication mechanism depends on capped collections. Developers may also use capped collections in their applications. See Also: The Capped Collections wiki page. checksum A calculated value used to ensure data integrity. The md5 algorithm is sometimes used as a checksum. chunk In the context of a shard cluster, a chunk is a contiguous range of shard key values assigned to a particular shard. By default, chunks are 64 megabytes or less. When they grow beyond the congured chunk size, a mongos splits the chunk into two chunks. circle MongoDBs geospatial indexes and querying system allow you to build queries around circles on twodimensional coordinate systems. These queries use the $circle operator to dene circle using the center and the radius of the circle. client The application layer that uses a database for data persistence and storage. Drivers provide the interface level between the application layer and the database server. cluster A set of mongod instances running in conjunction to increase database availability and performance. See sharding and replication for more information on the two different approaches to clustering with MongoDB. collection A namespace within a database for containing documents. Collections do not enforce a schema, but they are otherwise mostly analogous to RDBMS tables. compound index An index consisting of two or more keys. See Indexing Overview (page 179) for more information. cong database One of three mongod instances that store all of the metadata associated with a shard cluster.

33.2. Glossary

575

MongoDB Documentation, Release 2.0.6

control script A simple shell script, typically located in the /etc/rc.d or /etc/init.d directory and used by the systems initialization process to start, restart and stop a daemon process. control script A script used by a UNIX-like operating system to start, stop, or restart a daemon process. On most systems, you can nd these scripts in the /etc/init.d/ or /etc/rc.d/ directories. CRUD Create, read, update, and delete. The fundamental operations of any database. CSV A text-based data format consisting of comma-separated values. This format is commonly used to exchange database between relational databases, since the format is well-suited to tabular data. You can import CSV les using mongoimport. cursor In MongoDB, a cursor is a pointer to the result set of a query, that can be iterated through to retrieve results. By default, cursors will timeout after 10 minutes of inactivity. daemon The conventional name for a background, non-interactive process. data-center awareness A property that allows clients to address nodes in a system to based upon their location. Replica sets implement data-center awareness using tagging. See Also: members[n].tags (page 562) and data center awareness (page 87). database A physical container for collections. Each database gets its own set of les on the le system. A single MongoDB server typically servers multiple databases. database command Any MongoDB operation other than an insert, update, remove, or query. MongoDB exposes commands as queries query against the special $cmd collection. For example, the implementation of count for MongoDB is a command. See Also: /reference/commands for a full list of database commands in MongoDB database proler A tool that, when enabled, keeps a record on all long-running operations in a databases system.profile collection. The proler is most often used to diagnose slow queries. See Also: Monitoring Database Systems (page 150). dbpath Refers to the location of MongoDBs data le storage. The default dbpath (page 497) is /data/db. Other common data paths include /srv/mongodb and /var/lib/mongodb. See Also: dbpath (page 497) or --dbpath (page 487). delayed member A member of a replica set that cannot become primary and applies operations at a specied delay. This delay is useful for protecting data from human error (i.e. unintentionally deleted databases) or updates that have unforeseen effects on the production database. See Also: Delayed Members (page 40) document A record in a MongoDB collection, and the basic unit of data in MongoDB. Documents are analogous to JSON objects, but exist in the database in a more type-rich format known as BSON . draining The process of removing or shedding chunks from one shard to another. Administrators must drain shards before removing them from the cluster. See Also: removeshard, sharding.

576

Chapter 33. General Reference

MongoDB Documentation, Release 2.0.6

driver A client implementing the communication protocol required for talking to a server. The MongoDB drivers provide language-idiomatic methods for interfacing with MongoDB. See Also: Drivers (page 225) election In the context of replica sets, an election is the process by which members of a replica set select primary nodes on startup and in the event of failures. See Also: Replica Set Elections (page 34) and priority. eventual consistency A property of a distributed system allowing changes to the system to propagate gradually. In a database system, this means that readable nodes are not required to reect the latest writes at all times. In MongoDB, reads to a primary have strict consistency; reads to secondary nodes have eventual consistency. expression In the context of the aggregation framework, expressions are the stateless transformations that operate on the data that passes through the pipeline. See Also: Aggregation Framework (page 199). failover The process that allows one of the secondary nodes in a replica set to become primary in the event of a failure. See Also: Replica Set Failover (page 34). eld A name-value pair in a document. Documents have zero or more elds. Fields are analogous to columns in relational databases. rewall A system level networking lter that restricts access based on, among other things, IP address. Firewalls form part of effective network security strategy. fsync A system call that ushes all dirty, in-memory pages to disk. MongoDB calls fsync() every 60 seconds. Geohash A value is a binary representation of the location on a coordinate grid. geospatial Data that relates to geographical location. In MongoDB, you may index or store geospatial data according to geographical parameters and reference specic coordinates in queries. GridFS A convention for storing large les in a MongoDB database. All of the ofcial MongoDB drivers support this convention, as does the mongofiles program. See Also: mongoles Manual (page 526). haystack index In the context of geospatial queries, haystack indexes enhance searches by creating bucket of objects grouped by a second criterion. For example, you might want want all geographical searches to also include the type of location being searched for. In this case, you can create a haystack index that includes a documents position and type:
db.places.ensureIndex( { position: "geoHaystack", type: 1 } )

You can then query on position and type:


db.places.find( { position: [34.2, 33.3], type: "restaurant" } )

hidden member A member of a replica set that cannot become primary and is not advertised as part of the set in the database command isMaster, which prevents it from receiving read-only queries depending on read preference. 33.2. Glossary 577

MongoDB Documentation, Release 2.0.6

See Also: Hidden Member (page 40), isMaster, db.isMaster (page 469), and members[n].hidden (page 562). idempotent When calling an idempotent operation on a value or state, the operation only affects the value once. Thus, the operation can safely run multiple times without unwanted side effects. In the context of MongoDB, oplog entries must be idempotent to support initial synchronization and recovery from certain failure situations. Thus, MongoDB can safely apply oplog entries more than once without any ill effects. index A data structure that optimizes queries. See Indexing Overview (page 179) for more information. IPv6 A revision to the IP (Internet Protocol) standard that provides a signicantly larger address space to more effectively support the number of hosts on the contemporary Internet. ISODate The international date format used by mongo. to display dates. E.g. YYYY-MM-DD HH:MM.SS.milis. JavaScript A popular scripting language original designed for web browsers. The MongoDB shell and certain server-side functions use a JavaScript interpreter. journal A sequential, binary transaction used to bring the database into a consistent state in the event of a hard shutdown. MongoDB enables journaling by default for 64-bit builds of MongoDB version 2.0 and newer. Journal les are pre-allocated and will exist as three 1GB le in the data directory. To make journal les smaller, use smallfiles (page 500). When enabled, MongoDB writes data rst to the journal and after to the core data les. MongoDB commits to the journal every 100ms and this is congurable using the journalCommitInterval (page 498) runtime option. See Also: The Journaling wiki page. JSON JavaScript Object Notation. A human-readable, plain text format for expressing structured data with support in many programming languages. JSON document A JSON document is a collection of elds and values in a structured format. The following is a sample JSON document with two elds:
{ name: "MongoDB", type: "database" }

JSONP JSON with Padding. Refers to a method of injecting JSON into applications. Presents potential security concerns. LVM Logical volume manager. LVM is a program that abstracts disk images from physical devices, and provides a number of raw disk manipulation and snapshot capabilities useful for system management. map-reduce A data and processing and aggregation paradigm consisting of a map phase that selects data, and a reduce phase that transforms the data. In MongoDB, you can run arbitrary aggregations over data using map-reduce. See Also: The Map Reduce wiki page for more information regarding MongoDBs map-reduce implementation, and Aggregation Framework (page 199) for another approach to data aggregation in MongoDB. master In conventional master/slave replication, the master database receives all writes. The slave instances replicate from the master instance in real time. md5 md5 is a hashing algorithm used to efciently provide reproducible unique strings to identify and checksum data. MongoDB uses md5 to identify chunks of data for GridFS. MIME Multipurpose Internet Mail Extensions. A standard set of type and encoding denitions used to declare the encoding and type of data in multiple data storage, transmission, and email contexts.

578

Chapter 33. General Reference

MongoDB Documentation, Release 2.0.6

mongo The MongoDB Shell. mongo connects to mongod and mongos instances, allowing administration, management, and testing. mongo has a JavaScript interface. See Also: mongo Manual (page 503) and /reference/javascript. mongod The program implementing the MongoDB database server. This server typically runs as a daemon. See Also: mongod Manual (page 485). MongoDB The document-based database server described in this manual. mongos The routing and load balancing process that acts an interface between an application and a MongoDB shard cluster. See Also: mongos Manual (page 491). multi-master replication A replication method where multiple database instances can accept write operations to the same data set at any time. Multi-master replication exchanges increased concurrency and availability for a relaxed consistency semantic. MongoDB ensures consistency and, therefore, does not provide multi-master replication. namespace A canonical name for a collection or index in MongoDB. Namespaces consist of a concatenation of the database and collection or index name, like so: [database-name].[collection-or-index-name]. All documents belong to a namespace. natural order The order in which a database stores documents on disk. Typically this order is the same as the insertion order. Capped collections, among other things, guarantee that insertion order and natural order are identical. ObjectId A special 12-byte BSON type that has a high probability of being unique when generated. The most signicant digits in an ObjectId represent the time when the Object. MongoDB uses ObjectId values as the default values for _id elds. operator A keyword beginning with a $ used to express a complex query, update, or data transformation. For example, $gt is the query languages greater than operator. See the /reference/operators for more information about the available operators. See Also: /reference/operators. oplog A capped collection that stores an ordered history of logical writes to a MongoDB database. The oplog is the basic mechanism enabling replication in MongoDB. See Also: Oplog Sizes (page 36) and Change the Size of the Oplog (page 71). padding The extra space allocated to document on the disk to prevent moving a document when it grows as the result of update operations. padding factor An automatically-calibrated constant used to determine how much extra space MongoDB should allocate per document container on disk. A padding factor of 1 means that MongoDB will allocate only the amount of space needed for the document. A padding factor of 2 means that MongoDB will allocate twice the amount of space required by the document. page fault The event that occurs when a process requests stored data (i.e. a page) from memory that the operating system has moved to disk.

33.2. Glossary

579

MongoDB Documentation, Release 2.0.6

See Also: Storage FAQ: What are page faults? (page 370) partition A distributed system architecture that splits data into ranges. Sharding is a kind of partitioning. pcap A packet capture format used by mongosniff to record packets captured from network interfaces and display them as human-readable MongoDB operations. PID A process identier. On UNIX-like systems, a unique integer PID is assigned to each running process. You can use a PID to inspect a running process and send signals to it. pipe A communication channel in UNIX-like systems allowing independent processes to send and receive data. In the UNIX shell, piped operations allow users to direct the output of one command into the input of another. pipeline The series of operations in the aggregation process. See Also: Aggregation Framework (page 199). polygon MongoDBs geospatial indexes and querying system allow you to build queries around multi-sided polygons on two-dimensional coordinate systems. These queries use the $within operator and a sequence of points that dene the corners of the polygon. powerOf2Sizes A per-collection setting that changes and normalizes the way that MongoDB allocates space for each document in an effort to maximize storage reuse reduce fragmentation. This is the default for TTL Collections (page 234). See collMod and usePowerOf2Sizes for more information. New in version 2.2. pre-splitting An operation, performed before inserting data that divides the range of possible shard key values into chunks to facilitate easy insertion and high write throughput. When deploying a shard cluster, in some cases presplitting will expedite the initial distribution of documents among shards by manually dividing the collection into chunks rather than waiting for the MongoDB balancer to create chunks during the course of normal operation. primary In a replica set, the primary member is the current master instance, which receives all write operations. primary key A records unique, immutable identier. In an RDBMS, the primary key is typically an integer stored in each rows id eld. In MongoDB, the _id eld holds a documents primary key which is usually a BSON ObjectId. priority In the context of replica sets, priority is a congurable value that helps determine which nodes in a replica set are most likely to become primary. See Also: Replica Set Node Priority (page 34) projection A document given to a query that species which elds MongoDB will return from the documents in the result set. query A read request. MongoDB queries use a JSON -like query language that includes a variety of query operators with names that begin with a $ character. In the mongo shell, you can issue queries using the db.collection.find() (page 457) and db.collection.findOne() (page 458) methods. RDBMS Relational Database Management System. A database management system based on the relational model, typically using SQL as the query language. read preference A setting on the MongoDB drivers (page 225) that determines how the clients direct read operations. Read preference affects all replica sets including shards. By default, drivers direct all reads to primary nodes for strict consistency. However, you may also direct reads to secondary nodes for eventually consistent reads. See Also: Read Preference (page 51)

580

Chapter 33. General Reference

MongoDB Documentation, Release 2.0.6

read-lock In the context of a reader-writer lock, a lock that while held allows concurrent readers, but no writers. recovering A replica set member status indicating that a member is synchronizing or re-synchronizing its data from the primary node. Recovering nodes are unavailable for reads. replica pairs The precursor to the MongoDB replica sets. Deprecated since version 1.6. replica set A cluster of MongoDB servers that implements master-slave replication and automated failover. MongoDBs recommended replication strategy. See Also: Replication (page 31) and Replication Fundamentals (page 33). replication A feature allowing multiple database servers to share the same data, thereby ensuring redundancy and facilitating load balancing. MongoDB supports two avors of replication: master-slave replication and replica sets. See Also: replica set, sharding, Replication (page 31). and Replication Fundamentals (page 33). replication lag The length of time between the last operation in the primarys oplog last operation applied to a particular secondary or slave node. In general, you want to keep replication lag as small as possible. See Also: Replication Lag (page 45) resident memory The subset of an applications memory currently stored in physical RAM. Resident memory is a subset of virtual memory, which includes memory mapped to physical RAM and to disk. REST An API design pattern centered around the idea of resources and the CRUD operations that apply to them. Typically implemented over HTTP. MongoDB provides a simple HTTP REST interface that allows HTTP clients to run commands against the server. rollback A process that, in certain replica set situations, reverts writes operations to ensure the consistency of all replica set members. secondary In a replica set, the secondary members are the current slave instances that replicate the contents of the master database. Secondary members may handle read requests, but only the primary members can handle write operations. secondary index A database index that improves query performance by minimizing the amount of work that the query engine must perform to fulll a query. set name In the context of a replica set, the set name refers to an arbitrary name given to a replica set when its rst congured. All members of a replica set must have the same name specied with the replSet (page 501) setting (or --replSet (page 490) option for mongod.) See Also: replication, Replication (page 31) and Replication Fundamentals (page 33). shard A single replica set that stores some portion of a shard clusters total data set. See sharding. See Also: The Sharding wiki page. shard cluster The set of nodes comprising a sharded MongoDB deployment. A shard cluster consists of three cong processes, one or more replica sets, and one or more mongos routing processes. See Also: The Sharding wiki page.

33.2. Glossary

581

MongoDB Documentation, Release 2.0.6

shard key In a sharded collection, a shard key is the eld that MongoDB uses to distribute documents among members of the shard cluster. sharding A database architecture that enable horizontal scaling by splitting data into key ranges among two or more replica sets. This architecture is also known as range-based partitioning. See shard. See Also: The Sharding wiki page. shell helper A number of database commands have helper methods in the mongo shell that provide a more concise syntax and improve the general interactive experience. See Also: mongo Manual (page 503) and /reference/javascript. single-master replication A replication topology where only a single database instance accepts writes. Singlemaster replication ensures consistency and is the replication topology employed by MongoDB. slave In conventional master/slave replication, slaves are read-only instances that replicate operations from the master database. Data read from slave instances may not be completely consistent with the master. Therefore, applications requiring consistent reads must read from the master database instance. split The division between chunks in a shard cluster. SQL Structured Query Language (SQL) is a common special-purpose programming language used for interaction with a relational database including access control as well as inserting, updating, querying, and deleting data. There are some similar elements in the basic SQL syntax supported by different database vendors, but most implementations have their own dialects, data types, and interpretations of proposed SQL standards. Complex SQL is generally not directly portable between major RDBMS products. Often, SQL is often used as a metonym for relational databases. SSD Solid State Disk. A high-performance disk drive that uses solid state electronics for persistence, as opposed to the rotating platters and movable read/write heads used by traditional mechanical hard drives. strict consistency A property of a distributed system requiring that all nodes always reect the latest changes to the system. In a database system, this means that any system that can provide data must reect the latest writes at all times. In MongoDB, reads to a primary have strict consistency; reads to secondary nodes have eventual consistency. syslog On UNIX-like systems, a logging process that provides a uniform standard for servers and processes to submit logging information. tag One or more labels applied to a given replica set member that clients may use to issue data-center aware operations. TSV A text-based data format consisting of tab-separated values. This format is commonly used to exchange database between relational databases, since the format is well-suited to tabular data. You can import TSV les using mongoimport. TTL Stands for time to live, and represents an expiration time or period for a given piece of information to remain in a cache or other temporary storage system before the system deletes it or ages it out. unique index An index that enforces uniqueness for a particular eld across a single collection. upsert A kind of update that either updates the rst document matched in the provided query selector or, if no document matches, inserts a new document having the elds implied by the query selector and the update operation. virtual memory An applications working memory, typically residing on both disk an in physical RAM. working set The collection of data that MongoDB uses regularly. This data is typically (or preferably) held in RAM.

582

Chapter 33. General Reference

MongoDB Documentation, Release 2.0.6

write concern A setting on writes to MongoDB that allows the user to specify, how the database will handle a write operation before returning. This often determines how many replica set members should propagate a write before returning. See Also: Write Concern for Replica Sets (page 50). write-lock A lock on the database for a given writer. When a process writes to the database, it takes an exclusive write-lock to prevent other processes from writing or reading. writeBacks The process within the sharding system that ensures that writes issued to a shard that isnt responsible for the relevant chunk, get applied to the proper shard. See Also: The genindex may provide useful insight into the reference material in this manual.

33.2. Glossary

583

MongoDB Documentation, Release 2.0.6

584

Chapter 33. General Reference

CHAPTER

THIRTYFOUR

RELEASE NOTES
Always install the latest, stable version of MongoDB. See the following release notes for an account of the changes in major versions. Release notes also include instructions for upgrade. Current stable release (v2.2-series):

34.1 Release Notes for MongoDB 2.2


See the full index of this page for a complete list of changes included in 2.2. Upgrading (page 585) Changes (page 586) Licensing Changes (page 593) Resources (page 593)

34.1.1 Upgrading
MongoDB 2.2 is a standard, incremental production release and works as a drop-in replacement for MongoDB 2.0. Preparation If your MongoDB deployment uses authentication, you must upgrade mongod instances after upgrading all drivers to 2.2 compatible releases and mongos instances to the 2.2 version. Read through all release notes before upgrading, and ensure that no changes will affect your deployment. If you are not running with authentication, 2.2 processes can inter-operate with 2.0 and 1.8 tools and processes in replica sets and shard clusters. As a result, you can safely upgrade the mongod and mongos components of your deployment in any order. Upgrading a Standalone mongod 1. Download the v2.2 binaries from the MongoDB Download Page. 2. Shutdown your mongod instance. Replace the existing binary with the 2.2 mongod binary and restart MongoDB.

585

MongoDB Documentation, Release 2.0.6

Upgrading a Replica Set If your replica set runs with authentication (i.e. with --keyFile (page 486),) you must upgrade all members of the set at once. When using authentication 2.2 members cannot interact with 2.0 members. If your replica set does not use authentication you may upgrade your replica set in any order, to minimize downtime, use the following procedure: 1. Upgrade the secondary members of the set one at a time by shutting down the mongod and replacing the 2.0 binary with the 2.2 binary. 2. Use the rs.stepDown() (page 479) to step down the primary to allow the normal failover (page 34) procedure. rs.stepDown() (page 479) and replSetStepDown (page 441) provide for shorter and more consistent failover procedures than simply shutting down the primary directly. When the primary has stepped down, shut down its instance and upgrade by replacing the mongod binary with the 2.2 binary. Upgrading a Shard Cluster We recommend the following upgrade procedure for a sharded cluster: disable the balancer (page 109), Upgrade all mongos instances rst, in any order. Upgrade all of the mongod cong server instances one at a time using the stand alone (page 585) procedure. When you have fewer than three cong servers active, the cluster metadata will be read-only which will prevent (and abort) all chunk migrations and chunk splits. Upgrade all remaining cluster components, using the upgrade procedure for replica sets (page 586) for each of the shards. re-enable the balancer. If your cluster uses authentication, you must upgrade the components in the order above. If your cluster does not use authentication, you can upgrade the mongos instances, cong servers, and shards in any order. Note: Balancing does not currently work in mixed 2.0.x and 2.2.0 deployments. See SERVER-6902 for more information.

34.1.2 Changes
Major Features
Aggregation Framework

The aggregation framework makes it possible to do aggregation operations without needing to use map-reduce. The aggregate command exposes the aggregation framework, and the db.collection.aggregate() (page 454) helper in the mongo shell provides an interface to these operations. Consider the following resources for background on the aggregation framework and its use: Documentation: Aggregation Framework (page 199) Reference: Aggregation Framework Reference (page 209)

586

Chapter 34. Release Notes

MongoDB Documentation, Release 2.0.6

Examples: Aggregation Framework Examples (page 205)


TTL Collections

TTL collections remove expired data from a collection, using a special index and a background thread that deletes expired documents every minute. These collections are useful as an alternative to capped collections in some cases, such as for data warehousing and caching cases, including: machine generated event data, logs, and session information that needs to persist in a database for only a limited period of time. For more information, see the Expire Data from Collections by Setting TTL (page 234) tutorial.
Concurrency Improvements

MongoDB 2.2 increases the servers capacity for concurrent operations with the following improvements: 1. DB Level Locking 2. Improved Yielding on Page Faults 3. Improved Page Fault Detection on Windows To reect these changes, MongoDB now provides changed and improved reporting for concurrency and use, see locks (page 538) and recordStats (page 550) in server status (page 537) and see current operation output (page 567), db.currentOp() (page 466), mongotop (page 520), and mongostat (page 516).
Improved Data Center Awareness with Tag Aware Sharding

MongoDB 2.2 adds additional support for geographic distribution or other custom partitioning for sharded collections in shard clusters. By using this tag aware sharding, you can automatically ensure that data in a sharded database system is always on specic shards. For example, with tag aware sharding, you can ensure that data is closest to the application servers that use that data most frequently. Shard tagging controls data location, and is complementary but separate from replica set tagging, which controls read preference (page 51) and write concern (page 50). For example, shard tagging can pin all USA data to one or more logical shards, while replica set tagging can control which mongod instances (e.g. production or reporting) the application uses to service requests. See the documentation for the following helpers in the mongo shell that support tagged sharding conguration: sh.addShardTag() (page 480) sh.addTagRange() (page 481) sh.removeShardTag() (page 482) Also, see the wiki page for tag aware sharding.
Fully Supported Read Preference Semantics

All MongoDB clients and drivers now support full read preferences (page 51), including consistent support for a full range of read preference modes (page 52) and tag sets (page 53). This support extends to the mongos and applies identically to single replica sets and to the replica sets for each shard in a shard cluster. Additional read preference support now exists in the mongo shell using the readPref() (page 451) cursor method.

34.1. Release Notes for MongoDB 2.2

587

MongoDB Documentation, Release 2.0.6

Compatibility Changes
Authentication Changes

MongoDB 2.2 provides more reliable and robust support for authentication clients, including drivers and mongos instances. If your mongod instances or cluster runs with authentication: In sharded environments, 2.0 version mongos instances are not compatible with 2.2 shard clusters running with authentication. To prevent rendering your cluster non-operational, you must use the upgrade procedure for shard clusters (page 586) and upgrade all mongos instances before upgrading the shard. For all drivers, use the latest release of your driver and check its release notes. Drivers and mongos instances that connect to mongod instances that do not have authentication enabled are not affected by this issue.
findAndModify Returns Null Value for Upserts

In version 2.2, for upsert operations, findAndModify commands will now return the following output:
{ok: 1.0, value: null}

In the mongo shell, findAndModify operations running as upserts will only output a null value. Previously, in version 2.0 these operations would return an empty document, e.g. { }. See: SERVER-6226 for more information.
mongodump Output can only Restore to 2.2 MongoDB Instances

If you use the mongodump tool from the 2.2 distribution to create a dump of a database, you may only restore that dump to a 2.2 database. See: SERVER-6961 for more information. Behavioral Changes
Restrictions on Database Names for Windows

Database names running on Windows can no longer contain the following characters:
/\. "*<>:|?

The names of the data les include the database name. If you attempt to upgrade a database instance with one or more of these characters, mongod will refuse to start. Change the name of these databases before upgrading. See SERVER-4584 and SERVER-6729 for more information.

588

Chapter 34. Release Notes

MongoDB Documentation, Release 2.0.6

_id Fields and Indexes on Capped Collections

All capped collections now have an _id eld by default, if they exist outside of the local database, and now have indexes on the _id eld. This change only affects capped collections created with 2.2 instances and does not affect existing capped collections. See: SERVER-5516 for more information.
New $elemMatch Projection Operator

The $elemMatch (page 396) operator allows applications to narrow the data returned from queries so that the query operation will only return the rst matching element in an array. See the $elemMatch (projection) (page 396) documentation and the SERVER-2238 and SERVER-828 issues for more information. Windows Specic Changes
Windows XP is Not Supported

As of 2.2, MongoDB does not support Windows XP. Please upgrade to a more recent version of Windows to use the latest releases of MongoDB. See SERVER-5648 for more information.
Service Support for mongos.exe

You may now run mongos.exe instances as a Windows Service. See the mongos.exe Manual (page 530) reference and MongoDB as a Windows Service (page 24) and SERVER-1589 for more information.
Log Rotate Command Support

MongoDB for Windows now supports log rotation by way of the logRotate database command. See SERVER2612 for more information.
New Build Using SlimReadWrite Locks for Windows Concurrency

Labeled 2008+ on the Downloads Page, this build for 64-bit versions of Windows Server 2008 R2 and for Windows 7 or newer, offers increased performance over the standard 64-bit Windows build of MongoDB. See SERVER-3844 for more information. Tool Improvements
Index Denitions Handled by mongodump and mongorestore

When you specify the --collection (page 506) option to mongodump, mongodump will now backup the denitions for all indexes that exist on the source database. When you attempt to restore this backup with mongorestore, the target mongod will rebuild all indexes. See SERVER-808 for more information. mongorestore now includes the --noIndexRestore (page 510) option to provide the preceding behavior. Use --noIndexRestore (page 510) to prevent mongorestore from building previous indexes.

34.1. Release Notes for MongoDB 2.2

589

MongoDB Documentation, Release 2.0.6

mongooplog for Replaying Oplogs

The mongooplog tool makes it possible to pull oplog entries from mongod instance and apply them to another mongod instance. You can use mongooplog to achieve point-in-time backup of a MongoDB data set. See the SERVER-3873 case and the mongooplog Manual (page 522) documentation.
Authentication Support for mongotop and mongostat

mongotop and mongostat now contain support for username/password authentication. See SERVER-3875 and SERVER-3871 for more information regarding this change. Also consider the documentation of the following options for additional information: mongotop --username (page 520) mongotop --password (page 521) mongostat --username (page 517) mongostat --password (page 517)
Write Concern Support for mongoimport and mongorestore

mongoimport now provides an option to halt the import if the operation encounters an error, such as a network interruption, a duplicate key exception, or a write error. The --stopOnError (page 513) option will produce an error rather than silently continue importing data. See SERVER-3937 for more information. In mongorestore, the --w (page 510) option provides support for congurable write concern.
mongodump Support for Reading from Secondaries

You can now run mongodump when connected to a secondary member of a replica set. See SERVER-3854 for more information.
mongoimport Support for full 16MB Documents

Previously, mongoimport would only import documents that were less than 4 megabytes in size. This issue is now corrected, and you may use mongoimport to import documents that are at least 16 megabytes ins size. See SERVER-4593 for more information.
Timestamp() Extended JSON format

MongoDB extended JSON now includes a new Timestamp() type to represent the Timestamp type that MongoDB uses for timestamps in the oplog among other contexts. This permits tools like mongooplog and mongodump to query for specic timestamps. Consider the following mongodump operation:

mongodump --db local --collection oplog.rs --query {"ts":{"$gt":{"$timestamp" : {"t": 1344969612000,

See SERVER-3483 for more information.

590

Chapter 34. Release Notes

MongoDB Documentation, Release 2.0.6

Shell Improvements
Improved Shell User Interface

2.2 includes a number of changes that improve the overall quality and consistency of the user interface for the mongo shell: Full Unicode support. Bash-like line editing features. See SERVER-4312 for more information. Multi-line command support in shell history. See SERVER-3470 for more information. Windows support for the edit command. See SERVER-3998 for more information.
Helper to load Server-Side Functions

The db.loadServerScripts() (page 469) loads the contents of the current databases system.js collection into the current mongo shell session. See SERVER-1651 for more information.
Support for Bulk Inserts

If you pass an array of documents to the insert() (page 460) method, the mongo shell will now perform a bulk insert operation. See SERVER-3819 and SERVER-2395 for more information. Operations
Support for Logging to Syslog

See the SERVER-2957 case and the documentation of the syslog (page 496) run-time option or the mongod --syslog (page 486) and mongos --syslog command line-options.
touch Command

Added the touch command to read the data and/or indexes from a collection into memory. See: SERVER-2023 and touch for more information.
indexCounters No Longer Report Sampled Data

indexCounters (page 543) now report actual counters that reect index use and state. In previous versions, these data were sampled. See SERVER-5784 and indexCounters (page 543) for more information.
Padding Speciable on compact Command

See the documentation of the compact and the SERVER-4018 issue for more information.

34.1. Release Notes for MongoDB 2.2

591

MongoDB Documentation, Release 2.0.6

Added Build Flag to Use System Libraries

The Boost library, version 1.49, is now embeded in the MongoDB code base. If you want to build MongoDB binaries using system Boost libraries, you can pass scons using the --use-system-boost ag, as follows:
scons --use-system-boost

When building MongoDB, you can also pass scons a ag to compile MongoDB using only system libraries rather than the included versions of the libraries. For example:
scons --use-system-all

See the SERVER-3829 and SERVER-5172 issues for more information.


Memory Allocator Changed to TCMalloc

To improve performance, MongoDB 2.2 uses the TCMalloc memory allocator from Google Perftools. For more information about this change see the SERVER-188 and SERVER-4683. For more information about TCMalloc, see the documentation of TCMalloc itself. Replication
Improved Logging for Replica Set Lag

When secondary members of a replica set fall behind in replication, mongod now provides better reporting in the log. This makes it possible to track replication in general and identify what process may produce errors or halt replication. See SERVER-3575 for more information.
Replica Set Members can Sync from Specic Members

The new replSetSyncFrom command and new rs.syncFrom() (page 479) helper in the mongo shell make it possible for you to manually congure from which member of the set a replica will poll oplog entries. Use these commands to override the default selection logic if needed. Always exercise caution with replSetSyncFrom when overriding the default behavior.
Replica Set Members will not Sync from Members Without Indexes Unless buildIndexes: false

To prevent inconsistency between members of replica sets, if the member of a replica set has members[n].buildIndexes (page 561) set to true, other members of the replica set will not sync from this member, unless they also have members[n].buildIndexes (page 561) set to true. See SERVER-4160 for more information.
New Option To Congure Index Pre-Fetching during Replication

By default, when replicating options, secondaries will pre-fetch Indexes (page 177) associated with a query to improve replication throughput in most cases. The replIndexPrefetch (page 501) setting and --replIndexPrefetch (page 490) option allow administrators to disable this feature or allow the mongod to pre-fetch only the index on the _id eld. See SERVER-6718 for more information.

592

Chapter 34. Release Notes

MongoDB Documentation, Release 2.0.6

Map Reduce Improvements In 2.2 Map Reduce received the following improvements: Improved support for sharded MapReduce, and MapReduce will retry jobs following a cong error. Sharding Improvements
Index on Shard Keys Can Now Be a Compound Index

If your shard key uses the prex of an existing index, then you do not need to maintain a separate index for your shard key in addition to your existing index. This index, however, cannot be a multi-key index. See the Shard Key Indexes (page 119) documentation and SERVER-1506 for more information.
Migration Thresholds Modied

The migration thresholds (page 120) have changed in 2.2 to permit more even distribution of chunks in collections that have smaller quantities of data. See the Migration Thresholds (page 120) documentation for more information.

34.1.3 Licensing Changes


Added License notice for Google Perftools (TCMalloc Utility.) See the License Notice and the SERVER-4683 for more information.

34.1.4 Resources
MongoDB Downloads All JIRA Issues resolved in 2.2 All Backward Incompatible Changes All Third Party License Notices See /release-notes/2.2-changes for an overview of all changes in 2.2. Previous stable releases: /release-notes/2.0 /release-notes/1.8

34.1. Release Notes for MongoDB 2.2

593

MongoDB Documentation, Release 2.0.6

594

Chapter 34. Release Notes

INDEX

Symbols
--all mongostat command line option, 517 --auth mongod command line option, 486 --autoresync mongod command line option, 491 --bind_ip <ip address> mongod command line option, 485 mongos command line option, 492 --chunkSize <value> mongos command line option, 493 --collection <collection>, -c <c> mongooplog command line option, 524 --collection <collection>, -c <collection> mongodump command line option, 506 mongoexport command line option, 515 mongoles command line option, 528 mongoimport command line option, 512 mongorestore command line option, 509 --cong <lename>, -f <lename> mongod command line option, 485 mongos command line option, 492 --congdb <cong1>,<cong2><:port>,<cong3> mongos command line option, 493 --congsvr mongod command line option, 491 --cpu mongod command line option, 487 --csv mongoexport command line option, 515 --db <db>, -d <db> mongodump command line option, 506 mongoexport command line option, 515 mongoles command line option, 528 mongoimport command line option, 512 mongooplog command line option, 524 mongorestore command line option, 509 --dbpath <path> mongod command line option, 487 mongodump command line option, 506

mongoexport command line option, 514 mongoles command line option, 527 mongoimport command line option, 512 mongooplog command line option, 523 mongorestore command line option, 509 --diaglog <value> mongod command line option, 487 --directoryperdb mongod command line option, 487 mongodump command line option, 506 mongoexport command line option, 515 mongoles command line option, 527 mongoimport command line option, 512 mongooplog command line option, 523 mongorestore command line option, 509 --discover mongostat command line option, 517 --drop mongoimport command line option, 512 mongorestore command line option, 510 --eval <JAVASCRIPT> mongo command line option, 504 --fastsync mongod command line option, 490 --eldFile <le> mongoexport command line option, 515 mongooplog command line option, 524 --eldFile <lename> mongoimport command line option, 512 --elds <eld1[,eld2]>, -f <eld1[,eld2]> mongoexport command line option, 515 --elds <eld1[,led2]>, -f <eld1[,led2]> mongoimport command line option, 512 --elds [eld1[,eld2]], -f [eld1[,eld2]] mongooplog command line option, 524 --le <lename> mongoimport command line option, 512 --lter <JSON> bsondump command line option, 528 mongorestore command line option, 509 --forceTableScan mongodump command line option, 507 595

MongoDB Documentation, Release 2.0.6

--fork mongod command line option, 486 mongos command line option, 493 --forward <host>:<port> mongosniff command line option, 525 --from <host[:port]> mongooplog command line option, 524 --headerline mongoimport command line option, 513 --help bsondump command line option, 528 mongodump command line option, 505 mongoexport command line option, 514 mongoles command line option, 527 mongoimport command line option, 511 mongooplog command line option, 523 mongorestore command line option, 508 mongosniff command line option, 525 mongostat command line option, 516 mongotop command line option, 520 --help, -h mongo command line option, 504 --help, -h mongod command line option, 485 mongos command line option, 492 --host <HOSTNAME> mongo command line option, 504 --host <hostname><:port> mongodump command line option, 506 mongoexport command line option, 514 mongoles command line option, 527 mongorestore command line option, 508 mongostat command line option, 517 mongotop command line option, 520 --host <hostname><:port>, -h mongoimport command line option, 511 mongooplog command line option, 523 --http mongostat command line option, 517 --ignoreBlanks mongoimport command line option, 512 --install mongod.exe command line option, 529 mongos.exe command line option, 530 --ipv6 mongo command line option, 504 mongod command line option, 487 mongodump command line option, 506 mongoexport command line option, 514 mongoles command line option, 527 mongoimport command line option, 511 mongooplog command line option, 523 mongorestore command line option, 509 mongos command line option, 493

mongostat command line option, 517 mongotop command line option, 520 --journal mongod command line option, 487 mongodump command line option, 506 mongoexport command line option, 515 mongoles command line option, 527 mongoimport command line option, 512 mongooplog command line option, 523 mongorestore command line option, 509 --journalCommitInterval <value> mongod command line option, 487 --journalOptions <arguments> mongod command line option, 487 --jsonArray mongoexport command line option, 515 mongoimport command line option, 513 --jsonp mongod command line option, 487 mongos command line option, 493 --keepIndexVersion mongorestore command line option, 510 --keyFile <le> mongod command line option, 486 mongos command line option, 493 --local <lename>, -l <lename> mongoles command line option, 528 --localThreshold mongos command line option, 494 --locks mongotop command line option, 521 --logappend mongod command line option, 486 mongos command line option, 493 --logpath <path> mongod command line option, 486 mongos command line option, 492 --master mongod command line option, 490 --maxConns <number> mongod command line option, 486 mongos command line option, 492 --noIndexRestore mongorestore command line option, 510 --noMoveParanoia mongod command line option, 491 --noOptionsRestore mongorestore command line option, 510 --noauth mongod command line option, 488 --nodb mongo command line option, 504 --noheaders mongostat command line option, 517

596

Index

MongoDB Documentation, Release 2.0.6

--nohttpinterface mongod command line option, 488 mongos command line option, 494 --nojournal mongod command line option, 488 --noprealloc mongod command line option, 488 --norc mongo command line option, 504 --noscripting mongod command line option, 488 mongos command line option, 494 --notablescan mongod command line option, 488 --nounixsocket mongod command line option, 486 mongos command line option, 493 --nssize <value> mongod command line option, 488 --objcheck bsondump command line option, 528 mongod command line option, 486 mongorestore command line option, 509 mongos command line option, 492 mongosniff command line option, 525 --only <arg> mongod command line option, 491 --oplog mongodump command line option, 507 --oplogLimit <timestamp> mongorestore command line option, 510 --oplogReplay mongorestore command line option, 510 --oplogSize <value> mongod command line option, 490 --oplogns <namespace> mongooplog command line option, 524 --out <le>, -o <le> mongoexport command line option, 515 --out <path>, -o <path> mongodump command line option, 507 --password <password> mongodump command line option, 506 mongoexport command line option, 514 mongoles command line option, 527 mongoimport command line option, 512 mongorestore command line option, 509 mongostat command line option, 517 mongotop command line option, 521 --password <password>, -p <password> mongo command line option, 504 mongooplog command line option, 523 --pidlepath <path> mongod command line option, 486

mongos command line option, 493 --port mongooplog command line option, 523 --port <PORT> mongo command line option, 504 --port <port> mongod command line option, 485 mongodump command line option, 506 mongoexport command line option, 514 mongoles command line option, 527 mongoimport command line option, 511 mongorestore command line option, 509 mongos command line option, 492 mongostat command line option, 517 mongotop command line option, 520 --prole <level> mongod command line option, 488 --query <JSON> mongoexport command line option, 515 --query <json>, -q <json> mongodump command line option, 507 --quiet mongo command line option, 504 mongod command line option, 485 mongos command line option, 492 --quota mongod command line option, 488 --quotaFiles <number> mongod command line option, 488 --reinstall mongod.exe command line option, 529 mongos.exe command line option, 530 --remove mongod.exe command line option, 529 mongos.exe command line option, 530 --repair mongod command line option, 488 mongodump command line option, 507 --repairpath <path> mongod command line option, 489 --replIndexPrefetch mongod command line option, 490 --replSet <setname> mongod command line option, 490 --replace, -r mongoles command line option, 528 --rest mongod command line option, 488 --rowcount <number>, -n <number> mongostat command line option, 517 --seconds <number>, -s <number> mongooplog command line option, 524 --serviceDescription <description> mongod.exe command line option, 530

Index

597

MongoDB Documentation, Release 2.0.6

mongos.exe command line option, 531 mongos command line option, 493 --serviceDisplayName <name> --upgrade mongod.exe command line option, 529 mongod command line option, 489 mongos.exe command line option, 531 mongos command line option, 493 --serviceName <name> --upsert mongod.exe command line option, 529 mongoimport command line option, 513 mongos.exe command line option, 530 --upsertFields <eld1[,eld2]> --servicePassword <password> mongoimport command line option, 513 mongod.exe command line option, 530 --username <USERNAME>, -u <USERNAME> mongos.exe command line option, 531 mongo command line option, 504 --serviceUser <user> --username <username>, -u <username> mongod.exe command line option, 530 mongodump command line option, 506 mongos.exe command line option, 531 mongoexport command line option, 514 --shardsvr mongoles command line option, 527 mongod command line option, 491 mongoimport command line option, 512 --shell mongooplog command line option, 523 mongo command line option, 503 mongorestore command line option, 509 --shutdown mongostat command line option, 517 mongod command line option, 489 mongotop command line option, 520 --slave --verbose mongod command line option, 490 mongo command line option, 504 --slaveOk, -k --verbose, -v mongoexport command line option, 515 bsondump command line option, 528 --slavedelay <value> mongod command line option, 485 mongod command line option, 491 mongodump command line option, 506 --slowms <value> mongoexport command line option, 514 mongod command line option, 489 mongoles command line option, 527 --smallles mongoimport command line option, 511 mongod command line option, 489 mongooplog command line option, 523 --source <NET [interface]>, <FILE [lename]>, <DIAmongorestore command line option, 508 GLOG [lename]> mongos command line option, 492 mongosniff command line option, 525 mongostat command line option, 516 --source <host>:<port> mongotop command line option, 520 mongod command line option, 491 --version --stopOnError bsondump command line option, 528 mongoimport command line option, 513 mongo command line option, 504 --syncdelay <value> mongod command line option, 485 mongod command line option, 489 mongodump command line option, 506 --sysinfo mongoexport command line option, 514 mongod command line option, 489 mongoles command line option, 527 --syslog mongoimport command line option, 511 mongod command line option, 486 mongooplog command line option, 523 --test mongorestore command line option, 508 mongos command line option, 493 mongos command line option, 492 --traceExceptions mongostat command line option, 517 mongod command line option, 490 mongotop command line option, 520 --type <=json|=debug> --w <number of replicas per write> bsondump command line option, 528 mongorestore command line option, 510 --type <MIME>, t <MIME> $addToSet (operator), 373 mongoles command line option, 528 $all (operator), 373 --type <json|csv|tsv> $and (operator), 374 mongoimport command line option, 512 $atomic (operator), 375 --unixSocketPrex <path> $bit (operator), 375 mongod command line option, 486 $box (operator), 375, 395

598

Index

MongoDB Documentation, Release 2.0.6

$center (operator), 376, 395 $centerSphere (operator), 376 $cmd, 574 $comment (operator), 376 $each (operator), 373, 376 $elemMatch (operator), 377 $elemMatch (projection operator), 396 $exists (operator), 377 $explain (operator), 377 $gt (operator), 378 $gte (operator), 378 $hint (operator), 379 $in (operator), 379 $inc (operator), 379 $lt (operator), 380 $lte (operator), 380 $max (operator), 381 $maxDistance (operator), 381 $maxScan (operator), 381 $min (operator), 381 $mod (operator), 382 $ne (operator), 382 $near (operator), 383 $nearSphere (operator), 383 $nin (operator), 383 $nor (operator), 384 $not (operator), 385 $options (operator), 389 $or (operator), 386 $orderby (operator), 387 $polygon (operator), 387, 395 $pop (operator), 387 $pull (operator), 388 $pullAll (operator), 388 $push (operator), 388 $pushAll (operator), 389 $query (operator), 389 $regex (operator), 389 $rename (operator), 390 $returnKey (operator), 390 $set (operator), 390 $showDiskLoc (operator), 390 $size (operator), 391 $slice (projection operator), 397 $snapshot (operator), 391 $type (operator), 391 $uniqueDocs (operator), 393, 396 $unset (operator), 394 $where (operator), 394 $within (operator), 394 _id, 179, 574 _id index, 179 _isSelf (database command), 430 _isWindows (shell method), 474

_migrateClone (database command), 433 _rand (shell method), 475 _recvChunkAbort (database command), 436 _recvChunkCommit (database command), 436 _recvChunkStart (database command), 436 _recvChunkStatus (database command), 437 _skewClockCommand (database command), 444 _srand (shell method), 484 _startMongoProgram (shell method), 484 _testDistLockWithSkew (database command), 445 _testDistLockWithSyncCluster (database command), 445 _transferMods (database command), 447 <timestamp> (shell output), 521

A
accumulator, 574 active (shell output), 569 addShard (database command), 410 admin database, 574 administration sharding, 99 aggregate (database command), 410 aggregation, 574 aggregation framework, 574 applyOps (database command), 411 arbiter, 574 architectures sharding, 114 asserts (status), 548 asserts.msg (status), 548 asserts.regular (status), 548 asserts.rollovers (status), 549 asserts.user (status), 548 asserts.warning (status), 548 auth (setting), 497 authenticate (database command), 412 Automatic Data Volume Distribution, 93 autoresync (setting), 502 availableQueryOptions (database command), 412 avgObjSize (statistic), 552, 553

B
backgroundFlushing (status), 544 backgroundFlushing.average_ms (status), 544 backgroundFlushing.ushes (status), 544 backgroundFlushing.last_nished (status), 545 backgroundFlushing.last_ms (status), 544 backgroundFlushing.total_ms (status), 544 balancer, 574 balancing, 98 internals, 119 migration, 120 operations, 108 bind_ip (setting), 495 599

Index

MongoDB Documentation, Release 2.0.6

box, 574 BSON, 574 BSON types, 575 bsondump command line option --lter <JSON>, 528 --help, 528 --objcheck, 528 --type <=json|=debug>, 528 --verbose, -v, 528 --version, 528 btree, 575 buildInfo (database command), 446 bytesWithHeaders (shell output), 556 bytesWithoutHeaders (shell output), 556

C
CAP Theorem, 575 capped collection, 575 cat (shell method), 448 cd (shell method), 448 checkShardingIndex (database command), 412 checksum, 575 chunk, 575 chunkSize (setting), 503 circle, 575 clean (database command), 412 clearRawMongoProgramOutput (shell method), 448 client, 575 client (shell output), 569 clone (database command), 412 cloneCollection (database command), 413 closeAllDatabases (database command), 413 cluster, 575 collection, 575 collections (statistic), 551 collMod (database command), 413 collStats (database command), 414 compact (database command), 414 compound index, 575 cong database, 575 cong servers, 96 operations, 110 congdb (setting), 502 congsvr (setting), 502 conguration replica set members, 33 connectionId (shell output), 569 connections (status), 542 connections.available (status), 543 connections.current (status), 542 connPoolStats (database command), 416 connPoolSync (database command), 416 consistency replica set, 34 600

rollbacks, 35 control script, 576 convertToCapped (database command), 416 copydb (database command), 417 copydbgetnonce (database command), 418 copyDbpath (shell method), 449 count (database command), 418 count (statistic), 553 cpu (setting), 497 create (database command), 418 createdByType (statistic), 558 createdByType.master (statistic), 558 createdByType.set (statistic), 558 createdByType.sync (statistic), 558 CRUD, 576 CSV, 576 cursor, 576 cursor.count (shell method), 449 cursor.explain (shell method), 449 cursor.forEach (shell method), 449 cursor.hasNext (shell method), 450 cursor.hint (shell method), 450 cursor.limit (shell method), 450 cursor.map (shell method), 450 cursor.next (shell method), 451 cursor.readPref (shell method), 451 cursor.showDiskLoc (shell method), 451 cursor.size (shell method), 451 cursor.skip (shell method), 452 cursor.snapshot (shell method), 452 cursor.sort (shell method), 452 cursorInfo (database command), 419 cursors (status), 545 cursors.clientCursors_size (status), 545 cursors.timedOut (status), 545 cursors.totalOpen (status), 545

D
daemon, 576 data-center awareness, 576 database, 576 database command, 576 database proler, 576 database references, 225 dataSize (database command), 419 datasize (shell output), 555 dataSize (statistic), 552 Date (shell method), 448 db (shell output), 521 db (statistic), 551 db.addUser (shell method), 453 db.auth (shell method), 453 db.cloneDatabase (shell method), 453 db.collection.aggregate (shell method), 454 Index

MongoDB Documentation, Release 2.0.6

db.collection.dataSize (shell method), 454 db.collection.distinct (shell method), 454 db.collection.drop (shell method), 455 db.collection.dropIndex (shell method), 455 db.collection.dropIndexes (shell method), 455 db.collection.ensureIndex (shell method), 455 db.collection.nd (shell method), 457 db.collection.ndAndModify (shell method), 457 db.collection.ndOne (shell method), 458 db.collection.getIndexes (shell method), 458 db.collection.group (shell method), 459 db.collection.insert (shell method), 460 db.collection.mapReduce (shell method), 460 db.collection.reIndex (shell method), 462 db.collection.remove (shell method), 462 db.collection.renameCollection (shell method), 462 db.collection.save (shell method), 463 db.collection.stats (shell method), 463 db.collection.storageSize (shell method), 463 db.collection.totalIndexSize (shell method), 464 db.collection.update (shell method), 464 db.collection.validate (shell method), 464 db.commandHelp (shell method), 465 db.copyDatabase (shell method), 465 db.createCollection (shell method), 465 db.currentOp (shell method), 466 db.dropDatabase (shell method), 466 db.eval (shell method), 466 db.fsyncLock (shell method), 467 db.fsyncUnlock (shell method), 467 db.getCollection (shell method), 467 db.getCollectionNames (shell method), 467 db.getLastError (shell method), 467 db.getLastErrorObj (shell method), 468 db.getMongo (shell method), 468 db.getName (shell method), 468 db.getPrevError (shell method), 468 db.getProlingLevel (shell method), 468 db.getProlingStatus (shell method), 468 db.getReplicationInfo (shell method), 469 db.getSiblingDB (shell method), 469 db.isMaster (shell method), 469 db.killOP (shell method), 469 db.listCommands (shell method), 469 db.loadServerScripts (shell method), 469 db.logout (shell method), 470 db.printCollectionStats (shell method), 470 db.printReplicationInfo (shell method), 470 db.printShardingStatus (shell method), 470 db.printSlaveReplicationInfo (shell method), 470 db.removeUser (shell method), 470 db.repairDatabase (shell method), 471 db.resetError (shell method), 471 db.runCommand (shell method), 471

db.serverStatus (shell method), 471 db.setProlingLevel (shell method), 471 db.shutdownServer (shell method), 472 db.stats (shell method), 472 db.version (shell method), 473 dbHash (database command), 419 dbpath, 576 dbpath (setting), 497 DBRef, 225 dbStats (database command), 419 delayed member, 576 deletedCount (shell output), 556 deletedSize (shell output), 556 desc (shell output), 569 diaglog (setting), 497 diagLogging (database command), 419 directoryperdb (setting), 497 distinct (database command), 420 document, 576 space allocation, 413 draining, 576 driver, 577 driverOIDTest (database command), 420 drop (database command), 420 dropDatabase (database command), 420 dropIndexes (database command), 421 dur (status), 549 dur.commits (status), 549 dur.commitsInWriteLock (status), 550 dur.compression (status), 550 dur.earlyCommits (status), 550 dur.journaledMB (status), 549 dur.timeMS (status), 550 dur.timeMS.dt (status), 550 dur.timeMS.prepLogBuffer (status), 550 dur.timeMS.remapPrivateView (status), 550 dur.timeMS.writeToDataFiles (status), 550 dur.timeMS.writeToJournal (status), 550 dur.writeToDataFilesMB (status), 549

E
election, 577 emptycapped (database command), 421 enableSharding (database command), 421 errmsg (status), 567 errors (shell output), 556 eval (database command), 421 eventual consistency, 577 expression, 577 extentCount (shell output), 554 extents (shell output), 554 extents.rstRecord (shell output), 555 extents.lastRecord (shell output), 555 extents.loc (shell output), 554 601

Index

MongoDB Documentation, Release 2.0.6

extents.nsdiag (shell output), 555 extents.size (shell output), 555 extents.xnext (shell output), 555 extents.xprev (shell output), 555 extra_info (status), 543 extra_info.heap_usage_bytes (status), 543 extra_info.note (status), 543 extra_info.page_faults (status), 543

F
failover, 577 elections, 34 replica set, 46 fastsync (setting), 501 features (database command), 422 eld, 577 lemd5 (database command), 422 leSize (statistic), 552 ndAndModify (database command), 422 rewall, 577 rstExtent (shell output), 554 rstExtentDetails (shell output), 555 rstExtentDetails.rstRecord (shell output), 555 rstExtentDetails.lastRecord (shell output), 556 rstExtentDetails.loc (shell output), 555 rstExtentDetails.nsdiag (shell output), 555 rstExtentDetails.size (shell output), 555 rstExtentDetails.xnext (shell output), 555 rstExtentDetails.xprev (shell output), 555 ags (statistic), 553 ushRouterCong (database command), 423 forceerror (database command), 423 fork (setting), 497 fsync, 577 fsync (database command), 423 fundamentals sharding, 93 fuzzFile (shell method), 473

getMemInfo (shell method), 473 getnonce (database command), 428 getoptime (database command), 428 getParameter (database command), 427 getPrevError (database command), 427 getShardDistribution (shell method), 473 getShardMap (database command), 428 getShardVersion (database command), 428 getShardVersion (shell method), 473 globalLock (status), 540 globalLock.activeClients (status), 541 globalLock.activeClients.readers (status), 541 globalLock.activeClients.total (status), 541 globalLock.activeClients.writers (status), 541 globalLock.currentQueue (status), 541 globalLock.currentQueue.readers (status), 541 globalLock.currentQueue.total (status), 541 globalLock.currentQueue.writers (status), 541 globalLock.lockTime (status), 540 globalLock.ratio (status), 541 globalLock.totalTime (status), 540 godinsert (database command), 428 GridFS, 577 group (database command), 428

H
handshake (database command), 429 haystack index, 577 hidden member, 577 Horizontal Capacity, 94 host (status), 537 hostname (shell method), 474 hosts (statistic), 557 hosts.[host].available (statistic), 557 hosts.[host].created (statistic), 557

I
idempotent, 578 index, 578 _id, 179 sparse, 182 index types, 179 primary key, 179 indexCounters (status), 543 indexCounters.btree (status), 543 indexCounters.btree.accesses (status), 543 indexCounters.btree.hits (status), 544 indexCounters.btree.misses (status), 544 indexCounters.btree.missRatio (status), 544 indexCounters.btree.resets (status), 544 indexes (statistic), 552 indexSize (statistic), 552 indexSizes (statistic), 554 internals Index

G
Geohash, 577 geoNear (database command), 425 geoSearch (database command), 425 geospatial, 577 geoWalk (database command), 426 getCmdLineOpts (database command), 426 getDB (shell method), 473 getHostName (shell method), 473 getIndexes.key (shell output), 458 getIndexes.name (shell output), 458 getIndexes.ns (shell output), 458 getIndexes.v (shell output), 458 getLastError (database command), 426 getLog (database command), 427 602

MongoDB Documentation, Release 2.0.6

sharding, 116 invalidObjects (shell output), 556 IPv6, 578 ipv6 (setting), 498 isdbGrid (database command), 430 isMaster (database command), 429 isMaster.hosts (shell output), 430 isMaster.ismaster (shell output), 429 isMaster.localTime (shell output), 430 isMaster.maxBsonObjectSize (shell output), 430 isMaster.me (shell output), 430 isMaster.primary (shell output), 430 isMaster.secondary (shell output), 429 isMaster.setname (shell output), 429 ISODate, 578

J
JavaScript, 578 journal, 578 journal (setting), 498 journalCommitInterval (setting), 498 journalLatencyTest (database command), 430 JSON, 578 JSON document, 578 JSONP, 578 jsonp (setting), 498

K
keyFile (setting), 496 keysPerIndex (shell output), 556

locks.^<database> (shell output), 570 locks.^local (shell output), 569 locks.<database> (status), 540 locks.<database>.timeAcquiringMicros (status), 540 locks.<database>.timeAcquiringMicros.r (status), 540 locks.<database>.timeAcquiringMicros.w (status), 540 locks.<database>.timeLockedMicros (status), 540 locks.<database>.timeLockedMicros.r (status), 540 locks.<database>.timeLockedMicros.w (status), 540 locks.admin (status), 539 locks.admin.timeAcquiringMicros (status), 539 locks.admin.timeAcquiringMicros.r (status), 539 locks.admin.timeAcquiringMicros.w (status), 539 locks.admin.timeLockedMicros (status), 539 locks.admin.timeLockedMicros.r (status), 539 locks.admin.timeLockedMicros.w (status), 539 locks.local (status), 539 locks.local.timeAcquiringMicros (status), 539 locks.local.timeAcquiringMicros.r (status), 540 locks.local.timeAcquiringMicros.w (status), 540 locks.local.timeLockedMicros (status), 539 locks.local.timeLockedMicros.r (status), 539 locks.local.timeLockedMicros.w (status), 539 lockStats (shell output), 570 logappend (setting), 496 logout (database command), 431 logpath (setting), 496 logRotate (database command), 431 logSizeMB (status), 567 ls (shell method), 474 LVM, 578

L
lastExtent (shell output), 554 lastExtentSize (shell output), 555 lastExtentSize (statistic), 553 listCommands (database command), 431 listDatabases (database command), 431 listFiles (shell method), 474 listShards (database command), 431 load (shell method), 474 localThreshold (setting), 503 localTime (status), 538 locks (shell output), 569 locks (status), 538 locks.. (status), 538 locks...timeAcquiringMicros (status), 539 locks...timeAcquiringMicros.R (status), 539 locks...timeAcquiringMicros.W (status), 539 locks...timeLockedMicros (status), 538 locks...timeLockedMicros.R (status), 538 locks...timeLockedMicros.r (status), 539 locks...timeLockedMicros.W (status), 538 locks...timeLockedMicros.w (status), 539 locks.^ (shell output), 569 Index

M
map-reduce, 578 mapReduce (database command), 432 mapreduce.shardednish (database command), 433 master, 578 master (setting), 501 maxConns (setting), 495 md5, 578 md5sumFile (shell method), 475 medianKey (database command), 433 mem (status), 542 mem.bits (status), 542 mem.mapped (status), 542 mem.resident (status), 542 mem.supported (status), 542 mem.virtual (status), 542 members.errmsg (status), 560 members.health (status), 560 members.lastHeartbeat (status), 560 members.name (status), 559 members.optime (status), 560 members.optime.i (status), 560 603

MongoDB Documentation, Release 2.0.6

members.optime.t (status), 560 --keyFile <le>, 486 members.optimeDate (status), 560 --logappend, 486 members.pingMS (status), 560 --logpath <path>, 486 members.self (status), 559 --master, 490 members.state (status), 560 --maxConns <number>, 486 members.stateStr (status), 560 --noMoveParanoia, 491 members.uptime (status), 560 --noauth, 488 members[n]._id (shell output), 561 --nohttpinterface, 488 members[n].arbiterOnly (shell output), 561 --nojournal, 488 members[n].buildIndexes (shell output), 561 --noprealloc, 488 members[n].hidden (shell output), 562 --noscripting, 488 members[n].host (shell output), 561 --notablescan, 488 members[n].priority (shell output), 562 --nounixsocket, 486 members[n].slaveDelay (shell output), 562 --nssize <value>, 488 members[n].tags (shell output), 562 --objcheck, 486 members[n].votes (shell output), 563 --only <arg>, 491 MIME, 578 --oplogSize <value>, 490 mkdir (shell method), 475 --pidlepath <path>, 486 mongo, 579 --port <port>, 485 mongo command line option --prole <level>, 488 --eval <JAVASCRIPT>, 504 --quiet, 485 --help, -h, 504 --quota, 488 --host <HOSTNAME>, 504 --quotaFiles <number>, 488 --ipv6, 504 --repair, 488 --nodb, 504 --repairpath <path>, 489 --norc, 504 --replIndexPrefetch, 490 --password <password>, -p <password>, 504 --replSet <setname>, 490 --port <PORT>, 504 --rest, 488 --quiet, 504 --shardsvr, 491 --shell, 503 --shutdown, 489 --username <USERNAME>, -u <USERNAME>, --slave, 490 504 --slavedelay <value>, 491 --verbose, 504 --slowms <value>, 489 --version, 504 --smallles, 489 mongo.setSlaveOk (shell method), 475 --source <host>:<port>, 491 mongod, 579 --syncdelay <value>, 489 mongod command line option --sysinfo, 489 --auth, 486 --syslog, 486 --autoresync, 491 --traceExceptions, 490 --bind_ip <ip address>, 485 --unixSocketPrex <path>, 486 --cong <lename>, -f <lename>, 485 --upgrade, 489 --congsvr, 491 --verbose, -v, 485 --cpu, 487 --version, 485 --dbpath <path>, 487 mongod.exe command line option --diaglog <value>, 487 --install, 529 --directoryperdb, 487 --reinstall, 529 --fastsync, 490 --remove, 529 --fork, 486 --serviceDescription <description>, 530 --help, -h, 485 --serviceDisplayName <name>, 529 --ipv6, 487 --serviceName <name>, 529 --journal, 487 --servicePassword <password>, 530 --journalCommitInterval <value>, 487 --serviceUser <user>, 530 --journalOptions <arguments>, 487 MongoDB, 579 --jsonp, 487 mongodump command line option

604

Index

MongoDB Documentation, Release 2.0.6

--collection <collection>, -c <collection>, 506 --db <db>, -d <db>, 506 --dbpath <path>, 506 --directoryperdb, 506 --forceTableScan, 507 --help, 505 --host <hostname><:port>, 506 --ipv6, 506 --journal, 506 --oplog, 507 --out <path>, -o <path>, 507 --password <password>, 506 --port <port>, 506 --query <json>, -q <json>, 507 --repair, 507 --username <username>, -u <username>, 506 --verbose, -v, 506 --version, 506 mongoexport command line option --collection <collection>, -c <collection>, 515 --csv, 515 --db <db>, -d <db>, 515 --dbpath <path>, 514 --directoryperdb, 515 --eldFile <le>, 515 --elds <eld1[,eld2]>, -f <eld1[,eld2]>, 515 --help, 514 --host <hostname><:port>, 514 --ipv6, 514 --journal, 515 --jsonArray, 515 --out <le>, -o <le>, 515 --password <password>, 514 --port <port>, 514 --query <JSON>, 515 --slaveOk, -k, 515 --username <username>, -u <username>, 514 --verbose, -v, 514 --version, 514 mongoles command line option --collection <collection>, -c <collection>, 528 --db <db>, -d <db>, 528 --dbpath <path>, 527 --directoryperdb, 527 --help, 527 --host <hostname><:port>, 527 --ipv6, 527 --journal, 527 --local <lename>, -l <lename>, 528 --password <password>, 527 --port <port>, 527 --replace, -r, 528 --type <MIME>, t <MIME>, 528 --username <username>, -u <username>, 527

--verbose, -v, 527 --version, 527 mongoimport command line option --collection <collection>, -c <collection>, 512 --db <db>, -d <db>, 512 --dbpath <path>, 512 --directoryperdb, 512 --drop, 512 --eldFile <lename>, 512 --elds <eld1[,led2]>, -f <eld1[,led2]>, 512 --le <lename>, 512 --headerline, 513 --help, 511 --host <hostname><:port>, -h, 511 --ignoreBlanks, 512 --ipv6, 511 --journal, 512 --jsonArray, 513 --password <password>, 512 --port <port>, 511 --stopOnError, 513 --type <json|csv|tsv>, 512 --upsert, 513 --upsertFields <eld1[,eld2]>, 513 --username <username>, -u <username>, 512 --verbose, -v, 511 --version, 511 mongooplog command line option --collection <collection>, -c <c>, 524 --db <db>, -d <db>, 524 --dbpath <path>, 523 --directoryperdb, 523 --eldFile <le>, 524 --elds [eld1[,eld2]], -f [eld1[,eld2]], 524 --from <host[:port]>, 524 --help, 523 --host <hostname><:port>, -h, 523 --ipv6, 523 --journal, 523 --oplogns <namespace>, 524 --password <password>, -p <password>, 523 --port, 523 --seconds <number>, -s <number>, 524 --username <username>, -u <username>, 523 --verbose, -v, 523 --version, 523 mongorestore command line option --collection <collection>, -c <collection>, 509 --db <db>, -d <db>, 509 --dbpath <path>, 509 --directoryperdb, 509 --drop, 510 --lter <JSON>, 509 --help, 508

Index

605

MongoDB Documentation, Release 2.0.6

--host <hostname><:port>, 508 --ipv6, 509 --journal, 509 --keepIndexVersion, 510 --noIndexRestore, 510 --noOptionsRestore, 510 --objcheck, 509 --oplogLimit <timestamp>, 510 --oplogReplay, 510 --password <password>, 509 --port <port>, 509 --username <username>, -u <username>, 509 --verbose, -v, 508 --version, 508 --w <number of replicas per write>, 510 mongos, 97, 579 mongos command line option --bind_ip <ip address>, 492 --chunkSize <value>, 493 --cong <lename>, -f <lename>, 492 --congdb <cong1>,<cong2><:port>,<cong3>, 493 --fork, 493 --help, -h, 492 --ipv6, 493 --jsonp, 493 --keyFile <le>, 493 --localThreshold, 494 --logappend, 493 --logpath <path>, 492 --maxConns <number>, 492 --nohttpinterface, 494 --noscripting, 494 --nounixsocket, 493 --objcheck, 492 --pidlepath <path>, 493 --port <port>, 492 --quiet, 492 --test, 493 --unixSocketPrex <path>, 493 --upgrade, 493 --verbose, -v, 492 --version, 492 mongos.exe command line option --install, 530 --reinstall, 530 --remove, 530 --serviceDescription <description>, 531 --serviceDisplayName <name>, 531 --serviceName <name>, 530 --servicePassword <password>, 531 --serviceUser <user>, 531 mongosniff command line option --forward <host>:<port>, 525

--help, 525 --objcheck, 525 --source <NET [interface]>, <FILE [lename]>, <DIAGLOG [lename]>, 525 mongostat command line option --all, 517 --discover, 517 --help, 516 --host <hostname><:port>, 517 --http, 517 --ipv6, 517 --noheaders, 517 --password <password>, 517 --port <port>, 517 --rowcount <number>, -n <number>, 517 --username <username>, -u <username>, 517 --verbose, -v, 516 --version, 517 mongotop command line option --help, 520 --host <hostname><:port>, 520 --ipv6, 520 --locks, 521 --password <password>, 521 --port <port>, 520 --username <username>, -u <username>, 520 --verbose, -v, 520 --version, 520 moveChunk (database command), 434 movePrimary (database command), 434 msg (shell output), 570 multi-master replication, 579

N
namespace, 579 natural order, 579 nearest (read preference mode), 53 netstat (database command), 435 network (status), 545 network.bytesIn (status), 545 network.bytesOut (status), 545 network.numRequests (status), 545 nIndexes (shell output), 556 nindexes (statistic), 553 noauth (setting), 498 nohttpinterface (setting), 498 nojournal (setting), 498 noMoveParanoia (setting), 502 noprealloc (setting), 498 noscripting (setting), 498 notablescan (setting), 499 nounixsocket (setting), 496 now (status), 567 nrecords (shell output), 555 Index

606

MongoDB Documentation, Release 2.0.6

ns (shell output), 521, 554, 569 ns (statistic), 553 nssize (setting), 499 nsSizeMB (statistic), 552 numAScopedConnection (statistic), 558 numDBClientConnection (statistic), 558 numExtents (statistic), 552, 553 numYields (shell output), 570

O
objcheck (setting), 496 ObjectId, 579 objects (statistic), 552 objectsFound (shell output), 556 ok (shell output), 556 only (setting), 502 op (shell output), 569 opcounters (status), 548 opcounters.command (status), 548 opcounters.delete (status), 548 opcounters.getmore (status), 548 opcounters.insert (status), 548 opcounters.query (status), 548 opcounters.update (status), 548 opcountersRepl (status), 546 opcountersRepl.command (status), 547 opcountersRepl.delete (status), 547 opcountersRepl.getmore (status), 547 opcountersRepl.insert (status), 547 opcountersRepl.query (status), 547 opcountersRepl.update (status), 547 operator, 579 opid (shell output), 569 oplog, 579 oplogMainRowCount (status), 567 oplogSize (setting), 501

primary, 580 primary (read preference mode), 52 primary key, 580 primaryPreferred (read preference mode), 52 printShardingStatus (database command), 435 priority, 580 process (status), 538 prole (database command), 435 prole (setting), 499 projection, 580 pwd (shell method), 475

Q
query, 580 query (shell output), 569 quiet (setting), 495 quit (shell method), 475 quota (setting), 499 quotaFiles (setting), 499

R
Range-based Data Partitioning, 93 rawMongoProgramOutput (shell method), 476 RDBMS, 580 read (shell output), 521 read preference, 51, 580 background, 51 behavior, 54 member selection, 55 modes, 52 mongos, 56 nearest, 55 ping time, 55 semantics, 52 sharding, 56 tag sets, 53, 565 read-lock, 581 recordStats (status), 550 recordStats.<database>.accessNotInMemory (status), 551 recordStats.<database>.pageFaultExceptionsThrown (status), 551 recordStats.accessesNotInMemory (status), 550 recordStats.admin.accessNotInMemory (status), 551 recordStats.admin.pageFaultExceptionsThrown (status), 551 recordStats.local.accessNotInMemory (status), 550 recordStats.local.pageFaultExceptionsThrown (status), 551 recordStats.pageFaultExceptionsThrown (status), 550 recovering, 581 references, 225 reIndex (database command), 436 removeFile (shell method), 476 removeShard (database command), 437 607

P
padding, 579 padding (shell output), 555 padding factor, 579 paddingFactor (statistic), 553 page fault, 579 partition, 580 pcap, 580 PID, 580 pidlepath (setting), 496 ping (database command), 435 pipe, 580 pipeline, 580 polygon, 580 port (setting), 495 powerOf2Sizes, 580 pre-splitting, 580 Index

MongoDB Documentation, Release 2.0.6

renameCollection (database command), 437 repair (setting), 499 repairDatabase (database command), 438 repairpath (setting), 499 repl (status), 546 repl.hosts (status), 546 repl.ismaster (status), 546 repl.secondary (status), 546 repl.setName (status), 546 replica pairs, 581 replica set, 581 consistency, 34 elections, 34, 58 failover, 46, 58 oplog, 36, 57 priority, 34 rollbacks, 35 security, 37 tag sets, 565 replica set members arbiters, 41 delayed, 40 hidden, 39 non-voting, 41 secondary only, 39 replicaSets (statistic), 557 replicaSets.shard (statistic), 557 replicaSets.[shard].host (statistic), 557 replicaSets.[shard].host[n].addr (statistic), 557 replicaSets.[shard].host[n].hidden (statistic), 557 replicaSets.[shard].host[n].ismaster (statistic), 557 replicaSets.[shard].host[n].ok (statistic), 557 replicaSets.[shard].host[n].pingTimeMillis (statistic), 558 replicaSets.[shard].host[n].secondary (statistic), 557 replicaSets.[shard].host[n].tags (statistic), 558 replicaSets.[shard].master (statistic), 558 replicaSets.[shard].nextSlave (statistic), 558 replication, 581 replication lag, 581 replIndexPrefetch (setting), 501 replNetworkQueue (status), 547 replNetworkQueue.numBytes (status), 547 replNetworkQueue.numElems (status), 547 replNetworkQueue.waitTimeMs (status), 547 replSet (setting), 501 replSetElect (database command), 439 replSetFreeze (database command), 439 replSetFresh (database command), 439 replSetGetRBID (database command), 439 replSetGetStatus (database command), 439 replSetHeartbeat (database command), 440 replSetInitiate (database command), 440 replSetRecong (database command), 440 replSetStepDown (database command), 441

replSetSyncFrom (database command), 441 replSetTest (database command), 442 resetDbpath (shell method), 476 resetError (database command), 442 resident memory, 581 REST, 581 rest (setting), 499 resync (database command), 442 rollback, 581 rollbacks, 35 rs.add (shell method), 476 rs.addArb (shell method), 476 rs.conf (shell method), 477 rs.conf._id (shell output), 561 rs.conf.members (shell output), 561 rs.freeze (shell method), 477 rs.help (shell method), 477 rs.initiate (shell method), 477 rs.recong (shell method), 478 rs.remove (shell method), 478 rs.slaveOk (shell method), 479 rs.status (shell method), 479 rs.status.date (status), 559 rs.status.members (status), 559 rs.status.myState (status), 559 rs.status.set (status), 559 rs.status.syncingTo (status), 559 rs.stepDown (shell method), 479 rs.syncFrom (shell method), 479 run (shell method), 480 runMongoProgram (shell method), 480 runProgram (shell method), 480

S
secondary, 581 secondary (read preference mode), 53 secondary index, 581 secondaryPreferred (read preference mode), 53 secs_running (shell output), 569 security replica set, 37 serverStatus (database command), 442 set name, 581 setParameter (database command), 443 setShardVersion (database command), 443 settings (shell output), 563 settings.getLastErrorDefaults (shell output), 563 settings.getLastErrorModes (shell output), 563 sh.addShard (shell method), 480 sh.addShardTag (shell method), 480 sh.addTagRange (shell method), 481 sh.enableSharding (shell method), 481 sh.getBalancerState (shell method), 482 sh.help (shell method), 481 Index

608

MongoDB Documentation, Release 2.0.6

sh.isBalancerRunning (shell method), 481 sh.moveChunk (shell method), 482 sh.removeShardTag (shell method), 482 sh.setBalancerState (shell method), 482 sh.shardCollection (shell method), 483 sh.splitAt (shell method), 483 sh.splitFind (shell method), 483 sh.status (shell method), 484 shard, 581 shard cluster, 581 shard key, 95, 582 cardinality, 116 internals, 116 query isolation, 117 write scaling, 117 shardCollection (database command), 443 sharding, 582 architecture, 114 chunk size, 120 cong servers, 96 localhost, 95 requirements, 94 security, 98 shard key, 95 shard key indexes, 119 troubleshooting, 112 shardingState (database command), 444 shardsvr (setting), 502 shell helper, 582 shutdown (database command), 444 single-master replication, 582 size (statistic), 553 slave, 582 slave (setting), 501 slavedelay (setting), 502 slaveOk, 51 sleep (database command), 445 slowms (setting), 500 smallles (setting), 500 source (setting), 502 split, 582 split (database command), 445 splitChunk (database command), 445 SQL, 582 SSD, 582 stopMongod (shell method), 484 stopMongoProgram (shell method), 484 stopMongoProgramByPid (shell method), 484 storageSize (statistic), 552, 553 strict consistency, 582 syncdelay (setting), 500 sysinfo (setting), 500 syslog, 582 syslog (setting), 496

systemFlags (statistic), 553

T
tag, 582 tag sets, 53 conguration, 565 test (setting), 503 tFirst (status), 567 threadId (shell output), 569 timeAcquiringMicros.R (shell output), 570 timeAcquiringMicros.r (shell output), 570 timeAcquiringMicros.W (shell output), 570 timeAcquiringMicros.w (shell output), 571 timeAquiringMicros (shell output), 570 timeDiff (status), 567 timeDiffHours (status), 567 timeLockedMicros (shell output), 570 timeLockedMicros.R (shell output), 570 timeLockedMicros.r (shell output), 570 timeLockedMicros.W (shell output), 570 timeLockedMicros.w (shell output), 570 tLast (status), 567 top (database command), 446 total (shell output), 521 totalAvailable (statistic), 558 totalCreated (statistic), 558 totalIndexSize (statistic), 554 touch (database command), 446 traceExceptions (setting), 501 Transparent Query Routing, 94 troubleshooting sharding, 112 TSV, 582 TTL, 582 tutorials administration, 164 replica sets, 60 sharding, 121

U
unique index, 582 unixSocketPrex (setting), 497 unsetSharding (database command), 447 upgrade (setting), 500 upsert, 582 uptime (status), 538 uptimeEstimate (status), 538 usedMB (status), 567 usePowerOf2Sizes, 413 usePowerOf2Sizes (collection ag), 413 userFlags (statistic), 553

V
v (setting), 495 609

Index

MongoDB Documentation, Release 2.0.6

valid (shell output), 556 validate (database command), 447 verbose (setting), 495 version (status), 537 virtual memory, 582 vv (setting), 495 vvv (setting), 495 vvvv (setting), 495 vvvvv (setting), 495

W
waitingForLock (shell output), 570 waitMongoProgramOnPort (shell method), 484 waitProgram (shell method), 484 whatsmyuri (database command), 447 working set, 582 write (shell output), 521 write concern, 583 write-lock, 583 writebacklisten (database command), 448 writeBacks, 583 writeBacksQueued (database command), 448 writeBacksQueued (status), 549

610

Index

You might also like