KEMBAR78
Python and MongoDB | PDF
MongoDB + Python
Norberto Leite
Technical Evangelist
norberto@mongodb.com
Agenda
Introduction to MongoDB
pymongo
CRUD
Aggregation
GridFS
Indexes
ODMs
Ola, I'm Norberto
Norberto Leite
Technical Evangelist
!
Madrid, Spain
@nleite
norberto@mongodb.com
http://www.mongodb.com/norberto
MongoDB
MongoDB
GENERAL PURPOSE DOCUMENT DATABASE OPEN-SOURCE
Fully Featured
MongoDB Features
JSON Document Model
with Dynamic Schemas
Auto-Sharding for
Horizontal Scalability
Text Search
Aggregation Framework
and MapReduce
Full, Flexible Index Support
and Rich Queries
Built-In Replication
for High Availability
Advanced Security
Large Media Storage
with GridFS
MongoDB Inc.
400+ employees 2,000+ customers
Over $311 million in funding13 offices around the world
THE LARGEST ECOSYSTEM
9,000,000+
MongoDB Downloads
250,000+
Online Education Registrants
35,000+
MongoDB User Group Members
35,000+
MongoDB Management Service (MMS) Users
750+
Technology and Services Partners
2,000+
Customers Across All Industries
pymongo
pymongo
• MongoDB Python official driver
• Rockstart developer team
• Jesse Jiryu Davis, Bernie Hackett
• One of oldest and better maintained drivers
• Python and MongoDB are a natural fit
• BSON is very similar to dictionaries
• (everyone likes dictionaries)
• http://api.mongodb.org/python/current/
• https://github.com/mongodb/mongo-python-driver
pymongo 3.0
!
• Server discovery spec
• Monitoring spec
• Faster client startup when connecting to Replica Set
• Faster failover
• More robust replica set connections
• API clean up
Connecting
Connecting
#!/bin/python
from pymongo import MongoClient
!
mc = MongoClient()
client	
  instance
Connecting
#!/bin/python
from pymongo import MongoClient
!
uri = 'mongodb://127.0.0.1'
mc = MongoClient(uri)
Connecting
#!/bin/python
from pymongo import MongoClient
!
uri = 'mongodb://127.0.0.1'
mc = MongoClient(host=uri, max_pool_size=10)
Connecting to Replica Set
#!/bin/python
from pymongo import MongoClient
!
uri = ‘mongodb://127.0.0.1?replicaSet=MYREPLICA'
mc = MongoClient(uri)
Connecting to Replica Set
#!/bin/python
from pymongo import MongoClient
!
uri = ‘mongodb://127.0.0.1'
mc = MongoClient(host=uri, replicaSet='MYREPLICA')
Database Instance
#!/bin/python
from pymongo import MongoClient
mc = MongoClient()
!
db = mc['madrid_pug']
!
#or
!
db = mc.madrid_pug
database	
  instance
Collection Instance
#!/bin/python
from pymongo import MongoClient
mc = MongoClient()
!
coll = mc[‘madrid_pug’]['testcollection']
!
#or
!
coll = mc.madrid_pug.testcollection
collection	
  instance
CRUD
http://www.ferdychristant.com/blog//resources/Web/$FILE/crud.jpg
Operations
• Insert
• Remove
• Update
• Query
• Aggregate
• Create Indexes
• …
CRUD
• Insert
• Remove
• Update
• Query
• Aggregate
• Create Indexes
• …
Insert
#!/bin/python
from pymongo import MongoClient
mc = MongoClient()
!
coll = mc['madrid_pug']['testcollection']
!
!
coll.insert( {'field_one': 'some value'})
Find
#!/bin/python
from pymongo import MongoClient
mc = MongoClient()
!
coll = mc['madrid_pug']['testcollection']
!
!
cur = coll.find_one( {'field_one': 'some value'})
!
for d in cur:
print d
Update
#!/bin/python
from pymongo import MongoClient
mc = MongoClient()
!
coll = mc['madrid_pug']['testcollection']
!
!
result = coll.update_one( {'field_one': 'some value'},
{"$set": {'field_one': 'new_value'}} )
#or
!
result = coll.update_many( {'field_one': 'some value'},
{"$set": {'field_one': 'new_value'}} )
!
print(result)
!
Remove
#!/bin/python
from pymongo import MongoClient
mc = MongoClient()
!
coll = mc['madrid_pug']['testcollection']
!
!
result = coll.delete_one( {'field_one': 'some value’})
!
#or
!
result = coll.delete_many( {'field_one': 'some value'})
!
print(result)
!
Aggregate
http://4.bp.blogspot.com/-­‐0IT3rIJkAtM/Uud2pTrGCbI/AAAAAAAABZM/-­‐XUK7j4ZHmI/s1600/snowflakes.jpg
Aggregation Framework
• Analytical workload solution
• Pipeline processing
• Several Stages
• $match
• $group
• $project
• $unwind
• $sort
• $limit
• $skip
• $out
!
• http://docs.mongodb.org/manual/aggregation/
Aggregation Framework
#!/bin/python
from pymongo import MongoClient
mc = MongoClient()
!
coll = mc['madrid_pug']['testcollection']
!
!
cur = coll.aggregate( [
{"$match": {'field_one': {"$exists": True }}} ,
{"$project": { "new_label": "$field_one" }} ]
)
!
for d in cur:
print(d)
GridFS
http://www.appuntidigitali.it/site/wp-­‐content/uploads/rawdata.png
GridFS
• MongoDB has a 16MB document size limit
• So how can we store data bigger than 16MB?
• Media files (images, pdf’s, long binary files …)
• GridFS
• Convention more than a feature
• All drivers implement this convention
• pymongo is no different
• Very flexible approach
• Handy out-of-the-box solution
GridFS
#!/bin/python	
  
from	
  pymongo	
  import	
  MongoClient	
  
import	
  gridfs	
  
!
!
mc	
  =	
  MongoClient()	
  
database	
  =	
  mc.grid_example	
  
!
!
gfs	
  =	
  gridfs.GridFS(	
  database)	
  
!
read_file	
  =	
  open(	
  '/tmp/somefile',	
  'r')	
  
!
gfs.put(read_file,	
  author='Norberto',	
  tags=['awesome',	
  'madrid',	
  'pug'])	
  
call	
  grids	
  lib	
  w/	
  database
GridFS
#!/bin/python	
  
from	
  pymongo	
  import	
  MongoClient	
  
import	
  gridfs	
  
!
!
mc	
  =	
  MongoClient()	
  
database	
  =	
  mc.grid_example	
  
!
!
gfs	
  =	
  gridfs.GridFS(	
  database)	
  
!
read_file	
  =	
  open(	
  '/tmp/somefile',	
  'r')	
  
!
gfs.put(read_file,	
  author='Norberto',	
  tags=['awesome',	
  'madrid',	
  'pug'])	
  
open	
  file	
  for	
  reading
GridFS
#!/bin/python	
  
from	
  pymongo	
  import	
  MongoClient	
  
import	
  gridfs	
  
!
!
mc	
  =	
  MongoClient()	
  
database	
  =	
  mc.grid_example	
  
!
!
gfs	
  =	
  gridfs.GridFS(	
  database)	
  
!
read_file	
  =	
  open(	
  '/tmp/somefile',	
  'r')	
  
!
gfs.put(read_file,	
  author='Norberto',	
  tags=['awesome',	
  'madrid',	
  'pug'])	
  
call	
  put	
  to	
  store	
  file	
  and	
  
metadata
GridFS
mongo	
  
nair(mongod-­‐3.1.0-­‐pre-­‐)	
  grid_sample>	
  show	
  dbs	
  
grid_sample	
  	
  0.246GB	
  
local	
  	
  	
  	
  	
  	
  	
  	
  0.000GB	
  
nair(mongod-­‐3.1.0-­‐pre-­‐)	
  grid_sample>	
  show	
  collections	
  
fs.chunks	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  258.995MB	
  /	
  252.070MB	
  
fs.files	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.000MB	
  /	
  0.016MB	
  
database	
  created
GridFS
mongo	
  
nair(mongod-­‐3.1.0-­‐pre-­‐)	
  grid_sample>	
  show	
  dbs	
  
grid_sample	
  	
  0.246GB	
  
local	
  	
  	
  	
  	
  	
  	
  	
  0.000GB	
  
nair(mongod-­‐3.1.0-­‐pre-­‐)	
  grid_sample>	
  show	
  collections	
  
fs.chunks	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  258.995MB	
  /	
  252.070MB	
  
fs.files	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.000MB	
  /	
  0.016MB	
   2	
  collections
GridFS
mongo	
  
nair(mongod-­‐3.1.0-­‐pre-­‐)	
  grid_sample>	
  show	
  dbs	
  
grid_sample	
  	
  0.246GB	
  
local	
  	
  	
  	
  	
  	
  	
  	
  0.000GB	
  
nair(mongod-­‐3.1.0-­‐pre-­‐)	
  grid_sample>	
  show	
  collections	
  
fs.chunks	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  258.995MB	
  /	
  252.070MB	
  
fs.files	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.000MB	
  /	
  0.016MB	
  
chunks	
  collection	
  holds	
  binary	
  data	
  
files	
  holds	
  metada	
  data
Indexes
Indexes
• Single Field
• Compound
• Multikey
• Geospatial
• 2d
• 2dSphere - GeoJSON
• Full Text
• Hash Based
• TTL indexes
• Unique
• Sparse
Single Field Index
from pymongo import ASCENDING, MongoClient
mc = MongoClient()
!
coll = mc.madrid_pug.testcollection
!
coll.ensure_index( 'some_single_field', ASCENDING )
indexed	
  field indexing	
  order
Compound Field Index
from pymongo import ASCENDING, DESCENDING, MongoClient
mc = MongoClient()
!
coll = mc.madrid_pug.testcollection
!
coll.ensure_index( [('field_ascending', ASCENDING),
('field_descending', DESCENDING)] )
indexed	
  fields indexing	
  order
Multikey Field Index
mc = MongoClient()
!
coll = mc.madrid_pug.testcollection
!
!
coll.insert( {'array_field': [1, 2, 54, 89]})
!
coll.ensure_index( 'array_field')
indexed	
  field
Geospatial Field Index
from pymongo import GEOSPHERE
import geojson
!
!
p = geojson.Point( [-73.9858603477478, 40.75929362758241])
!
coll.insert( {'point', p)
!
coll.ensure_index( [( 'point', GEOSPHERE )])
index	
  type
ODM and others
Friends
• mongoengine
• http://mongoengine.org/
• Motor
• http://motor.readthedocs.org/en/stable/
• async driver
• Tornado
• Greenlets
• ming
• http://sourceforge.net/projects/merciless/
Let's recap
Recap
• MongoDB is Awesome
• Specially to work with Python
• pymongo
• super well supported
• fully in sync with MongoDB server
MongoDB 3.0 is here!
Go and Play!
https://www.mongodb.com/lp/download/mongodb-­‐enterprise?jmp=homepage
http://www.mongodb.com/norberto
Obrigado!
Norberto Leite
Technical Evangelist
@nleite
norberto@mongodb.com
Python and MongoDB

Python and MongoDB