MongoDB
about | developer | name | support |MongoDB was created by the founders of DoubleClick.
The name Mongo is based on Mongo character in the movie Blazzing Saddles.
enterprise support for MongoDB is available through 10gen.
benchmarks | mongodb vs couchdb and mysql |
Ben benchmarked MongoDB against mysql and couchdb on a single machine and found MongoDB to be considerably fast. [mongodb vs couchdb and mysql]
case_studies | mongodb at sourceforge | mongodb at foursquare | mongodb at gamechanger | mongodb at shutterfly | mongodb at boxedice | mongodb at chartbeat | foursquare outage |
Slides from presentation regarding MongoDB's implementation at sourceforge. [mongodb at sourceforge]
A video detailing MongoDB implementation at foursquare, presented by Harry Heymann. Originally foursquare was using MySQL, then migrated to PostgreSQL and then settled with MongoDB. [mongodb at foursquare]
case study detailing how and why gamechanger moved to MongoDB [mongodb at gamechanger]
Kenny Gorman talks about MongoDB implementation at shutterfly. [mongodb at shutterfly]
boxedice evaluated cassandra, couchdb, Hypertable, Tokyo Cabinet and Project Voldemort when deciding to scale their infrastructure. Ultimately, boxedice settled with MongoDB. [mongodb at boxedice]
Kushal Dave, CTO of chartbeat (a real-time analytics startup), presents details on MongoDB deployment at chartbeat. [mongodb at chartbeat]
Post mortem of Foursquare outage caused when data on one of the shards exceeded RAM capacity. [foursquare outage]
deployment | sourceforge | sourceforge |
[sourceforge]
[sourceforge] development | ORM | connection |
Several ORMs are available for Mongo including pymongo and mongoengine.
The following code sample shows how you can connect to the customers data store (document in MongoDB).
Mongo connection = new Mongo("customers");
DBCollectioncollection =
connection.getCollection("orders");
collection.insert({
id :'3212323',
first_name : 'Joel',
last_name : 'Dash',
order_total : '432.50',
comments : []
});indexing | full index support | inner objects | geo spatial | compound indexes | primary keys | create index |
MongoDB offers full index support including support for fulltext indexes allowing you to index any attribute. General and index assisted queries are fast in MongoDB, however, large Map/Reduce jobs block other requests until the MapReduce job is complete.
MongoDB offers ability to index on an inner object. This makes it ideal for queries like find all orders for an SKU.
MongoDB features built-in geo spatial capabilities (a feature FourSquare found especially attractive when settling with MongoDB).
MongoDB offers the ability to create compound indexes but there is a 1KB limitation for key size. The data that makes up the index key cannot be larger than 1KB. When that limit is reached for a particular key, it is excluded from the index.
MongoDB automatically generates a primary key which can be retrieved using _id.
collection.insert(post);
System.out.println(post.get("_id"));
To create an index, use
collection.ensureindex({phone_number : 1});
mongodb_vs_couchdb | mongodb team | gabrielle lane | scott motte | newsicare |
A comparison by MongoDB team addressing topics like MVCC, in-place updates, horizontal scalability, query expression, atomicity, durability and MapReduce. [mongodb team]
A presentation on SlideShare by Gabriele Lane provides an in-depth comparison of MongoDB vs couchdb including various MapReduce examples. [gabrielle lane]
A presentation by Scott Motte on SlideShare comparing MongoDB with couchdb. [scott motte]
NewsICare team compares MongoDB to couchdb [newsicare]
performance | high performance | serial performance | concurrency | memory mapped files |
Since MongoDB is designed to be a high performance data store, performance is a primary focus of development.
MongoDB provides fast serial performance for single clients.
MongoDB features read concurrency (versions >= 1.3.x).
MongoDB uses memory mapped files for faster performance.
querying | MapReduce | dynamic queries | query optimizer | fulltext search | query examples | query plans | how queries work with sharding |
MongoDB features MapReduce for batch processing of data and aggregate operatIOns. Large Map/Reduce jobs block other requests until the MapReduce job is complete meaning no reads or writes can be performed until the MapReduce job in progress finishes.
Support for fast dynamic (ad-hoc) queries, in MongoDB, allows you to find data using any criteria even on non-indexed fields.
Just like MySQL, MongoDB's query optimizer generates query plans for each client-submitted query.
MongoDB supports full text search (comparison), cursors, regular expressions, conditional operators, groupings. MongoDB doesn't offer a powerful fulltext search. For full text searches with MongoDB the best practices indicate that all keywords be placed in a single field. It's the responsibility of your code to split and stem content into keywords. One area where MongoDB shines is when real-time search is needed. Dedicated fulltext search solutions depend on bulk index building making real-time search difficult, if not impossible.
Queries in MongoDB are expressed as BSON documents. An empty document acts as a wildcard and selects every document in a collection. An example query using the MongoDB Shell follows:
db.profiles.find({'location': 'Palo Alto'});
db.profiles.find().skip(20).limit(10);
db.profiles.find({}).sort({rating: 1}); [query examples] MongoDB tests several query plans in parallel. The first one to finish is adopted and the remaining tests are terminated. Because of a non-relational system lacking any JOINs, the space required to store possible query plans is much smaller.
Slides with information on how queries work with auto-sharding in MongoDB [how queries work with sharding]
replication | master slave replication | replication requirements | configuring master | configuring slave | transaction logs | auto restarting replication | slave getting out of sync | restarting replication |
MongoDB supports master / slave replication. [master slave replication]
At a minimum, two MongoDB database instances need to be configured in master mode and slave mode respectively for replication to work.
The following command will allow you to configure a MongoDB database instance to run as master.
$ bin/mongod --master [--dbpath /data/path/to/masterdb]
The following command will allow you to configure a MongoDB database instance to run as a slave.
$ bin/mongod --slave --source <masterhostname>[:] [--dbpath /data/path/to/slavedb/]
On master the local.oplog.$main collection is created which serves as the transaction log holding the queued operations to be applied later to slaves. On the slaves, a local.sources collection is maintained
In order for MongoDB to resume replication upon breaking, specify the --autoresync option on the command line. When replication breaks (gets out of sync), MongoDB will attempt to restart replication after a ten second pause. If for some reason replication gets out of sync again within ten minutes of resuming, MongoDB will wait for ten minutes to pass since initial resumption before attempting to auto-sync again. In other words, only one auto resync attempt is made every ten minutes.
When a MongoDB slave gets out of sync, replication terminates and can't be restarted without manual intervention.
If your MongoDB slave instance gets out of sync, you can restart replication using the {resync:1} command.
resources | kristina chodrow | schema optimization | mongodb administration | cap theorem |
Slides from Kristina Chodorow's talk at a Meetup. Kristina works at 10gen. [kristina chodrow]
Kyle Banker talks about optimizing MongoDB schemas. [schema optimization]
video by Mathias Stearn with tips for database administrators and operations staff on administering MongoDB. [mongodb administration]
David Strauss talks about MongoDB and Dr. Eric Brewer's CAP theorem. [cap theorem]
scalability | auto sharding |
MongoDB is designed to be easily scalable and features auto-sharding out of the box.
storage | storage format | file system | key value | BSON | data types |
MongoDB is a document-oriented data store using BSON as the storage format.
MongoDB features GridFS allowing you to store file of any size by breaking it into chunks (usually 256KB). Each chunk then becomes a separate document in a chunks collection from storage point of view. metadata about the file is stored in files collection. [file system]
The BSON format used by MongoDB contains zero or more key value pairs. These key value pairs comprise a single entity called document.
The BSON objects in MongoDB are limited to 4MB in size in MongoDB.
Basic BSON data types are:
byte 1 byte (8-bits)
int32 4 bytes (32-bit signed integer)
int64 8 bytes (64-bit signed integer)
double 8 bytes (64-bit IEEE 754 floating point)
updates | atomic updates |
Supports atomic, in-place updates that update just a portion of the doucment, in addition to traditional updates to replace an entire document.
use_cases | ideal | real time analytics |
MongoDB is ideal for update-intensive use cases featuring logging related write / update intensive loads.
Damon Cortesi talks about why MongoDB works great for real-time stats generation and data collection. [real time analytics]
MongoDB vs CouchDB
| MongoDB | CouchDB | |
| Architecture | Update-in-place | MVCC based |
| Replication | Master-Slave | Master-Master |
| Data Model | BSON | JSON |
| Document-oriented | Yes | Yes |
| Schema-free | Yes | Yes |
| Interface | TCP/IP - Custom | HTTP/REST |
| In-place Updates | Yes | No |
| MVCC | No | Yes |
| Need drivers | Yes | No |
| MapReduce | Yes | Yes |
| Dynamic queries / ad-hoc queries | Yes | No |
| Indexing | Yes | No |
| Benchmark with 250MB file size | MongoDB is faster | CouchDB is slower |
| Benchmark (heavy load/query condition) | MongoDB is faster | CouchDB is slower |
| Repair required if table crashed? | Yes , run repairDatabase() | No (database termination doesn't affect consistency) |
MongoDB on Twitter
- Blogging Contest: MongoDB Schema Design http://t.co/h9cQIHiP @mongodb
- Presentation: MongoDB for Content Management http://t.co/3XDJ4Afx @mongodb
- Testing Strategies for Continuous Deployment of a Node.JS & MongoDB app to Heroku/MongoLab http://t.co/TvSj364j @mongodb
- Building a Real Time Activity Stream on Cloud Foundry with Node.js, Redis and MongoDB – Part I http://t.co/xbxfYXNB @mongodb
- Getting Started with MongoDB and PHP http://t.co/II5G6MJC @mongodb
- Podcast from @engineyard: MongoDB and OpenStreetMap http://t.co/UdffOttC @mongodb
- Free Webinar Today: MongoDB for Content Management http://t.co/glwpL1ti @mongodb
- Starting soon: MongoDB for Content Management Webinar (session 1) http://t.co/glwpL1ti @mongodb
- Rejoignez nous le 14 Juin pour MongoDB Paris. La plupart des présentations seront en français ! http://t.co/iX8EHOPV @mongodb
- Using @Alfresco and @MongoDB together http://t.co/SutQRbLd @mongodb
- Mobilize Your MongoDB: Building MongoDB Mobile Apps with OpenShift PaaS http://t.co/NITu3Niy @mongodb
- Free webcast on May 18 from O'Reilly: MongoDB and PHP http://t.co/jLE3pUQP @mongodb
- MongoDB Users, Hackers & Enthusiasts Meetup (Delhi, Gurgaon, Noida) meeting Saturday at @twthoughts http://t.co/A2uqSKtv @mongodb
- Round Pegs and Square Holes: Django and MongoDB at SoCal Piggies http://t.co/RyuVhyAi @mongodb
- Blog post on using Scala, Play 2.0 and MongoDB http://t.co/pFhr6OfM @mongodb
- Early Bird for MongoDB Paris ends this week http://t.co/iX8EHOPV @mongodb
- #MongoSF presentation from @montsechka: MongoDB Schema Design: Insights and Tradeoffs http://t.co/fha2tehP @mongodb
- Using MongoDB in production? Get listed on the production deployments page http://t.co/F1f8tyZ6 @mongodb
- MongoDB UK conference is coming on June 20 http://t.co/YI9VF4ft @mongodb
- Free Webinar: MongoDB for Content Management http://t.co/glwpL1ti @mongodb


