Getting Started with MongoDB and Python

In this post I’ll walk through getting started with MongoDB using the Python PyMongo module. I’ll go through the installation process, and then walk through an example of entering data into a MongoDB through Python. (In a future post I’ll cover querying documents.) For the installation, I’ll assume that you’re running Ubuntu, but there are instructions for all major operating systems on the link that I have provided.

Installation

My first inclination was to just run sudo apt-get install mongodb from the command line. This installed something, but it wasn’t what I needed to reproduce the Python tutorials I had found on the internet. Then I found this page on the MongoDB website that did have what I needed. I’ll reproduce their installation process here.

Import the public key used by the package management system

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10

Create a list file for MongoDB

echo 'deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen' | sudo tee /etc/apt/sources.list.d/mongodb.list

Reload the local package database

sudo apt-get update

Install the MongoDB packages

sudo apt-get install mongodb-org

Start/Stop/Restart MongoDB

sudo service mongod start
sudo service mongod stop
sudo service mongod restart

Check the MongoDB log file

less /var/log/mongodb/mongod.log

An Example with Python

First, we’ll need to install PyMongo using pip.

sudo pip install pymongo

Now, we can create a database, and a collection, which is like a table in a traditional RDBMS. First we’ll start a MongoDB instance by running the following line at the terminal.

sudo service mongod start

Next, we’ll fire up python (or ipython, or an ipython notebook) import some stuff, and connect to the mongod instance we just started. This can be done several ways, the first line after the import connects to the default host and port. The next two lines show alternate ways to explicitly specify the default host and port.

from pymongo import MongoClient
client = MongoClient()
# alternatively..
client = MongoClient("localhost",27017)
client = MongoClient("mongodb://localhost:27017/")

At this point we can create a database using attribute notation, or dictionary style notation. Here, the MongoDB database is is called “mydb”, and the Python variable describing that database is “db”. (This is useful if you’re opening up a database and it has a ridiculously long name.)

db = client.mydb
db = client["mydb"]

Next we can create collections in the database. Collections are analogous to tables in a traditional RDBMS. (Collections and databases aren’t actually created until you start adding documents, which are analogous to rows or records.) Again, we have the attribute and dictionary styles for creating collections.

coll = db.mycollection
coll = db['mycollection']

We can create a document as a Python dict. A document can hold strings, numbers, and lists. We can insert this into a collection using the insert() method.

document = {"fname":"connor","weight":170.5,"height":[5,10]}
coll.insert( document )

Since databases and collections aren’t created until we insert a document, we can now see the collection by calling the collection_names() method on the database.

db.collection_names()

If we have a lot of documents we’d like to put into a collection, we can inset them all at once using a list of dictionaries. Remember, since we’re dealing with documents and not tables, we can all sorts of fields.

connor_doc = {"fname":"connor","weight":170.5,"height":[5,10]}
roger_doc = {"name":"roger","species":"dog","breed":"awesome","weight":20.2}
docs = [ connor_doc, roger_doc ]
coll.insert( docs )