PageRenderTime 1ms CodeModel.GetById 19ms app.highlight 6ms RepoModel.GetById 1ms app.codeStats 0ms

/doc/tutorial.rst

https://github.com/hghazal/mongo-python-driver
ReStructuredText | 332 lines | 246 code | 86 blank | 0 comment | 0 complexity | c5727d42a5cebcdd46b469c087589110 MD5 | raw file
  1Tutorial
  2========
  3
  4.. testsetup::
  5
  6  from pymongo import Connection
  7  connection = Connection()
  8  connection.drop_database('test-database')
  9
 10This tutorial is intended as an introduction to working with
 11**MongoDB** and **PyMongo**.
 12
 13Prerequisites
 14-------------
 15Before we start, make sure that you have the **PyMongo** distribution
 16:doc:`installed <installation>`. In the Python shell, the following
 17should run without raising an exception:
 18
 19.. doctest::
 20
 21  >>> import pymongo
 22
 23This tutorial also assumes that a MongoDB instance is running on the
 24default host and port. Assuming you have `downloaded and installed
 25<http://www.mongodb.org/display/DOCS/Getting+Started>`_ MongoDB, you
 26can start it like so:
 27
 28.. code-block:: bash
 29
 30  $ mongod
 31
 32Making a Connection
 33-------------------
 34The first step when working with **PyMongo** is to create a
 35:class:`~pymongo.connection.Connection` to the running **mongod**
 36instance. Doing so is easy:
 37
 38.. doctest::
 39
 40  >>> from pymongo import Connection
 41  >>> connection = Connection()
 42
 43The above code will connect on the default host and port. We can also
 44specify the host and port explicitly, as follows:
 45
 46.. doctest::
 47
 48  >>> connection = Connection('localhost', 27017)
 49
 50Getting a Database
 51------------------
 52A single instance of MongoDB can support multiple independent
 53`databases <http://www.mongodb.org/display/DOCS/Databases>`_. When
 54working with PyMongo you access databases using attribute style access
 55on :class:`~pymongo.connection.Connection` instances:
 56
 57.. doctest::
 58
 59  >>> db = connection.test_database
 60
 61If your database name is such that using attribute style access won't
 62work (like ``test-database``), you can use dictionary style access
 63instead:
 64
 65.. doctest::
 66
 67  >>> db = connection['test-database']
 68
 69Getting a Collection
 70--------------------
 71A `collection <http://www.mongodb.org/display/DOCS/Collections>`_ is a
 72group of documents stored in MongoDB, and can be thought of as roughly
 73the equivalent of a table in a relational database. Getting a
 74collection in PyMongo works the same as getting a database:
 75
 76.. doctest::
 77
 78  >>> collection = db.test_collection
 79
 80or (using dictionary style access):
 81
 82.. doctest::
 83
 84  >>> collection = db['test-collection']
 85
 86An important note about collections (and databases) in MongoDB is that
 87they are created lazily - none of the above commands have actually
 88performed any operations on the MongoDB server. Collections and
 89databases are created when the first document is inserted into them.
 90
 91Documents
 92---------
 93Data in MongoDB is represented (and stored) using JSON-style
 94documents. In PyMongo we use dictionaries to represent documents. As
 95an example, the following dictionary might be used to represent a blog
 96post:
 97
 98.. doctest::
 99
100  >>> import datetime
101  >>> post = {"author": "Mike",
102  ...         "text": "My first blog post!",
103  ...         "tags": ["mongodb", "python", "pymongo"],
104  ...         "date": datetime.datetime.utcnow()}
105
106Note that documents can contain native Python types (like
107:class:`datetime.datetime` instances) which will be automatically
108converted to and from the appropriate `BSON
109<http://www.mongodb.org/display/DOCS/BSON>`_ types.
110
111.. todo:: link to table of Python <-> BSON types
112
113Inserting a Document
114--------------------
115To insert a document into a collection we can use the
116:meth:`~pymongo.collection.Collection.insert` method:
117
118.. doctest::
119
120  >>> posts = db.posts
121  >>> posts.insert(post)
122  ObjectId('...')
123
124When a document is inserted a special key, ``"_id"``, is automatically
125added if the document doesn't already contain an ``"_id"`` key. The value
126of ``"_id"`` must be unique across the
127collection. :meth:`~pymongo.collection.Collection.insert` returns the
128value of ``"_id"`` for the inserted document. For more information, see the
129`documentation on _id
130<http://www.mongodb.org/display/DOCS/Object+IDs>`_.
131
132.. todo:: notes on the differences between save and insert
133
134After inserting the first document, the *posts* collection has
135actually been created on the server. We can verify this by listing all
136of the collections in our database:
137
138.. doctest::
139
140  >>> db.collection_names()
141  [u'posts', u'system.indexes']
142
143.. note:: The *system.indexes* collection is a special internal
144   collection that was created automatically.
145
146
147Getting a Single Document With :meth:`~pymongo.collection.Collection.find_one`
148------------------------------------------------------------------------------
149The most basic type of query that can be performed in MongoDB is
150:meth:`~pymongo.collection.Collection.find_one`. This method returns a
151single document matching a query (or ``None`` if there are no
152matches). It is useful when you know there is only one matching
153document, or are only interested in the first match. Here we use
154:meth:`~pymongo.collection.Collection.find_one` to get the first
155document from the posts collection:
156
157.. doctest::
158
159  >>> posts.find_one()
160  {u'date': datetime.datetime(...), u'text': u'My first blog post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']}
161
162The result is a dictionary matching the one that we inserted previously.
163
164.. note:: The returned document contains an ``"_id"``, which was
165   automatically added on insert.
166
167:meth:`~pymongo.collection.Collection.find_one` also supports querying
168on specific elements that the resulting document must match. To limit
169our results to a document with author "Mike" we do:
170
171.. doctest::
172
173  >>> posts.find_one({"author": "Mike"})
174  {u'date': datetime.datetime(...), u'text': u'My first blog post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']}
175
176If we try with a different author, like "Eliot", we'll get no result:
177
178.. doctest::
179
180  >>> posts.find_one({"author": "Eliot"})
181
182A Note On Unicode Strings
183-------------------------
184You probably noticed that the regular Python strings we stored earlier look
185different when retrieved from the server (e.g. u'Mike' instead of 'Mike').
186A short explanation is in order.
187
188MongoDB stores data in `BSON format <http://bsonspec.org>`_. BSON strings are
189UTF-8 encoded so PyMongo must ensure that any strings it stores contain only
190valid UTF-8 data. Regular strings (<type 'str'>) are validated and stored
191unaltered. Unicode strings (<type 'unicode'>) are encoded UTF-8 first. The
192reason our example string is represented in the Python shell as u'Mike' instead
193of 'Mike' is that PyMongo decodes each BSON string to a Python unicode string,
194not a regular str.
195
196`You can read more about Python unicode strings here
197<http://docs.python.org/howto/unicode.html>`_.
198
199Bulk Inserts
200------------
201In order to make querying a little more interesting, let's insert a
202few more documents. In addition to inserting a single document, we can
203also perform *bulk insert* operations, by passing an iterable as the
204first argument to :meth:`~pymongo.collection.Collection.insert`. This
205will insert each document in the iterable, sending only a single
206command to the server:
207
208.. doctest::
209
210  >>> new_posts = [{"author": "Mike",
211  ...               "text": "Another post!",
212  ...               "tags": ["bulk", "insert"],
213  ...               "date": datetime.datetime(2009, 11, 12, 11, 14)},
214  ...              {"author": "Eliot",
215  ...               "title": "MongoDB is fun",
216  ...               "text": "and pretty easy too!",
217  ...               "date": datetime.datetime(2009, 11, 10, 10, 45)}]
218  >>> posts.insert(new_posts)
219  [ObjectId('...'), ObjectId('...')]
220
221There are a couple of interesting things to note about this example:
222
223  - The call to :meth:`~pymongo.collection.Collection.insert` now
224    returns two :class:`~bson.objectid.ObjectId` instances, one for
225    each inserted document.
226  - ``new_posts[1]`` has a different "shape" than the other posts -
227    there is no ``"tags"`` field and we've added a new field,
228    ``"title"``. This is what we mean when we say that MongoDB is
229    *schema-free*.
230
231Querying for More Than One Document
232-----------------------------------
233To get more than a single document as the result of a query we use the
234:meth:`~pymongo.collection.Collection.find`
235method. :meth:`~pymongo.collection.Collection.find` returns a
236:class:`~pymongo.cursor.Cursor` instance, which allows us to iterate
237over all matching documents. For example, we can iterate over every
238document in the ``posts`` collection:
239
240.. doctest::
241
242  >>> for post in posts.find():
243  ...   post
244  ...
245  {u'date': datetime.datetime(...), u'text': u'My first blog post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']}
246  {u'date': datetime.datetime(2009, 11, 12, 11, 14), u'text': u'Another post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'bulk', u'insert']}
247  {u'date': datetime.datetime(2009, 11, 10, 10, 45), u'text': u'and pretty easy too!', u'_id': ObjectId('...'), u'author': u'Eliot', u'title': u'MongoDB is fun'}
248
249Just like we did with :meth:`~pymongo.collection.Collection.find_one`,
250we can pass a document to :meth:`~pymongo.collection.Collection.find`
251to limit the returned results. Here, we get only those documents whose
252author is "Mike":
253
254.. doctest::
255
256  >>> for post in posts.find({"author": "Mike"}):
257  ...   post
258  ...
259  {u'date': datetime.datetime(...), u'text': u'My first blog post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']}
260  {u'date': datetime.datetime(2009, 11, 12, 11, 14), u'text': u'Another post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'bulk', u'insert']}
261
262Counting
263--------
264If we just want to know how many documents match a query we can
265perform a :meth:`~pymongo.cursor.Cursor.count` operation instead of a
266full query. We can get a count of all of the documents in a
267collection:
268
269.. doctest::
270
271  >>> posts.count()
272  3
273
274or just of those documents that match a specific query:
275
276.. doctest::
277
278  >>> posts.find({"author": "Mike"}).count()
279  2
280
281Range Queries
282-------------
283MongoDB supports many different types of `advanced queries
284<http://www.mongodb.org/display/DOCS/Advanced+Queries>`_. As an
285example, lets perform a query where we limit results to posts older
286than a certain date, but also sort the results by author:
287
288.. doctest::
289
290  >>> d = datetime.datetime(2009, 11, 12, 12)
291  >>> for post in posts.find({"date": {"$lt": d}}).sort("author"):
292  ...   post
293  ...
294  {u'date': datetime.datetime(2009, 11, 10, 10, 45), u'text': u'and pretty easy too!', u'_id': ObjectId('...'), u'author': u'Eliot', u'title': u'MongoDB is fun'}
295  {u'date': datetime.datetime(2009, 11, 12, 11, 14), u'text': u'Another post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'bulk', u'insert']}
296
297Here we use the special ``"$lt"`` operator to do a range query, and
298also call :meth:`~pymongo.cursor.Cursor.sort` to sort the results
299by author.
300
301Indexing
302--------
303To make the above query fast we can add a compound index on
304``"date"`` and ``"author"``. To start, lets use the
305:meth:`~pymongo.cursor.Cursor.explain` method to get some information
306about how the query is being performed without the index:
307
308.. doctest::
309
310  >>> posts.find({"date": {"$lt": d}}).sort("author").explain()["cursor"]
311  u'BasicCursor'
312  >>> posts.find({"date": {"$lt": d}}).sort("author").explain()["nscanned"]
313  3
314
315We can see that the query is using the *BasicCursor* and scanning over
316all 3 documents in the collection. Now let's add a compound index and
317look at the same information:
318
319.. doctest::
320
321  >>> from pymongo import ASCENDING, DESCENDING
322  >>> posts.create_index([("date", DESCENDING), ("author", ASCENDING)])
323  u'date_-1_author_1'
324  >>> posts.find({"date": {"$lt": d}}).sort("author").explain()["cursor"]
325  u'BtreeCursor date_-1_author_1'
326  >>> posts.find({"date": {"$lt": d}}).sort("author").explain()["nscanned"]
327  2
328
329Now the query is using a *BtreeCursor* (the index) and only scanning
330over the 2 matching documents.
331
332.. seealso:: The MongoDB documentation on `indexes <http://www.mongodb.org/display/DOCS/Indexes>`_