/doc/tutorial.rst

https://github.com/hghazal/mongo-python-driver · ReStructuredText · 332 lines · 246 code · 86 blank · 0 comment · 0 complexity · c5727d42a5cebcdd46b469c087589110 MD5 · raw file

  1. Tutorial
  2. ========
  3. .. testsetup::
  4. from pymongo import Connection
  5. connection = Connection()
  6. connection.drop_database('test-database')
  7. This tutorial is intended as an introduction to working with
  8. **MongoDB** and **PyMongo**.
  9. Prerequisites
  10. -------------
  11. Before we start, make sure that you have the **PyMongo** distribution
  12. :doc:`installed <installation>`. In the Python shell, the following
  13. should run without raising an exception:
  14. .. doctest::
  15. >>> import pymongo
  16. This tutorial also assumes that a MongoDB instance is running on the
  17. default host and port. Assuming you have `downloaded and installed
  18. <http://www.mongodb.org/display/DOCS/Getting+Started>`_ MongoDB, you
  19. can start it like so:
  20. .. code-block:: bash
  21. $ mongod
  22. Making a Connection
  23. -------------------
  24. The first step when working with **PyMongo** is to create a
  25. :class:`~pymongo.connection.Connection` to the running **mongod**
  26. instance. Doing so is easy:
  27. .. doctest::
  28. >>> from pymongo import Connection
  29. >>> connection = Connection()
  30. The above code will connect on the default host and port. We can also
  31. specify the host and port explicitly, as follows:
  32. .. doctest::
  33. >>> connection = Connection('localhost', 27017)
  34. Getting a Database
  35. ------------------
  36. A single instance of MongoDB can support multiple independent
  37. `databases <http://www.mongodb.org/display/DOCS/Databases>`_. When
  38. working with PyMongo you access databases using attribute style access
  39. on :class:`~pymongo.connection.Connection` instances:
  40. .. doctest::
  41. >>> db = connection.test_database
  42. If your database name is such that using attribute style access won't
  43. work (like ``test-database``), you can use dictionary style access
  44. instead:
  45. .. doctest::
  46. >>> db = connection['test-database']
  47. Getting a Collection
  48. --------------------
  49. A `collection <http://www.mongodb.org/display/DOCS/Collections>`_ is a
  50. group of documents stored in MongoDB, and can be thought of as roughly
  51. the equivalent of a table in a relational database. Getting a
  52. collection in PyMongo works the same as getting a database:
  53. .. doctest::
  54. >>> collection = db.test_collection
  55. or (using dictionary style access):
  56. .. doctest::
  57. >>> collection = db['test-collection']
  58. An important note about collections (and databases) in MongoDB is that
  59. they are created lazily - none of the above commands have actually
  60. performed any operations on the MongoDB server. Collections and
  61. databases are created when the first document is inserted into them.
  62. Documents
  63. ---------
  64. Data in MongoDB is represented (and stored) using JSON-style
  65. documents. In PyMongo we use dictionaries to represent documents. As
  66. an example, the following dictionary might be used to represent a blog
  67. post:
  68. .. doctest::
  69. >>> import datetime
  70. >>> post = {"author": "Mike",
  71. ... "text": "My first blog post!",
  72. ... "tags": ["mongodb", "python", "pymongo"],
  73. ... "date": datetime.datetime.utcnow()}
  74. Note that documents can contain native Python types (like
  75. :class:`datetime.datetime` instances) which will be automatically
  76. converted to and from the appropriate `BSON
  77. <http://www.mongodb.org/display/DOCS/BSON>`_ types.
  78. .. todo:: link to table of Python <-> BSON types
  79. Inserting a Document
  80. --------------------
  81. To insert a document into a collection we can use the
  82. :meth:`~pymongo.collection.Collection.insert` method:
  83. .. doctest::
  84. >>> posts = db.posts
  85. >>> posts.insert(post)
  86. ObjectId('...')
  87. When a document is inserted a special key, ``"_id"``, is automatically
  88. added if the document doesn't already contain an ``"_id"`` key. The value
  89. of ``"_id"`` must be unique across the
  90. collection. :meth:`~pymongo.collection.Collection.insert` returns the
  91. value of ``"_id"`` for the inserted document. For more information, see the
  92. `documentation on _id
  93. <http://www.mongodb.org/display/DOCS/Object+IDs>`_.
  94. .. todo:: notes on the differences between save and insert
  95. After inserting the first document, the *posts* collection has
  96. actually been created on the server. We can verify this by listing all
  97. of the collections in our database:
  98. .. doctest::
  99. >>> db.collection_names()
  100. [u'posts', u'system.indexes']
  101. .. note:: The *system.indexes* collection is a special internal
  102. collection that was created automatically.
  103. Getting a Single Document With :meth:`~pymongo.collection.Collection.find_one`
  104. ------------------------------------------------------------------------------
  105. The most basic type of query that can be performed in MongoDB is
  106. :meth:`~pymongo.collection.Collection.find_one`. This method returns a
  107. single document matching a query (or ``None`` if there are no
  108. matches). It is useful when you know there is only one matching
  109. document, or are only interested in the first match. Here we use
  110. :meth:`~pymongo.collection.Collection.find_one` to get the first
  111. document from the posts collection:
  112. .. doctest::
  113. >>> posts.find_one()
  114. {u'date': datetime.datetime(...), u'text': u'My first blog post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']}
  115. The result is a dictionary matching the one that we inserted previously.
  116. .. note:: The returned document contains an ``"_id"``, which was
  117. automatically added on insert.
  118. :meth:`~pymongo.collection.Collection.find_one` also supports querying
  119. on specific elements that the resulting document must match. To limit
  120. our results to a document with author "Mike" we do:
  121. .. doctest::
  122. >>> posts.find_one({"author": "Mike"})
  123. {u'date': datetime.datetime(...), u'text': u'My first blog post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']}
  124. If we try with a different author, like "Eliot", we'll get no result:
  125. .. doctest::
  126. >>> posts.find_one({"author": "Eliot"})
  127. A Note On Unicode Strings
  128. -------------------------
  129. You probably noticed that the regular Python strings we stored earlier look
  130. different when retrieved from the server (e.g. u'Mike' instead of 'Mike').
  131. A short explanation is in order.
  132. MongoDB stores data in `BSON format <http://bsonspec.org>`_. BSON strings are
  133. UTF-8 encoded so PyMongo must ensure that any strings it stores contain only
  134. valid UTF-8 data. Regular strings (<type 'str'>) are validated and stored
  135. unaltered. Unicode strings (<type 'unicode'>) are encoded UTF-8 first. The
  136. reason our example string is represented in the Python shell as u'Mike' instead
  137. of 'Mike' is that PyMongo decodes each BSON string to a Python unicode string,
  138. not a regular str.
  139. `You can read more about Python unicode strings here
  140. <http://docs.python.org/howto/unicode.html>`_.
  141. Bulk Inserts
  142. ------------
  143. In order to make querying a little more interesting, let's insert a
  144. few more documents. In addition to inserting a single document, we can
  145. also perform *bulk insert* operations, by passing an iterable as the
  146. first argument to :meth:`~pymongo.collection.Collection.insert`. This
  147. will insert each document in the iterable, sending only a single
  148. command to the server:
  149. .. doctest::
  150. >>> new_posts = [{"author": "Mike",
  151. ... "text": "Another post!",
  152. ... "tags": ["bulk", "insert"],
  153. ... "date": datetime.datetime(2009, 11, 12, 11, 14)},
  154. ... {"author": "Eliot",
  155. ... "title": "MongoDB is fun",
  156. ... "text": "and pretty easy too!",
  157. ... "date": datetime.datetime(2009, 11, 10, 10, 45)}]
  158. >>> posts.insert(new_posts)
  159. [ObjectId('...'), ObjectId('...')]
  160. There are a couple of interesting things to note about this example:
  161. - The call to :meth:`~pymongo.collection.Collection.insert` now
  162. returns two :class:`~bson.objectid.ObjectId` instances, one for
  163. each inserted document.
  164. - ``new_posts[1]`` has a different "shape" than the other posts -
  165. there is no ``"tags"`` field and we've added a new field,
  166. ``"title"``. This is what we mean when we say that MongoDB is
  167. *schema-free*.
  168. Querying for More Than One Document
  169. -----------------------------------
  170. To get more than a single document as the result of a query we use the
  171. :meth:`~pymongo.collection.Collection.find`
  172. method. :meth:`~pymongo.collection.Collection.find` returns a
  173. :class:`~pymongo.cursor.Cursor` instance, which allows us to iterate
  174. over all matching documents. For example, we can iterate over every
  175. document in the ``posts`` collection:
  176. .. doctest::
  177. >>> for post in posts.find():
  178. ... post
  179. ...
  180. {u'date': datetime.datetime(...), u'text': u'My first blog post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']}
  181. {u'date': datetime.datetime(2009, 11, 12, 11, 14), u'text': u'Another post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'bulk', u'insert']}
  182. {u'date': datetime.datetime(2009, 11, 10, 10, 45), u'text': u'and pretty easy too!', u'_id': ObjectId('...'), u'author': u'Eliot', u'title': u'MongoDB is fun'}
  183. Just like we did with :meth:`~pymongo.collection.Collection.find_one`,
  184. we can pass a document to :meth:`~pymongo.collection.Collection.find`
  185. to limit the returned results. Here, we get only those documents whose
  186. author is "Mike":
  187. .. doctest::
  188. >>> for post in posts.find({"author": "Mike"}):
  189. ... post
  190. ...
  191. {u'date': datetime.datetime(...), u'text': u'My first blog post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']}
  192. {u'date': datetime.datetime(2009, 11, 12, 11, 14), u'text': u'Another post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'bulk', u'insert']}
  193. Counting
  194. --------
  195. If we just want to know how many documents match a query we can
  196. perform a :meth:`~pymongo.cursor.Cursor.count` operation instead of a
  197. full query. We can get a count of all of the documents in a
  198. collection:
  199. .. doctest::
  200. >>> posts.count()
  201. 3
  202. or just of those documents that match a specific query:
  203. .. doctest::
  204. >>> posts.find({"author": "Mike"}).count()
  205. 2
  206. Range Queries
  207. -------------
  208. MongoDB supports many different types of `advanced queries
  209. <http://www.mongodb.org/display/DOCS/Advanced+Queries>`_. As an
  210. example, lets perform a query where we limit results to posts older
  211. than a certain date, but also sort the results by author:
  212. .. doctest::
  213. >>> d = datetime.datetime(2009, 11, 12, 12)
  214. >>> for post in posts.find({"date": {"$lt": d}}).sort("author"):
  215. ... post
  216. ...
  217. {u'date': datetime.datetime(2009, 11, 10, 10, 45), u'text': u'and pretty easy too!', u'_id': ObjectId('...'), u'author': u'Eliot', u'title': u'MongoDB is fun'}
  218. {u'date': datetime.datetime(2009, 11, 12, 11, 14), u'text': u'Another post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'bulk', u'insert']}
  219. Here we use the special ``"$lt"`` operator to do a range query, and
  220. also call :meth:`~pymongo.cursor.Cursor.sort` to sort the results
  221. by author.
  222. Indexing
  223. --------
  224. To make the above query fast we can add a compound index on
  225. ``"date"`` and ``"author"``. To start, lets use the
  226. :meth:`~pymongo.cursor.Cursor.explain` method to get some information
  227. about how the query is being performed without the index:
  228. .. doctest::
  229. >>> posts.find({"date": {"$lt": d}}).sort("author").explain()["cursor"]
  230. u'BasicCursor'
  231. >>> posts.find({"date": {"$lt": d}}).sort("author").explain()["nscanned"]
  232. 3
  233. We can see that the query is using the *BasicCursor* and scanning over
  234. all 3 documents in the collection. Now let's add a compound index and
  235. look at the same information:
  236. .. doctest::
  237. >>> from pymongo import ASCENDING, DESCENDING
  238. >>> posts.create_index([("date", DESCENDING), ("author", ASCENDING)])
  239. u'date_-1_author_1'
  240. >>> posts.find({"date": {"$lt": d}}).sort("author").explain()["cursor"]
  241. u'BtreeCursor date_-1_author_1'
  242. >>> posts.find({"date": {"$lt": d}}).sort("author").explain()["nscanned"]
  243. 2
  244. Now the query is using a *BtreeCursor* (the index) and only scanning
  245. over the 2 matching documents.
  246. .. seealso:: The MongoDB documentation on `indexes <http://www.mongodb.org/display/DOCS/Indexes>`_