PageRenderTime 183ms CodeModel.GetById 100ms app.highlight 2ms RepoModel.GetById 79ms app.codeStats 0ms

/docs/topics/db/optimization.txt

https://code.google.com/p/mango-py/
Plain Text | 270 lines | 192 code | 78 blank | 0 comment | 0 complexity | 2befd0934ba3e1bcf8d047e2aa837eac MD5 | raw file
  1============================
  2Database access optimization
  3============================
  4
  5Django's database layer provides various ways to help developers get the most
  6out of their databases. This document gathers together links to the relevant
  7documentation, and adds various tips, organized under a number of headings that
  8outline the steps to take when attempting to optimize your database usage.
  9
 10Profile first
 11=============
 12
 13As general programming practice, this goes without saying. Find out :ref:`what
 14queries you are doing and what they are costing you
 15<faq-see-raw-sql-queries>`. You may also want to use an external project like
 16django-debug-toolbar_, or a tool that monitors your database directly.
 17
 18Remember that you may be optimizing for speed or memory or both, depending on
 19your requirements. Sometimes optimizing for one will be detrimental to the
 20other, but sometimes they will help each other. Also, work that is done by the
 21database process might not have the same cost (to you) as the same amount of
 22work done in your Python process. It is up to you to decide what your
 23priorities are, where the balance must lie, and profile all of these as required
 24since this will depend on your application and server.
 25
 26With everything that follows, remember to profile after every change to ensure
 27that the change is a benefit, and a big enough benefit given the decrease in
 28readability of your code. **All** of the suggestions below come with the caveat
 29that in your circumstances the general principle might not apply, or might even
 30be reversed.
 31
 32.. _django-debug-toolbar: http://robhudson.github.com/django-debug-toolbar/
 33
 34Use standard DB optimization techniques
 35=======================================
 36
 37...including:
 38
 39* Indexes. This is a number one priority, *after* you have determined from
 40  profiling what indexes should be added. Use
 41  :attr:`django.db.models.Field.db_index` to add these from Django.
 42
 43* Appropriate use of field types.
 44
 45We will assume you have done the obvious things above. The rest of this document
 46focuses on how to use Django in such a way that you are not doing unnecessary
 47work. This document also does not address other optimization techniques that
 48apply to all expensive operations, such as :doc:`general purpose caching
 49</topics/cache>`.
 50
 51Understand QuerySets
 52====================
 53
 54Understanding :doc:`QuerySets </ref/models/querysets>` is vital to getting good
 55performance with simple code. In particular:
 56
 57Understand QuerySet evaluation
 58------------------------------
 59
 60To avoid performance problems, it is important to understand:
 61
 62* that :ref:`QuerySets are lazy <querysets-are-lazy>`.
 63
 64* when :ref:`they are evaluated <when-querysets-are-evaluated>`.
 65
 66* how :ref:`the data is held in memory <caching-and-querysets>`.
 67
 68Understand cached attributes
 69----------------------------
 70
 71As well as caching of the whole ``QuerySet``, there is caching of the result of
 72attributes on ORM objects. In general, attributes that are not callable will be
 73cached. For example, assuming the :ref:`example Weblog models
 74<queryset-model-example>`::
 75
 76  >>> entry = Entry.objects.get(id=1)
 77  >>> entry.blog   # Blog object is retrieved at this point
 78  >>> entry.blog   # cached version, no DB access
 79
 80But in general, callable attributes cause DB lookups every time::
 81
 82  >>> entry = Entry.objects.get(id=1)
 83  >>> entry.authors.all()   # query performed
 84  >>> entry.authors.all()   # query performed again
 85
 86Be careful when reading template code - the template system does not allow use
 87of parentheses, but will call callables automatically, hiding the above
 88distinction.
 89
 90Be careful with your own custom properties - it is up to you to implement
 91caching.
 92
 93Use the ``with`` template tag
 94-----------------------------
 95
 96To make use of the caching behavior of ``QuerySet``, you may need to use the
 97:ttag:`with` template tag.
 98
 99Use ``iterator()``
100------------------
101
102When you have a lot of objects, the caching behavior of the ``QuerySet`` can
103cause a large amount of memory to be used. In this case,
104:meth:`~django.db.models.QuerySet.iterator()` may help.
105
106Do database work in the database rather than in Python
107======================================================
108
109For instance:
110
111* At the most basic level, use :ref:`filter and exclude <queryset-api>` to do
112  filtering in the database.
113
114* Use :ref:`F() object query expressions <query-expressions>` to do filtering
115  against other fields within the same model.
116
117* Use :doc:`annotate to do aggregation in the database </topics/db/aggregation>`.
118
119If these aren't enough to generate the SQL you need:
120
121Use ``QuerySet.extra()``
122------------------------
123
124A less portable but more powerful method is
125:meth:`~django.db.models.QuerySet.extra()`, which allows some SQL to be
126explicitly added to the query. If that still isn't powerful enough:
127
128Use raw SQL
129-----------
130
131Write your own :doc:`custom SQL to retrieve data or populate models
132</topics/db/sql>`. Use ``django.db.connection.queries`` to find out what Django
133is writing for you and start from there.
134
135Retrieve everything at once if you know you will need it
136========================================================
137
138Hitting the database multiple times for different parts of a single 'set' of
139data that you will need all parts of is, in general, less efficient than
140retrieving it all in one query. This is particularly important if you have a
141query that is executed in a loop, and could therefore end up doing many database
142queries, when only one was needed. So:
143
144Use ``QuerySet.select_related()``
145---------------------------------
146
147Understand :ref:`QuerySet.select_related() <select-related>` thoroughly, and use it:
148
149* in view code,
150
151* and in :doc:`managers and default managers </topics/db/managers>` where
152  appropriate. Be aware when your manager is and is not used; sometimes this is
153  tricky so don't make assumptions.
154
155Don't retrieve things you don't need
156====================================
157
158Use ``QuerySet.values()`` and ``values_list()``
159-----------------------------------------------
160
161When you just want a ``dict`` or ``list`` of values, and don't need ORM model
162objects, make appropriate usage of :meth:`~django.db.models.QuerySet.values()`.
163These can be useful for replacing model objects in template code - as long as
164the dicts you supply have the same attributes as those used in the template,
165you are fine.
166
167Use ``QuerySet.defer()`` and ``only()``
168---------------------------------------
169
170Use :meth:`~django.db.models.QuerySet.defer()` and
171:meth:`~django.db.models.QuerySet.only()` if there are database columns you
172know that you won't need (or won't need in most cases) to avoid loading
173them. Note that if you *do* use them, the ORM will have to go and get them in
174a separate query, making this a pessimization if you use it inappropriately.
175
176Also, be aware that there is some (small extra) overhead incurred inside
177Django when constructing a model with deferred fields. Don't be too aggressive
178in deferring fields without profiling as the database has to read most of the
179non-text, non-VARCHAR data from the disk for a single row in the results, even
180if it ends up only using a few columns. The ``defer()`` and ``only()`` methods
181are most useful when you can avoid loading a lot of text data or for fields
182that might take a lot of processing to convert back to Python. As always,
183profile first, then optimize.
184
185Use QuerySet.count()
186--------------------
187
188...if you only want the count, rather than doing ``len(queryset)``.
189
190Use QuerySet.exists()
191---------------------
192
193...if you only want to find out if at least one result exists, rather than ``if
194queryset``.
195
196But:
197
198Don't overuse ``count()`` and ``exists()``
199------------------------------------------
200
201If you are going to need other data from the QuerySet, just evaluate it.
202
203For example, assuming an Email model that has a ``body`` attribute and a
204many-to-many relation to User, the following template code is optimal:
205
206.. code-block:: html+django
207
208   {% if display_inbox %}
209     {% with emails=user.emails.all %}
210       {% if emails %}
211         <p>You have {{ emails|length }} email(s)</p>
212         {% for email in emails %}
213           <p>{{ email.body }}</p>
214         {% endfor %}
215       {% else %}
216         <p>No messages today.</p>
217       {% endif %}
218     {% endwith %}
219   {% endif %}
220
221
222It is optimal because:
223
224 1. Since QuerySets are lazy, this does no database queries if 'display_inbox'
225    is False.
226
227 #. Use of ``with`` means that we store ``user.emails.all`` in a variable for
228    later use, allowing its cache to be re-used.
229
230 #. The line ``{% if emails %}`` causes ``QuerySet.__nonzero__()`` to be called,
231    which causes the ``user.emails.all()`` query to be run on the database, and
232    at the least the first line to be turned into an ORM object. If there aren't
233    any results, it will return False, otherwise True.
234
235 #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling
236    out the rest of the cache without doing another query.
237
238 #. The ``for`` loop iterates over the already filled cache.
239
240In total, this code does either one or zero database queries. The only
241deliberate optimization performed is the use of the ``with`` tag. Using
242``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause
243additional queries.
244
245Use ``QuerySet.update()`` and ``delete()``
246------------------------------------------
247
248Rather than retrieve a load of objects, set some values, and save them
249individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update()
250<topics-db-queries-update>`. Similarly, do :ref:`bulk deletes
251<topics-db-queries-delete>` where possible.
252
253Note, however, that these bulk update methods cannot call the ``save()`` or
254``delete()`` methods of individual instances, which means that any custom
255behavior you have added for these methods will not be executed, including
256anything driven from the normal database object :doc:`signals </ref/signals>`.
257
258Use foreign key values directly
259-------------------------------
260
261If you only need a foreign key value, use the foreign key value that is already on
262the object you've got, rather than getting the whole related object and taking
263its primary key. i.e. do::
264
265   entry.blog_id
266
267instead of::
268
269   entry.blog.id
270