/SQLAlchemy-0.7.8/doc/_sources/orm/loading.txt
Plain Text | 502 lines | 385 code | 117 blank | 0 comment | 0 complexity | eaa49ef33576f2072cb9c8fe8fee0a7b MD5 | raw file
1.. _loading_toplevel:
2
3.. currentmodule:: sqlalchemy.orm
4
5Relationship Loading Techniques
6===============================
7
8A big part of SQLAlchemy is providing a wide range of control over how related objects get loaded when querying. This behavior
9can be configured at mapper construction time using the ``lazy`` parameter to the :func:`.relationship` function,
10as well as by using options with the :class:`.Query` object.
11
12Using Loader Strategies: Lazy Loading, Eager Loading
13----------------------------------------------------
14
15By default, all inter-object relationships are **lazy loading**. The scalar or
16collection attribute associated with a :func:`~sqlalchemy.orm.relationship`
17contains a trigger which fires the first time the attribute is accessed. This
18trigger, in all but one case, issues a SQL call at the point of access
19in order to load the related object or objects:
20
21.. sourcecode:: python+sql
22
23 {sql}>>> jack.addresses
24 SELECT addresses.id AS addresses_id, addresses.email_address AS addresses_email_address,
25 addresses.user_id AS addresses_user_id
26 FROM addresses
27 WHERE ? = addresses.user_id
28 [5]
29 {stop}[<Address(u'jack@google.com')>, <Address(u'j25@yahoo.com')>]
30
31The one case where SQL is not emitted is for a simple many-to-one relationship, when
32the related object can be identified by its primary key alone and that object is already
33present in the current :class:`.Session`.
34
35This default behavior of "load upon attribute access" is known as "lazy" or
36"select" loading - the name "select" because a "SELECT" statement is typically emitted
37when the attribute is first accessed.
38
39In the :ref:`ormtutorial_toplevel`, we introduced the concept of **Eager
40Loading**. We used an ``option`` in conjunction with the
41:class:`~sqlalchemy.orm.query.Query` object in order to indicate that a
42relationship should be loaded at the same time as the parent, within a single
43SQL query. This option, known as :func:`.joinedload`, connects a JOIN (by default
44a LEFT OUTER join) to the statement and populates the scalar/collection from the
45same result set as that of the parent:
46
47.. sourcecode:: python+sql
48
49 {sql}>>> jack = session.query(User).\
50 ... options(joinedload('addresses')).\
51 ... filter_by(name='jack').all() #doctest: +NORMALIZE_WHITESPACE
52 SELECT addresses_1.id AS addresses_1_id, addresses_1.email_address AS addresses_1_email_address,
53 addresses_1.user_id AS addresses_1_user_id, users.id AS users_id, users.name AS users_name,
54 users.fullname AS users_fullname, users.password AS users_password
55 FROM users LEFT OUTER JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id
56 WHERE users.name = ?
57 ['jack']
58
59
60In addition to "joined eager loading", a second option for eager loading
61exists, called "subquery eager loading". This kind of eager loading emits an
62additional SQL statement for each collection requested, aggregated across all
63parent objects:
64
65.. sourcecode:: python+sql
66
67 {sql}>>> jack = session.query(User).\
68 ... options(subqueryload('addresses')).\
69 ... filter_by(name='jack').all()
70 SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname,
71 users.password AS users_password
72 FROM users
73 WHERE users.name = ?
74 ('jack',)
75 SELECT addresses.id AS addresses_id, addresses.email_address AS addresses_email_address,
76 addresses.user_id AS addresses_user_id, anon_1.users_id AS anon_1_users_id
77 FROM (SELECT users.id AS users_id
78 FROM users
79 WHERE users.name = ?) AS anon_1 JOIN addresses ON anon_1.users_id = addresses.user_id
80 ORDER BY anon_1.users_id, addresses.id
81 ('jack',)
82
83The default **loader strategy** for any :func:`~sqlalchemy.orm.relationship`
84is configured by the ``lazy`` keyword argument, which defaults to ``select`` - this indicates
85a "select" statement .
86Below we set it as ``joined`` so that the ``children`` relationship is eager
87loading, using a join:
88
89.. sourcecode:: python+sql
90
91 # load the 'children' collection using LEFT OUTER JOIN
92 mapper(Parent, parent_table, properties={
93 'children': relationship(Child, lazy='joined')
94 })
95
96We can also set it to eagerly load using a second query for all collections,
97using ``subquery``:
98
99.. sourcecode:: python+sql
100
101 # load the 'children' attribute using a join to a subquery
102 mapper(Parent, parent_table, properties={
103 'children': relationship(Child, lazy='subquery')
104 })
105
106When querying, all three choices of loader strategy are available on a
107per-query basis, using the :func:`~sqlalchemy.orm.joinedload`,
108:func:`~sqlalchemy.orm.subqueryload` and :func:`~sqlalchemy.orm.lazyload`
109query options:
110
111.. sourcecode:: python+sql
112
113 # set children to load lazily
114 session.query(Parent).options(lazyload('children')).all()
115
116 # set children to load eagerly with a join
117 session.query(Parent).options(joinedload('children')).all()
118
119 # set children to load eagerly with a second statement
120 session.query(Parent).options(subqueryload('children')).all()
121
122To reference a relationship that is deeper than one level, separate the names by periods:
123
124.. sourcecode:: python+sql
125
126 session.query(Parent).options(joinedload('foo.bar.bat')).all()
127
128When using dot-separated names with :func:`~sqlalchemy.orm.joinedload` or
129:func:`~sqlalchemy.orm.subqueryload`, the option applies **only** to the actual
130attribute named, and **not** its ancestors. For example, suppose a mapping
131from ``A`` to ``B`` to ``C``, where the relationships, named ``atob`` and
132``btoc``, are both lazy-loading. A statement like the following:
133
134.. sourcecode:: python+sql
135
136 session.query(A).options(joinedload('atob.btoc')).all()
137
138will load only ``A`` objects to start. When the ``atob`` attribute on each
139``A`` is accessed, the returned ``B`` objects will *eagerly* load their ``C``
140objects.
141
142Therefore, to modify the eager load to load both ``atob`` as well as ``btoc``,
143place joinedloads for both:
144
145.. sourcecode:: python+sql
146
147 session.query(A).options(joinedload('atob'), joinedload('atob.btoc')).all()
148
149or more succinctly just use :func:`~sqlalchemy.orm.joinedload_all` or
150:func:`~sqlalchemy.orm.subqueryload_all`:
151
152.. sourcecode:: python+sql
153
154 session.query(A).options(joinedload_all('atob.btoc')).all()
155
156There are two other loader strategies available, **dynamic loading** and **no
157loading**; these are described in :ref:`largecollections`.
158
159Default Loading Strategies
160--------------------------
161
162.. versionadded:: 0.7.5
163 Default loader strategies as a new feature.
164
165Each of :func:`.joinedload`, :func:`.subqueryload`, :func:`.lazyload`,
166and :func:`.noload` can be used to set the default style of
167:func:`.relationship` loading
168for a particular query, affecting all :func:`.relationship` -mapped
169attributes not otherwise
170specified in the :class:`.Query`. This feature is available by passing
171the string ``'*'`` as the argument to any of these options::
172
173 session.query(MyClass).options(lazyload('*'))
174
175Above, the ``lazyload('*')`` option will supercede the ``lazy`` setting
176of all :func:`.relationship` constructs in use for that query,
177except for those which use the ``'dynamic'`` style of loading.
178If some relationships specify
179``lazy='joined'`` or ``lazy='subquery'``, for example,
180using ``default_strategy(lazy='select')`` will unilaterally
181cause all those relationships to use ``'select'`` loading.
182
183The option does not supercede loader options stated in the
184query, such as :func:`.eagerload`,
185:func:`.subqueryload`, etc. The query below will still use joined loading
186for the ``widget`` relationship::
187
188 session.query(MyClass).options(
189 lazyload('*'),
190 joinedload(MyClass.widget)
191 )
192
193If multiple ``'*'`` options are passed, the last one overrides
194those previously passed.
195
196.. _zen_of_eager_loading:
197
198The Zen of Eager Loading
199-------------------------
200
201The philosophy behind loader strategies is that any set of loading schemes can be
202applied to a particular query, and *the results don't change* - only the number
203of SQL statements required to fully load related objects and collections changes. A particular
204query might start out using all lazy loads. After using it in context, it might be revealed
205that particular attributes or collections are always accessed, and that it would be more
206efficient to change the loader strategy for these. The strategy can be changed with no other
207modifications to the query, the results will remain identical, but fewer SQL statements would be emitted.
208In theory (and pretty much in practice), nothing you can do to the :class:`.Query` would make it load
209a different set of primary or related objects based on a change in loader strategy.
210
211How :func:`joinedload` in particular achieves this result of not impacting
212entity rows returned in any way is that it creates an anonymous alias of the joins it adds to your
213query, so that they can't be referenced by other parts of the query. For example,
214the query below uses :func:`.joinedload` to create a LEFT OUTER JOIN from ``users``
215to ``addresses``, however the ``ORDER BY`` added against ``Address.email_address``
216is not valid - the ``Address`` entity is not named in the query:
217
218.. sourcecode:: python+sql
219
220 >>> jack = session.query(User).\
221 ... options(joinedload(User.addresses)).\
222 ... filter(User.name=='jack').\
223 ... order_by(Address.email_address).all()
224 {opensql}SELECT addresses_1.id AS addresses_1_id, addresses_1.email_address AS addresses_1_email_address,
225 addresses_1.user_id AS addresses_1_user_id, users.id AS users_id, users.name AS users_name,
226 users.fullname AS users_fullname, users.password AS users_password
227 FROM users LEFT OUTER JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id
228 WHERE users.name = ? ORDER BY addresses.email_address <-- this part is wrong !
229 ['jack']
230
231Above, ``ORDER BY addresses.email_address`` is not valid since ``addresses`` is not in the
232FROM list. The correct way to load the ``User`` records and order by email
233address is to use :meth:`.Query.join`:
234
235.. sourcecode:: python+sql
236
237 >>> jack = session.query(User).\
238 ... join(User.addresses).\
239 ... filter(User.name=='jack').\
240 ... order_by(Address.email_address).all()
241 {opensql}
242 SELECT users.id AS users_id, users.name AS users_name,
243 users.fullname AS users_fullname, users.password AS users_password
244 FROM users JOIN addresses ON users.id = addresses.user_id
245 WHERE users.name = ? ORDER BY addresses.email_address
246 ['jack']
247
248The statement above is of course not the same as the previous one, in that the columns from ``addresses``
249are not included in the result at all. We can add :func:`.joinedload` back in, so that
250there are two joins - one is that which we are ordering on, the other is used anonymously to
251load the contents of the ``User.addresses`` collection:
252
253.. sourcecode:: python+sql
254
255 >>> jack = session.query(User).\
256 ... join(User.addresses).\
257 ... options(joinedload(User.addresses)).\
258 ... filter(User.name=='jack').\
259 ... order_by(Address.email_address).all()
260 {opensql}SELECT addresses_1.id AS addresses_1_id, addresses_1.email_address AS addresses_1_email_address,
261 addresses_1.user_id AS addresses_1_user_id, users.id AS users_id, users.name AS users_name,
262 users.fullname AS users_fullname, users.password AS users_password
263 FROM users JOIN addresses ON users.id = addresses.user_id
264 LEFT OUTER JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id
265 WHERE users.name = ? ORDER BY addresses.email_address
266 ['jack']
267
268What we see above is that our usage of :meth:`.Query.join` is to supply JOIN clauses we'd like
269to use in subsequent query criterion, whereas our usage of :func:`.joinedload` only concerns
270itself with the loading of the ``User.addresses`` collection, for each ``User`` in the result.
271In this case, the two joins most probably appear redundant - which they are. If we
272wanted to use just one JOIN for collection loading as well as ordering, we use the
273:func:`.contains_eager` option, described in :ref:`contains_eager` below. But
274to see why :func:`joinedload` does what it does, consider if we were **filtering** on a
275particular ``Address``:
276
277.. sourcecode:: python+sql
278
279 >>> jack = session.query(User).\
280 ... join(User.addresses).\
281 ... options(joinedload(User.addresses)).\
282 ... filter(User.name=='jack').\
283 ... filter(Address.email_address=='someaddress@foo.com').\
284 ... all()
285 {opensql}SELECT addresses_1.id AS addresses_1_id, addresses_1.email_address AS addresses_1_email_address,
286 addresses_1.user_id AS addresses_1_user_id, users.id AS users_id, users.name AS users_name,
287 users.fullname AS users_fullname, users.password AS users_password
288 FROM users JOIN addresses ON users.id = addresses.user_id
289 LEFT OUTER JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id
290 WHERE users.name = ? AND addresses.email_address = ?
291 ['jack', 'someaddress@foo.com']
292
293Above, we can see that the two JOINs have very different roles. One will match exactly
294one row, that of the join of ``User`` and ``Address`` where ``Address.email_address=='someaddress@foo.com'``.
295The other LEFT OUTER JOIN will match *all* ``Address`` rows related to ``User``,
296and is only used to populate the ``User.addresses`` collection, for those ``User`` objects
297that are returned.
298
299By changing the usage of :func:`.joinedload` to another style of loading, we can change
300how the collection is loaded completely independently of SQL used to retrieve
301the actual ``User`` rows we want. Below we change :func:`.joinedload` into
302:func:`.subqueryload`:
303
304.. sourcecode:: python+sql
305
306 >>> jack = session.query(User).\
307 ... join(User.addresses).\
308 ... options(subqueryload(User.addresses)).\
309 ... filter(User.name=='jack').\
310 ... filter(Address.email_address=='someaddress@foo.com').\
311 ... all()
312 {opensql}SELECT users.id AS users_id, users.name AS users_name,
313 users.fullname AS users_fullname, users.password AS users_password
314 FROM users JOIN addresses ON users.id = addresses.user_id
315 WHERE users.name = ? AND addresses.email_address = ?
316 ['jack', 'someaddress@foo.com']
317
318 # ... subqueryload() emits a SELECT in order
319 # to load all address records ...
320
321When using joined eager loading, if the
322query contains a modifier that impacts the rows returned
323externally to the joins, such as when using DISTINCT, LIMIT, OFFSET
324or equivalent, the completed statement is first
325wrapped inside a subquery, and the joins used specifically for joined eager
326loading are applied to the subquery. SQLAlchemy's
327joined eager loading goes the extra mile, and then ten miles further, to
328absolutely ensure that it does not affect the end result of the query, only
329the way collections and related objects are loaded, no matter what the format of the query is.
330
331What Kind of Loading to Use ?
332-----------------------------
333
334Which type of loading to use typically comes down to optimizing the tradeoff
335between number of SQL executions, complexity of SQL emitted, and amount of
336data fetched. Lets take two examples, a :func:`~sqlalchemy.orm.relationship`
337which references a collection, and a :func:`~sqlalchemy.orm.relationship` that
338references a scalar many-to-one reference.
339
340* One to Many Collection
341
342 * When using the default lazy loading, if you load 100 objects, and then access a collection on each of
343 them, a total of 101 SQL statements will be emitted, although each statement will typically be a
344 simple SELECT without any joins.
345
346 * When using joined loading, the load of 100 objects and their collections will emit only one SQL
347 statement. However, the
348 total number of rows fetched will be equal to the sum of the size of all the collections, plus one
349 extra row for each parent object that has an empty collection. Each row will also contain the full
350 set of columns represented by the parents, repeated for each collection item - SQLAlchemy does not
351 re-fetch these columns other than those of the primary key, however most DBAPIs (with some
352 exceptions) will transmit the full data of each parent over the wire to the client connection in
353 any case. Therefore joined eager loading only makes sense when the size of the collections are
354 relatively small. The LEFT OUTER JOIN can also be performance intensive compared to an INNER join.
355
356 * When using subquery loading, the load of 100 objects will emit two SQL statements. The second
357 statement will fetch a total number of rows equal to the sum of the size of all collections. An
358 INNER JOIN is used, and a minimum of parent columns are requested, only the primary keys. So a
359 subquery load makes sense when the collections are larger.
360
361 * When multiple levels of depth are used with joined or subquery loading, loading collections-within-
362 collections will multiply the total number of rows fetched in a cartesian fashion. Both forms
363 of eager loading always join from the original parent class.
364
365* Many to One Reference
366
367 * When using the default lazy loading, a load of 100 objects will like in the case of the collection
368 emit as many as 101 SQL statements. However - there is a significant exception to this, in that
369 if the many-to-one reference is a simple foreign key reference to the target's primary key, each
370 reference will be checked first in the current identity map using :meth:`.Query.get`. So here,
371 if the collection of objects references a relatively small set of target objects, or the full set
372 of possible target objects have already been loaded into the session and are strongly referenced,
373 using the default of `lazy='select'` is by far the most efficient way to go.
374
375 * When using joined loading, the load of 100 objects will emit only one SQL statement. The join
376 will be a LEFT OUTER JOIN, and the total number of rows will be equal to 100 in all cases.
377 If you know that each parent definitely has a child (i.e. the foreign
378 key reference is NOT NULL), the joined load can be configured with ``innerjoin=True``, which is
379 usually specified within the :func:`~sqlalchemy.orm.relationship`. For a load of objects where
380 there are many possible target references which may have not been loaded already, joined loading
381 with an INNER JOIN is extremely efficient.
382
383 * Subquery loading will issue a second load for all the child objects, so for a load of 100 objects
384 there would be two SQL statements emitted. There's probably not much advantage here over
385 joined loading, however, except perhaps that subquery loading can use an INNER JOIN in all cases
386 whereas joined loading requires that the foreign key is NOT NULL.
387
388.. _joinedload_and_join:
389
390.. _contains_eager:
391
392Routing Explicit Joins/Statements into Eagerly Loaded Collections
393------------------------------------------------------------------
394
395The behavior of :func:`~sqlalchemy.orm.joinedload()` is such that joins are
396created automatically, using anonymous aliases as targets, the results of which
397are routed into collections and
398scalar references on loaded objects. It is often the case that a query already
399includes the necessary joins which represent a particular collection or scalar
400reference, and the joins added by the joinedload feature are redundant - yet
401you'd still like the collections/references to be populated.
402
403For this SQLAlchemy supplies the :func:`~sqlalchemy.orm.contains_eager()`
404option. This option is used in the same manner as the
405:func:`~sqlalchemy.orm.joinedload()` option except it is assumed that the
406:class:`~sqlalchemy.orm.query.Query` will specify the appropriate joins
407explicitly. Below it's used with a ``from_statement`` load::
408
409 # mapping is the users->addresses mapping
410 mapper(User, users_table, properties={
411 'addresses': relationship(Address, addresses_table)
412 })
413
414 # define a query on USERS with an outer join to ADDRESSES
415 statement = users_table.outerjoin(addresses_table).select().apply_labels()
416
417 # construct a Query object which expects the "addresses" results
418 query = session.query(User).options(contains_eager('addresses'))
419
420 # get results normally
421 r = query.from_statement(statement)
422
423It works just as well with an inline :meth:`.Query.join` or
424:meth:`.Query.outerjoin`::
425
426 session.query(User).outerjoin(User.addresses).options(contains_eager(User.addresses)).all()
427
428If the "eager" portion of the statement is "aliased", the ``alias`` keyword
429argument to :func:`~sqlalchemy.orm.contains_eager` may be used to indicate it.
430This is a string alias name or reference to an actual
431:class:`~sqlalchemy.sql.expression.Alias` (or other selectable) object:
432
433.. sourcecode:: python+sql
434
435 # use an alias of the Address entity
436 adalias = aliased(Address)
437
438 # construct a Query object which expects the "addresses" results
439 query = session.query(User).\
440 outerjoin(adalias, User.addresses).\
441 options(contains_eager(User.addresses, alias=adalias))
442
443 # get results normally
444 {sql}r = query.all()
445 SELECT users.user_id AS users_user_id, users.user_name AS users_user_name, adalias.address_id AS adalias_address_id,
446 adalias.user_id AS adalias_user_id, adalias.email_address AS adalias_email_address, (...other columns...)
447 FROM users LEFT OUTER JOIN email_addresses AS email_addresses_1 ON users.user_id = email_addresses_1.user_id
448
449The ``alias`` argument is used only as a source of columns to match up to the
450result set. You can use it to match up the result to arbitrary label
451names in a string SQL statement, by passing a :func:`.select` which links those
452labels to the mapped :class:`.Table`::
453
454 # label the columns of the addresses table
455 eager_columns = select([
456 addresses.c.address_id.label('a1'),
457 addresses.c.email_address.label('a2'),
458 addresses.c.user_id.label('a3')])
459
460 # select from a raw SQL statement which uses those label names for the
461 # addresses table. contains_eager() matches them up.
462 query = session.query(User).\
463 from_statement("select users.*, addresses.address_id as a1, "
464 "addresses.email_address as a2, addresses.user_id as a3 "
465 "from users left outer join addresses on users.user_id=addresses.user_id").\
466 options(contains_eager(User.addresses, alias=eager_columns))
467
468The path given as the argument to :func:`.contains_eager` needs
469to be a full path from the starting entity. For example if we were loading
470``Users->orders->Order->items->Item``, the string version would look like::
471
472 query(User).options(contains_eager('orders', 'items'))
473
474Or using the class-bound descriptor::
475
476 query(User).options(contains_eager(User.orders, Order.items))
477
478
479Relation Loader API
480--------------------
481
482.. autofunction:: contains_alias
483
484.. autofunction:: contains_eager
485
486.. autofunction:: eagerload
487
488.. autofunction:: eagerload_all
489
490.. autofunction:: immediateload
491
492.. autofunction:: joinedload
493
494.. autofunction:: joinedload_all
495
496.. autofunction:: lazyload
497
498.. autofunction:: noload
499
500.. autofunction:: subqueryload
501
502.. autofunction:: subqueryload_all