/Doc/library/pickle.rst

http://unladen-swallow.googlecode.com/ · ReStructuredText · 888 lines · 643 code · 245 blank · 0 comment · 0 complexity · 931d707ad4b3334b353d55615e4970d7 MD5 · raw file

  1. :mod:`pickle` --- Python object serialization
  2. =============================================
  3. .. index::
  4. single: persistence
  5. pair: persistent; objects
  6. pair: serializing; objects
  7. pair: marshalling; objects
  8. pair: flattening; objects
  9. pair: pickling; objects
  10. .. module:: pickle
  11. :synopsis: Convert Python objects to streams of bytes and back.
  12. .. sectionauthor:: Jim Kerr <jbkerr@sr.hp.com>.
  13. .. sectionauthor:: Barry Warsaw <barry@zope.com>
  14. The :mod:`pickle` module implements a fundamental, but powerful algorithm for
  15. serializing and de-serializing a Python object structure. "Pickling" is the
  16. process whereby a Python object hierarchy is converted into a byte stream, and
  17. "unpickling" is the inverse operation, whereby a byte stream is converted back
  18. into an object hierarchy. Pickling (and unpickling) is alternatively known as
  19. "serialization", "marshalling," [#]_ or "flattening", however, to avoid
  20. confusion, the terms used here are "pickling" and "unpickling".
  21. This documentation describes both the :mod:`pickle` module and the
  22. :mod:`cPickle` module.
  23. Relationship to other Python modules
  24. ------------------------------------
  25. The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle`
  26. module. As its name implies, :mod:`cPickle` is written in C, so it can be up to
  27. 1000 times faster than :mod:`pickle`. However it does not support subclassing
  28. of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle`
  29. these are functions, not classes. Most applications have no need for this
  30. functionality, and can benefit from the improved performance of :mod:`cPickle`.
  31. Other than that, the interfaces of the two modules are nearly identical; the
  32. common interface is described in this manual and differences are pointed out
  33. where necessary. In the following discussions, we use the term "pickle" to
  34. collectively describe the :mod:`pickle` and :mod:`cPickle` modules.
  35. The data streams the two modules produce are guaranteed to be interchangeable.
  36. Python has a more primitive serialization module called :mod:`marshal`, but in
  37. general :mod:`pickle` should always be the preferred way to serialize Python
  38. objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc`
  39. files.
  40. The :mod:`pickle` module differs from :mod:`marshal` several significant ways:
  41. * The :mod:`pickle` module keeps track of the objects it has already serialized,
  42. so that later references to the same object won't be serialized again.
  43. :mod:`marshal` doesn't do this.
  44. This has implications both for recursive objects and object sharing. Recursive
  45. objects are objects that contain references to themselves. These are not
  46. handled by marshal, and in fact, attempting to marshal recursive objects will
  47. crash your Python interpreter. Object sharing happens when there are multiple
  48. references to the same object in different places in the object hierarchy being
  49. serialized. :mod:`pickle` stores such objects only once, and ensures that all
  50. other references point to the master copy. Shared objects remain shared, which
  51. can be very important for mutable objects.
  52. * :mod:`marshal` cannot be used to serialize user-defined classes and their
  53. instances. :mod:`pickle` can save and restore class instances transparently,
  54. however the class definition must be importable and live in the same module as
  55. when the object was stored.
  56. * The :mod:`marshal` serialization format is not guaranteed to be portable
  57. across Python versions. Because its primary job in life is to support
  58. :file:`.pyc` files, the Python implementers reserve the right to change the
  59. serialization format in non-backwards compatible ways should the need arise.
  60. The :mod:`pickle` serialization format is guaranteed to be backwards compatible
  61. across Python releases.
  62. .. warning::
  63. The :mod:`pickle` module is not intended to be secure against erroneous or
  64. maliciously constructed data. Never unpickle data received from an untrusted
  65. or unauthenticated source.
  66. Note that serialization is a more primitive notion than persistence; although
  67. :mod:`pickle` reads and writes file objects, it does not handle the issue of
  68. naming persistent objects, nor the (even more complicated) issue of concurrent
  69. access to persistent objects. The :mod:`pickle` module can transform a complex
  70. object into a byte stream and it can transform the byte stream into an object
  71. with the same internal structure. Perhaps the most obvious thing to do with
  72. these byte streams is to write them onto a file, but it is also conceivable to
  73. send them across a network or store them in a database. The module
  74. :mod:`shelve` provides a simple interface to pickle and unpickle objects on
  75. DBM-style database files.
  76. Data stream format
  77. ------------------
  78. .. index::
  79. single: XDR
  80. single: External Data Representation
  81. The data format used by :mod:`pickle` is Python-specific. This has the
  82. advantage that there are no restrictions imposed by external standards such as
  83. XDR (which can't represent pointer sharing); however it means that non-Python
  84. programs may not be able to reconstruct pickled Python objects.
  85. By default, the :mod:`pickle` data format uses a printable ASCII representation.
  86. This is slightly more voluminous than a binary representation. The big
  87. advantage of using printable ASCII (and of some other characteristics of
  88. :mod:`pickle`'s representation) is that for debugging or recovery purposes it is
  89. possible for a human to read the pickled file with a standard text editor.
  90. There are currently 3 different protocols which can be used for pickling.
  91. * Protocol version 0 is the original ASCII protocol and is backwards compatible
  92. with earlier versions of Python.
  93. * Protocol version 1 is the old binary format which is also compatible with
  94. earlier versions of Python.
  95. * Protocol version 2 was introduced in Python 2.3. It provides much more
  96. efficient pickling of :term:`new-style class`\es.
  97. Refer to :pep:`307` for more information.
  98. If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified
  99. as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version
  100. available will be used.
  101. .. versionchanged:: 2.3
  102. Introduced the *protocol* parameter.
  103. A binary format, which is slightly more efficient, can be chosen by specifying a
  104. *protocol* version >= 1.
  105. Usage
  106. -----
  107. To serialize an object hierarchy, you first create a pickler, then you call the
  108. pickler's :meth:`dump` method. To de-serialize a data stream, you first create
  109. an unpickler, then you call the unpickler's :meth:`load` method. The
  110. :mod:`pickle` module provides the following constant:
  111. .. data:: HIGHEST_PROTOCOL
  112. The highest protocol version available. This value can be passed as a
  113. *protocol* value.
  114. .. versionadded:: 2.3
  115. .. note::
  116. Be sure to always open pickle files created with protocols >= 1 in binary mode.
  117. For the old ASCII-based pickle protocol 0 you can use either text mode or binary
  118. mode as long as you stay consistent.
  119. A pickle file written with protocol 0 in binary mode will contain lone linefeeds
  120. as line terminators and therefore will look "funny" when viewed in Notepad or
  121. other editors which do not support this format.
  122. The :mod:`pickle` module provides the following functions to make the pickling
  123. process more convenient:
  124. .. function:: dump(obj, file[, protocol])
  125. Write a pickled representation of *obj* to the open file object *file*. This is
  126. equivalent to ``Pickler(file, protocol).dump(obj)``.
  127. If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
  128. specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
  129. version will be used.
  130. .. versionchanged:: 2.3
  131. Introduced the *protocol* parameter.
  132. *file* must have a :meth:`write` method that accepts a single string argument.
  133. It can thus be a file object opened for writing, a :mod:`StringIO` object, or
  134. any other custom object that meets this interface.
  135. .. function:: load(file)
  136. Read a string from the open file object *file* and interpret it as a pickle data
  137. stream, reconstructing and returning the original object hierarchy. This is
  138. equivalent to ``Unpickler(file).load()``.
  139. *file* must have two methods, a :meth:`read` method that takes an integer
  140. argument, and a :meth:`readline` method that requires no arguments. Both
  141. methods should return a string. Thus *file* can be a file object opened for
  142. reading, a :mod:`StringIO` object, or any other custom object that meets this
  143. interface.
  144. This function automatically determines whether the data stream was written in
  145. binary mode or not.
  146. .. function:: dumps(obj[, protocol])
  147. Return the pickled representation of the object as a string, instead of writing
  148. it to a file.
  149. If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
  150. specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
  151. version will be used.
  152. .. versionchanged:: 2.3
  153. The *protocol* parameter was added.
  154. .. function:: loads(string)
  155. Read a pickled object hierarchy from a string. Characters in the string past
  156. the pickled object's representation are ignored.
  157. The :mod:`pickle` module also defines three exceptions:
  158. .. exception:: PickleError
  159. A common base class for the other exceptions defined below. This inherits from
  160. :exc:`Exception`.
  161. .. exception:: PicklingError
  162. This exception is raised when an unpicklable object is passed to the
  163. :meth:`dump` method.
  164. .. exception:: UnpicklingError
  165. This exception is raised when there is a problem unpickling an object. Note that
  166. other exceptions may also be raised during unpickling, including (but not
  167. necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
  168. :exc:`ImportError`, and :exc:`IndexError`.
  169. The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
  170. :class:`Unpickler`:
  171. .. class:: Pickler(file[, protocol])
  172. This takes a file-like object to which it will write a pickle data stream.
  173. If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
  174. specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
  175. protocol version will be used.
  176. .. versionchanged:: 2.3
  177. Introduced the *protocol* parameter.
  178. *file* must have a :meth:`write` method that accepts a single string argument.
  179. It can thus be an open file object, a :mod:`StringIO` object, or any other
  180. custom object that meets this interface.
  181. :class:`Pickler` objects define one (or two) public methods:
  182. .. method:: dump(obj)
  183. Write a pickled representation of *obj* to the open file object given in the
  184. constructor. Either the binary or ASCII format will be used, depending on the
  185. value of the *protocol* argument passed to the constructor.
  186. .. method:: clear_memo()
  187. Clears the pickler's "memo". The memo is the data structure that remembers
  188. which objects the pickler has already seen, so that shared or recursive objects
  189. pickled by reference and not by value. This method is useful when re-using
  190. picklers.
  191. .. note::
  192. Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers
  193. created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an
  194. instance variable called :attr:`memo` which is a Python dictionary. So to clear
  195. the memo for a :mod:`pickle` module pickler, you could do the following::
  196. mypickler.memo.clear()
  197. Code that does not need to support older versions of Python should simply use
  198. :meth:`clear_memo`.
  199. It is possible to make multiple calls to the :meth:`dump` method of the same
  200. :class:`Pickler` instance. These must then be matched to the same number of
  201. calls to the :meth:`load` method of the corresponding :class:`Unpickler`
  202. instance. If the same object is pickled by multiple :meth:`dump` calls, the
  203. :meth:`load` will all yield references to the same object. [#]_
  204. :class:`Unpickler` objects are defined as:
  205. .. class:: Unpickler(file)
  206. This takes a file-like object from which it will read a pickle data stream.
  207. This class automatically determines whether the data stream was written in
  208. binary mode or not, so it does not need a flag as in the :class:`Pickler`
  209. factory.
  210. *file* must have two methods, a :meth:`read` method that takes an integer
  211. argument, and a :meth:`readline` method that requires no arguments. Both
  212. methods should return a string. Thus *file* can be a file object opened for
  213. reading, a :mod:`StringIO` object, or any other custom object that meets this
  214. interface.
  215. :class:`Unpickler` objects have one (or two) public methods:
  216. .. method:: load()
  217. Read a pickled object representation from the open file object given in
  218. the constructor, and return the reconstituted object hierarchy specified
  219. therein.
  220. This method automatically determines whether the data stream was written
  221. in binary mode or not.
  222. .. method:: noload()
  223. This is just like :meth:`load` except that it doesn't actually create any
  224. objects. This is useful primarily for finding what's called "persistent
  225. ids" that may be referenced in a pickle data stream. See section
  226. :ref:`pickle-protocol` below for more details.
  227. **Note:** the :meth:`noload` method is currently only available on
  228. :class:`Unpickler` objects created with the :mod:`cPickle` module.
  229. :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload`
  230. method.
  231. What can be pickled and unpickled?
  232. ----------------------------------
  233. The following types can be pickled:
  234. * ``None``, ``True``, and ``False``
  235. * integers, long integers, floating point numbers, complex numbers
  236. * normal and Unicode strings
  237. * tuples, lists, sets, and dictionaries containing only picklable objects
  238. * functions defined at the top level of a module
  239. * built-in functions defined at the top level of a module
  240. * classes that are defined at the top level of a module
  241. * instances of such classes whose :attr:`__dict__` or :meth:`__setstate__` is
  242. picklable (see section :ref:`pickle-protocol` for details)
  243. Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
  244. exception; when this happens, an unspecified number of bytes may have already
  245. been written to the underlying file. Trying to pickle a highly recursive data
  246. structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
  247. raised in this case. You can carefully raise this limit with
  248. :func:`sys.setrecursionlimit`.
  249. Note that functions (built-in and user-defined) are pickled by "fully qualified"
  250. name reference, not by value. This means that only the function name is
  251. pickled, along with the name of module the function is defined in. Neither the
  252. function's code, nor any of its function attributes are pickled. Thus the
  253. defining module must be importable in the unpickling environment, and the module
  254. must contain the named object, otherwise an exception will be raised. [#]_
  255. Similarly, classes are pickled by named reference, so the same restrictions in
  256. the unpickling environment apply. Note that none of the class's code or data is
  257. pickled, so in the following example the class attribute ``attr`` is not
  258. restored in the unpickling environment::
  259. class Foo:
  260. attr = 'a class attr'
  261. picklestring = pickle.dumps(Foo)
  262. These restrictions are why picklable functions and classes must be defined in
  263. the top level of a module.
  264. Similarly, when class instances are pickled, their class's code and data are not
  265. pickled along with them. Only the instance data are pickled. This is done on
  266. purpose, so you can fix bugs in a class or add methods to the class and still
  267. load objects that were created with an earlier version of the class. If you
  268. plan to have long-lived objects that will see many versions of a class, it may
  269. be worthwhile to put a version number in the objects so that suitable
  270. conversions can be made by the class's :meth:`__setstate__` method.
  271. .. _pickle-protocol:
  272. The pickle protocol
  273. -------------------
  274. .. currentmodule:: None
  275. This section describes the "pickling protocol" that defines the interface
  276. between the pickler/unpickler and the objects that are being serialized. This
  277. protocol provides a standard way for you to define, customize, and control how
  278. your objects are serialized and de-serialized. The description in this section
  279. doesn't cover specific customizations that you can employ to make the unpickling
  280. environment slightly safer from untrusted pickle data streams; see section
  281. :ref:`pickle-sub` for more details.
  282. .. _pickle-inst:
  283. Pickling and unpickling normal class instances
  284. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  285. .. method:: object.__getinitargs__()
  286. When a pickled class instance is unpickled, its :meth:`__init__` method is
  287. normally *not* invoked. If it is desirable that the :meth:`__init__` method
  288. be called on unpickling, an old-style class can define a method
  289. :meth:`__getinitargs__`, which should return a *tuple* containing the
  290. arguments to be passed to the class constructor (:meth:`__init__` for
  291. example). The :meth:`__getinitargs__` method is called at pickle time; the
  292. tuple it returns is incorporated in the pickle for the instance.
  293. .. method:: object.__getnewargs__()
  294. New-style types can provide a :meth:`__getnewargs__` method that is used for
  295. protocol 2. Implementing this method is needed if the type establishes some
  296. internal invariants when the instance is created, or if the memory allocation
  297. is affected by the values passed to the :meth:`__new__` method for the type
  298. (as it is for tuples and strings). Instances of a :term:`new-style class`
  299. ``C`` are created using ::
  300. obj = C.__new__(C, *args)
  301. where *args* is the result of calling :meth:`__getnewargs__` on the original
  302. object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
  303. .. method:: object.__getstate__()
  304. Classes can further influence how their instances are pickled; if the class
  305. defines the method :meth:`__getstate__`, it is called and the return state is
  306. pickled as the contents for the instance, instead of the contents of the
  307. instance's dictionary. If there is no :meth:`__getstate__` method, the
  308. instance's :attr:`__dict__` is pickled.
  309. .. method:: object.__setstate__()
  310. Upon unpickling, if the class also defines the method :meth:`__setstate__`,
  311. it is called with the unpickled state. [#]_ If there is no
  312. :meth:`__setstate__` method, the pickled state must be a dictionary and its
  313. items are assigned to the new instance's dictionary. If a class defines both
  314. :meth:`__getstate__` and :meth:`__setstate__`, the state object needn't be a
  315. dictionary and these methods can do what they want. [#]_
  316. .. note::
  317. For :term:`new-style class`\es, if :meth:`__getstate__` returns a false
  318. value, the :meth:`__setstate__` method will not be called.
  319. .. note::
  320. At unpickling time, some methods like :meth:`__getattr__`,
  321. :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
  322. instance. In case those methods rely on some internal invariant being
  323. true, the type should implement either :meth:`__getinitargs__` or
  324. :meth:`__getnewargs__` to establish such an invariant; otherwise, neither
  325. :meth:`__new__` nor :meth:`__init__` will be called.
  326. Pickling and unpickling extension types
  327. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  328. .. method:: object.__reduce__()
  329. When the :class:`Pickler` encounters an object of a type it knows nothing
  330. about --- such as an extension type --- it looks in two places for a hint of
  331. how to pickle it. One alternative is for the object to implement a
  332. :meth:`__reduce__` method. If provided, at pickling time :meth:`__reduce__`
  333. will be called with no arguments, and it must return either a string or a
  334. tuple.
  335. If a string is returned, it names a global variable whose contents are
  336. pickled as normal. The string returned by :meth:`__reduce__` should be the
  337. object's local name relative to its module; the pickle module searches the
  338. module namespace to determine the object's module.
  339. When a tuple is returned, it must be between two and five elements long.
  340. Optional elements can either be omitted, or ``None`` can be provided as their
  341. value. The contents of this tuple are pickled as normal and used to
  342. reconstruct the object at unpickling time. The semantics of each element
  343. are:
  344. * A callable object that will be called to create the initial version of the
  345. object. The next element of the tuple will provide arguments for this
  346. callable, and later elements provide additional state information that will
  347. subsequently be used to fully reconstruct the pickled data.
  348. In the unpickling environment this object must be either a class, a
  349. callable registered as a "safe constructor" (see below), or it must have an
  350. attribute :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
  351. :exc:`UnpicklingError` will be raised in the unpickling environment. Note
  352. that as usual, the callable itself is pickled by name.
  353. * A tuple of arguments for the callable object.
  354. .. versionchanged:: 2.5
  355. Formerly, this argument could also be ``None``.
  356. * Optionally, the object's state, which will be passed to the object's
  357. :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If
  358. the object has no :meth:`__setstate__` method, then, as above, the value
  359. must be a dictionary and it will be added to the object's :attr:`__dict__`.
  360. * Optionally, an iterator (and not a sequence) yielding successive list
  361. items. These list items will be pickled, and appended to the object using
  362. either ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is
  363. primarily used for list subclasses, but may be used by other classes as
  364. long as they have :meth:`append` and :meth:`extend` methods with the
  365. appropriate signature. (Whether :meth:`append` or :meth:`extend` is used
  366. depends on which pickle protocol version is used as well as the number of
  367. items to append, so both must be supported.)
  368. * Optionally, an iterator (not a sequence) yielding successive dictionary
  369. items, which should be tuples of the form ``(key, value)``. These items
  370. will be pickled and stored to the object using ``obj[key] = value``. This
  371. is primarily used for dictionary subclasses, but may be used by other
  372. classes as long as they implement :meth:`__setitem__`.
  373. .. method:: object.__reduce_ex__(protocol)
  374. It is sometimes useful to know the protocol version when implementing
  375. :meth:`__reduce__`. This can be done by implementing a method named
  376. :meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`,
  377. when it exists, is called in preference over :meth:`__reduce__` (you may
  378. still provide :meth:`__reduce__` for backwards compatibility). The
  379. :meth:`__reduce_ex__` method will be called with a single integer argument,
  380. the protocol version.
  381. The :class:`object` class implements both :meth:`__reduce__` and
  382. :meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__`
  383. but not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation
  384. detects this and calls :meth:`__reduce__`.
  385. An alternative to implementing a :meth:`__reduce__` method on the object to be
  386. pickled, is to register the callable with the :mod:`copy_reg` module. This
  387. module provides a way for programs to register "reduction functions" and
  388. constructors for user-defined types. Reduction functions have the same
  389. semantics and interface as the :meth:`__reduce__` method described above, except
  390. that they are called with a single argument, the object to be pickled.
  391. The registered constructor is deemed a "safe constructor" for purposes of
  392. unpickling as described above.
  393. Pickling and unpickling external objects
  394. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  395. .. index::
  396. single: persistent_id (pickle protocol)
  397. single: persistent_load (pickle protocol)
  398. For the benefit of object persistence, the :mod:`pickle` module supports the
  399. notion of a reference to an object outside the pickled data stream. Such
  400. objects are referenced by a "persistent id", which is just an arbitrary string
  401. of printable ASCII characters. The resolution of such names is not defined by
  402. the :mod:`pickle` module; it will delegate this resolution to user defined
  403. functions on the pickler and unpickler. [#]_
  404. To define external persistent id resolution, you need to set the
  405. :attr:`persistent_id` attribute of the pickler object and the
  406. :attr:`persistent_load` attribute of the unpickler object.
  407. To pickle objects that have an external persistent id, the pickler must have a
  408. custom :func:`persistent_id` method that takes an object as an argument and
  409. returns either ``None`` or the persistent id for that object. When ``None`` is
  410. returned, the pickler simply pickles the object as normal. When a persistent id
  411. string is returned, the pickler will pickle that string, along with a marker so
  412. that the unpickler will recognize the string as a persistent id.
  413. To unpickle external objects, the unpickler must have a custom
  414. :func:`persistent_load` function that takes a persistent id string and returns
  415. the referenced object.
  416. Here's a silly example that *might* shed more light::
  417. import pickle
  418. from cStringIO import StringIO
  419. src = StringIO()
  420. p = pickle.Pickler(src)
  421. def persistent_id(obj):
  422. if hasattr(obj, 'x'):
  423. return 'the value %d' % obj.x
  424. else:
  425. return None
  426. p.persistent_id = persistent_id
  427. class Integer:
  428. def __init__(self, x):
  429. self.x = x
  430. def __str__(self):
  431. return 'My name is integer %d' % self.x
  432. i = Integer(7)
  433. print i
  434. p.dump(i)
  435. datastream = src.getvalue()
  436. print repr(datastream)
  437. dst = StringIO(datastream)
  438. up = pickle.Unpickler(dst)
  439. class FancyInteger(Integer):
  440. def __str__(self):
  441. return 'I am the integer %d' % self.x
  442. def persistent_load(persid):
  443. if persid.startswith('the value '):
  444. value = int(persid.split()[2])
  445. return FancyInteger(value)
  446. else:
  447. raise pickle.UnpicklingError, 'Invalid persistent id'
  448. up.persistent_load = persistent_load
  449. j = up.load()
  450. print j
  451. In the :mod:`cPickle` module, the unpickler's :attr:`persistent_load` attribute
  452. can also be set to a Python list, in which case, when the unpickler reaches a
  453. persistent id, the persistent id string will simply be appended to this list.
  454. This functionality exists so that a pickle data stream can be "sniffed" for
  455. object references without actually instantiating all the objects in a pickle.
  456. [#]_ Setting :attr:`persistent_load` to a list is usually used in conjunction
  457. with the :meth:`noload` method on the Unpickler.
  458. .. BAW: Both pickle and cPickle support something called inst_persistent_id()
  459. which appears to give unknown types a second shot at producing a persistent
  460. id. Since Jim Fulton can't remember why it was added or what it's for, I'm
  461. leaving it undocumented.
  462. .. _pickle-sub:
  463. Subclassing Unpicklers
  464. ----------------------
  465. .. index::
  466. single: load_global() (pickle protocol)
  467. single: find_global() (pickle protocol)
  468. By default, unpickling will import any class that it finds in the pickle data.
  469. You can control exactly what gets unpickled and what gets called by customizing
  470. your unpickler. Unfortunately, exactly how you do this is different depending
  471. on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_
  472. In the :mod:`pickle` module, you need to derive a subclass from
  473. :class:`Unpickler`, overriding the :meth:`load_global` method.
  474. :meth:`load_global` should read two lines from the pickle data stream where the
  475. first line will the name of the module containing the class and the second line
  476. will be the name of the instance's class. It then looks up the class, possibly
  477. importing the module and digging out the attribute, then it appends what it
  478. finds to the unpickler's stack. Later on, this class will be assigned to the
  479. :attr:`__class__` attribute of an empty class, as a way of magically creating an
  480. instance without calling its class's :meth:`__init__`. Your job (should you
  481. choose to accept it), would be to have :meth:`load_global` push onto the
  482. unpickler's stack, a known safe version of any class you deem safe to unpickle.
  483. It is up to you to produce such a class. Or you could raise an error if you
  484. want to disallow all unpickling of instances. If this sounds like a hack,
  485. you're right. Refer to the source code to make this work.
  486. Things are a little cleaner with :mod:`cPickle`, but not by much. To control
  487. what gets unpickled, you can set the unpickler's :attr:`find_global` attribute
  488. to a function or ``None``. If it is ``None`` then any attempts to unpickle
  489. instances will raise an :exc:`UnpicklingError`. If it is a function, then it
  490. should accept a module name and a class name, and return the corresponding class
  491. object. It is responsible for looking up the class and performing any necessary
  492. imports, and it may raise an error to prevent instances of the class from being
  493. unpickled.
  494. The moral of the story is that you should be really careful about the source of
  495. the strings your application unpickles.
  496. .. _pickle-example:
  497. Example
  498. -------
  499. For the simplest code, use the :func:`dump` and :func:`load` functions. Note
  500. that a self-referencing list is pickled and restored correctly. ::
  501. import pickle
  502. data1 = {'a': [1, 2.0, 3, 4+6j],
  503. 'b': ('string', u'Unicode string'),
  504. 'c': None}
  505. selfref_list = [1, 2, 3]
  506. selfref_list.append(selfref_list)
  507. output = open('data.pkl', 'wb')
  508. # Pickle dictionary using protocol 0.
  509. pickle.dump(data1, output)
  510. # Pickle the list using the highest protocol available.
  511. pickle.dump(selfref_list, output, -1)
  512. output.close()
  513. The following example reads the resulting pickled data. When reading a
  514. pickle-containing file, you should open the file in binary mode because you
  515. can't be sure if the ASCII or binary format was used. ::
  516. import pprint, pickle
  517. pkl_file = open('data.pkl', 'rb')
  518. data1 = pickle.load(pkl_file)
  519. pprint.pprint(data1)
  520. data2 = pickle.load(pkl_file)
  521. pprint.pprint(data2)
  522. pkl_file.close()
  523. Here's a larger example that shows how to modify pickling behavior for a class.
  524. The :class:`TextReader` class opens a text file, and returns the line number and
  525. line contents each time its :meth:`readline` method is called. If a
  526. :class:`TextReader` instance is pickled, all attributes *except* the file object
  527. member are saved. When the instance is unpickled, the file is reopened, and
  528. reading resumes from the last location. The :meth:`__setstate__` and
  529. :meth:`__getstate__` methods are used to implement this behavior. ::
  530. #!/usr/local/bin/python
  531. class TextReader:
  532. """Print and number lines in a text file."""
  533. def __init__(self, file):
  534. self.file = file
  535. self.fh = open(file)
  536. self.lineno = 0
  537. def readline(self):
  538. self.lineno = self.lineno + 1
  539. line = self.fh.readline()
  540. if not line:
  541. return None
  542. if line.endswith("\n"):
  543. line = line[:-1]
  544. return "%d: %s" % (self.lineno, line)
  545. def __getstate__(self):
  546. odict = self.__dict__.copy() # copy the dict since we change it
  547. del odict['fh'] # remove filehandle entry
  548. return odict
  549. def __setstate__(self, dict):
  550. fh = open(dict['file']) # reopen file
  551. count = dict['lineno'] # read from file...
  552. while count: # until line count is restored
  553. fh.readline()
  554. count = count - 1
  555. self.__dict__.update(dict) # update attributes
  556. self.fh = fh # save the file object
  557. A sample usage might be something like this::
  558. >>> import TextReader
  559. >>> obj = TextReader.TextReader("TextReader.py")
  560. >>> obj.readline()
  561. '1: #!/usr/local/bin/python'
  562. >>> obj.readline()
  563. '2: '
  564. >>> obj.readline()
  565. '3: class TextReader:'
  566. >>> import pickle
  567. >>> pickle.dump(obj, open('save.p', 'wb'))
  568. If you want to see that :mod:`pickle` works across Python processes, start
  569. another Python session, before continuing. What follows can happen from either
  570. the same process or a new process. ::
  571. >>> import pickle
  572. >>> reader = pickle.load(open('save.p', 'rb'))
  573. >>> reader.readline()
  574. '4: """Print and number lines in a text file."""'
  575. .. seealso::
  576. Module :mod:`copy_reg`
  577. Pickle interface constructor registration for extension types.
  578. Module :mod:`shelve`
  579. Indexed databases of objects; uses :mod:`pickle`.
  580. Module :mod:`copy`
  581. Shallow and deep object copying.
  582. Module :mod:`marshal`
  583. High-performance serialization of built-in types.
  584. :mod:`cPickle` --- A faster :mod:`pickle`
  585. =========================================
  586. .. module:: cPickle
  587. :synopsis: Faster version of pickle, but not subclassable.
  588. .. moduleauthor:: Jim Fulton <jim@zope.com>
  589. .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
  590. .. index:: module: pickle
  591. The :mod:`cPickle` module supports serialization and de-serialization of Python
  592. objects, providing an interface and functionality nearly identical to the
  593. :mod:`pickle` module. There are several differences, the most important being
  594. performance and subclassability.
  595. First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because
  596. the former is implemented in C. Second, in the :mod:`cPickle` module the
  597. callables :func:`Pickler` and :func:`Unpickler` are functions, not classes.
  598. This means that you cannot use them to derive custom pickling and unpickling
  599. subclasses. Most applications have no need for this functionality and should
  600. benefit from the greatly improved performance of the :mod:`cPickle` module.
  601. The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are
  602. identical, so it is possible to use :mod:`pickle` and :mod:`cPickle`
  603. interchangeably with existing pickles. [#]_
  604. There are additional minor differences in API between :mod:`cPickle` and
  605. :mod:`pickle`, however for most applications, they are interchangeable. More
  606. documentation is provided in the :mod:`pickle` module documentation, which
  607. includes a list of the documented differences.
  608. .. rubric:: Footnotes
  609. .. [#] Don't confuse this with the :mod:`marshal` module
  610. .. [#] In the :mod:`pickle` module these callables are classes, which you could
  611. subclass to customize the behavior. However, in the :mod:`cPickle` module these
  612. callables are factory functions and so cannot be subclassed. One common reason
  613. to subclass is to control what objects can actually be unpickled. See section
  614. :ref:`pickle-sub` for more details.
  615. .. [#] *Warning*: this is intended for pickling multiple objects without intervening
  616. modifications to the objects or their parts. If you modify an object and then
  617. pickle it again using the same :class:`Pickler` instance, the object is not
  618. pickled again --- a reference to it is pickled and the :class:`Unpickler` will
  619. return the old value, not the modified one. There are two problems here: (1)
  620. detecting changes, and (2) marshalling a minimal set of changes. Garbage
  621. Collection may also become a problem here.
  622. .. [#] The exception raised will likely be an :exc:`ImportError` or an
  623. :exc:`AttributeError` but it could be something else.
  624. .. [#] These methods can also be used to implement copying class instances.
  625. .. [#] This protocol is also used by the shallow and deep copying operations defined in
  626. the :mod:`copy` module.
  627. .. [#] The actual mechanism for associating these user defined functions is slightly
  628. different for :mod:`pickle` and :mod:`cPickle`. The description given here
  629. works the same for both implementations. Users of the :mod:`pickle` module
  630. could also use subclassing to effect the same results, overriding the
  631. :meth:`persistent_id` and :meth:`persistent_load` methods in the derived
  632. classes.
  633. .. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles
  634. in their living rooms.
  635. .. [#] A word of caution: the mechanisms described here use internal attributes and
  636. methods, which are subject to change in future versions of Python. We intend to
  637. someday provide a common interface for controlling this behavior, which will
  638. work in either :mod:`pickle` or :mod:`cPickle`.
  639. .. [#] Since the pickle data format is actually a tiny stack-oriented programming
  640. language, and some freedom is taken in the encodings of certain objects, it is
  641. possible that the two modules produce different data streams for the same input
  642. objects. However it is guaranteed that they will always be able to read each
  643. other's data streams.