/Doc/library/email.parser.rst

http://unladen-swallow.googlecode.com/ · ReStructuredText · 224 lines · 160 code · 64 blank · 0 comment · 0 complexity · be5b959e2d31234f038e3b6f5c0d3f19 MD5 · raw file

  1. :mod:`email`: Parsing email messages
  2. ------------------------------------
  3. .. module:: email.parser
  4. :synopsis: Parse flat text email messages to produce a message object structure.
  5. Message object structures can be created in one of two ways: they can be created
  6. from whole cloth by instantiating :class:`~email.message.Message` objects and
  7. stringing them together via :meth:`attach` and :meth:`set_payload` calls, or they
  8. can be created by parsing a flat text representation of the email message.
  9. The :mod:`email` package provides a standard parser that understands most email
  10. document structures, including MIME documents. You can pass the parser a string
  11. or a file object, and the parser will return to you the root
  12. :class:`~email.message.Message` instance of the object structure. For simple,
  13. non-MIME messages the payload of this root object will likely be a string
  14. containing the text of the message. For MIME messages, the root object will
  15. return ``True`` from its :meth:`is_multipart` method, and the subparts can be
  16. accessed via the :meth:`get_payload` and :meth:`walk` methods.
  17. There are actually two parser interfaces available for use, the classic
  18. :class:`Parser` API and the incremental :class:`FeedParser` API. The classic
  19. :class:`Parser` API is fine if you have the entire text of the message in memory
  20. as a string, or if the entire message lives in a file on the file system.
  21. :class:`FeedParser` is more appropriate for when you're reading the message from
  22. a stream which might block waiting for more input (e.g. reading an email message
  23. from a socket). The :class:`FeedParser` can consume and parse the message
  24. incrementally, and only returns the root object when you close the parser [#]_.
  25. Note that the parser can be extended in limited ways, and of course you can
  26. implement your own parser completely from scratch. There is no magical
  27. connection between the :mod:`email` package's bundled parser and the
  28. :class:`~email.message.Message` class, so your custom parser can create message
  29. object trees any way it finds necessary.
  30. FeedParser API
  31. ^^^^^^^^^^^^^^
  32. .. versionadded:: 2.4
  33. The :class:`FeedParser`, imported from the :mod:`email.feedparser` module,
  34. provides an API that is conducive to incremental parsing of email messages, such
  35. as would be necessary when reading the text of an email message from a source
  36. that can block (e.g. a socket). The :class:`FeedParser` can of course be used
  37. to parse an email message fully contained in a string or a file, but the classic
  38. :class:`Parser` API may be more convenient for such use cases. The semantics
  39. and results of the two parser APIs are identical.
  40. The :class:`FeedParser`'s API is simple; you create an instance, feed it a bunch
  41. of text until there's no more to feed it, then close the parser to retrieve the
  42. root message object. The :class:`FeedParser` is extremely accurate when parsing
  43. standards-compliant messages, and it does a very good job of parsing
  44. non-compliant messages, providing information about how a message was deemed
  45. broken. It will populate a message object's *defects* attribute with a list of
  46. any problems it found in a message. See the :mod:`email.errors` module for the
  47. list of defects that it can find.
  48. Here is the API for the :class:`FeedParser`:
  49. .. class:: FeedParser([_factory])
  50. Create a :class:`FeedParser` instance. Optional *_factory* is a no-argument
  51. callable that will be called whenever a new message object is needed. It
  52. defaults to the :class:`email.message.Message` class.
  53. .. method:: feed(data)
  54. Feed the :class:`FeedParser` some more data. *data* should be a string
  55. containing one or more lines. The lines can be partial and the
  56. :class:`FeedParser` will stitch such partial lines together properly. The
  57. lines in the string can have any of the common three line endings,
  58. carriage return, newline, or carriage return and newline (they can even be
  59. mixed).
  60. .. method:: close()
  61. Closing a :class:`FeedParser` completes the parsing of all previously fed
  62. data, and returns the root message object. It is undefined what happens
  63. if you feed more data to a closed :class:`FeedParser`.
  64. Parser class API
  65. ^^^^^^^^^^^^^^^^
  66. The :class:`Parser` class, imported from the :mod:`email.parser` module,
  67. provides an API that can be used to parse a message when the complete contents
  68. of the message are available in a string or file. The :mod:`email.parser`
  69. module also provides a second class, called :class:`HeaderParser` which can be
  70. used if you're only interested in the headers of the message.
  71. :class:`HeaderParser` can be much faster in these situations, since it does not
  72. attempt to parse the message body, instead setting the payload to the raw body
  73. as a string. :class:`HeaderParser` has the same API as the :class:`Parser`
  74. class.
  75. .. class:: Parser([_class])
  76. The constructor for the :class:`Parser` class takes an optional argument
  77. *_class*. This must be a callable factory (such as a function or a class), and
  78. it is used whenever a sub-message object needs to be created. It defaults to
  79. :class:`~email.message.Message` (see :mod:`email.message`). The factory will
  80. be called without arguments.
  81. The optional *strict* flag is ignored.
  82. .. deprecated:: 2.4
  83. Because the :class:`Parser` class is a backward compatible API wrapper
  84. around the new-in-Python 2.4 :class:`FeedParser`, *all* parsing is
  85. effectively non-strict. You should simply stop passing a *strict* flag to
  86. the :class:`Parser` constructor.
  87. .. versionchanged:: 2.2.2
  88. The *strict* flag was added.
  89. .. versionchanged:: 2.4
  90. The *strict* flag was deprecated.
  91. The other public :class:`Parser` methods are:
  92. .. method:: parse(fp[, headersonly])
  93. Read all the data from the file-like object *fp*, parse the resulting
  94. text, and return the root message object. *fp* must support both the
  95. :meth:`readline` and the :meth:`read` methods on file-like objects.
  96. The text contained in *fp* must be formatted as a block of :rfc:`2822`
  97. style headers and header continuation lines, optionally preceded by a
  98. envelope header. The header block is terminated either by the end of the
  99. data or by a blank line. Following the header block is the body of the
  100. message (which may contain MIME-encoded subparts).
  101. Optional *headersonly* is as with the :meth:`parse` method.
  102. .. versionchanged:: 2.2.2
  103. The *headersonly* flag was added.
  104. .. method:: parsestr(text[, headersonly])
  105. Similar to the :meth:`parse` method, except it takes a string object
  106. instead of a file-like object. Calling this method on a string is exactly
  107. equivalent to wrapping *text* in a :class:`StringIO` instance first and
  108. calling :meth:`parse`.
  109. Optional *headersonly* is a flag specifying whether to stop parsing after
  110. reading the headers or not. The default is ``False``, meaning it parses
  111. the entire contents of the file.
  112. .. versionchanged:: 2.2.2
  113. The *headersonly* flag was added.
  114. Since creating a message object structure from a string or a file object is such
  115. a common task, two functions are provided as a convenience. They are available
  116. in the top-level :mod:`email` package namespace.
  117. .. currentmodule:: email
  118. .. function:: message_from_string(s[, _class[, strict]])
  119. Return a message object structure from a string. This is exactly equivalent to
  120. ``Parser().parsestr(s)``. Optional *_class* and *strict* are interpreted as
  121. with the :class:`Parser` class constructor.
  122. .. versionchanged:: 2.2.2
  123. The *strict* flag was added.
  124. .. function:: message_from_file(fp[, _class[, strict]])
  125. Return a message object structure tree from an open file object. This is
  126. exactly equivalent to ``Parser().parse(fp)``. Optional *_class* and *strict*
  127. are interpreted as with the :class:`Parser` class constructor.
  128. .. versionchanged:: 2.2.2
  129. The *strict* flag was added.
  130. Here's an example of how you might use this at an interactive Python prompt::
  131. >>> import email
  132. >>> msg = email.message_from_string(myString)
  133. Additional notes
  134. ^^^^^^^^^^^^^^^^
  135. Here are some notes on the parsing semantics:
  136. * Most non-\ :mimetype:`multipart` type messages are parsed as a single message
  137. object with a string payload. These objects will return ``False`` for
  138. :meth:`is_multipart`. Their :meth:`get_payload` method will return a string
  139. object.
  140. * All :mimetype:`multipart` type messages will be parsed as a container message
  141. object with a list of sub-message objects for their payload. The outer
  142. container message will return ``True`` for :meth:`is_multipart` and their
  143. :meth:`get_payload` method will return the list of :class:`~email.message.Message`
  144. subparts.
  145. * Most messages with a content type of :mimetype:`message/\*` (e.g.
  146. :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also be
  147. parsed as container object containing a list payload of length 1. Their
  148. :meth:`is_multipart` method will return ``True``. The single element in the
  149. list payload will be a sub-message object.
  150. * Some non-standards compliant messages may not be internally consistent about
  151. their :mimetype:`multipart`\ -edness. Such messages may have a
  152. :mailheader:`Content-Type` header of type :mimetype:`multipart`, but their
  153. :meth:`is_multipart` method may return ``False``. If such messages were parsed
  154. with the :class:`FeedParser`, they will have an instance of the
  155. :class:`MultipartInvariantViolationDefect` class in their *defects* attribute
  156. list. See :mod:`email.errors` for details.
  157. .. rubric:: Footnotes
  158. .. [#] As of email package version 3.0, introduced in Python 2.4, the classic
  159. :class:`Parser` was re-implemented in terms of the :class:`FeedParser`, so the
  160. semantics and results are identical between the two parsers.