PageRenderTime 414ms CodeModel.GetById 19ms RepoModel.GetById 0ms app.codeStats 0ms

/Doc/library/email.parser.rst

https://bitbucket.org/arigo/cpython-withatomic/
ReStructuredText | 291 lines | 209 code | 82 blank | 0 comment | 0 complexity | f644075162a4f5dce64cf60229346aef MD5 | raw file
Possible License(s): 0BSD
  1. :mod:`email`: Parsing email messages
  2. ------------------------------------
  3. .. module:: email.parser
  4. :synopsis: Parse flat text email messages to produce a message object structure.
  5. Message object structures can be created in one of two ways: they can be created
  6. from whole cloth by instantiating :class:`~email.message.Message` objects and
  7. stringing them together via :meth:`attach` and :meth:`set_payload` calls, or they
  8. can be created by parsing a flat text representation of the email message.
  9. The :mod:`email` package provides a standard parser that understands most email
  10. document structures, including MIME documents. You can pass the parser a string
  11. or a file object, and the parser will return to you the root
  12. :class:`~email.message.Message` instance of the object structure. For simple,
  13. non-MIME messages the payload of this root object will likely be a string
  14. containing the text of the message. For MIME messages, the root object will
  15. return ``True`` from its :meth:`is_multipart` method, and the subparts can be
  16. accessed via the :meth:`get_payload` and :meth:`walk` methods.
  17. There are actually two parser interfaces available for use, the classic
  18. :class:`Parser` API and the incremental :class:`FeedParser` API. The classic
  19. :class:`Parser` API is fine if you have the entire text of the message in memory
  20. as a string, or if the entire message lives in a file on the file system.
  21. :class:`FeedParser` is more appropriate for when you're reading the message from
  22. a stream which might block waiting for more input (e.g. reading an email message
  23. from a socket). The :class:`FeedParser` can consume and parse the message
  24. incrementally, and only returns the root object when you close the parser [#]_.
  25. Note that the parser can be extended in limited ways, and of course you can
  26. implement your own parser completely from scratch. There is no magical
  27. connection between the :mod:`email` package's bundled parser and the
  28. :class:`~email.message.Message` class, so your custom parser can create message
  29. object trees any way it finds necessary.
  30. FeedParser API
  31. ^^^^^^^^^^^^^^
  32. The :class:`FeedParser`, imported from the :mod:`email.feedparser` module,
  33. provides an API that is conducive to incremental parsing of email messages, such
  34. as would be necessary when reading the text of an email message from a source
  35. that can block (e.g. a socket). The :class:`FeedParser` can of course be used
  36. to parse an email message fully contained in a string or a file, but the classic
  37. :class:`Parser` API may be more convenient for such use cases. The semantics
  38. and results of the two parser APIs are identical.
  39. The :class:`FeedParser`'s API is simple; you create an instance, feed it a bunch
  40. of text until there's no more to feed it, then close the parser to retrieve the
  41. root message object. The :class:`FeedParser` is extremely accurate when parsing
  42. standards-compliant messages, and it does a very good job of parsing
  43. non-compliant messages, providing information about how a message was deemed
  44. broken. It will populate a message object's *defects* attribute with a list of
  45. any problems it found in a message. See the :mod:`email.errors` module for the
  46. list of defects that it can find.
  47. Here is the API for the :class:`FeedParser`:
  48. .. class:: FeedParser(_factory=email.message.Message, *, policy=policy.default)
  49. Create a :class:`FeedParser` instance. Optional *_factory* is a no-argument
  50. callable that will be called whenever a new message object is needed. It
  51. defaults to the :class:`email.message.Message` class.
  52. The *policy* keyword specifies a :mod:`~email.policy` object that controls a
  53. number of aspects of the parser's operation. The default policy maintains
  54. backward compatibility.
  55. .. versionchanged:: 3.3 Added the *policy* keyword.
  56. .. method:: feed(data)
  57. Feed the :class:`FeedParser` some more data. *data* should be a string
  58. containing one or more lines. The lines can be partial and the
  59. :class:`FeedParser` will stitch such partial lines together properly. The
  60. lines in the string can have any of the common three line endings,
  61. carriage return, newline, or carriage return and newline (they can even be
  62. mixed).
  63. .. method:: close()
  64. Closing a :class:`FeedParser` completes the parsing of all previously fed
  65. data, and returns the root message object. It is undefined what happens
  66. if you feed more data to a closed :class:`FeedParser`.
  67. .. class:: BytesFeedParser(_factory=email.message.Message)
  68. Works exactly like :class:`FeedParser` except that the input to the
  69. :meth:`~FeedParser.feed` method must be bytes and not string.
  70. .. versionadded:: 3.2
  71. Parser class API
  72. ^^^^^^^^^^^^^^^^
  73. The :class:`Parser` class, imported from the :mod:`email.parser` module,
  74. provides an API that can be used to parse a message when the complete contents
  75. of the message are available in a string or file. The :mod:`email.parser`
  76. module also provides header-only parsers, called :class:`HeaderParser` and
  77. :class:`BytesHeaderParser`, which can be used if you're only interested in the
  78. headers of the message. :class:`HeaderParser` and :class:`BytesHeaderParser`
  79. can be much faster in these situations, since they do not attempt to parse the
  80. message body, instead setting the payload to the raw body as a string. They
  81. have the same API as the :class:`Parser` and :class:`BytesParser` classes.
  82. .. versionadded:: 3.3 BytesHeaderParser
  83. .. class:: Parser(_class=email.message.Message, *, policy=policy.default)
  84. The constructor for the :class:`Parser` class takes an optional argument
  85. *_class*. This must be a callable factory (such as a function or a class), and
  86. it is used whenever a sub-message object needs to be created. It defaults to
  87. :class:`~email.message.Message` (see :mod:`email.message`). The factory will
  88. be called without arguments.
  89. The *policy* keyword specifies a :mod:`~email.policy` object that controls a
  90. number of aspects of the parser's operation. The default policy maintains
  91. backward compatibility.
  92. .. versionchanged:: 3.3
  93. Removed the *strict* argument that was deprecated in 2.4. Added the
  94. *policy* keyword.
  95. The other public :class:`Parser` methods are:
  96. .. method:: parse(fp, headersonly=False)
  97. Read all the data from the file-like object *fp*, parse the resulting
  98. text, and return the root message object. *fp* must support both the
  99. :meth:`readline` and the :meth:`read` methods on file-like objects.
  100. The text contained in *fp* must be formatted as a block of :rfc:`2822`
  101. style headers and header continuation lines, optionally preceded by a
  102. envelope header. The header block is terminated either by the end of the
  103. data or by a blank line. Following the header block is the body of the
  104. message (which may contain MIME-encoded subparts).
  105. Optional *headersonly* is a flag specifying whether to stop parsing after
  106. reading the headers or not. The default is ``False``, meaning it parses
  107. the entire contents of the file.
  108. .. method:: parsestr(text, headersonly=False)
  109. Similar to the :meth:`parse` method, except it takes a string object
  110. instead of a file-like object. Calling this method on a string is exactly
  111. equivalent to wrapping *text* in a :class:`~io.StringIO` instance first and
  112. calling :meth:`parse`.
  113. Optional *headersonly* is as with the :meth:`parse` method.
  114. .. class:: BytesParser(_class=email.message.Message, *, policy=policy.default)
  115. This class is exactly parallel to :class:`Parser`, but handles bytes input.
  116. The *_class* and *strict* arguments are interpreted in the same way as for
  117. the :class:`Parser` constructor.
  118. The *policy* keyword specifies a :mod:`~email.policy` object that
  119. controls a number of aspects of the parser's operation. The default
  120. policy maintains backward compatibility.
  121. .. versionchanged:: 3.3
  122. Removed the *strict* argument. Added the *policy* keyword.
  123. .. method:: parse(fp, headeronly=False)
  124. Read all the data from the binary file-like object *fp*, parse the
  125. resulting bytes, and return the message object. *fp* must support
  126. both the :meth:`readline` and the :meth:`read` methods on file-like
  127. objects.
  128. The bytes contained in *fp* must be formatted as a block of :rfc:`2822`
  129. style headers and header continuation lines, optionally preceded by a
  130. envelope header. The header block is terminated either by the end of the
  131. data or by a blank line. Following the header block is the body of the
  132. message (which may contain MIME-encoded subparts, including subparts
  133. with a :mailheader:`Content-Transfer-Encoding` of ``8bit``.
  134. Optional *headersonly* is a flag specifying whether to stop parsing after
  135. reading the headers or not. The default is ``False``, meaning it parses
  136. the entire contents of the file.
  137. .. method:: parsebytes(bytes, headersonly=False)
  138. Similar to the :meth:`parse` method, except it takes a byte string object
  139. instead of a file-like object. Calling this method on a byte string is
  140. exactly equivalent to wrapping *text* in a :class:`~io.BytesIO` instance
  141. first and calling :meth:`parse`.
  142. Optional *headersonly* is as with the :meth:`parse` method.
  143. .. versionadded:: 3.2
  144. Since creating a message object structure from a string or a file object is such
  145. a common task, four functions are provided as a convenience. They are available
  146. in the top-level :mod:`email` package namespace.
  147. .. currentmodule:: email
  148. .. function:: message_from_string(s, _class=email.message.Message, *, \
  149. policy=policy.default)
  150. Return a message object structure from a string. This is exactly equivalent to
  151. ``Parser().parsestr(s)``. *_class* and *policy* are interpreted as
  152. with the :class:`Parser` class constructor.
  153. .. versionchanged:: 3.3
  154. Removed the *strict* argument. Added the *policy* keyword.
  155. .. function:: message_from_bytes(s, _class=email.message.Message, *, \
  156. policy=policy.default)
  157. Return a message object structure from a byte string. This is exactly
  158. equivalent to ``BytesParser().parsebytes(s)``. Optional *_class* and
  159. *strict* are interpreted as with the :class:`Parser` class constructor.
  160. .. versionadded:: 3.2
  161. .. versionchanged:: 3.3
  162. Removed the *strict* argument. Added the *policy* keyword.
  163. .. function:: message_from_file(fp, _class=email.message.Message, *, \
  164. policy=policy.default)
  165. Return a message object structure tree from an open :term:`file object`.
  166. This is exactly equivalent to ``Parser().parse(fp)``. *_class*
  167. and *policy* are interpreted as with the :class:`Parser` class constructor.
  168. .. versionchanged::
  169. Removed the *strict* argument. Added the *policy* keyword.
  170. .. function:: message_from_binary_file(fp, _class=email.message.Message, *, \
  171. policy=policy.default)
  172. Return a message object structure tree from an open binary :term:`file
  173. object`. This is exactly equivalent to ``BytesParser().parse(fp)``.
  174. *_class* and *policy* are interpreted as with the :class:`Parser`
  175. class constructor.
  176. .. versionadded:: 3.2
  177. .. versionchanged:: 3.3
  178. Removed the *strict* argument. Added the *policy* keyword.
  179. Here's an example of how you might use this at an interactive Python prompt::
  180. >>> import email
  181. >>> msg = email.message_from_string(myString)
  182. Additional notes
  183. ^^^^^^^^^^^^^^^^
  184. Here are some notes on the parsing semantics:
  185. * Most non-\ :mimetype:`multipart` type messages are parsed as a single message
  186. object with a string payload. These objects will return ``False`` for
  187. :meth:`is_multipart`. Their :meth:`get_payload` method will return a string
  188. object.
  189. * All :mimetype:`multipart` type messages will be parsed as a container message
  190. object with a list of sub-message objects for their payload. The outer
  191. container message will return ``True`` for :meth:`is_multipart` and their
  192. :meth:`get_payload` method will return the list of :class:`~email.message.Message`
  193. subparts.
  194. * Most messages with a content type of :mimetype:`message/\*` (e.g.
  195. :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also be
  196. parsed as container object containing a list payload of length 1. Their
  197. :meth:`is_multipart` method will return ``True``. The single element in the
  198. list payload will be a sub-message object.
  199. * Some non-standards compliant messages may not be internally consistent about
  200. their :mimetype:`multipart`\ -edness. Such messages may have a
  201. :mailheader:`Content-Type` header of type :mimetype:`multipart`, but their
  202. :meth:`is_multipart` method may return ``False``. If such messages were parsed
  203. with the :class:`FeedParser`, they will have an instance of the
  204. :class:`MultipartInvariantViolationDefect` class in their *defects* attribute
  205. list. See :mod:`email.errors` for details.
  206. .. rubric:: Footnotes
  207. .. [#] As of email package version 3.0, introduced in Python 2.4, the classic
  208. :class:`Parser` was re-implemented in terms of the :class:`FeedParser`, so the
  209. semantics and results are identical between the two parsers.