/Doc/library/xml.sax.handler.rst

http://unladen-swallow.googlecode.com/ · ReStructuredText · 402 lines · 250 code · 152 blank · 0 comment · 0 complexity · 366b048cc0f0f1c158d767842bcb038a MD5 · raw file

  1. :mod:`xml.sax.handler` --- Base classes for SAX handlers
  2. ========================================================
  3. .. module:: xml.sax.handler
  4. :synopsis: Base classes for SAX event handlers.
  5. .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no>
  6. .. sectionauthor:: Martin v. Lรถwis <martin@v.loewis.de>
  7. .. versionadded:: 2.0
  8. The SAX API defines four kinds of handlers: content handlers, DTD handlers,
  9. error handlers, and entity resolvers. Applications normally only need to
  10. implement those interfaces whose events they are interested in; they can
  11. implement the interfaces in a single object or in multiple objects. Handler
  12. implementations should inherit from the base classes provided in the module
  13. :mod:`xml.sax.handler`, so that all methods get default implementations.
  14. .. class:: ContentHandler
  15. This is the main callback interface in SAX, and the one most important to
  16. applications. The order of events in this interface mirrors the order of the
  17. information in the document.
  18. .. class:: DTDHandler
  19. Handle DTD events.
  20. This interface specifies only those DTD events required for basic parsing
  21. (unparsed entities and attributes).
  22. .. class:: EntityResolver
  23. Basic interface for resolving entities. If you create an object implementing
  24. this interface, then register the object with your Parser, the parser will call
  25. the method in your object to resolve all external entities.
  26. .. class:: ErrorHandler
  27. Interface used by the parser to present error and warning messages to the
  28. application. The methods of this object control whether errors are immediately
  29. converted to exceptions or are handled in some other way.
  30. In addition to these classes, :mod:`xml.sax.handler` provides symbolic constants
  31. for the feature and property names.
  32. .. data:: feature_namespaces
  33. Value: ``"http://xml.org/sax/features/namespaces"`` --- true: Perform Namespace
  34. processing. --- false: Optionally do not perform Namespace processing (implies
  35. namespace-prefixes; default). --- access: (parsing) read-only; (not parsing)
  36. read/write
  37. .. data:: feature_namespace_prefixes
  38. Value: ``"http://xml.org/sax/features/namespace-prefixes"`` --- true: Report
  39. the original prefixed names and attributes used for Namespace
  40. declarations. --- false: Do not report attributes used for Namespace
  41. declarations, and optionally do not report original prefixed names
  42. (default). --- access: (parsing) read-only; (not parsing) read/write
  43. .. data:: feature_string_interning
  44. Value: ``"http://xml.org/sax/features/string-interning"`` --- true: All element
  45. names, prefixes, attribute names, Namespace URIs, and local names are interned
  46. using the built-in intern function. --- false: Names are not necessarily
  47. interned, although they may be (default). --- access: (parsing) read-only; (not
  48. parsing) read/write
  49. .. data:: feature_validation
  50. Value: ``"http://xml.org/sax/features/validation"`` --- true: Report all
  51. validation errors (implies external-general-entities and
  52. external-parameter-entities). --- false: Do not report validation errors. ---
  53. access: (parsing) read-only; (not parsing) read/write
  54. .. data:: feature_external_ges
  55. Value: ``"http://xml.org/sax/features/external-general-entities"`` --- true:
  56. Include all external general (text) entities. --- false: Do not include
  57. external general entities. --- access: (parsing) read-only; (not parsing)
  58. read/write
  59. .. data:: feature_external_pes
  60. Value: ``"http://xml.org/sax/features/external-parameter-entities"`` --- true:
  61. Include all external parameter entities, including the external DTD subset. ---
  62. false: Do not include any external parameter entities, even the external DTD
  63. subset. --- access: (parsing) read-only; (not parsing) read/write
  64. .. data:: all_features
  65. List of all features.
  66. .. data:: property_lexical_handler
  67. Value: ``"http://xml.org/sax/properties/lexical-handler"`` --- data type:
  68. xml.sax.sax2lib.LexicalHandler (not supported in Python 2) --- description: An
  69. optional extension handler for lexical events like comments. --- access:
  70. read/write
  71. .. data:: property_declaration_handler
  72. Value: ``"http://xml.org/sax/properties/declaration-handler"`` --- data type:
  73. xml.sax.sax2lib.DeclHandler (not supported in Python 2) --- description: An
  74. optional extension handler for DTD-related events other than notations and
  75. unparsed entities. --- access: read/write
  76. .. data:: property_dom_node
  77. Value: ``"http://xml.org/sax/properties/dom-node"`` --- data type:
  78. org.w3c.dom.Node (not supported in Python 2) --- description: When parsing,
  79. the current DOM node being visited if this is a DOM iterator; when not parsing,
  80. the root DOM node for iteration. --- access: (parsing) read-only; (not parsing)
  81. read/write
  82. .. data:: property_xml_string
  83. Value: ``"http://xml.org/sax/properties/xml-string"`` --- data type: String ---
  84. description: The literal string of characters that was the source for the
  85. current event. --- access: read-only
  86. .. data:: all_properties
  87. List of all known property names.
  88. .. _content-handler-objects:
  89. ContentHandler Objects
  90. ----------------------
  91. Users are expected to subclass :class:`ContentHandler` to support their
  92. application. The following methods are called by the parser on the appropriate
  93. events in the input document:
  94. .. method:: ContentHandler.setDocumentLocator(locator)
  95. Called by the parser to give the application a locator for locating the origin
  96. of document events.
  97. SAX parsers are strongly encouraged (though not absolutely required) to supply a
  98. locator: if it does so, it must supply the locator to the application by
  99. invoking this method before invoking any of the other methods in the
  100. DocumentHandler interface.
  101. The locator allows the application to determine the end position of any
  102. document-related event, even if the parser is not reporting an error. Typically,
  103. the application will use this information for reporting its own errors (such as
  104. character content that does not match an application's business rules). The
  105. information returned by the locator is probably not sufficient for use with a
  106. search engine.
  107. Note that the locator will return correct information only during the invocation
  108. of the events in this interface. The application should not attempt to use it at
  109. any other time.
  110. .. method:: ContentHandler.startDocument()
  111. Receive notification of the beginning of a document.
  112. The SAX parser will invoke this method only once, before any other methods in
  113. this interface or in DTDHandler (except for :meth:`setDocumentLocator`).
  114. .. method:: ContentHandler.endDocument()
  115. Receive notification of the end of a document.
  116. The SAX parser will invoke this method only once, and it will be the last method
  117. invoked during the parse. The parser shall not invoke this method until it has
  118. either abandoned parsing (because of an unrecoverable error) or reached the end
  119. of input.
  120. .. method:: ContentHandler.startPrefixMapping(prefix, uri)
  121. Begin the scope of a prefix-URI Namespace mapping.
  122. The information from this event is not necessary for normal Namespace
  123. processing: the SAX XML reader will automatically replace prefixes for element
  124. and attribute names when the ``feature_namespaces`` feature is enabled (the
  125. default).
  126. There are cases, however, when applications need to use prefixes in character
  127. data or in attribute values, where they cannot safely be expanded automatically;
  128. the :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events supply the
  129. information to the application to expand prefixes in those contexts itself, if
  130. necessary.
  131. .. XXX This is not really the default, is it? MvL
  132. Note that :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events are not
  133. guaranteed to be properly nested relative to each-other: all
  134. :meth:`startPrefixMapping` events will occur before the corresponding
  135. :meth:`startElement` event, and all :meth:`endPrefixMapping` events will occur
  136. after the corresponding :meth:`endElement` event, but their order is not
  137. guaranteed.
  138. .. method:: ContentHandler.endPrefixMapping(prefix)
  139. End the scope of a prefix-URI mapping.
  140. See :meth:`startPrefixMapping` for details. This event will always occur after
  141. the corresponding :meth:`endElement` event, but the order of
  142. :meth:`endPrefixMapping` events is not otherwise guaranteed.
  143. .. method:: ContentHandler.startElement(name, attrs)
  144. Signals the start of an element in non-namespace mode.
  145. The *name* parameter contains the raw XML 1.0 name of the element type as a
  146. string and the *attrs* parameter holds an object of the :class:`Attributes`
  147. interface (see :ref:`attributes-objects`) containing the attributes of
  148. the element. The object passed as *attrs* may be re-used by the parser; holding
  149. on to a reference to it is not a reliable way to keep a copy of the attributes.
  150. To keep a copy of the attributes, use the :meth:`copy` method of the *attrs*
  151. object.
  152. .. method:: ContentHandler.endElement(name)
  153. Signals the end of an element in non-namespace mode.
  154. The *name* parameter contains the name of the element type, just as with the
  155. :meth:`startElement` event.
  156. .. method:: ContentHandler.startElementNS(name, qname, attrs)
  157. Signals the start of an element in namespace mode.
  158. The *name* parameter contains the name of the element type as a ``(uri,
  159. localname)`` tuple, the *qname* parameter contains the raw XML 1.0 name used in
  160. the source document, and the *attrs* parameter holds an instance of the
  161. :class:`AttributesNS` interface (see :ref:`attributes-ns-objects`)
  162. containing the attributes of the element. If no namespace is associated with
  163. the element, the *uri* component of *name* will be ``None``. The object passed
  164. as *attrs* may be re-used by the parser; holding on to a reference to it is not
  165. a reliable way to keep a copy of the attributes. To keep a copy of the
  166. attributes, use the :meth:`copy` method of the *attrs* object.
  167. Parsers may set the *qname* parameter to ``None``, unless the
  168. ``feature_namespace_prefixes`` feature is activated.
  169. .. method:: ContentHandler.endElementNS(name, qname)
  170. Signals the end of an element in namespace mode.
  171. The *name* parameter contains the name of the element type, just as with the
  172. :meth:`startElementNS` method, likewise the *qname* parameter.
  173. .. method:: ContentHandler.characters(content)
  174. Receive notification of character data.
  175. The Parser will call this method to report each chunk of character data. SAX
  176. parsers may return all contiguous character data in a single chunk, or they may
  177. split it into several chunks; however, all of the characters in any single event
  178. must come from the same external entity so that the Locator provides useful
  179. information.
  180. *content* may be a Unicode string or a byte string; the ``expat`` reader module
  181. produces always Unicode strings.
  182. .. note::
  183. The earlier SAX 1 interface provided by the Python XML Special Interest Group
  184. used a more Java-like interface for this method. Since most parsers used from
  185. Python did not take advantage of the older interface, the simpler signature was
  186. chosen to replace it. To convert old code to the new interface, use *content*
  187. instead of slicing content with the old *offset* and *length* parameters.
  188. .. method:: ContentHandler.ignorableWhitespace(whitespace)
  189. Receive notification of ignorable whitespace in element content.
  190. Validating Parsers must use this method to report each chunk of ignorable
  191. whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating
  192. parsers may also use this method if they are capable of parsing and using
  193. content models.
  194. SAX parsers may return all contiguous whitespace in a single chunk, or they may
  195. split it into several chunks; however, all of the characters in any single event
  196. must come from the same external entity, so that the Locator provides useful
  197. information.
  198. .. method:: ContentHandler.processingInstruction(target, data)
  199. Receive notification of a processing instruction.
  200. The Parser will invoke this method once for each processing instruction found:
  201. note that processing instructions may occur before or after the main document
  202. element.
  203. A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a
  204. text declaration (XML 1.0, section 4.3.1) using this method.
  205. .. method:: ContentHandler.skippedEntity(name)
  206. Receive notification of a skipped entity.
  207. The Parser will invoke this method once for each entity skipped. Non-validating
  208. processors may skip entities if they have not seen the declarations (because,
  209. for example, the entity was declared in an external DTD subset). All processors
  210. may skip external entities, depending on the values of the
  211. ``feature_external_ges`` and the ``feature_external_pes`` properties.
  212. .. _dtd-handler-objects:
  213. DTDHandler Objects
  214. ------------------
  215. :class:`DTDHandler` instances provide the following methods:
  216. .. method:: DTDHandler.notationDecl(name, publicId, systemId)
  217. Handle a notation declaration event.
  218. .. method:: DTDHandler.unparsedEntityDecl(name, publicId, systemId, ndata)
  219. Handle an unparsed entity declaration event.
  220. .. _entity-resolver-objects:
  221. EntityResolver Objects
  222. ----------------------
  223. .. method:: EntityResolver.resolveEntity(publicId, systemId)
  224. Resolve the system identifier of an entity and return either the system
  225. identifier to read from as a string, or an InputSource to read from. The default
  226. implementation returns *systemId*.
  227. .. _sax-error-handler:
  228. ErrorHandler Objects
  229. --------------------
  230. Objects with this interface are used to receive error and warning information
  231. from the :class:`XMLReader`. If you create an object that implements this
  232. interface, then register the object with your :class:`XMLReader`, the parser
  233. will call the methods in your object to report all warnings and errors. There
  234. are three levels of errors available: warnings, (possibly) recoverable errors,
  235. and unrecoverable errors. All methods take a :exc:`SAXParseException` as the
  236. only parameter. Errors and warnings may be converted to an exception by raising
  237. the passed-in exception object.
  238. .. method:: ErrorHandler.error(exception)
  239. Called when the parser encounters a recoverable error. If this method does not
  240. raise an exception, parsing may continue, but further document information
  241. should not be expected by the application. Allowing the parser to continue may
  242. allow additional errors to be discovered in the input document.
  243. .. method:: ErrorHandler.fatalError(exception)
  244. Called when the parser encounters an error it cannot recover from; parsing is
  245. expected to terminate when this method returns.
  246. .. method:: ErrorHandler.warning(exception)
  247. Called when the parser presents minor warning information to the application.
  248. Parsing is expected to continue when this method returns, and document
  249. information will continue to be passed to the application. Raising an exception
  250. in this method will cause parsing to end.