/Doc/library/xml.sax.reader.rst
http://unladen-swallow.googlecode.com/ · ReStructuredText · 386 lines · 207 code · 179 blank · 0 comment · 0 complexity · c925258f8d5ac3691b57594ab4c50899 MD5 · raw file
- :mod:`xml.sax.xmlreader` --- Interface for XML parsers
- ======================================================
- .. module:: xml.sax.xmlreader
- :synopsis: Interface which SAX-compliant XML parsers must implement.
- .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no>
- .. sectionauthor:: Martin v. Lรถwis <martin@v.loewis.de>
- .. versionadded:: 2.0
- SAX parsers implement the :class:`XMLReader` interface. They are implemented in
- a Python module, which must provide a function :func:`create_parser`. This
- function is invoked by :func:`xml.sax.make_parser` with no arguments to create
- a new parser object.
- .. class:: XMLReader()
- Base class which can be inherited by SAX parsers.
- .. class:: IncrementalParser()
- In some cases, it is desirable not to parse an input source at once, but to feed
- chunks of the document as they get available. Note that the reader will normally
- not read the entire file, but read it in chunks as well; still :meth:`parse`
- won't return until the entire document is processed. So these interfaces should
- be used if the blocking behaviour of :meth:`parse` is not desirable.
- When the parser is instantiated it is ready to begin accepting data from the
- feed method immediately. After parsing has been finished with a call to close
- the reset method must be called to make the parser ready to accept new data,
- either from feed or using the parse method.
- Note that these methods must *not* be called during parsing, that is, after
- parse has been called and before it returns.
- By default, the class also implements the parse method of the XMLReader
- interface using the feed, close and reset methods of the IncrementalParser
- interface as a convenience to SAX 2.0 driver writers.
- .. class:: Locator()
- Interface for associating a SAX event with a document location. A locator object
- will return valid results only during calls to DocumentHandler methods; at any
- other time, the results are unpredictable. If information is not available,
- methods may return ``None``.
- .. class:: InputSource([systemId])
- Encapsulation of the information needed by the :class:`XMLReader` to read
- entities.
- This class may include information about the public identifier, system
- identifier, byte stream (possibly with character encoding information) and/or
- the character stream of an entity.
- Applications will create objects of this class for use in the
- :meth:`XMLReader.parse` method and for returning from
- EntityResolver.resolveEntity.
- An :class:`InputSource` belongs to the application, the :class:`XMLReader` is
- not allowed to modify :class:`InputSource` objects passed to it from the
- application, although it may make copies and modify those.
- .. class:: AttributesImpl(attrs)
- This is an implementation of the :class:`Attributes` interface (see section
- :ref:`attributes-objects`). This is a dictionary-like object which
- represents the element attributes in a :meth:`startElement` call. In addition
- to the most useful dictionary operations, it supports a number of other
- methods as described by the interface. Objects of this class should be
- instantiated by readers; *attrs* must be a dictionary-like object containing
- a mapping from attribute names to attribute values.
- .. class:: AttributesNSImpl(attrs, qnames)
- Namespace-aware variant of :class:`AttributesImpl`, which will be passed to
- :meth:`startElementNS`. It is derived from :class:`AttributesImpl`, but
- understands attribute names as two-tuples of *namespaceURI* and
- *localname*. In addition, it provides a number of methods expecting qualified
- names as they appear in the original document. This class implements the
- :class:`AttributesNS` interface (see section :ref:`attributes-ns-objects`).
- .. _xmlreader-objects:
- XMLReader Objects
- -----------------
- The :class:`XMLReader` interface supports the following methods:
- .. method:: XMLReader.parse(source)
- Process an input source, producing SAX events. The *source* object can be a
- system identifier (a string identifying the input source -- typically a file
- name or an URL), a file-like object, or an :class:`InputSource` object. When
- :meth:`parse` returns, the input is completely processed, and the parser object
- can be discarded or reset. As a limitation, the current implementation only
- accepts byte streams; processing of character streams is for further study.
- .. method:: XMLReader.getContentHandler()
- Return the current :class:`ContentHandler`.
- .. method:: XMLReader.setContentHandler(handler)
- Set the current :class:`ContentHandler`. If no :class:`ContentHandler` is set,
- content events will be discarded.
- .. method:: XMLReader.getDTDHandler()
- Return the current :class:`DTDHandler`.
- .. method:: XMLReader.setDTDHandler(handler)
- Set the current :class:`DTDHandler`. If no :class:`DTDHandler` is set, DTD
- events will be discarded.
- .. method:: XMLReader.getEntityResolver()
- Return the current :class:`EntityResolver`.
- .. method:: XMLReader.setEntityResolver(handler)
- Set the current :class:`EntityResolver`. If no :class:`EntityResolver` is set,
- attempts to resolve an external entity will result in opening the system
- identifier for the entity, and fail if it is not available.
- .. method:: XMLReader.getErrorHandler()
- Return the current :class:`ErrorHandler`.
- .. method:: XMLReader.setErrorHandler(handler)
- Set the current error handler. If no :class:`ErrorHandler` is set, errors will
- be raised as exceptions, and warnings will be printed.
- .. method:: XMLReader.setLocale(locale)
- Allow an application to set the locale for errors and warnings.
- SAX parsers are not required to provide localization for errors and warnings; if
- they cannot support the requested locale, however, they must throw a SAX
- exception. Applications may request a locale change in the middle of a parse.
- .. method:: XMLReader.getFeature(featurename)
- Return the current setting for feature *featurename*. If the feature is not
- recognized, :exc:`SAXNotRecognizedException` is raised. The well-known
- featurenames are listed in the module :mod:`xml.sax.handler`.
- .. method:: XMLReader.setFeature(featurename, value)
- Set the *featurename* to *value*. If the feature is not recognized,
- :exc:`SAXNotRecognizedException` is raised. If the feature or its setting is not
- supported by the parser, *SAXNotSupportedException* is raised.
- .. method:: XMLReader.getProperty(propertyname)
- Return the current setting for property *propertyname*. If the property is not
- recognized, a :exc:`SAXNotRecognizedException` is raised. The well-known
- propertynames are listed in the module :mod:`xml.sax.handler`.
- .. method:: XMLReader.setProperty(propertyname, value)
- Set the *propertyname* to *value*. If the property is not recognized,
- :exc:`SAXNotRecognizedException` is raised. If the property or its setting is
- not supported by the parser, *SAXNotSupportedException* is raised.
- .. _incremental-parser-objects:
- IncrementalParser Objects
- -------------------------
- Instances of :class:`IncrementalParser` offer the following additional methods:
- .. method:: IncrementalParser.feed(data)
- Process a chunk of *data*.
- .. method:: IncrementalParser.close()
- Assume the end of the document. That will check well-formedness conditions that
- can be checked only at the end, invoke handlers, and may clean up resources
- allocated during parsing.
- .. method:: IncrementalParser.reset()
- This method is called after close has been called to reset the parser so that it
- is ready to parse new documents. The results of calling parse or feed after
- close without calling reset are undefined.
- .. _locator-objects:
- Locator Objects
- ---------------
- Instances of :class:`Locator` provide these methods:
- .. method:: Locator.getColumnNumber()
- Return the column number where the current event ends.
- .. method:: Locator.getLineNumber()
- Return the line number where the current event ends.
- .. method:: Locator.getPublicId()
- Return the public identifier for the current event.
- .. method:: Locator.getSystemId()
- Return the system identifier for the current event.
- .. _input-source-objects:
- InputSource Objects
- -------------------
- .. method:: InputSource.setPublicId(id)
- Sets the public identifier of this :class:`InputSource`.
- .. method:: InputSource.getPublicId()
- Returns the public identifier of this :class:`InputSource`.
- .. method:: InputSource.setSystemId(id)
- Sets the system identifier of this :class:`InputSource`.
- .. method:: InputSource.getSystemId()
- Returns the system identifier of this :class:`InputSource`.
- .. method:: InputSource.setEncoding(encoding)
- Sets the character encoding of this :class:`InputSource`.
- The encoding must be a string acceptable for an XML encoding declaration (see
- section 4.3.3 of the XML recommendation).
- The encoding attribute of the :class:`InputSource` is ignored if the
- :class:`InputSource` also contains a character stream.
- .. method:: InputSource.getEncoding()
- Get the character encoding of this InputSource.
- .. method:: InputSource.setByteStream(bytefile)
- Set the byte stream (a Python file-like object which does not perform
- byte-to-character conversion) for this input source.
- The SAX parser will ignore this if there is also a character stream specified,
- but it will use a byte stream in preference to opening a URI connection itself.
- If the application knows the character encoding of the byte stream, it should
- set it with the setEncoding method.
- .. method:: InputSource.getByteStream()
- Get the byte stream for this input source.
- The getEncoding method will return the character encoding for this byte stream,
- or None if unknown.
- .. method:: InputSource.setCharacterStream(charfile)
- Set the character stream for this input source. (The stream must be a Python 1.6
- Unicode-wrapped file-like that performs conversion to Unicode strings.)
- If there is a character stream specified, the SAX parser will ignore any byte
- stream and will not attempt to open a URI connection to the system identifier.
- .. method:: InputSource.getCharacterStream()
- Get the character stream for this input source.
- .. _attributes-objects:
- The :class:`Attributes` Interface
- ---------------------------------
- :class:`Attributes` objects implement a portion of the mapping protocol,
- including the methods :meth:`copy`, :meth:`get`, :meth:`has_key`, :meth:`items`,
- :meth:`keys`, and :meth:`values`. The following methods are also provided:
- .. method:: Attributes.getLength()
- Return the number of attributes.
- .. method:: Attributes.getNames()
- Return the names of the attributes.
- .. method:: Attributes.getType(name)
- Returns the type of the attribute *name*, which is normally ``'CDATA'``.
- .. method:: Attributes.getValue(name)
- Return the value of attribute *name*.
- .. getValueByQName, getNameByQName, getQNameByName, getQNames available
- .. here already, but documented only for derived class.
- .. _attributes-ns-objects:
- The :class:`AttributesNS` Interface
- -----------------------------------
- This interface is a subtype of the :class:`Attributes` interface (see section
- :ref:`attributes-objects`). All methods supported by that interface are also
- available on :class:`AttributesNS` objects.
- The following methods are also available:
- .. method:: AttributesNS.getValueByQName(name)
- Return the value for a qualified name.
- .. method:: AttributesNS.getNameByQName(name)
- Return the ``(namespace, localname)`` pair for a qualified *name*.
- .. method:: AttributesNS.getQNameByName(name)
- Return the qualified name for a ``(namespace, localname)`` pair.
- .. method:: AttributesNS.getQNames()
- Return the qualified names of all attributes.