ReStructuredText | 273 lines | 182 code | 91 blank | 0 comment | 0 complexity | d97820d38e7a30790add2ad290958efe MD5 | raw file
1 2:mod:`xml.dom.minidom` --- Lightweight DOM implementation 3========================================================= 4 5.. module:: xml.dom.minidom 6 :synopsis: Lightweight Document Object Model (DOM) implementation. 7.. moduleauthor:: Paul Prescod <email@example.com> 8.. sectionauthor:: Paul Prescod <firstname.lastname@example.org> 9.. sectionauthor:: Martin v. Lรถwis <email@example.com> 10 11 12.. versionadded:: 2.0 13 14:mod:`xml.dom.minidom` is a light-weight implementation of the Document Object 15Model interface. It is intended to be simpler than the full DOM and also 16significantly smaller. 17 18DOM applications typically start by parsing some XML into a DOM. With 19:mod:`xml.dom.minidom`, this is done through the parse functions:: 20 21 from xml.dom.minidom import parse, parseString 22 23 dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name 24 25 datasource = open('c:\\temp\\mydata.xml') 26 dom2 = parse(datasource) # parse an open file 27 28 dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>') 29 30The :func:`parse` function can take either a filename or an open file object. 31 32 33.. function:: parse(filename_or_file[, parser[, bufsize]]) 34 35 Return a :class:`Document` from the given input. *filename_or_file* may be 36 either a file name, or a file-like object. *parser*, if given, must be a SAX2 37 parser object. This function will change the document handler of the parser and 38 activate namespace support; other parser configuration (like setting an entity 39 resolver) must have been done in advance. 40 41If you have XML in a string, you can use the :func:`parseString` function 42instead: 43 44 45.. function:: parseString(string[, parser]) 46 47 Return a :class:`Document` that represents the *string*. This method creates a 48 :class:`StringIO` object for the string and passes that on to :func:`parse`. 49 50Both functions return a :class:`Document` object representing the content of the 51document. 52 53What the :func:`parse` and :func:`parseString` functions do is connect an XML 54parser with a "DOM builder" that can accept parse events from any SAX parser and 55convert them into a DOM tree. The name of the functions are perhaps misleading, 56but are easy to grasp when learning the interfaces. The parsing of the document 57will be completed before these functions return; it's simply that these 58functions do not provide a parser implementation themselves. 59 60You can also create a :class:`Document` by calling a method on a "DOM 61Implementation" object. You can get this object either by calling the 62:func:`getDOMImplementation` function in the :mod:`xml.dom` package or the 63:mod:`xml.dom.minidom` module. Using the implementation from the 64:mod:`xml.dom.minidom` module will always return a :class:`Document` instance 65from the minidom implementation, while the version from :mod:`xml.dom` may 66provide an alternate implementation (this is likely if you have the `PyXML 67package <http://pyxml.sourceforge.net/>`_ installed). Once you have a 68:class:`Document`, you can add child nodes to it to populate the DOM:: 69 70 from xml.dom.minidom import getDOMImplementation 71 72 impl = getDOMImplementation() 73 74 newdoc = impl.createDocument(None, "some_tag", None) 75 top_element = newdoc.documentElement 76 text = newdoc.createTextNode('Some textual content.') 77 top_element.appendChild(text) 78 79Once you have a DOM document object, you can access the parts of your XML 80document through its properties and methods. These properties are defined in 81the DOM specification. The main property of the document object is the 82:attr:`documentElement` property. It gives you the main element in the XML 83document: the one that holds all others. Here is an example program:: 84 85 dom3 = parseString("<myxml>Some data</myxml>") 86 assert dom3.documentElement.tagName == "myxml" 87 88When you are finished with a DOM, you should clean it up. This is necessary 89because some versions of Python do not support garbage collection of objects 90that refer to each other in a cycle. Until this restriction is removed from all 91versions of Python, it is safest to write your code as if cycles would not be 92cleaned up. 93 94The way to clean up a DOM is to call its :meth:`unlink` method:: 95 96 dom1.unlink() 97 dom2.unlink() 98 dom3.unlink() 99 100:meth:`unlink` is a :mod:`xml.dom.minidom`\ -specific extension to the DOM API. 101After calling :meth:`unlink` on a node, the node and its descendants are 102essentially useless. 103 104 105.. seealso:: 106 107 `Document Object Model (DOM) Level 1 Specification <http://www.w3.org/TR/REC-DOM-Level-1/>`_ 108 The W3C recommendation for the DOM supported by :mod:`xml.dom.minidom`. 109 110 111.. _minidom-objects: 112 113DOM Objects 114----------- 115 116The definition of the DOM API for Python is given as part of the :mod:`xml.dom` 117module documentation. This section lists the differences between the API and 118:mod:`xml.dom.minidom`. 119 120 121.. method:: Node.unlink() 122 123 Break internal references within the DOM so that it will be garbage collected on 124 versions of Python without cyclic GC. Even when cyclic GC is available, using 125 this can make large amounts of memory available sooner, so calling this on DOM 126 objects as soon as they are no longer needed is good practice. This only needs 127 to be called on the :class:`Document` object, but may be called on child nodes 128 to discard children of that node. 129 130 131.. method:: Node.writexml(writer[, indent=""[, addindent=""[, newl=""[, encoding=""]]]]) 132 133 Write XML to the writer object. The writer should have a :meth:`write` method 134 which matches that of the file object interface. The *indent* parameter is the 135 indentation of the current node. The *addindent* parameter is the incremental 136 indentation to use for subnodes of the current one. The *newl* parameter 137 specifies the string to use to terminate newlines. 138 139 .. versionchanged:: 2.1 140 The optional keyword parameters *indent*, *addindent*, and *newl* were added to 141 support pretty output. 142 143 .. versionchanged:: 2.3 144 For the :class:`Document` node, an additional keyword argument 145 *encoding* can be used to specify the encoding field of the XML header. 146 147 148.. method:: Node.toxml([encoding]) 149 150 Return the XML that the DOM represents as a string. 151 152 With no argument, the XML header does not specify an encoding, and the result is 153 Unicode string if the default encoding cannot represent all characters in the 154 document. Encoding this string in an encoding other than UTF-8 is likely 155 incorrect, since UTF-8 is the default encoding of XML. 156 157 With an explicit *encoding* _ argument, the result is a byte string in the 158 specified encoding. It is recommended that this argument is always specified. To 159 avoid :exc:`UnicodeError` exceptions in case of unrepresentable text data, the 160 encoding argument should be specified as "utf-8". 161 162 .. versionchanged:: 2.3 163 the *encoding* argument was introduced; see :meth:`writexml`. 164 165 166.. method:: Node.toprettyxml([indent=""[, newl=""[, encoding=""]]]) 167 168 Return a pretty-printed version of the document. *indent* specifies the 169 indentation string and defaults to a tabulator; *newl* specifies the string 170 emitted at the end of each line and defaults to ``\n``. 171 172 .. versionadded:: 2.1 173 174 .. versionchanged:: 2.3 175 the encoding argument was introduced; see :meth:`writexml`. 176 177The following standard DOM methods have special considerations with 178:mod:`xml.dom.minidom`: 179 180 181.. method:: Node.cloneNode(deep) 182 183 Although this method was present in the version of :mod:`xml.dom.minidom` 184 packaged with Python 2.0, it was seriously broken. This has been corrected for 185 subsequent releases. 186 187 188.. _dom-example: 189 190DOM Example 191----------- 192 193This example program is a fairly realistic example of a simple program. In this 194particular case, we do not take much advantage of the flexibility of the DOM. 195 196.. literalinclude:: ../includes/minidom-example.py 197 198 199.. _minidom-and-dom: 200 201minidom and the DOM standard 202---------------------------- 203 204The :mod:`xml.dom.minidom` module is essentially a DOM 1.0-compatible DOM with 205some DOM 2 features (primarily namespace features). 206 207Usage of the DOM interface in Python is straight-forward. The following mapping 208rules apply: 209 210* Interfaces are accessed through instance objects. Applications should not 211 instantiate the classes themselves; they should use the creator functions 212 available on the :class:`Document` object. Derived interfaces support all 213 operations (and attributes) from the base interfaces, plus any new operations. 214 215* Operations are used as methods. Since the DOM uses only :keyword:`in` 216 parameters, the arguments are passed in normal order (from left to right). 217 There are no optional arguments. ``void`` operations return ``None``. 218 219* IDL attributes map to instance attributes. For compatibility with the OMG IDL 220 language mapping for Python, an attribute ``foo`` can also be accessed through 221 accessor methods :meth:`_get_foo` and :meth:`_set_foo`. ``readonly`` 222 attributes must not be changed; this is not enforced at runtime. 223 224* The types ``short int``, ``unsigned int``, ``unsigned long long``, and 225 ``boolean`` all map to Python integer objects. 226 227* The type ``DOMString`` maps to Python strings. :mod:`xml.dom.minidom` supports 228 either byte or Unicode strings, but will normally produce Unicode strings. 229 Values of type ``DOMString`` may also be ``None`` where allowed to have the IDL 230 ``null`` value by the DOM specification from the W3C. 231 232* ``const`` declarations map to variables in their respective scope (e.g. 233 ``xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE``); they must not be changed. 234 235* ``DOMException`` is currently not supported in :mod:`xml.dom.minidom`. 236 Instead, :mod:`xml.dom.minidom` uses standard Python exceptions such as 237 :exc:`TypeError` and :exc:`AttributeError`. 238 239* :class:`NodeList` objects are implemented using Python's built-in list type. 240 Starting with Python 2.2, these objects provide the interface defined in the DOM 241 specification, but with earlier versions of Python they do not support the 242 official API. They are, however, much more "Pythonic" than the interface 243 defined in the W3C recommendations. 244 245The following interfaces have no implementation in :mod:`xml.dom.minidom`: 246 247* :class:`DOMTimeStamp` 248 249* :class:`DocumentType` (added in Python 2.1) 250 251* :class:`DOMImplementation` (added in Python 2.1) 252 253* :class:`CharacterData` 254 255* :class:`CDATASection` 256 257* :class:`Notation` 258 259* :class:`Entity` 260 261* :class:`EntityReference` 262 263* :class:`DocumentFragment` 264 265Most of these reflect information in the XML document that is not of general 266utility to most DOM users. 267 268.. rubric:: Footnotes 269 270.. [#] The encoding string included in XML output should conform to the 271 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is 272 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl 273 and http://www.iana.org/assignments/character-sets .