/public_html/apidocs/pymine.beautifulsoup.BeautifulSoup.html
HTML | 285 lines | 198 code | 87 blank | 0 comment | 0 complexity | 3c41824f53c257653825649eb3bd4256 MD5 | raw file
Possible License(s): Apache-2.0, LGPL-2.1
- <!DOCTYPE html
- PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
- "DTD/xhtml1-strict.dtd">
- <html>
- <head>
- <title>API docs for “pymine.beautifulsoup.BeautifulSoup”</title>
- <meta content="text/html;charset=utf-8" http-equiv="Content-Type" />
- <link href="apidocs.css" type="text/css" rel="stylesheet" />
-
-
- </head>
- <body>
- <h1 class="module">Module p.b.BeautifulSoup</h1>
- <p>
- <span id="part">Part of <a href="pymine.html">pymine</a>.<a href="pymine.beautifulsoup.html">beautifulsoup</a></span>
-
-
- </p>
- <div>
-
- </div>
- <pre>Beautiful Soup
- Elixir and Tonic
- "The Screen-Scraper's Friend"
- http://www.crummy.com/software/BeautifulSoup/
- Beautiful Soup parses a (possibly invalid) XML or HTML document into a
- tree representation. It provides methods and Pythonic idioms that make
- it easy to navigate, search, and modify the tree.
- A well-formed XML/HTML document yields a well-formed data
- structure. An ill-formed XML/HTML document yields a correspondingly
- ill-formed data structure. If your document is only locally
- well-formed, you can use this library to find and process the
- well-formed part of it.
- Beautiful Soup works with Python 2.2 and up. It has no external
- dependencies, but you'll have more success at converting data to UTF-8
- if you also install these three packages:
- * chardet, for auto-detecting character encodings
- http://chardet.feedparser.org/
- * cjkcodecs and iconv_codec, which add more encodings to the ones supported
- by stock Python.
- http://cjkpython.i18n.org/
- Beautiful Soup defines classes for two main parsing strategies:
- * BeautifulStoneSoup, for parsing XML, SGML, or your domain-specific
- language that kind of looks like XML.
- * BeautifulSoup, for parsing run-of-the-mill HTML code, be it valid
- or invalid. This class has web browser-like heuristics for
- obtaining a sensible parse tree in the face of common HTML errors.
- Beautiful Soup also defines a class (UnicodeDammit) for autodetecting
- the encoding of an HTML or XML document, and converting it to
- Unicode. Much of this code is taken from Mark Pilgrim's Universal Feed Parser.
- For more than you ever wanted to know about Beautiful Soup, see the
- documentation:
- http://www.crummy.com/software/BeautifulSoup/documentation.html
- Here, have some legalese:
- Copyright (c) 2004-2009, Leonard Richardson
- All rights reserved.
- Redistribution and use in source and binary forms, with or without
- modification, are permitted provided that the following conditions are
- met:
- * Redistributions of source code must retain the above copyright
- notice, this list of conditions and the following disclaimer.
- * Redistributions in binary form must reproduce the above
- copyright notice, this list of conditions and the following
- disclaimer in the documentation and/or other materials provided
- with the distribution.
- * Neither the name of the the Beautiful Soup Consortium and All
- Night Kosher Bakery nor the names of its contributors may be
- used to endorse or promote products derived from this software
- without specific prior written permission.
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
- CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
- EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
- PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
- PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
- LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
- NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
- SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE, DAMMIT.</pre>
-
-
- <div id="splitTables">
- <table class="children sortable" id="id44">
-
-
-
-
- <tr class="function">
-
-
- <td>Function</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.html#_match_css_class">_match_css_class</a></td>
- <td><span>Build a RE to match the given CSS class.</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.PageElement.html">PageElement</a></td>
- <td><span>Contains the navigational information for some part of the page</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.NavigableString.html">NavigableString</a></td>
- <td><span class="undocumented">No class docstring; 2/5 methods documented</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.CData.html">CData</a></td>
- <td><span class="undocumented">Undocumented</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.ProcessingInstruction.html">ProcessingInstruction</a></td>
- <td><span class="undocumented">Undocumented</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.Comment.html">Comment</a></td>
- <td><span class="undocumented">Undocumented</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.Declaration.html">Declaration</a></td>
- <td><span class="undocumented">Undocumented</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.Tag.html">Tag</a></td>
- <td><span>Represents a found HTML tag with its attributes and contents.</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.SoupStrainer.html">SoupStrainer</a></td>
- <td><span>Encapsulates a number of ways of matching a markup element (tag or</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.ResultSet.html">ResultSet</a></td>
- <td><span>A ResultSet is just a list that keeps track of the SoupStrainer</span></td>
- </tr><tr class="function">
-
-
- <td>Function</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.html#buildTagMap">buildTagMap</a></td>
- <td><span>Turns a list of maps, lists, or scalars into a single map.</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.BeautifulStoneSoup.html">BeautifulStoneSoup</a></td>
- <td><span>This class contains the basic parser and search code. It defines</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.BeautifulSoup.html">BeautifulSoup</a></td>
- <td><span>This parser knows the following facts about HTML:</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.StopParsing.html">StopParsing</a></td>
- <td><span class="undocumented">Undocumented</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.ICantBelieveItsBeautifulSoup.html">ICantBelieveItsBeautifulSoup</a></td>
- <td><span>The BeautifulSoup class is oriented towards skipping over</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.MinimalSoup.html">MinimalSoup</a></td>
- <td><span>The MinimalSoup class is for parsing HTML that contains</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.BeautifulSOAP.html">BeautifulSOAP</a></td>
- <td><span>This class will push a tag with only a single string child into</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.RobustXMLParser.html">RobustXMLParser</a></td>
- <td><span class="undocumented">Undocumented</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.RobustHTMLParser.html">RobustHTMLParser</a></td>
- <td><span class="undocumented">Undocumented</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.RobustWackAssHTMLParser.html">RobustWackAssHTMLParser</a></td>
- <td><span class="undocumented">Undocumented</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.RobustInsanelyWackAssHTMLParser.html">RobustInsanelyWackAssHTMLParser</a></td>
- <td><span class="undocumented">Undocumented</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.SimplifyingSOAPParser.html">SimplifyingSOAPParser</a></td>
- <td><span class="undocumented">Undocumented</span></td>
- </tr><tr class="class">
-
-
- <td>Class</td>
- <td><a href="pymine.beautifulsoup.BeautifulSoup.UnicodeDammit.html">UnicodeDammit</a></td>
- <td><span>A class for detecting the encoding of a *ML document and</span></td>
- </tr>
-
- </table>
-
-
- </div>
-
-
-
- <div class="function">
- <a name="pymine.beautifulsoup.BeautifulSoup._match_css_class"></a>
- <a name="_match_css_class"></a>
- <div class="functionHeader">
-
- def _match_css_class(str):
-
- </div>
- <div class="functionBody">
-
- <div>Build a RE to match the given CSS class.<table class="fieldTable"></table></div>
- </div>
- </div><div class="function">
- <a name="pymine.beautifulsoup.BeautifulSoup.buildTagMap"></a>
- <a name="buildTagMap"></a>
- <div class="functionHeader">
-
- def buildTagMap(default, *args):
-
- </div>
- <div class="functionBody">
-
- <div>Turns a list of maps, lists, or scalars into a single map. Used to build
- the SELF_CLOSING_TAGS, NESTABLE_TAGS, and NESTING_RESET_TAGS maps out of
- lists and partial maps.<table class="fieldTable"></table></div>
- </div>
- </div>
-
- <address>
- <a href="index.html">API Documentation</a> for pymine, generated by <a href="http://codespeak.net/~mwh/pydoctor/">pydoctor</a> at 2010-04-07 23:15:24.
- </address>
- </body>
- </html>