/doc/tagger.Reader-class.html

http://github.com/apresta/tagger · HTML · 299 lines · 258 code · 19 blank · 22 comment · 0 complexity · 6f28b3f54ef6d1170b8206ff6d5684ef MD5 · raw file

  1. <?xml version="1.0" encoding="ascii"?>
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  3. "DTD/xhtml1-transitional.dtd">
  4. <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  5. <head>
  6. <title>tagger.Reader</title>
  7. <link rel="stylesheet" href="epydoc.css" type="text/css" />
  8. <script type="text/javascript" src="epydoc.js"></script>
  9. </head>
  10. <body bgcolor="white" text="black" link="blue" vlink="#204080"
  11. alink="#204080">
  12. <!-- ==================== NAVIGATION BAR ==================== -->
  13. <table class="navbar" border="0" width="100%" cellpadding="0"
  14. bgcolor="#a0c0ff" cellspacing="0">
  15. <tr valign="middle">
  16. <!-- Tree link -->
  17. <th>&nbsp;&nbsp;&nbsp;<a
  18. href="module-tree.html">Trees</a>&nbsp;&nbsp;&nbsp;</th>
  19. <!-- Index link -->
  20. <th>&nbsp;&nbsp;&nbsp;<a
  21. href="identifier-index.html">Indices</a>&nbsp;&nbsp;&nbsp;</th>
  22. <!-- Help link -->
  23. <th>&nbsp;&nbsp;&nbsp;<a
  24. href="help.html">Help</a>&nbsp;&nbsp;&nbsp;</th>
  25. <!-- Project homepage -->
  26. <th class="navbar" align="right" width="100%">
  27. <table border="0" cellpadding="0" cellspacing="0">
  28. <tr><th class="navbar" align="center"
  29. ><a class="navbar" target="_top" href="http://github.com/apresta/tagger">tagger</a></th>
  30. </tr></table></th>
  31. </tr>
  32. </table>
  33. <table width="100%" cellpadding="0" cellspacing="0">
  34. <tr valign="top">
  35. <td width="100%">
  36. <span class="breadcrumbs">
  37. <a href="tagger-module.html">Module&nbsp;tagger</a> ::
  38. Class&nbsp;Reader
  39. </span>
  40. </td>
  41. <td>
  42. <table cellpadding="0" cellspacing="0">
  43. <!-- hide/show private -->
  44. <tr><td align="right"><span class="options">[<a href="javascript:void(0);" class="privatelink"
  45. onclick="toggle_private();">hide&nbsp;private</a>]</span></td></tr>
  46. <tr><td align="right"><span class="options"
  47. >[<a href="frames.html" target="_top">frames</a
  48. >]&nbsp;|&nbsp;<a href="tagger.Reader-class.html"
  49. target="_top">no&nbsp;frames</a>]</span></td></tr>
  50. </table>
  51. </td>
  52. </tr>
  53. </table>
  54. <!-- ==================== CLASS DESCRIPTION ==================== -->
  55. <h1 class="epydoc">Class Reader</h1><p class="nomargin-top"><span class="codelink"><a href="tagger-pysrc.html#Reader">source&nbsp;code</a></span></p>
  56. <center>
  57. <center> <map id="class_hierarchy_for_reader" name="class_hierarchy_for_reader">
  58. <area shape="rect" id="node1" href="extras.HTMLReader-class.html" title="extras.HTMLReader" alt="" coords="188,117,335,144"/>
  59. <area shape="rect" id="node2" href="extras.SimpleReader-class.html" title="extras.SimpleReader" alt="" coords="5,61,157,88"/>
  60. <area shape="rect" id="node3" href="extras.UnicodeReader-class.html" title="extras.UnicodeReader" alt="" coords="181,61,341,88"/>
  61. <area shape="rect" id="node4" href="tagger.Reader-class.html" title="Reader" alt="" coords="137,5,204,32"/>
  62. </map>
  63. <img src="class_hierarchy_for_reader.gif" alt='' usemap="#class_hierarchy_for_reader" ismap="ismap" class="graph-without-title" />
  64. </center>
  65. </center>
  66. <hr />
  67. <p>Class for parsing a string of text to obtain tags</p>
  68. <p>(it just turns the string to lowercase and splits it according to
  69. whitespaces and punctuation, identifying proper nouns and terminal words;
  70. different rules and formats other than plain text could be used)</p>
  71. <!-- ==================== INSTANCE METHODS ==================== -->
  72. <a name="section-InstanceMethods"></a>
  73. <table class="summary" border="1" cellpadding="3"
  74. cellspacing="0" width="100%" bgcolor="white">
  75. <tr bgcolor="#70b0f0" class="table-header">
  76. <td colspan="2" class="table-header">
  77. <table border="0" cellpadding="0" cellspacing="0" width="100%">
  78. <tr valign="top">
  79. <td align="left"><span class="table-header">Instance Methods</span></td>
  80. <td align="right" valign="top"
  81. ><span class="options">[<a href="#section-InstanceMethods"
  82. class="privatelink" onclick="toggle_private();"
  83. >hide private</a>]</span></td>
  84. </tr>
  85. </table>
  86. </td>
  87. </tr>
  88. <tr>
  89. <td width="15%" align="right" valign="top" class="summary">
  90. <span class="summary-type">&nbsp;</span>
  91. </td><td class="summary">
  92. <table width="100%" cellpadding="0" cellspacing="0" border="0">
  93. <tr>
  94. <td><span class="summary-sig"><a href="tagger.Reader-class.html#__call__" class="summary-sig-name">__call__</a>(<span class="summary-sig-arg">self</span>,
  95. <span class="summary-sig-arg">text</span>)</span><br />
  96. Returns:
  97. a list of tags respecting the order in the text</td>
  98. <td align="right" valign="top">
  99. <span class="codelink"><a href="tagger-pysrc.html#Reader.__call__">source&nbsp;code</a></span>
  100. </td>
  101. </tr>
  102. </table>
  103. </td>
  104. </tr>
  105. <tr>
  106. <td width="15%" align="right" valign="top" class="summary">
  107. <span class="summary-type">&nbsp;</span>
  108. </td><td class="summary">
  109. <table width="100%" cellpadding="0" cellspacing="0" border="0">
  110. <tr>
  111. <td><span class="summary-sig"><a href="tagger.Reader-class.html#preprocess" class="summary-sig-name">preprocess</a>(<span class="summary-sig-arg">self</span>,
  112. <span class="summary-sig-arg">text</span>)</span><br />
  113. Returns:
  114. the processed text</td>
  115. <td align="right" valign="top">
  116. <span class="codelink"><a href="tagger-pysrc.html#Reader.preprocess">source&nbsp;code</a></span>
  117. </td>
  118. </tr>
  119. </table>
  120. </td>
  121. </tr>
  122. </table>
  123. <!-- ==================== CLASS VARIABLES ==================== -->
  124. <a name="section-ClassVariables"></a>
  125. <table class="summary" border="1" cellpadding="3"
  126. cellspacing="0" width="100%" bgcolor="white">
  127. <tr bgcolor="#70b0f0" class="table-header">
  128. <td colspan="2" class="table-header">
  129. <table border="0" cellpadding="0" cellspacing="0" width="100%">
  130. <tr valign="top">
  131. <td align="left"><span class="table-header">Class Variables</span></td>
  132. <td align="right" valign="top"
  133. ><span class="options">[<a href="#section-ClassVariables"
  134. class="privatelink" onclick="toggle_private();"
  135. >hide private</a>]</span></td>
  136. </tr>
  137. </table>
  138. </td>
  139. </tr>
  140. <tr>
  141. <td width="15%" align="right" valign="top" class="summary">
  142. <span class="summary-type">&nbsp;</span>
  143. </td><td class="summary">
  144. <a name="match_apostrophes"></a><span class="summary-name">match_apostrophes</span> = <code title="re.compile(r'`|\xe2\x80\x99')">re.compile(r'`<code class="re-op">|</code>\xe2\x80\x99')</code>
  145. </td>
  146. </tr>
  147. <tr>
  148. <td width="15%" align="right" valign="top" class="summary">
  149. <span class="summary-type">&nbsp;</span>
  150. </td><td class="summary">
  151. <a name="match_paragraphs"></a><span class="summary-name">match_paragraphs</span> = <code title="re.compile(r'[\.\?!\t\n\r\f\v]+')">re.compile(r'<code class="re-group">[</code>\.\?!\t\n\r\f\v<code class="re-group">]</code><code class="re-op">+</code>')</code>
  152. </td>
  153. </tr>
  154. <tr>
  155. <td width="15%" align="right" valign="top" class="summary">
  156. <span class="summary-type">&nbsp;</span>
  157. </td><td class="summary">
  158. <a name="match_phrases"></a><span class="summary-name">match_phrases</span> = <code title="re.compile(r'[,;:\(\)\[\]\{\}&lt;&gt;]+')">re.compile(r'<code class="re-group">[</code>,;:\(\)\[\]\{\}&lt;&gt;<code class="re-group">]</code><code class="re-op">+</code>')</code>
  159. </td>
  160. </tr>
  161. <tr>
  162. <td width="15%" align="right" valign="top" class="summary">
  163. <span class="summary-type">&nbsp;</span>
  164. </td><td class="summary">
  165. <a name="match_words"></a><span class="summary-name">match_words</span> = <code title="re.compile(r'[\w-\'_/&amp;]+')">re.compile(r'<code class="re-group">[</code>\w-\'_/&amp;<code class="re-group">]</code><code class="re-op">+</code>')</code>
  166. </td>
  167. </tr>
  168. </table>
  169. <!-- ==================== METHOD DETAILS ==================== -->
  170. <a name="section-MethodDetails"></a>
  171. <table class="details" border="1" cellpadding="3"
  172. cellspacing="0" width="100%" bgcolor="white">
  173. <tr bgcolor="#70b0f0" class="table-header">
  174. <td colspan="2" class="table-header">
  175. <table border="0" cellpadding="0" cellspacing="0" width="100%">
  176. <tr valign="top">
  177. <td align="left"><span class="table-header">Method Details</span></td>
  178. <td align="right" valign="top"
  179. ><span class="options">[<a href="#section-MethodDetails"
  180. class="privatelink" onclick="toggle_private();"
  181. >hide private</a>]</span></td>
  182. </tr>
  183. </table>
  184. </td>
  185. </tr>
  186. </table>
  187. <a name="__call__"></a>
  188. <div>
  189. <table class="details" border="1" cellpadding="3"
  190. cellspacing="0" width="100%" bgcolor="white">
  191. <tr><td>
  192. <table width="100%" cellpadding="0" cellspacing="0" border="0">
  193. <tr valign="top"><td>
  194. <h3 class="epydoc"><span class="sig"><span class="sig-name">__call__</span>(<span class="sig-arg">self</span>,
  195. <span class="sig-arg">text</span>)</span>
  196. <br /><em class="fname">(Call operator)</em>
  197. </h3>
  198. </td><td align="right" valign="top"
  199. ><span class="codelink"><a href="tagger-pysrc.html#Reader.__call__">source&nbsp;code</a></span>&nbsp;
  200. </td>
  201. </tr></table>
  202. <dl class="fields">
  203. <dt>Parameters:</dt>
  204. <dd><ul class="nomargin-top">
  205. <li><strong class="pname"><code>text</code></strong> - the string of text to be tagged</li>
  206. </ul></dd>
  207. <dt>Returns:</dt>
  208. <dd>a list of tags respecting the order in the text</dd>
  209. </dl>
  210. </td></tr></table>
  211. </div>
  212. <a name="preprocess"></a>
  213. <div>
  214. <table class="details" border="1" cellpadding="3"
  215. cellspacing="0" width="100%" bgcolor="white">
  216. <tr><td>
  217. <table width="100%" cellpadding="0" cellspacing="0" border="0">
  218. <tr valign="top"><td>
  219. <h3 class="epydoc"><span class="sig"><span class="sig-name">preprocess</span>(<span class="sig-arg">self</span>,
  220. <span class="sig-arg">text</span>)</span>
  221. </h3>
  222. </td><td align="right" valign="top"
  223. ><span class="codelink"><a href="tagger-pysrc.html#Reader.preprocess">source&nbsp;code</a></span>&nbsp;
  224. </td>
  225. </tr></table>
  226. <dl class="fields">
  227. <dt>Parameters:</dt>
  228. <dd><ul class="nomargin-top">
  229. <li><strong class="pname"><code>text</code></strong> - a string containing the text document to perform any required
  230. transformation before splitting</li>
  231. </ul></dd>
  232. <dt>Returns:</dt>
  233. <dd>the processed text</dd>
  234. </dl>
  235. </td></tr></table>
  236. </div>
  237. <br />
  238. <!-- ==================== NAVIGATION BAR ==================== -->
  239. <table class="navbar" border="0" width="100%" cellpadding="0"
  240. bgcolor="#a0c0ff" cellspacing="0">
  241. <tr valign="middle">
  242. <!-- Tree link -->
  243. <th>&nbsp;&nbsp;&nbsp;<a
  244. href="module-tree.html">Trees</a>&nbsp;&nbsp;&nbsp;</th>
  245. <!-- Index link -->
  246. <th>&nbsp;&nbsp;&nbsp;<a
  247. href="identifier-index.html">Indices</a>&nbsp;&nbsp;&nbsp;</th>
  248. <!-- Help link -->
  249. <th>&nbsp;&nbsp;&nbsp;<a
  250. href="help.html">Help</a>&nbsp;&nbsp;&nbsp;</th>
  251. <!-- Project homepage -->
  252. <th class="navbar" align="right" width="100%">
  253. <table border="0" cellpadding="0" cellspacing="0">
  254. <tr><th class="navbar" align="center"
  255. ><a class="navbar" target="_top" href="http://github.com/apresta/tagger">tagger</a></th>
  256. </tr></table></th>
  257. </tr>
  258. </table>
  259. <table border="0" cellpadding="0" cellspacing="0" width="100%%">
  260. <tr>
  261. <td align="left" class="footer">
  262. Generated by Epydoc 3.0.1 on Fri May 13 11:13:08 2011
  263. </td>
  264. <td align="right" class="footer">
  265. <a target="mainFrame" href="http://epydoc.sourceforge.net"
  266. >http://epydoc.sourceforge.net</a>
  267. </td>
  268. </tr>
  269. </table>
  270. <script type="text/javascript">
  271. <!--
  272. // Private objects are initially displayed (because if
  273. // javascript is turned off then we want them to be
  274. // visible); but by default, we want to hide them. So hide
  275. // them unless we have a cookie that says to show them.
  276. checkCookie();
  277. // -->
  278. </script>
  279. </body>
  280. </html>