PageRenderTime 465ms CodeModel.GetById 242ms app.highlight 4ms RepoModel.GetById 217ms app.codeStats 0ms

/Doc/library/tokenize.rst

http://unladen-swallow.googlecode.com/
ReStructuredText | 121 lines | 86 code | 35 blank | 0 comment | 0 complexity | dd9f5c7cf5acf0f4604d65630437195c MD5 | raw file
  1
  2:mod:`tokenize` --- Tokenizer for Python source
  3===============================================
  4
  5.. module:: tokenize
  6   :synopsis: Lexical scanner for Python source code.
  7.. moduleauthor:: Ka Ping Yee
  8.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
  9
 10
 11The :mod:`tokenize` module provides a lexical scanner for Python source code,
 12implemented in Python.  The scanner in this module returns comments as tokens as
 13well, making it useful for implementing "pretty-printers," including colorizers
 14for on-screen displays.
 15
 16The primary entry point is a :term:`generator`:
 17
 18.. function:: generate_tokens(readline)
 19
 20   The :func:`generate_tokens` generator requires one argument, *readline*,
 21   which must be a callable object which provides the same interface as the
 22   :meth:`readline` method of built-in file objects (see section
 23   :ref:`bltin-file-objects`).  Each call to the function should return one line
 24   of input as a string.
 25
 26   The generator produces 5-tuples with these members: the token type; the token
 27   string; a 2-tuple ``(srow, scol)`` of ints specifying the row and column
 28   where the token begins in the source; a 2-tuple ``(erow, ecol)`` of ints
 29   specifying the row and column where the token ends in the source; and the
 30   line on which the token was found.  The line passed (the last tuple item) is
 31   the *logical* line; continuation lines are included.
 32
 33   .. versionadded:: 2.2
 34
 35An older entry point is retained for backward compatibility:
 36
 37
 38.. function:: tokenize(readline[, tokeneater])
 39
 40   The :func:`tokenize` function accepts two parameters: one representing the input
 41   stream, and one providing an output mechanism for :func:`tokenize`.
 42
 43   The first parameter, *readline*, must be a callable object which provides the
 44   same interface as the :meth:`readline` method of built-in file objects (see
 45   section :ref:`bltin-file-objects`).  Each call to the function should return one
 46   line of input as a string. Alternately, *readline* may be a callable object that
 47   signals completion by raising :exc:`StopIteration`.
 48
 49   .. versionchanged:: 2.5
 50      Added :exc:`StopIteration` support.
 51
 52   The second parameter, *tokeneater*, must also be a callable object.  It is
 53   called once for each token, with five arguments, corresponding to the tuples
 54   generated by :func:`generate_tokens`.
 55
 56All constants from the :mod:`token` module are also exported from
 57:mod:`tokenize`, as are two additional token type values that might be passed to
 58the *tokeneater* function by :func:`tokenize`:
 59
 60
 61.. data:: COMMENT
 62
 63   Token value used to indicate a comment.
 64
 65
 66.. data:: NL
 67
 68   Token value used to indicate a non-terminating newline.  The NEWLINE token
 69   indicates the end of a logical line of Python code; NL tokens are generated when
 70   a logical line of code is continued over multiple physical lines.
 71
 72Another function is provided to reverse the tokenization process. This is useful
 73for creating tools that tokenize a script, modify the token stream, and write
 74back the modified script.
 75
 76
 77.. function:: untokenize(iterable)
 78
 79   Converts tokens back into Python source code.  The *iterable* must return
 80   sequences with at least two elements, the token type and the token string.  Any
 81   additional sequence elements are ignored.
 82
 83   The reconstructed script is returned as a single string.  The result is
 84   guaranteed to tokenize back to match the input so that the conversion is
 85   lossless and round-trips are assured.  The guarantee applies only to the token
 86   type and token string as the spacing between tokens (column positions) may
 87   change.
 88
 89   .. versionadded:: 2.5
 90
 91Example of a script re-writer that transforms float literals into Decimal
 92objects::
 93
 94   def decistmt(s):
 95       """Substitute Decimals for floats in a string of statements.
 96
 97       >>> from decimal import Decimal
 98       >>> s = 'print +21.3e-5*-.1234/81.7'
 99       >>> decistmt(s)
100       "print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')"
101
102       >>> exec(s)
103       -3.21716034272e-007
104       >>> exec(decistmt(s))
105       -3.217160342717258261933904529E-7
106
107       """
108       result = []
109       g = generate_tokens(StringIO(s).readline)   # tokenize the string
110       for toknum, tokval, _, _, _  in g:
111           if toknum == NUMBER and '.' in tokval:  # replace NUMBER tokens
112               result.extend([
113                   (NAME, 'Decimal'),
114                   (OP, '('),
115                   (STRING, repr(tokval)),
116                   (OP, ')')
117               ])
118           else:
119               result.append((toknum, tokval))
120       return untokenize(result)
121