
``tokenize`` --- Tokenizer for Python source
********************************************

The ``tokenize`` module provides a lexical scanner for Python source
code, implemented in Python.  The scanner in this module returns
comments as tokens as well, making it useful for implementing "pretty-
printers," including colorizers for on-screen displays.

The primary entry point is a *generator*:

tokenize.generate_tokens(readline)

   The ``generate_tokens()`` generator requires one argument,
   *readline*, which must be a callable object which provides the same
   interface as the ``readline()`` method of built-in file objects
   (see section *File Objects*).  Each call to the function should
   return one line of input as a string.

   The generator produces 5-tuples with these members: the token type;
   the token string; a 2-tuple ``(srow, scol)`` of ints specifying the
   row and column where the token begins in the source; a 2-tuple
   ``(erow, ecol)`` of ints specifying the row and column where the
   token ends in the source; and the line on which the token was
   found.  The line passed (the last tuple item) is the *logical*
   line; continuation lines are included.

   New in version 2.2.

An older entry point is retained for backward compatibility:

tokenize.tokenize(readline[, tokeneater])

   The ``tokenize()`` function accepts two parameters: one
   representing the input stream, and one providing an output
   mechanism for ``tokenize()``.

   The first parameter, *readline*, must be a callable object which
   provides the same interface as the ``readline()`` method of built-
   in file objects (see section *File Objects*).  Each call to the
   function should return one line of input as a string. Alternately,
   *readline* may be a callable object that signals completion by
   raising ``StopIteration``.

   Changed in version 2.5: Added ``StopIteration`` support.

   The second parameter, *tokeneater*, must also be a callable object.
   It is called once for each token, with five arguments,
   corresponding to the tuples generated by ``generate_tokens()``.

All constants from the ``token`` module are also exported from
``tokenize``, as are two additional token type values that might be
passed to the *tokeneater* function by ``tokenize()``:

tokenize.COMMENT

   Token value used to indicate a comment.

tokenize.NL

   Token value used to indicate a non-terminating newline.  The
   NEWLINE token indicates the end of a logical line of Python code;
   NL tokens are generated when a logical line of code is continued
   over multiple physical lines.

Another function is provided to reverse the tokenization process. This
is useful for creating tools that tokenize a script, modify the token
stream, and write back the modified script.

tokenize.untokenize(iterable)

   Converts tokens back into Python source code.  The *iterable* must
   return sequences with at least two elements, the token type and the
   token string.  Any additional sequence elements are ignored.

   The reconstructed script is returned as a single string.  The
   result is guaranteed to tokenize back to match the input so that
   the conversion is lossless and round-trips are assured.  The
   guarantee applies only to the token type and token string as the
   spacing between tokens (column positions) may change.

   New in version 2.5.

Example of a script re-writer that transforms float literals into
Decimal objects:

   def decistmt(s):
       """Substitute Decimals for floats in a string of statements.

       >>> from decimal import Decimal
       >>> s = 'print +21.3e-5*-.1234/81.7'
       >>> decistmt(s)
       "print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')"

       >>> exec(s)
       -3.21716034272e-007
       >>> exec(decistmt(s))
       -3.217160342717258261933904529E-7

       """
       result = []
       g = generate_tokens(StringIO(s).readline)   # tokenize the string
       for toknum, tokval, _, _, _  in g:
           if toknum == NUMBER and '.' in tokval:  # replace NUMBER tokens
               result.extend([
                   (NAME, 'Decimal'),
                   (OP, '('),
                   (STRING, repr(tokval)),
                   (OP, ')')
               ])
           else:
               result.append((toknum, tokval))
       return untokenize(result)
