|
| 1 | +\section{\module{tokenize} --- |
| 2 | + Tokenizer for Python source} |
| 3 | + |
| 4 | +\declaremodule{standard}{tokenize} |
| 5 | +\modulesynopsis{Lexical scanner for Python source code.} |
| 6 | +\moduleauthor{Ka Ping Yee}{} |
| 7 | +\sectionauthor{Fred L. Drake, Jr.}{ [email protected]} |
| 8 | + |
| 9 | + |
| 10 | +The \module{tokenize} module provides a lexical scanner for Python |
| 11 | +source code, implemented in Python. The scanner in this module |
| 12 | +returns comments as tokens as well, making it useful for implementing |
| 13 | +``pretty-printers,'' including colorizers for on-screen displays. |
| 14 | + |
| 15 | +The scanner is exposed via single function: |
| 16 | + |
| 17 | + |
| 18 | +\begin{funcdesc}{tokenize}{readline\optional{, tokeneater}} |
| 19 | + The \function{tokenize()} function accepts two parameters: one |
| 20 | + representing the input stream, and one providing an output mechanism |
| 21 | + for \function{tokenize()}. |
| 22 | + |
| 23 | + The first parameter, \var{readline}, must be a callable object which |
| 24 | + provides the same interface as \method{readline()} method of |
| 25 | + built-in file objects (see section~\ref{bltin-file-objects}). Each |
| 26 | + call to the function should return one line of input as a string. |
| 27 | + |
| 28 | + The second parameter, \var{tokeneater}, must also be a callable |
| 29 | + object. It is called with five parameters: the token type, the |
| 30 | + token string, a tuple \code{(\var{srow}, \var{scol})} specifying the |
| 31 | + row and column where the token begins in the source, a tuple |
| 32 | + \code{(\var{erow}, \var{ecol})} giving the ending position of the |
| 33 | + token, and the line on which the token was found. The line passed |
| 34 | + is the \emph{logical} line; continuation lines are included. |
| 35 | +\end{funcdesc} |
| 36 | + |
| 37 | + |
| 38 | +All constants from the \refmodule{token} module are also exported from |
| 39 | +\module{tokenize}, as is one additional token type value that might be |
| 40 | +passed to the \var{tokeneater} function by \function{tokenize()}: |
| 41 | + |
| 42 | +\begin{datadesc}{COMMENT} |
| 43 | + Token value used to indicate a comment. |
| 44 | +\end{datadesc} |
0 commit comments