|
| 1 | +% Module and documentation by Eric S. Raymond, 21 Dec 1998 |
| 2 | +\section{Standard Module \module{shlex}} |
| 3 | +\stmodindex{shlex} |
| 4 | +\label{module-shlex} |
| 5 | + |
| 6 | +The \code{shlex} class makes it easy to write lexical analyzers for |
| 7 | +simple syntaxes resembling that of the Unix shell. This will often |
| 8 | +be useful for writing minilanguages, e.g. in run control files for |
| 9 | +Python applications. |
| 10 | + |
| 11 | +\begin{classdesc}{shlex}{\optional{stream}} |
| 12 | +A \class{shlex} instance or subclass instance is a lexical analyzer |
| 13 | +object. The initialization argument, if present, specifies where to |
| 14 | +read characters from. It must be a file- or stream-like object with |
| 15 | +\method{read} and \method{readline} methods. If no argument is given, |
| 16 | +input will be taken from sys.stdin. |
| 17 | + |
| 18 | +\end{classdesc} |
| 19 | + |
| 20 | +\subsection{shlex Objects} |
| 21 | +\label{shlex-objects} |
| 22 | + |
| 23 | +A \class{shlex} instance has the following methods: |
| 24 | + |
| 25 | +\begin{methoddesc}{get_token}{} |
| 26 | +Return a token. If tokens have been stacked using \method{push_token}, |
| 27 | +pop a token off the stack. Otherwise, read one from the input stream. |
| 28 | +If reading encounters an immediate end-of-file, '' is returned. |
| 29 | +\end{methoddesc} |
| 30 | + |
| 31 | +\begin{methoddesc}{push_token}{str} |
| 32 | +Push the argument onto the token stack. |
| 33 | +\end{methoddesc} |
| 34 | + |
| 35 | +Instances of \class{shlex} subclasses have some public instance |
| 36 | +variables which either control lexical analysis or can be used |
| 37 | +for debugging: |
| 38 | + |
| 39 | +\begin{memberdesc}{commenters} |
| 40 | +The string of characters that are recognized as comment beginners. |
| 41 | +All characters from the comment beginner to end of line are ignored. |
| 42 | +Includes just '#' by default. |
| 43 | +\end{memberdesc} |
| 44 | + |
| 45 | +\begin{memberdesc}{wordchars} |
| 46 | +The string of characters that will accumulate into multi-character |
| 47 | +tokens. By default, includes all ASCII alphanumerics and underscore. |
| 48 | +\end{memberdesc} |
| 49 | + |
| 50 | +\begin{memberdesc}{whitespace} |
| 51 | +Characters that will be considered whitespace and skipped. Whitespace |
| 52 | +bounds tokens. By default, includes space and tab and linefeed and |
| 53 | +carriage-return. |
| 54 | +\end{memberdesc} |
| 55 | + |
| 56 | +\begin{memberdesc}{quotes} |
| 57 | +Characters that will be considered string quotes. The token |
| 58 | +accumulates until the same quote is encountered again (thus, different |
| 59 | +quote types protect each other as in the shall.) By default, includes |
| 60 | +ASCII single and double quotes. |
| 61 | +\end{memberdesc} |
| 62 | + |
| 63 | +Note that any character not declared to be a word character, |
| 64 | +whitespace, or a quote will be returned as a single-character token. |
| 65 | + |
| 66 | +Quote and comment characters are not recognized within words. Thus, |
| 67 | +the bare words ``ain't'' and ``ain#t'' would be returned as single |
| 68 | +tokens by the default parser. |
| 69 | + |
| 70 | +\begin{memberdesc}{lineno} |
| 71 | +Source line number (count of newlines seen so far plus one). |
| 72 | +\end{memberdesc} |
| 73 | + |
| 74 | +\begin{memberdesc}{token} |
| 75 | +The token buffer. It may be useful to examine this when catching exceptions. |
| 76 | +\end{memberdesc} |
| 77 | + |
0 commit comments