@@ -4,43 +4,67 @@ \section{\module{shlex} ---
44\declaremodule {standard}{shlex}
55\modulesynopsis {Simple lexical analysis for \UNIX \ shell-like languages.}
66\moduleauthor {Eric S. Raymond}{
[email protected] }
7+ \moduleauthor {Gustavo Niemeyer}{
[email protected] }
78\sectionauthor {Eric S. Raymond}{
[email protected] }
9+ \sectionauthor {Gustavo Niemeyer}{
[email protected] }
810
911\versionadded {1.5.2}
1012
1113The \class {shlex} class makes it easy to write lexical analyzers for
1214simple syntaxes resembling that of the \UNIX {} shell. This will often
13- be useful for writing minilanguages, e.g.\ in run control files for
14- Python applications.
15-
16- \begin {classdesc }{shlex}{\optional {stream\optional {, file}}}
17- A \class {shlex} instance or subclass instance is a lexical analyzer
18- object. The initialization argument, if present, specifies where to
19- read characters from. It must be a file- or stream-like object with
20- \method {read()} and \method {readline()} methods. If no argument is given,
21- input will be taken from \code {sys.stdin}. The second optional
22- argument is a filename string, which sets the initial value of the
23- \member {infile} member. If the stream argument is omitted or
24- equal to \code {sys.stdin}, this second argument defaults to `` stdin'' .
25- \end {classdesc }
26-
15+ be useful for writing minilanguages, (e.g. in run control files for
16+ Python applications) or for parsing quoted strings.
2717
2818\begin {seealso }
2919 \seemodule {ConfigParser}{Parser for configuration files similar to the
3020 Windows \file {.ini} files.}
3121\end {seealso }
3222
3323
24+ \subsection {Module Contents }
25+
26+ The \module {shlex} module defines the following functions:
27+
28+ \begin {funcdesc }{split}{s\optional {, posix=\code {True}\optional {,
29+ spaces=\code {True}}}}
30+ Split the string \var {s} using shell-like syntax. If \code {posix} is
31+ \code {True}, operate in posix mode. If \code {spaces} is \code {True}, it
32+ will only split words in whitespaces (setting the
33+ \member {whitespace_split} member of the \class {shlex} instance).
34+ \versionadded {2.3}
35+ \end {funcdesc }
36+
37+ The \module {shlex} module defines the following classes:
38+
39+ \begin {classdesc }{shlex}{\optional {instream=\code {sys.stdin}\optional {,
40+ infile=\code {None}\optional {,
41+ posix=\code {False}}}}}
42+ A \class {shlex} instance or subclass instance is a lexical analyzer
43+ object. The initialization argument, if present, specifies where to
44+ read characters from. It must be a file-/stream-like object with
45+ \method {read()} and \method {readline()} methods, or a string (strings
46+ are accepted since Python 2.3). If no argument is given, input will be
47+ taken from \code {sys.stdin}. The second optional argument is a filename
48+ string, which sets the initial value of the \member {infile} member. If
49+ the \var {instream} argument is omitted or equal to \code {sys.stdin},
50+ this second argument defaults to `` stdin'' . The \var {posix} argument
51+ was introduced in Python 2.3, and defines the operational mode. When
52+ \var {posix} is not true (default), the \class {shlex} instance will
53+ operate in compatibility mode. When operating in posix mode,
54+ \class {shlex} will try to be as close as possible to the posix shell
55+ parsing rules. See~\ref {shlex-objects }.
56+ \end {classdesc }
57+
3458\subsection {shlex Objects \label {shlex-objects } }
3559
3660A \class {shlex} instance has the following methods:
3761
38-
3962\begin {methoddesc }{get_token}{}
4063Return a token. If tokens have been stacked using
4164\method {push_token()}, pop a token off the stack. Otherwise, read one
4265from the input stream. If reading encounters an immediate
43- end-of-file, an empty string is returned.
66+ end-of-file, \member {self.eof} is returned (the empty string (\code {""})
67+ in non-posix mode, and \code {None} in posix mode).
4468\end {methoddesc }
4569
4670\begin {methoddesc }{push_token}{str}
@@ -132,13 +156,33 @@ \subsection{shlex Objects \label{shlex-objects}}
132156carriage-return.
133157\end {memberdesc }
134158
159+ \begin {memberdesc }{escape}
160+ Characters that will be considered as escape. This will be only used
161+ in posix mode, and includes just \character {\textbackslash } by default.
162+ \versionadded {2.3}
163+ \end {memberdesc }
164+
135165\begin {memberdesc }{quotes}
136166Characters that will be considered string quotes. The token
137167accumulates until the same quote is encountered again (thus, different
138168quote types protect each other as in the shell.) By default, includes
139169\ASCII {} single and double quotes.
140170\end {memberdesc }
141171
172+ \begin {memberdesc }{escapedquotes}
173+ Characters in \member {quotes} that will interpret escape characters
174+ defined in \member {escape}. This is only used in posix mode, and includes
175+ just \character {"} by default.
176+ \versionadded {2.3}
177+ \end {memberdesc }
178+
179+ \begin {memberdesc }{whitespace_split}
180+ If true, tokens will only be split in whitespaces. This is useful, for
181+ example, for parsing command lines with \class {shlex}, getting tokens
182+ in a similar way to shell arguments.
183+ \versionadded {2.3}
184+ \end {memberdesc }
185+
142186\begin {memberdesc }{infile}
143187The name of the current input file, as initially set at class
144188instantiation time or stacked by later source requests. It may
@@ -168,13 +212,6 @@ \subsection{shlex Objects \label{shlex-objects}}
168212details.
169213\end {memberdesc }
170214
171- Note that any character not declared to be a word character,
172- whitespace, or a quote will be returned as a single-character token.
173-
174- Quote and comment characters are not recognized within words. Thus,
175- the bare words \samp {ain't} and \samp {ain\# t} would be returned as single
176- tokens by the default parser.
177-
178215\begin {memberdesc }{lineno}
179216Source line number (count of newlines seen so far plus one).
180217\end {memberdesc }
@@ -183,3 +220,56 @@ \subsection{shlex Objects \label{shlex-objects}}
183220The token buffer. It may be useful to examine this when catching
184221exceptions.
185222\end {memberdesc }
223+
224+ \begin {memberdesc }{eof}
225+ Token used to determine end of file. This will be set to the empty
226+ string (\code {""}), in non-posix mode, and to \code {None} in posix
227+ mode.
228+ \versionadded {2.3}
229+ \end {memberdesc }
230+
231+ \subsection {Parsing Rules\label {shlex-parsing-rules } }
232+
233+ When operating in non-posix mode, \class {shlex} with try to obey to the
234+ following rules.
235+
236+ \begin {itemize }
237+ \item Quote characters are not recognized within words
238+ (\code {Do"Not"Separate} is parsed as the single word
239+ \code {Do"Not"Separate});
240+ \item Escape characters are not recognized;
241+ \item Enclosing characters in quotes preserve the literal value of
242+ all characters within the quotes;
243+ \item Closing quotes separate words (\code {"Do"Separate} is parsed
244+ as \code {"Do"} and \code {Separate});
245+ \item If \member {whitespace_split} is \code {False}, any character not
246+ declared to be a word character, whitespace, or a quote will be
247+ returned as a single-character token. If it is \code {True},
248+ \class {shlex} will only split words in whitespaces;
249+ \item EOF is signaled with an empty string (\code {""});
250+ \item It's not possible to parse empty strings, even if quoted.
251+ \end {itemize }
252+
253+ When operating in posix mode, \class {shlex} will try to obey to the
254+ following parsing rules.
255+
256+ \begin {itemize }
257+ \item Quotes are stripped out, and do not separate words
258+ (\code {"Do"Not"Separate"} is parsed as the single word
259+ \code {DoNotSeparate});
260+ \item Non-quoted escape characters (e.g. \character {\textbackslash })
261+ preserve the literal value of the next character that follows;
262+ \item Enclosing characters in quotes which are not part of
263+ \member {escapedquotes} (e.g. \character {'}) preserve the literal
264+ value of all characters within the quotes;
265+ \item Enclosing characters in quotes which are part of
266+ \member {escapedquotes} (e.g. \character {"}) preserves the literal
267+ value of all characters within the quotes, with the exception of
268+ the characters mentioned in \member {escape}. The escape characters
269+ retain its special meaning only when followed by the quote in use,
270+ or the escape character itself. Otherwise the escape character
271+ will be considered a normal character.
272+ \item EOF is signaled with a \code {None} value;
273+ \item Quoted empty strings (\code {""}) are allowed;
274+ \end {itemize }
275+
0 commit comments