Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 48d0437

Browse files
committed
AMK's latest version.
1 parent 7980826 commit 48d0437

2 files changed

Lines changed: 186 additions & 162 deletions

File tree

Doc/lib/libre.tex

Lines changed: 93 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,7 @@ \section{Built-in Module \sectcode{re}}
44
\bimodindex{re}
55

66
% XXX Remove before 1.5final release.
7-
{\large\bf The \code{re} module is still in the process of being
8-
developed, and more features will be added in future 1.5 alphas and
9-
betas. This documentation is also preliminary and incomplete. If you
7+
{\large\bf This documentation is also preliminary and incomplete. If you
108
find a bug or documentation error, or just find something unclear,
119
please send a message to
1210
\code{[email protected]}, and we'll fix it.}
@@ -53,7 +51,7 @@ \section{Built-in Module \sectcode{re}}
5351
%Similarly, a backslash followed by a digit 0-7 should be doubled to
5452
%avoid interpretation as an octal escape.
5553

56-
\subsection{Regular Expressions}
54+
\subsection{Regular Expression Syntax}
5755

5856
A regular expression (or RE) specifies a set of strings that matches
5957
it; the functions in this module let you check if a particular string
@@ -92,9 +90,10 @@ \subsection{Regular Expressions}
9290
specified, this matches any character including a newline.
9391
\item[\code{\^}] (Caret.) Matches the start of the string, and in
9492
\code{MULTILINE} mode also immediately after each newline.
95-
\item[\code{\$}] Matches the end of the string.
93+
\item[\code{\$}] Matches the end of the string, and in
94+
\code{MULTILINE} mode also matches before a newline.
9695
\code{foo} matches both 'foo' and 'foobar', while the regular
97-
expression '\code{foo\$}' matches only 'foo'.
96+
expression \code{foo\$} matches only 'foo'.
9897
%
9998
\item[\code{*}] Causes the resulting RE to
10099
match 0 or more repetitions of the preceding RE, as many repetitions
@@ -130,17 +129,18 @@ \subsection{Regular Expressions}
130129
subsequent character are included in the resulting string. However,
131130
if Python would recognize the resulting sequence, the backslash should
132131
be repeated twice. This is complicated and hard to understand, so
133-
it's highly recommended that you use raw strings.
132+
it's highly recommended that you use raw strings for all but the simplest expressions.
134133
%
135134
\item[\code{[]}] Used to indicate a set of characters. Characters can
136-
be listed individually, or a range is indicated by giving two
137-
characters and separating them by a '-'. Special characters are not
138-
active inside sets. For example, \code{[akm\$]} will match any of the
139-
characters 'a', 'k', 'm', or '\$'; \code{[a-z]} will match any
140-
lowercase letter and \code{[a-zA-Z0-9]} matches any letter or digit.
141-
Character classes of the form \code{\e \var{X}} defined below are also acceptable.
142-
If you want to include a \code{]} or a \code{-} inside a
143-
set, precede it with a backslash.
135+
be listed individually, or a range of characters can be indicated by
136+
giving two characters and separating them by a '-'. Special
137+
characters are not active inside sets. For example, \code{[akm\$]}
138+
will match any of the characters 'a', 'k', 'm', or '\$'; \code{[a-z]}
139+
will match any lowercase letter and \code{[a-zA-Z0-9]} matches any
140+
letter or digit. Character classes such as \code{\e w} or \code {\e
141+
S} (defined below) are also acceptable inside a range. If you want to
142+
include a \code{]} or a \code{-} inside a set, precede it with a
143+
backslash.
144144

145145
Characters \emph{not} within a range can be matched by including a
146146
\code{\^} as the first character of the set; \code{\^} elsewhere will
@@ -151,11 +151,11 @@ \subsection{Regular Expressions}
151151
be used inside groups (see below) as well. To match a literal '|',
152152
use \code{\e|}, or enclose it inside a character class, like \code{[|]}.
153153
%
154-
\item[\code{(...)}] Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the
155-
contents of a group can be retrieved after a match has been performed,
156-
and can be matched later in the string with the
157-
\code{\e \var{number}} special sequence, described below. To match the
158-
literals '(' or ')',
154+
\item[\code{(...)}] Matches whatever regular expression is inside the
155+
parentheses, and indicates the start and end of a group; the contents
156+
of a group can be retrieved after a match has been performed, and can
157+
be matched later in the string with the \code{\e \var{number}} special
158+
sequence, described below. To match the literals '(' or ')',
159159
use \code{\e(} or \code{\e)}, or enclose them inside a character
160160
class: \code{[(] [)]}.
161161
%
@@ -167,9 +167,9 @@ \subsection{Regular Expressions}
167167
\item[\code{(?iLmsx)}] (One or more letters from the set 'i', 'L', 'm', 's',
168168
'x'.) The group matches the empty string; the letters set the
169169
corresponding flags (re.I, re.L, re.M, re.S, re.X) for the entire regular
170-
expression. (The flag 'L' is uppercase because it is not in standard Perl.)
171-
This is useful if you wish include the flags as part of the regular
172-
expression, instead of passing a \var{flag} argument to the \code{compile} function.
170+
expression. This is useful if you wish include the flags as part of
171+
the regular expression, instead of passing a \var{flag} argument to
172+
the \code{compile} function.
173173
%
174174
\item[\code{(?:...)}] A non-grouping version of regular parentheses.
175175
Matches whatever's inside the parentheses, but the text matched by the
@@ -183,12 +183,14 @@ \subsection{Regular Expressions}
183183
named. So the group named 'id' in the example above can also be
184184
referenced as the numbered group 1.
185185

186-
For example, if the pattern string is
187-
\code{r'(?P<id>[a-zA-Z_]\e w*)'}, the group can be referenced by its
186+
For example, if the pattern is
187+
\code{(?P<id>[a-zA-Z_]\e w*)}, the group can be referenced by its
188188
name in arguments to methods of match objects, such as \code{m.group('id')}
189189
or \code{m.end('id')}, and also by name in pattern text (e.g. \code{(?P=id)}) and
190190
replacement text (e.g. \code{\e g<id>}).
191191
%
192+
\item[\code{(?P=\var{name})}] Matches whatever text was matched by the earlier group named \var{name}.
193+
%
192194
\item[\code{(?\#...)}] A comment; the contents of the parentheses are simply ignored.
193195
%
194196
\item[\code{(?=...)}] Matches if \code{...} matches next, but doesn't consume any of the string. This is called a lookahead assertion. For example,
@@ -203,8 +205,7 @@ \subsection{Regular Expressions}
203205
The special sequences consist of '\code{\e}' and a character from the
204206
list below. If the ordinary character is not on the list, then the
205207
resulting RE will match the second character. For example,
206-
\code{\e\$} matches the character '\$'. Ones where the backslash
207-
should be doubled are indicated.
208+
\code{\e\$} matches the character '\$'.
208209

209210
\begin{itemize}
210211

@@ -222,7 +223,9 @@ \subsection{Regular Expressions}
222223
\item[\code{\e b}] Matches the empty string, but only at the
223224
beginning or end of a word. A word is defined as a sequence of
224225
alphanumeric characters, so the end of a word is indicated by
225-
whitespace or a non-alphanumeric character.
226+
whitespace or a non-alphanumeric character. Inside a character range,
227+
\code{\e b} represents the backspace character, for compatibility with
228+
Python's string literals.
226229
%
227230
\item[\code{\e B}] Matches the empty string, but only when it is
228231
\emph{not} at the beginning or end of a word.
@@ -274,35 +277,42 @@ \subsection{Module Contents}
274277

275278
\begin{itemize}
276279

277-
\item[I ] or IGNORECASE:
278-
Perform case-insensitive matching; expressions like [A-Z] will match
279-
lowercase letters, too.
280+
\item {I or IGNORECASE or \code{(?i)}}
281+
282+
{Perform case-insensitive matching; expressions like \code{[A-Z]} will match
283+
lowercase letters, too. This is not affected by the current locale.
284+
}
285+
\item {L or LOCALE or \code{(?L)}}
280286

281-
\item[L ] or LOCALE:
282-
Make \code{\e w}, \code{\e W}, \code{\e b}, \code{\e B}, dependent on
283-
the current locale.
287+
{Make \code{\e w}, \code{\e W}, \code{\e b},
288+
\code{\e B}, dependent on the current locale.
289+
}
284290

285-
\item[M ] or MULTILINE:
286-
When specified, the pattern character \code{\^} matches at the
287-
beginning of the string and at the beginning of each line (immediately
288-
following each newline); and the pattern character \code{\$} matches
289-
at the end of the string and at the end of each line (immediately
290-
preceding each newline).
291+
\item {M or MULTILINE or \code{(?m)}}
291292

293+
{When specified, the pattern character \code{\^} matches at the
294+
beginning of the string and at the beginning of each line
295+
(immediately following each newline); and the pattern character
296+
\code{\$} matches at the end of the string and at the end of each line
297+
(immediately preceding each newline).
292298
By default, \code{\^} matches only at the beginning of the string, and
293299
\code{\$} only at the end of the string and immediately before the
294300
newline (if any) at the end of the string.
301+
}
302+
303+
\item {S or DOTALL or \code{(?s)}}
304+
305+
{Make the \code{.} special character any character at all, including a
306+
newline; without this flag, \code{.} will match anything \emph{except}
307+
a newline.}
295308

296-
\item[S ] or DOTALL:
297-
Make the \code{.} special character match a newline; without this
298-
flag, \code{.} will match anything \emph{except} a newline.
309+
\item {X or VERBOSE or \code{(?x)}}
299310

300-
\item[X ] or VERBOSE:
301-
When specified, whitespace within the pattern string is ignored except
302-
when in a character class or preceded by an unescaped backslash, and,
303-
when a line contains a \code{\#} not in a character class or preceded
304-
by an unescaped backslash, all characters from the leftmost such
305-
\code{\#} through the end of the line are ignored.
311+
{Ignore whitespace within the pattern
312+
except when in a character class or preceded by an unescaped
313+
backslash, and, when a line contains a \code{\#} neither in a character
314+
class or preceded by an unescaped backslash, all characters from the
315+
leftmost such \code{\#} through the end of the line are ignored. }
306316

307317
\end{itemize}
308318

@@ -319,18 +329,18 @@ \subsection{Module Contents}
319329
result = re.match(pat, str)
320330
\end{verbatim}\ecode
321331
%
322-
but the version using \code{compile()} is more efficient when multiple
323-
regular expressions are used concurrently in a single program.
332+
but the version using \code{compile()} is more efficient when the
333+
expression will be used several times in a single program.
324334
%(The compiled version of the last pattern passed to \code{regex.match()} or
325335
%\code{regex.search()} is cached, so programs that use only a single
326336
%regular expression at a time needn't worry about compiling regular
327337
%expressions.)
328338
\end{funcdesc}
329339

330340
\begin{funcdesc}{escape}{string}
331-
Return \var{string} with all non-alphanumerics backslashed; this is
332-
useful if you want to match some variable string which may have
333-
regular expression metacharacters in it.
341+
Return \var{string} with all non-alphanumerics backslashed; this is
342+
useful if you want to match an arbitrary literal string that may have
343+
regular expression metacharacters in it.
334344
\end{funcdesc}
335345

336346
\begin{funcdesc}{match}{pattern\, string\optional{\, flags}}
@@ -382,9 +392,9 @@ \subsection{Module Contents}
382392
\end{verbatim}\ecode
383393
%
384394
The pattern may be a string or a
385-
regexp object; if you need to specify
386-
regular expression flags, you must use a regexp object, or use
387-
embedded modifiers in a pattern string; e.g.
395+
regex object; if you need to specify
396+
regular expression flags, you must use a regex object, or use
397+
embedded modifiers in a pattern; e.g.
388398
%
389399
\bcode\begin{verbatim}
390400
sub("(?i)b+", "x", "bbbb BBBB") returns 'x x'.
@@ -418,16 +428,14 @@ \subsection{Regular Expression Objects}
418428
\begin{funcdesc}{match}{string\optional{\, pos}\optional{\, endpos}}
419429
If zero or more characters at the beginning of \var{string} match
420430
this regular expression, return a corresponding
421-
\code{Match} object. Return \code{None} if the string does not
431+
\code{MatchObject} instance. Return \code{None} if the string does not
422432
match the pattern; note that this is different from a zero-length
423433
match.
424434

425435
The optional second parameter \var{pos} gives an index in the string
426-
where the search is to start; it defaults to \code{0}. This is not
427-
completely equivalent to slicing the string; the \code{'\^'} pattern
428-
character matches at the real begin of the string and at positions
429-
just after a newline, not necessarily at the index where the search
430-
is to start.
436+
where the search is to start; it defaults to \code{0}. The
437+
\code{'\^'} pattern character will match at the index where the
438+
search is to start.
431439

432440
The optional parameter \var{endpos} limits how far the string will
433441
be searched; it will be as if the string is \var{endpos} characters
@@ -441,8 +449,8 @@ \subsection{Regular Expression Objects}
441449
position in the string matches the pattern; note that this is
442450
different from finding a zero-length match at some point in the string.
443451

444-
The optional \var{pos} and \var{endpos} parameters have the same meaning as for the
445-
\code{match} method.
452+
The optional \var{pos} and \var{endpos} parameters have the same
453+
meaning as for the \code{match} method.
446454
\end{funcdesc}
447455

448456
\begin{funcdesc}{split}{string\, \optional{, maxsplit=0}}
@@ -474,8 +482,8 @@ \subsection{Regular Expression Objects}
474482
The pattern string from which the regex object was compiled.
475483
\end{datadesc}
476484

477-
\subsection{Match Objects}
478-
Match objects support the following methods and attributes:
485+
\subsection{MatchObjects}
486+
\code{Matchobject} instances support the following methods and attributes:
479487

480488
\begin{funcdesc}{start}{group}
481489
\end{funcdesc}
@@ -504,23 +512,28 @@ \subsection{Match Objects}
504512
\code{(None, None)}.
505513
\end{funcdesc}
506514

507-
\begin{funcdesc}{group}{\optional{g1, g2, ...})}
508-
This method is only valid when the last call to the \code{match}
509-
or \code{search} method found a match. It returns one or more
510-
groups of the match. If there is a single \var{index} argument,
511-
the result is a single string; if there are multiple arguments, the
512-
result is a tuple with one item per argument. If the \var{index} is
513-
zero, the corresponding return value is the entire matching string; if
514-
it is in the inclusive range [1..99], it is the string matching the
515-
the corresponding parenthesized group (using the default syntax,
516-
groups are parenthesized using \code{\e (} and \code{\e )}). If no
517-
such group exists, the corresponding result is \code{None}.
515+
\begin{funcdesc}{group}{\optional{g1, g2, ...}}
516+
Returns one or more groups of the match. If there is a single
517+
\var{index} argument, the result is a single string; if there are
518+
multiple arguments, the result is a tuple with one item per argument.
519+
If the \var{index} is zero, the corresponding return value is the
520+
entire matching string; if it is in the inclusive range [1..99], it is
521+
the string matching the the corresponding parenthesized group. If no
522+
such group exists, the corresponding result is
523+
\code{None}.
518524

519525
If the regular expression uses the \code{(?P<\var{name}>...)} syntax,
520526
the \var{index} arguments may also be strings identifying groups by
521527
their group name.
522528
\end{funcdesc}
523529

530+
\begin{funcdesc}{groups}{}
531+
Return a tuple containing all the subgroups of the match, from 1 up to
532+
however many groups are in the pattern. Groups that did not
533+
participate in the match have values of \code{None}. If the tuple
534+
would only be one element long, a string will be returned instead.
535+
\end{funcdesc}
536+
524537
\begin{datadesc}{pos}
525538
The value of \var{pos} which was passed to the
526539
\code{search} or \code{match} function. This is the index into the
@@ -534,8 +547,8 @@ \subsection{Match Objects}
534547
\end{datadesc}
535548

536549
\begin{datadesc}{re}
537-
The regular expression object whose match() or search() method
538-
produced this match object.
550+
The regular expression object whose \code{match()} or \code{search()} method
551+
produced this \code{MatchObject} instance.
539552
\end{datadesc}
540553

541554
\begin{datadesc}{string}
@@ -545,4 +558,3 @@ \subsection{Match Objects}
545558
\begin{seealso}
546559
\seetext Jeffrey Friedl, \emph{Mastering Regular Expressions}.
547560
\end{seealso}
548-

0 commit comments

Comments
 (0)