@@ -4,9 +4,7 @@ \section{Built-in Module \sectcode{re}}
44\bimodindex {re}
55
66% XXX Remove before 1.5final release.
7- {\large\bf The \code {re} module is still in the process of being
8- developed, and more features will be added in future 1.5 alphas and
9- betas. This documentation is also preliminary and incomplete. If you
7+ {\large\bf This documentation is also preliminary and incomplete. If you
108find a bug or documentation error, or just find something unclear,
119please send a message to
1210\code {
[email protected] }, and we'll fix it.}
@@ -53,7 +51,7 @@ \section{Built-in Module \sectcode{re}}
5351% Similarly, a backslash followed by a digit 0-7 should be doubled to
5452% avoid interpretation as an octal escape.
5553
56- \subsection {Regular Expressions }
54+ \subsection {Regular Expression Syntax }
5755
5856A regular expression (or RE) specifies a set of strings that matches
5957it; the functions in this module let you check if a particular string
@@ -92,9 +90,10 @@ \subsection{Regular Expressions}
9290specified, this matches any character including a newline.
9391\item [\code {\^ }] (Caret.) Matches the start of the string, and in
9492\code {MULTILINE} mode also immediately after each newline.
95- \item [\code {\$ }] Matches the end of the string.
93+ \item [\code {\$ }] Matches the end of the string, and in
94+ \code {MULTILINE} mode also matches before a newline.
9695\code {foo} matches both 'foo' and 'foobar' , while the regular
97- expression ' \code{foo\$}' matches only 'foo' .
96+ expression \code {foo\$ } matches only 'foo' .
9897%
9998\item [\code {*}] Causes the resulting RE to
10099match 0 or more repetitions of the preceding RE, as many repetitions
@@ -130,17 +129,18 @@ \subsection{Regular Expressions}
130129subsequent character are included in the resulting string. However,
131130if Python would recognize the resulting sequence, the backslash should
132131be repeated twice. This is complicated and hard to understand, so
133- it's highly recommended that you use raw strings.
132+ it's highly recommended that you use raw strings for all but the simplest expressions .
134133%
135134\item [\code {[]}] Used to indicate a set of characters. Characters can
136- be listed individually, or a range is indicated by giving two
137- characters and separating them by a '-' . Special characters are not
138- active inside sets. For example, \code {[akm\$ ]} will match any of the
139- characters 'a' , 'k' , 'm' , or '\$' ; \code {[a-z]} will match any
140- lowercase letter and \code {[a-zA-Z0-9]} matches any letter or digit.
141- Character classes of the form \code {\e \var {X}} defined below are also acceptable.
142- If you want to include a \code {]} or a \code {-} inside a
143- set, precede it with a backslash.
135+ be listed individually, or a range of characters can be indicated by
136+ giving two characters and separating them by a '-' . Special
137+ characters are not active inside sets. For example, \code {[akm\$ ]}
138+ will match any of the characters 'a' , 'k' , 'm' , or '\$' ; \code {[a-z]}
139+ will match any lowercase letter and \code {[a-zA-Z0-9]} matches any
140+ letter or digit. Character classes such as \code {\e w} or \code {\e
141+ S} (defined below) are also acceptable inside a range. If you want to
142+ include a \code {]} or a \code {-} inside a set, precede it with a
143+ backslash.
144144
145145Characters \emph {not } within a range can be matched by including a
146146\code {\^ } as the first character of the set; \code {\^ } elsewhere will
@@ -151,11 +151,11 @@ \subsection{Regular Expressions}
151151be used inside groups (see below) as well. To match a literal '|' ,
152152use \code {\e |}, or enclose it inside a character class, like \code {[|]}.
153153%
154- \item [\code {(...)}] Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the
155- contents of a group can be retrieved after a match has been performed,
156- and can be matched later in the string with the
157- \code {\e \var {number}} special sequence, described below. To match the
158- literals '(' or ')' ,
154+ \item [\code {(...)}] Matches whatever regular expression is inside the
155+ parentheses, and indicates the start and end of a group; the contents
156+ of a group can be retrieved after a match has been performed, and can
157+ be matched later in the string with the \code {\e \var {number}} special
158+ sequence, described below. To match the literals '(' or ')' ,
159159use \code {\e (} or \code {\e )}, or enclose them inside a character
160160class: \code {[(] [)]}.
161161%
@@ -167,9 +167,9 @@ \subsection{Regular Expressions}
167167\item [\code {(?iLmsx)}] (One or more letters from the set 'i' , 'L' , 'm' , 's' ,
168168'x' .) The group matches the empty string; the letters set the
169169corresponding flags (re.I, re.L, re.M, re.S, re.X) for the entire regular
170- expression. (The flag 'L' is uppercase because it is not in standard Perl.)
171- This is useful if you wish include the flags as part of the regular
172- expression, instead of passing a \var {flag} argument to the \code {compile} function.
170+ expression. This is useful if you wish include the flags as part of
171+ the regular expression, instead of passing a \var {flag} argument to
172+ the \code {compile} function.
173173%
174174\item [\code {(?:...)}] A non-grouping version of regular parentheses.
175175Matches whatever's inside the parentheses, but the text matched by the
@@ -183,12 +183,14 @@ \subsection{Regular Expressions}
183183named. So the group named 'id' in the example above can also be
184184referenced as the numbered group 1.
185185
186- For example, if the pattern string is
187- \code {r' (?P<id>[a-zA-Z_]\e w*)' }, the group can be referenced by its
186+ For example, if the pattern is
187+ \code {(?P<id>[a-zA-Z_]\e w*)}, the group can be referenced by its
188188name in arguments to methods of match objects, such as \code {m.group('id')}
189189or \code {m.end('id')}, and also by name in pattern text (e.g. \code {(?P=id)}) and
190190replacement text (e.g. \code {\e g<id>}).
191191%
192+ \item [\code {(?P=\var {name})}] Matches whatever text was matched by the earlier group named \var {name}.
193+ %
192194\item [\code {(?\# ...)}] A comment; the contents of the parentheses are simply ignored.
193195%
194196\item [\code {(?=...)}] Matches if \code {...} matches next, but doesn't consume any of the string. This is called a lookahead assertion. For example,
@@ -203,8 +205,7 @@ \subsection{Regular Expressions}
203205The special sequences consist of '\code{\e}' and a character from the
204206list below. If the ordinary character is not on the list, then the
205207resulting RE will match the second character. For example,
206- \code {\e \$ } matches the character '\$' . Ones where the backslash
207- should be doubled are indicated.
208+ \code {\e \$ } matches the character '\$' .
208209
209210\begin {itemize }
210211
@@ -222,7 +223,9 @@ \subsection{Regular Expressions}
222223\item [\code {\e b}] Matches the empty string, but only at the
223224beginning or end of a word. A word is defined as a sequence of
224225alphanumeric characters, so the end of a word is indicated by
225- whitespace or a non-alphanumeric character.
226+ whitespace or a non-alphanumeric character. Inside a character range,
227+ \code {\e b} represents the backspace character, for compatibility with
228+ Python's string literals.
226229%
227230\item [\code {\e B}] Matches the empty string, but only when it is
228231\emph {not } at the beginning or end of a word.
@@ -274,35 +277,42 @@ \subsection{Module Contents}
274277
275278\begin {itemize }
276279
277- \item [I ] or IGNORECASE:
278- Perform case-insensitive matching; expressions like [A-Z] will match
279- lowercase letters, too.
280+ \item {I or IGNORECASE or \code {(?i)}}
281+
282+ {Perform case-insensitive matching; expressions like \code {[A-Z]} will match
283+ lowercase letters, too. This is not affected by the current locale.
284+ }
285+ \item {L or LOCALE or \code {(?L)}}
280286
281- \item [L ] or LOCALE:
282- Make \code {\e w }, \code { \e W}, \code { \e b}, \code { \e B}, dependent on
283- the current locale.
287+ {Make \code { \e w}, \code { \e W}, \code { \e b},
288+ \code {\e B }, dependent on the current locale.
289+ }
284290
285- \item [M ] or MULTILINE:
286- When specified, the pattern character \code {\^ } matches at the
287- beginning of the string and at the beginning of each line (immediately
288- following each newline); and the pattern character \code {\$ } matches
289- at the end of the string and at the end of each line (immediately
290- preceding each newline).
291+ \item {M or MULTILINE or \code {(?m)}}
291292
293+ {When specified, the pattern character \code {\^ } matches at the
294+ beginning of the string and at the beginning of each line
295+ (immediately following each newline); and the pattern character
296+ \code {\$ } matches at the end of the string and at the end of each line
297+ (immediately preceding each newline).
292298By default, \code {\^ } matches only at the beginning of the string, and
293299\code {\$ } only at the end of the string and immediately before the
294300newline (if any) at the end of the string.
301+ }
302+
303+ \item {S or DOTALL or \code {(?s)}}
304+
305+ {Make the \code {.} special character any character at all, including a
306+ newline; without this flag, \code {.} will match anything \emph {except }
307+ a newline.}
295308
296- \item [S ] or DOTALL:
297- Make the \code {.} special character match a newline; without this
298- flag, \code {.} will match anything \emph {except } a newline.
309+ \item {X or VERBOSE or \code {(?x)}}
299310
300- \item [X ] or VERBOSE:
301- When specified, whitespace within the pattern string is ignored except
302- when in a character class or preceded by an unescaped backslash, and,
303- when a line contains a \code {\# } not in a character class or preceded
304- by an unescaped backslash, all characters from the leftmost such
305- \code {\# } through the end of the line are ignored.
311+ {Ignore whitespace within the pattern
312+ except when in a character class or preceded by an unescaped
313+ backslash, and, when a line contains a \code {\# } neither in a character
314+ class or preceded by an unescaped backslash, all characters from the
315+ leftmost such \code {\# } through the end of the line are ignored. }
306316
307317\end {itemize }
308318
@@ -319,18 +329,18 @@ \subsection{Module Contents}
319329result = re.match(pat, str)
320330\end {verbatim }\ecode
321331%
322- but the version using \code {compile()} is more efficient when multiple
323- regular expressions are used concurrently in a single program.
332+ but the version using \code {compile()} is more efficient when the
333+ expression will be used several times in a single program.
324334% (The compiled version of the last pattern passed to \code{regex.match()} or
325335% \code{regex.search()} is cached, so programs that use only a single
326336% regular expression at a time needn't worry about compiling regular
327337% expressions.)
328338\end {funcdesc }
329339
330340\begin {funcdesc }{escape}{string}
331- Return \var {string} with all non-alphanumerics backslashed; this is
332- useful if you want to match some variable string which may have
333- regular expression metacharacters in it.
341+ Return \var {string} with all non-alphanumerics backslashed; this is
342+ useful if you want to match an arbitrary literal string that may have
343+ regular expression metacharacters in it.
334344\end {funcdesc }
335345
336346\begin {funcdesc }{match}{pattern\, string\optional {\, flags}}
@@ -382,9 +392,9 @@ \subsection{Module Contents}
382392\end {verbatim }\ecode
383393%
384394The pattern may be a string or a
385- regexp object; if you need to specify
386- regular expression flags, you must use a regexp object, or use
387- embedded modifiers in a pattern string ; e.g.
395+ regex object; if you need to specify
396+ regular expression flags, you must use a regex object, or use
397+ embedded modifiers in a pattern; e.g.
388398%
389399\bcode \begin {verbatim }
390400sub("(?i)b+", "x", "bbbb BBBB") returns 'x x'.
@@ -418,16 +428,14 @@ \subsection{Regular Expression Objects}
418428\begin {funcdesc }{match}{string\optional {\, pos}\optional {\, endpos}}
419429 If zero or more characters at the beginning of \var {string} match
420430 this regular expression, return a corresponding
421- \code {Match} object . Return \code {None} if the string does not
431+ \code {MatchObject} instance . Return \code {None} if the string does not
422432 match the pattern; note that this is different from a zero-length
423433 match.
424434
425435 The optional second parameter \var {pos} gives an index in the string
426- where the search is to start; it defaults to \code {0}. This is not
427- completely equivalent to slicing the string; the \code {'\^ '} pattern
428- character matches at the real begin of the string and at positions
429- just after a newline, not necessarily at the index where the search
430- is to start.
436+ where the search is to start; it defaults to \code {0}. The
437+ \code {'\^ '} pattern character will match at the index where the
438+ search is to start.
431439
432440 The optional parameter \var {endpos} limits how far the string will
433441 be searched; it will be as if the string is \var {endpos} characters
@@ -441,8 +449,8 @@ \subsection{Regular Expression Objects}
441449 position in the string matches the pattern; note that this is
442450 different from finding a zero-length match at some point in the string.
443451
444- The optional \var {pos} and \var {endpos} parameters have the same meaning as for the
445- \code {match} method.
452+ The optional \var {pos} and \var {endpos} parameters have the same
453+ meaning as for the \code {match} method.
446454\end {funcdesc }
447455
448456\begin {funcdesc }{split}{string\, \optional {, maxsplit=0}}
@@ -474,8 +482,8 @@ \subsection{Regular Expression Objects}
474482The pattern string from which the regex object was compiled.
475483\end {datadesc }
476484
477- \subsection {Match Objects }
478- Match objects support the following methods and attributes:
485+ \subsection {MatchObjects }
486+ \code {Matchobject} instances support the following methods and attributes:
479487
480488\begin {funcdesc }{start}{group}
481489\end {funcdesc }
@@ -504,23 +512,28 @@ \subsection{Match Objects}
504512\code {(None, None)}.
505513\end {funcdesc }
506514
507- \begin {funcdesc }{group}{\optional {g1, g2, ...})}
508- This method is only valid when the last call to the \code {match}
509- or \code {search} method found a match. It returns one or more
510- groups of the match. If there is a single \var {index} argument,
511- the result is a single string; if there are multiple arguments, the
512- result is a tuple with one item per argument. If the \var {index} is
513- zero, the corresponding return value is the entire matching string; if
514- it is in the inclusive range [1..99], it is the string matching the
515- the corresponding parenthesized group (using the default syntax,
516- groups are parenthesized using \code {\e (} and \code {\e )}). If no
517- such group exists, the corresponding result is \code {None}.
515+ \begin {funcdesc }{group}{\optional {g1, g2, ...}}
516+ Returns one or more groups of the match. If there is a single
517+ \var {index} argument, the result is a single string; if there are
518+ multiple arguments, the result is a tuple with one item per argument.
519+ If the \var {index} is zero, the corresponding return value is the
520+ entire matching string; if it is in the inclusive range [1..99], it is
521+ the string matching the the corresponding parenthesized group. If no
522+ such group exists, the corresponding result is
523+ \code {None}.
518524
519525If the regular expression uses the \code {(?P<\var {name}>...)} syntax,
520526the \var {index} arguments may also be strings identifying groups by
521527their group name.
522528\end {funcdesc }
523529
530+ \begin {funcdesc }{groups}{}
531+ Return a tuple containing all the subgroups of the match, from 1 up to
532+ however many groups are in the pattern. Groups that did not
533+ participate in the match have values of \code {None}. If the tuple
534+ would only be one element long, a string will be returned instead.
535+ \end {funcdesc }
536+
524537\begin {datadesc }{pos}
525538The value of \var {pos} which was passed to the
526539\code {search} or \code {match} function. This is the index into the
@@ -534,8 +547,8 @@ \subsection{Match Objects}
534547\end {datadesc }
535548
536549\begin {datadesc }{re}
537- The regular expression object whose match() or search() method
538- produced this match object .
550+ The regular expression object whose \code { match()} or \code { search()} method
551+ produced this \code {MatchObject} instance .
539552\end {datadesc }
540553
541554\begin {datadesc }{string}
@@ -545,4 +558,3 @@ \subsection{Match Objects}
545558\begin {seealso }
546559\seetext Jeffrey Friedl, \emph {Mastering Regular Expressions }.
547560\end {seealso }
548-
0 commit comments