11\section {\module {re} ---
2- Perl-style regular expression operations. }
2+ Regular expression operations }
33\declaremodule {standard}{re}
44\moduleauthor {Andrew M. Kuchling}{
[email protected] }
5+ \moduleauthor {Fredrik Lundh}{
[email protected] }
56\sectionauthor {Andrew M. Kuchling}{
[email protected] }
67
78
8- \modulesynopsis {Perl-style regular expression search and match
9- operations .}
9+ \modulesynopsis {Regular expression search and match operations with a
10+ Perl-style expression syntax .}
1011
1112
1213This module provides regular expression matching operations similar to
13- those found in Perl. It's 8-bit clean: the strings being processed
14- may contain both null bytes and characters whose high bit is set. Regular
15- expression pattern strings may not contain null bytes, but can specify
16- the null byte using the \code {\e\var {number}} notation.
17- Characters with the high bit set may be included. The \module {re}
18- module is always available.
14+ those found in Perl. Regular expression pattern strings may not
15+ contain null bytes, but can specify the null byte using the
16+ \code {\e\var {number}} notation. Both patterns and strings to be
17+ searched can be Unicode strings as well as 8-bit strings. The
18+ \module {re} module is always available.
1919
2020Regular expressions use the backslash character (\character {\e }) to
2121indicate special forms or to allow special characters to be used
@@ -34,6 +34,15 @@ \section{\module{re} ---
3434Usually patterns will be expressed in Python code using this raw
3535string notation.
3636
37+ \strong {Implementation note:}
38+ The \module {re}\refstmodindex {pre} module has two distinct
39+ implementations: \module {sre} is the default implementation and
40+ includes Unicode support, but may run into stack limitations for some
41+ patterns. Though this will be fixed for a future release of Python,
42+ the older implementation (without Unicode support) is still available
43+ as the \module {pre}\refstmodindex {pre} module.
44+
45+
3746\subsection {Regular Expression Syntax \label {re-syntax } }
3847
3948A regular expression (or RE) specifies a set of strings that matches
@@ -155,9 +164,16 @@ \subsection{Regular Expression Syntax \label{re-syntax}}
155164will match any character except \character {5}.
156165
157166\item [\character {|}]\code {A|B}, where A and B can be arbitrary REs,
158- creates a regular expression that will match either A or B. This can
159- be used inside groups (see below) as well. To match a literal \character {|},
160- use \regexp {\e |}, or enclose it inside a character class, as in \regexp {[|]}.
167+ creates a regular expression that will match either A or B. An
168+ arbitrary number of REs can be separated by the \character {|} in this
169+ way. This can be used inside groups (see below) as well. REs
170+ separated by \character {|} are tried from left to right, and the first
171+ one that allows the complete pattern to match is considered the
172+ accepted branch. This means that if \code {A} matches, \code {B} will
173+ never be tested, even if it would produce a longer overall match. In
174+ other words, the \character {|} operator is never greedy. To match a
175+ literal \character {|}, use \regexp {\e |}, or enclose it inside a
176+ character class, as in \regexp {[|]}.
161177
162178\item [\code {(...)}] Matches whatever regular expression is inside the
163179parentheses, and indicates the start and end of a group; the contents
@@ -184,6 +200,11 @@ \subsection{Regular Expression Syntax \label{re-syntax}}
184200include the flags as part of the regular expression, instead of
185201passing a \var {flag} argument to the \function {compile()} function.
186202
203+ Note that the \regexp {(?x)} flag changes how the expression is parsed.
204+ It should be used first in the expression string, or after one or more
205+ whitespace characters. If there are non-whitespace characters before
206+ the flag, the results are undefined.
207+
187208\item [\code {(?:...)}] A non-grouping version of regular parentheses.
188209Matches whatever regular expression is inside the parentheses, but the
189210substring matched by the
0 commit comments