Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 062ea2e

Browse files
committed
Made a number of revisions suggested by Fredrik Lundh.
Revised the first paragraph so it doesn't sound like it was written when 7-bit strings were assumed; note that Unicode strings can be used.
1 parent e2b7c4d commit 062ea2e

1 file changed

Lines changed: 33 additions & 12 deletions

File tree

Doc/lib/libre.tex

Lines changed: 33 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
11
\section{\module{re} ---
2-
Perl-style regular expression operations.}
2+
Regular expression operations}
33
\declaremodule{standard}{re}
44
\moduleauthor{Andrew M. Kuchling}{[email protected]}
5+
\moduleauthor{Fredrik Lundh}{[email protected]}
56
\sectionauthor{Andrew M. Kuchling}{[email protected]}
67

78

8-
\modulesynopsis{Perl-style regular expression search and match
9-
operations.}
9+
\modulesynopsis{Regular expression search and match operations with a
10+
Perl-style expression syntax.}
1011

1112

1213
This module provides regular expression matching operations similar to
13-
those found in Perl. It's 8-bit clean: the strings being processed
14-
may contain both null bytes and characters whose high bit is set. Regular
15-
expression pattern strings may not contain null bytes, but can specify
16-
the null byte using the \code{\e\var{number}} notation.
17-
Characters with the high bit set may be included. The \module{re}
18-
module is always available.
14+
those found in Perl. Regular expression pattern strings may not
15+
contain null bytes, but can specify the null byte using the
16+
\code{\e\var{number}} notation. Both patterns and strings to be
17+
searched can be Unicode strings as well as 8-bit strings. The
18+
\module{re} module is always available.
1919

2020
Regular expressions use the backslash character (\character{\e}) to
2121
indicate special forms or to allow special characters to be used
@@ -34,6 +34,15 @@ \section{\module{re} ---
3434
Usually patterns will be expressed in Python code using this raw
3535
string notation.
3636

37+
\strong{Implementation note:}
38+
The \module{re}\refstmodindex{pre} module has two distinct
39+
implementations: \module{sre} is the default implementation and
40+
includes Unicode support, but may run into stack limitations for some
41+
patterns. Though this will be fixed for a future release of Python,
42+
the older implementation (without Unicode support) is still available
43+
as the \module{pre}\refstmodindex{pre} module.
44+
45+
3746
\subsection{Regular Expression Syntax \label{re-syntax}}
3847

3948
A regular expression (or RE) specifies a set of strings that matches
@@ -155,9 +164,16 @@ \subsection{Regular Expression Syntax \label{re-syntax}}
155164
will match any character except \character{5}.
156165

157166
\item[\character{|}]\code{A|B}, where A and B can be arbitrary REs,
158-
creates a regular expression that will match either A or B. This can
159-
be used inside groups (see below) as well. To match a literal \character{|},
160-
use \regexp{\e|}, or enclose it inside a character class, as in \regexp{[|]}.
167+
creates a regular expression that will match either A or B. An
168+
arbitrary number of REs can be separated by the \character{|} in this
169+
way. This can be used inside groups (see below) as well. REs
170+
separated by \character{|} are tried from left to right, and the first
171+
one that allows the complete pattern to match is considered the
172+
accepted branch. This means that if \code{A} matches, \code{B} will
173+
never be tested, even if it would produce a longer overall match. In
174+
other words, the \character{|} operator is never greedy. To match a
175+
literal \character{|}, use \regexp{\e|}, or enclose it inside a
176+
character class, as in \regexp{[|]}.
161177

162178
\item[\code{(...)}] Matches whatever regular expression is inside the
163179
parentheses, and indicates the start and end of a group; the contents
@@ -184,6 +200,11 @@ \subsection{Regular Expression Syntax \label{re-syntax}}
184200
include the flags as part of the regular expression, instead of
185201
passing a \var{flag} argument to the \function{compile()} function.
186202

203+
Note that the \regexp{(?x)} flag changes how the expression is parsed.
204+
It should be used first in the expression string, or after one or more
205+
whitespace characters. If there are non-whitespace characters before
206+
the flag, the results are undefined.
207+
187208
\item[\code{(?:...)}] A non-grouping version of regular parentheses.
188209
Matches whatever regular expression is inside the parentheses, but the
189210
substring matched by the

0 commit comments

Comments
 (0)