@@ -33,8 +33,7 @@ \section{\module{re} ---
3333Usually patterns will be expressed in Python code using this raw
3434string notation.
3535
36- \subsection {Regular Expression Syntax }
37- \label {re-syntax }
36+ \subsection {Regular Expression Syntax \label {re-syntax } }
3837
3938A regular expression (or RE) specifies a set of strings that matches
4039it; the functions in this module let you check if a particular string
@@ -70,29 +69,31 @@ \subsection{Regular Expression Syntax}
7069% define these since they're used twice:
7170\newcommand {\MyLeftMargin }{0.7in}
7271\newcommand {\MyLabelWidth }{0.65in}
72+
7373\begin {list }{}{\leftmargin \MyLeftMargin \labelwidth \MyLabelWidth }
74+
7475\item [\character {.}] (Dot.) In the default mode, this matches any
7576character except a newline. If the \constant {DOTALL} flag has been
7677specified, this matches any character including a newline.
77- %
78+
7879\item [\character {\^ }] (Caret.) Matches the start of the string, and in
7980\constant {MULTILINE} mode also matches immediately after each newline.
80- %
81+
8182\item [\character {\$ }] Matches the end of the string, and in
8283\constant {MULTILINE} mode also matches before a newline.
8384\regexp {foo} matches both 'foo' and 'foobar' , while the regular
8485expression \regexp {foo\$ } matches only 'foo' .
85- %
86+
8687\item [\character {*}] Causes the resulting RE to
8788match 0 or more repetitions of the preceding RE, as many repetitions
8889as are possible. \regexp {ab*} will
8990match 'a' , 'ab' , or 'a' followed by any number of 'b' s.
90- %
91+
9192\item [\character {+}] Causes the
9293resulting RE to match 1 or more repetitions of the preceding RE.
9394\regexp {ab+} will match 'a' followed by any non-zero number of 'b' s; it
9495will not match just 'a' .
95- %
96+
9697\item [\character {?}] Causes the resulting RE to
9798match 0 or 1 repetitions of the preceding RE. \regexp {ab?} will
9899match either 'a' or 'ab' .
@@ -105,24 +106,26 @@ \subsection{Regular Expression Syntax}
105106\dfn {non-greedy} or \dfn {minimal} fashion; as \emph {few } characters as
106107possible will be matched. Using \regexp {.*?} in the previous
107108expression will match only \code {'<H1>'}.
108- %
109+
109110\item [\code {\{ \var {m},\var {n}\} }] Causes the resulting RE to match from
110111\var {m} to \var {n} repetitions of the preceding RE, attempting to
111112match as many repetitions as possible. For example, \regexp {a\{ 3,5\} }
112113will match from 3 to 5 \character {a} characters. Omitting \var {m} is the same
113114as specifying 0 for the lower bound; omitting \var {n} specifies an
114115infinite upper bound.
115- %
116+
116117\item [\code {\{ \var {m},\var {n}\} ?}] Causes the resulting RE to
117118match from \var {m} to \var {n} repetitions of the preceding RE,
118119attempting to match as \emph {few } repetitions as possible. This is
119120the non-greedy version of the previous qualifier. For example, on the
120- 6-character string \code {'aaaaaa'}, \regexp {a\{ 3,5\} } will match 5 \character {a}
121- characters, while \regexp {a\{ 3,5\} ?} will only match 3 characters.
122- %
123- \item [\character {\e }] Either escapes special characters (permitting you to match
124- characters like \character {*}, \character {?}, and so forth), or
125- signals a special sequence; special sequences are discussed below.
121+ 6-character string \code {'aaaaaa'}, \regexp {a\{ 3,5\} } will match 5
122+ \character {a} characters, while \regexp {a\{ 3,5\} ?} will only match 3
123+ characters.
124+
125+ \item [\character {\e }] Either escapes special characters (permitting
126+ you to match characters like \character {*}, \character {?}, and so
127+ forth), or signals a special sequence; special sequences are discussed
128+ below.
126129
127130If you're not using a raw string to
128131express the pattern, remember that Python also uses the
@@ -133,7 +136,7 @@ \subsection{Regular Expression Syntax}
133136be repeated twice. This is complicated and hard to understand, so
134137it's highly recommended that you use raw strings for all but the
135138simplest expressions.
136- %
139+
137140\item [\code {[]}] Used to indicate a set of characters. Characters can
138141be listed individually, or a range of characters can be indicated by
139142giving two characters and separating them by a \character {-}. Special
@@ -153,42 +156,41 @@ \subsection{Regular Expression Syntax}
153156simply match the \character {\^ } character. For example, \regexp {[\^ 5]}
154157will match any character except \character {5}.
155158
156- %
157159\item [\character {|}]\code {A|B}, where A and B can be arbitrary REs,
158160creates a regular expression that will match either A or B. This can
159161be used inside groups (see below) as well. To match a literal \character {|},
160162use \regexp {\e |}, or enclose it inside a character class, as in \regexp {[|]}.
161- %
163+
162164\item [\code {(...)}] Matches whatever regular expression is inside the
163165parentheses, and indicates the start and end of a group; the contents
164166of a group can be retrieved after a match has been performed, and can
165167be matched later in the string with the \regexp {\e \var {number}} special
166- sequence, described below. To match the literals \character {(} or \character {')},
167- use \regexp {\e (} or \regexp {\e )}, or enclose them inside a character
168- class: \regexp {[(] [)]}.
169- %
170- \item [\code {(?...)}] This is an extension notation (a \character {?} following a
171- \character {(} is not meaningful otherwise). The first character after
172- the \character {?}
168+ sequence, described below. To match the literals \character {(} or
169+ \character {')}, use \regexp {\e (} or \regexp {\e )}, or enclose them
170+ inside a character class: \regexp {[(] [)]}.
171+
172+ \item [\code {(?...)}] This is an extension notation (a \character {?}
173+ following a \character {(} is not meaningful otherwise). The first
174+ character after the \character {?}
173175determines what the meaning and further syntax of the construct is.
174176Extensions usually do not create a new group;
175177\regexp {(?P<\var {name}>...)} is the only exception to this rule.
176178Following are the currently supported extensions.
177- %
179+
178180\item [\code {(?iLmsx)}] (One or more letters from the set \character {i},
179181\character {L}, \character {m}, \character {s}, \character {x}.) The group matches
180182the empty string; the letters set the corresponding flags
181183(\constant {re.I}, \constant {re.L}, \constant {re.M}, \constant {re.S},
182184\constant {re.X}) for the entire regular expression. This is useful if
183185you wish to include the flags as part of the regular expression, instead
184186of passing a \var {flag} argument to the \function {compile()} function.
185- %
187+
186188\item [\code {(?:...)}] A non-grouping version of regular parentheses.
187189Matches whatever regular expression is inside the parentheses, but the
188190substring matched by the
189191group \emph {cannot } be retrieved after performing a match or
190192referenced later in the pattern.
191- %
193+
192194\item [\code {(?P<\var {name}>...)}] Similar to regular parentheses, but
193195the substring matched by the group is accessible via the symbolic group
194196name \var {name}. Group names must be valid Python identifiers. A
@@ -201,18 +203,18 @@ \subsection{Regular Expression Syntax}
201203name in arguments to methods of match objects, such as \code {m.group('id')}
202204or \code {m.end('id')}, and also by name in pattern text
203205(e.g. \regexp {(?P=id)}) and replacement text (e.g. \code {\e g<id>}).
204- %
206+
205207\item [\code {(?P=\var {name})}] Matches whatever text was matched by the
206208earlier group named \var {name}.
207- %
209+
208210\item [\code {(?\# ...)}] A comment; the contents of the parentheses are
209211simply ignored.
210- %
212+
211213\item [\code {(?=...)}] Matches if \regexp {...} matches next, but doesn't
212214consume any of the string. This is called a lookahead assertion. For
213215example, \regexp {Isaac (?=Asimov)} will match \code {'Isaac~'} only if it's
214216followed by \code {'Asimov'}.
215- %
217+
216218\item [\code {(?!...)}] Matches if \regexp {...} doesn't match next. This
217219is a negative lookahead assertion. For example,
218220\regexp {Isaac (?!Asimov)} will match \code {'Isaac~'} only if it's \emph {not }
@@ -474,8 +476,7 @@ \subsection{Module Contents}
474476\end {excdesc }
475477
476478
477- \subsection {Regular Expression Objects }
478- \label {re-objects }
479+ \subsection {Regular Expression Objects \label {re-objects } }
479480
480481Compiled regular expression objects support the following methods and
481482attributes:
@@ -547,8 +548,7 @@ \subsection{Regular Expression Objects}
547548\end {memberdesc }
548549
549550
550- \subsection {Match Objects }
551- \label {match-objects }
551+ \subsection {Match Objects \label {match-objects } }
552552
553553\class {MatchObject} instances support the following methods and attributes:
554554
0 commit comments