Python - by Sanjay Mate , IICMR-MCA 1
A regular expression is a powerful tool for matching
text, based on a pre-defined pattern. It can detect the
presence or absence of a text by matching with a
particular pattern, and also can split a pattern into one
or more sub-patterns
Regular expressions use a sequence of characters and
symbols to define a pattern of text.
Regular expressions are useful for finding phone
numbers, email addresses, dates, and any other data
that has a consistent format
Python - by Sanjay Mate , IICMR-MCA 2
Python standard library provides a re module for
regular expressions. Its primary function is to offer a
search, where it takes a regular expression and a
string.
Python - by Sanjay Mate , IICMR-MCA 3
A Regular Expression (RegEx) is a sequence of
characters that defines a search pattern. For example,
^a...s$
Any five letter string starting with
a and ending with s
Python - by Sanjay Mate , IICMR-MCA 4
Python - by Sanjay Mate , IICMR-MCA 5
Python - by Sanjay Mate , IICMR-MCA 6
Raw string
The raw string is slightly different from a regular
string, it won’t interpret the \ character as an escape
character.
Python - by Sanjay Mate , IICMR-MCA 7
Some characters are metacharacters, also called as
special characters, and don’t match themselves.
Python - by Sanjay Mate , IICMR-MCA 8
Metacharacter [ ]
used for specifying a character class, Characters can be
listed individually, or a range of characters can be
indicated by giving two characters and separating them
by a '-'.
Python - by Sanjay Mate , IICMR-MCA 9
^ symbol in [ ]
Python - by Sanjay Mate , IICMR-MCA 10
[]
Python - by Sanjay Mate , IICMR-MCA 11
. dot
Python - by Sanjay Mate , IICMR-MCA 12
^ Caret
Python - by Sanjay Mate , IICMR-MCA 13
$ dollar
Python - by Sanjay Mate , IICMR-MCA 14
* star
Python - by Sanjay Mate , IICMR-MCA 15
+ plus
Python - by Sanjay Mate , IICMR-MCA 16
{ } Braces
Python - by Sanjay Mate , IICMR-MCA 17
{ } Braces
Python - by Sanjay Mate , IICMR-MCA 18
| Alternation
Python - by Sanjay Mate , IICMR-MCA 19
( ) Group
Grouping constructs break up a regex in
Python into sub expressions or groups. This
serves two purposes:
◦ Grouping: A group represents a single syntactic
entity. Additional metacharacters apply to the entire
group as a unit.
◦ Capturing: Some grouping constructs also capture
the portion of the search string that matches the
subexpression in the group. You can retrieve
captured matches later through several different
mechanisms.
Python - by Sanjay Mate , IICMR-MCA 20
( ) Group
Python - by Sanjay Mate , IICMR-MCA 21
\ slash
Backlash \ is used to escape various
characters including all metacharacters.
\$a match if a string contains $ followed
by a. Here, $ is not interpreted by a RegEx
engine in a special way.
Python - by Sanjay Mate , IICMR-MCA 22
The following list of special sequences
Python - by Sanjay Mate , IICMR-MCA 23
\A - Matches if the specified characters are at the
start of a string.
Python - by Sanjay Mate , IICMR-MCA 24
\b - Matches if the specified characters are at the
beginning or end of a word.
Python - by Sanjay Mate , IICMR-MCA 25
\B - Opposite of \b. Matches if the specified
characters are not at the beginning or end of a word..
Python - by Sanjay Mate , IICMR-MCA 26
\d - Matches any decimal digit. Equivalent to [0-9]
Python - by Sanjay Mate , IICMR-MCA 27
\D - Matches any non-decimal digit. Equivalent
to [^0-9]
Python - by Sanjay Mate , IICMR-MCA 28
\s - Matches where a string contains any whitespace
character. Equivalent to [ \t\n\r\f\v]
Python - by Sanjay Mate , IICMR-MCA 29
\S - Matches where a string contains any non-
whitespace character. Equivalent to [^ \t\n\r\f\v].
Python - by Sanjay Mate , IICMR-MCA 30
\w - Matches any alphanumeric character (digits and
alphabets). Equivalent to [a-zA-Z0-9_].
underscore _ is also considered an alphanumeric
character.
Python - by Sanjay Mate , IICMR-MCA 31
\W - Matches any non-alphanumeric character.
Equivalent to [^a-zA-Z0-9_]
Python - by Sanjay Mate , IICMR-MCA 32
\Z - Matches if the specified characters are at
the end of a string.
Python - by Sanjay Mate , IICMR-MCA 33
Python has a module named re to work with
regular expressions. To use it, we need to
import the module.
re.findall()
The re.findall() method returns a list of strings
containing all matches.
Python - by Sanjay Mate , IICMR-MCA 34
re.findall()
Python - by Sanjay Mate , IICMR-MCA 35
re.split()
The re.split method splits the string where there
is a match and returns a list of strings where the
splits have occurred.
Python - by Sanjay Mate , IICMR-MCA 36
re.split()
Python - by Sanjay Mate , IICMR-MCA 37
re.split()
You can pass maxsplit argument to
the re.split() method. It's the maximum number
of splits that will occur.
Python - by Sanjay Mate , IICMR-MCA 38
re.sub()
The method returns a string where matched
occurrences are replaced with the content
of replace variable.
Python - by Sanjay Mate , IICMR-MCA 39
re.sub()
The method returns a string where matched
occurrences are replaced with the content
of replace variable.
Python - by Sanjay Mate , IICMR-MCA 40
re.sub()
You can pass count as a fourth parameter to
the re.sub() method. If omited, it results to 0.
This will replace all occurrences.
Python - by Sanjay Mate , IICMR-MCA 41
re.subn()
The re.subn() is similar to re.sub() except it
returns a tuple of 2 items containing the new
string and the number of substitutions made.
Python - by Sanjay Mate , IICMR-MCA 44
re.search()
The re.search() method takes two arguments: a
pattern and a string. The method looks for the
first location where the RegEx pattern produces
a match with the string..
If the search is successful, re.search() returns a
match object; if not, it returns None
match = re.search(pattern, str).
Python - by Sanjay Mate , IICMR-MCA 45
re.search()
match = re.search(pattern, str).
Python - by Sanjay Mate , IICMR-MCA 46
re.fullmatch()
Unlike the match() method, which performs the
pattern matching only at the beginning of the
string, the re.fullmatch method returns a match
object if and only if the entire target string from
the first to the last character matches the
regular expression pattern.
Python - by Sanjay Mate , IICMR-MCA 47
re.fullmatch()
Python - by Sanjay Mate , IICMR-MCA 48
whenever we found a match to the regex
pattern, Python returns us the Match object
Python - by Sanjay Mate , IICMR-MCA 49
match.group()
The group() method returns the part of the
string where there is a match.
Python - by Sanjay Mate , IICMR-MCA 50
match.group()
Python - by Sanjay Mate , IICMR-MCA 51
match.start(), match.end() and match.span()
The start() function returns the index of the start
of the matched substring
end() returns the end index of the matched
substring
Python - by Sanjay Mate , IICMR-MCA 52
match.re and match.string
The re attribute of a matched object returns a
regular expression object.
Similarly, string attribute returns the passed
string.
Python - by Sanjay Mate , IICMR-MCA 53
Python - by Sanjay Mate , IICMR-MCA 54
Python regex allows optional flags to specify
when using regular expression patterns
with match(), search(), and split(), among others.
IGNORECASE flag
◦ which stands for ignoring a case. specified this flag in
the regex method as an argument to perform case
insensitive matching.
Python - by Sanjay Mate , IICMR-MCA 55
IGNORECASE flag
◦ re.I
◦ re.IGNORECASE
Python - by Sanjay Mate , IICMR-MCA 56
DOTALL flag
◦ By default, the dot(.) metacharacter inside the regular
expression pattern represents any character, be it a
letter, digit, symbol, or a punctuation mark, except the
new line character, which is \n.
Python - by Sanjay Mate , IICMR-MCA 57
DOTALL flag
◦ re.S
◦ re.DOTALL
Python - by Sanjay Mate , IICMR-MCA 58
VERBOSE flag
◦ That re.X flag stands for verbose. This flag allows more
flexibility and better formatting when writing more
complex regex patterns between the parentheses of
the match(), search(), or other regex methods.
Python - by Sanjay Mate , IICMR-MCA 59
VERBOSE flag
◦ re.X
◦ re.VERBOSE
Python - by Sanjay Mate , IICMR-MCA 60
MULTILINE flag
◦ The re.M flag is used as an argument inside the regex
method to perform a match inside a multiline block of
text.
re.M
re.MULTILINE
Python - by Sanjay Mate , IICMR-MCA 61
MULTILINE flag
◦ his flag is used with metacharacter ^ and $.
The caret (^)matches a pattern only at the
beginning of the string
The dollar ($) matches the regular expression
pattern at the end of the string
When this flag is specified, the pattern
character ^ matches at the beginning of the string and
each newline’s start (\n). And the metacharacter
character $ match at the end of the string and the end of
each newline (\n).
Python - by Sanjay Mate , IICMR-MCA 62
MULTILINE flag
Python - by Sanjay Mate , IICMR-MCA 63
ASCII flag
regex \w, \W, \b, \B, \d, \D, \s and \S perform
ASCII-only matching instead of full Unicode
matching. This is only meaningful for Unicode
patterns and is ignored for byte patterns.
Python - by Sanjay Mate , IICMR-MCA 64
ASCII flag
◦ re.A
◦ re.ASCII
Python - by Sanjay Mate , IICMR-MCA 65
Python - by Sanjay Mate , IICMR-MCA 66
Python - by Sanjay Mate , IICMR-MCA 67