
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Nested Character Class Subtraction in Python
Nested character class subtraction in Python's regular expressions allows us to define complex character sets by removing characters from an existing character class, including another character class by using the '-' operator.
It works from the innermost to outermost character class, and subtractions are evaluated from left to right within the square brackets [ ].
Usually, Python's library, re, doesn't directly support nested character class subtraction like some other regex engines do. We can't write something like [[abc]-[bc]], which means characters a, b, or c, excluding b or c. So we need to install a third-party module like regex (install using pip install regex) instead of re.
For instance, '[0-9-[4-6]]' matches digits 0-9 but excludes 4, 5, and 6, effectively matching 0, 1, 2, 3, 7, 8, and 9.
- Character Classes: A character class (defined within square brackets `[]`) represents a set of characters. Ex: '[abc]' matches 'a', 'b', or 'c'.
- Character Class Subtraction: Subtraction uses the syntax '[abc--[ab]]'. It means match a character that is in the class '[abc]' but not in the class '[ab]'. In this case, only 'c' would match.
- Nested Subtraction: Finding characters in the first string not present in the second. Return a new string containing only those characters.
Matching All Letters Except Vowels (Basic Subtraction)
Let's assume that we want to match any lowercase letter except vowels. It means subtracting [aeiou] from [a-z]. As the regex supports character class subtraction, it might look like [a-z-[aeiou]].
Example
In the following example, we wrote a regular expression that matches a lowercase letter. We used regex.findall() with a pattern that includes a negative lookahead. The program prints all letters from a to z except vowels.
import regex text = "abcdefghijklmnopqrstuvwxyz" pattern = r'[a-z--[aeiou]]' matches = regex.findall(pattern, text) print(matches)
Following is the output of the above code -
['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']
Matching Lowercase Consonants
To match all lowercase consonants from the given string, use the set of all lowercase letters [\p{Lower}]. Now subtract vowels [aeiou] from this set.
Example
In the following example, [\p{Lower}--[aeiou]] finds all lowercase letters except vowels. The regex.findall() finds all consonants in the text.
import regex pattern = r'[\p{Lower}--[aeiou]]' text = "hello world" matches = regex.findall(pattern, text) print(matches)
Following is the output of the above code -
['h', 'l', 'l', 'w', 'r', 'l', 'd']
Match digits except '0' and '1'
Matching all the strings except '0' and '1' can be done using \d to match digits. Subtracting '0' and '1' from digits with [\d--[01]].
Example
The following example demonstrates how to extract '0' and '1' from all digits. Here, the regex.findall() method will list the remaining digits.
import regex pattern = r'[\d--[01]]' text = "0123456789" matches = regex.findall(pattern, text) print(matches)
Following is the output of the above code ?
['2', '3', '4', '5', '6', '7', '8', '9']