Nested Character Class Subtraction in Python

Python Server Side Programming Programming

Nested character class subtraction in Python's regular expressions allows us to define complex character sets by removing characters from an existing character class, including another character class by using the '-' operator.

It works from the innermost to outermost character class, and subtractions are evaluated from left to right within the square brackets [ ].

Usually, Python's library, re, doesn't directly support nested character class subtraction like some other regex engines do. We can't write something like [[abc]-[bc]], which means characters a, b, or c, excluding b or c. So we need to install a third-party module like regex (install using pip install regex) instead of re.

For instance, '[0-9-[4-6]]' matches digits 0-9 but excludes 4, 5, and 6, effectively matching 0, 1, 2, 3, 7, 8, and 9.

Character Classes: A character class (defined within square brackets `[]`) represents a set of characters. Ex: '[abc]' matches 'a', 'b', or 'c'.
Character Class Subtraction: Subtraction uses the syntax '[abc--[ab]]'. It means match a character that is in the class '[abc]' but not in the class '[ab]'. In this case, only 'c' would match.
Nested Subtraction: Finding characters in the first string not present in the second. Return a new string containing only those characters.

Matching All Letters Except Vowels (Basic Subtraction)

Let's assume that we want to match any lowercase letter except vowels. It means subtracting [aeiou] from [a-z]. As the regex supports character class subtraction, it might look like [a-z-[aeiou]].

Example

In the following example, we wrote a regular expression that matches a lowercase letter. We used regex.findall() with a pattern that includes a negative lookahead. The program prints all letters from a to z except vowels.

import regex
text = "abcdefghijklmnopqrstuvwxyz"
pattern = r'[a-z--[aeiou]]'

matches = regex.findall(pattern, text)
print(matches)

Following is the output of the above code -

['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']

Matching Lowercase Consonants

To match all lowercase consonants from the given string, use the set of all lowercase letters [\p{Lower}]. Now subtract vowels [aeiou] from this set.

Example

In the following example, [\p{Lower}--[aeiou]] finds all lowercase letters except vowels. The regex.findall() finds all consonants in the text.

import regex

pattern = r'[\p{Lower}--[aeiou]]'
text = "hello world"
matches = regex.findall(pattern, text)
print(matches)

Following is the output of the above code -

['h', 'l', 'l', 'w', 'r', 'l', 'd']

Match digits except '0' and '1'

Matching all the strings except '0' and '1' can be done using \d to match digits. Subtracting '0' and '1' from digits with [\d--[01]].

Example

The following example demonstrates how to extract '0' and '1' from all digits. Here, the regex.findall() method will list the remaining digits.

import regex

pattern = r'[\d--[01]]'
text = "0123456789"
matches = regex.findall(pattern, text)
print(matches)

Following is the output of the above code ?

['2', '3', '4', '5', '6', '7', '8', '9']

SaiKrishna Tavva

Updated on: 2025-05-15T19:42:08+05:30

363 Views

Kickstart Your Career

Get certified by completing the course

Get Started