Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 1f26828

Browse files
committed
Issue #6561: '\d' in a regular expression should match only Unicode
character category [Nd], not [No].
1 parent 6bd13fb commit 1f26828

4 files changed

Lines changed: 32 additions & 6 deletions

File tree

Doc/library/re.rst

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -338,11 +338,12 @@ the second character. For example, ``\$`` matches the character ``'$'``.
338338

339339
``\d``
340340
For Unicode (str) patterns:
341-
Matches any Unicode digit (which includes ``[0-9]``, and also many
342-
other digit characters). If the :const:`ASCII` flag is used only
343-
``[0-9]`` is matched (but the flag affects the entire regular
344-
expression, so in such cases using an explicit ``[0-9]`` may be a
345-
better choice).
341+
Matches any Unicode decimal digit (that is, any character in
342+
Unicode character category [Nd]). This includes ``[0-9]``, and
343+
also many other digit characters. If the :const:`ASCII` flag is
344+
used only ``[0-9]`` is matched (but the flag affects the entire
345+
regular expression, so in such cases using an explicit ``[0-9]``
346+
may be a better choice).
346347
For 8-bit (bytes) patterns:
347348
Matches any decimal digit; this is equivalent to ``[0-9]``.
348349

Lib/test/test_re.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -605,6 +605,27 @@ def test_bug_817234(self):
605605
self.assertEqual(next(iter).span(), (4, 4))
606606
self.assertRaises(StopIteration, next, iter)
607607

608+
def test_bug_6561(self):
609+
# '\d' should match characters in Unicode category 'Nd'
610+
# (Number, Decimal Digit), but not those in 'Nl' (Number,
611+
# Letter) or 'No' (Number, Other).
612+
decimal_digits = [
613+
'\u0037', # '\N{DIGIT SEVEN}', category 'Nd'
614+
'\u0e58', # '\N{THAI DIGIT SIX}', category 'Nd'
615+
'\uff10', # '\N{FULLWIDTH DIGIT ZERO}', category 'Nd'
616+
]
617+
for x in decimal_digits:
618+
self.assertEqual(re.match('^\d$', x).group(0), x)
619+
620+
not_decimal_digits = [
621+
'\u2165', # '\N{ROMAN NUMERAL SIX}', category 'Nl'
622+
'\u3039', # '\N{HANGZHOU NUMERAL TWENTY}', category 'Nl'
623+
'\u2082', # '\N{SUBSCRIPT TWO}', category 'No'
624+
'\u32b4', # '\N{CIRCLED NUMBER THIRTY NINE}', category 'No'
625+
]
626+
for x in not_decimal_digits:
627+
self.assertIsNone(re.match('^\d$', x))
628+
608629
def test_empty_array(self):
609630
# SF buf 1647541
610631
import array

Misc/NEWS

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,10 @@ Library
108108
Extension Modules
109109
-----------------
110110

111+
- Issue #6561: '\d' in a regex now matches only characters with
112+
Unicode category 'Nd' (Number, Decimal Digit). Previously it also
113+
matched characters with category 'No'.
114+
111115
- Issue #4509: Array objects are no longer modified after an operation
112116
failing due to the resize restriction in-place when the object has exported
113117
buffers.

Modules/_sre.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ static unsigned int sre_lower_locale(unsigned int ch)
168168

169169
#if defined(HAVE_UNICODE)
170170

171-
#define SRE_UNI_IS_DIGIT(ch) Py_UNICODE_ISDIGIT((Py_UNICODE)(ch))
171+
#define SRE_UNI_IS_DIGIT(ch) Py_UNICODE_ISDECIMAL((Py_UNICODE)(ch))
172172
#define SRE_UNI_IS_SPACE(ch) Py_UNICODE_ISSPACE((Py_UNICODE)(ch))
173173
#define SRE_UNI_IS_LINEBREAK(ch) Py_UNICODE_ISLINEBREAK((Py_UNICODE)(ch))
174174
#define SRE_UNI_IS_ALNUM(ch) Py_UNICODE_ISALNUM((Py_UNICODE)(ch))

0 commit comments

Comments
 (0)