Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit b22273e

Browse files
authored
bpo-31672: Fix string.Template accidentally matched non-ASCII identifiers (GH-3872)
Pattern `[a-z]` with `IGNORECASE` flag can match to some non-ASCII characters. Straightforward solution for this is using `IGNORECASE | ASCII` flag. But users may subclass `Template` and override only `idpattern`. So we want to avoid changing `Template.flags`. So this commit uses local flag `-i` for `idpattern` and change `[a-z]` to `[a-zA-Z]`.
1 parent 9255104 commit b22273e

File tree

4 files changed

+24
-3
lines changed

4 files changed

+24
-3
lines changed

Doc/library/string.rst

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -755,8 +755,17 @@ attributes:
755755

756756
* *idpattern* -- This is the regular expression describing the pattern for
757757
non-braced placeholders. The default value is the regular expression
758-
``[_a-z][_a-z0-9]*``. If this is given and *braceidpattern* is ``None``
759-
this pattern will also apply to braced placeholders.
758+
``(?-i:[_a-zA-Z][_a-zA-Z0-9]*)``. If this is given and *braceidpattern* is
759+
``None`` this pattern will also apply to braced placeholders.
760+
761+
.. note::
762+
763+
Since default *flags* is ``re.IGNORECASE``, pattern ``[a-z]`` can match
764+
with some non-ASCII characters. That's why we use local ``-i`` flag here.
765+
766+
While *flags* is kept to ``re.IGNORECASE`` for backward compatibility,
767+
you can override it to ``0`` or ``re.IGNORECASE | re.ASCII`` when
768+
subclassing. It's simple way to avoid unexpected match like above example.
760769

761770
.. versionchanged:: 3.7
762771
*braceidpattern* can be used to define separate patterns used inside and

Lib/string.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,11 @@ class Template(metaclass=_TemplateMetaclass):
7979
"""A string class for supporting $-substitutions."""
8080

8181
delimiter = '$'
82-
idpattern = r'[_a-z][_a-z0-9]*'
82+
# r'[a-z]' matches to non-ASCII letters when used with IGNORECASE,
83+
# but without ASCII flag. We can't add re.ASCII to flags because of
84+
# backward compatibility. So we use local -i flag and [a-zA-Z] pattern.
85+
# See https://bugs.python.org/issue31672
86+
idpattern = r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)'
8387
braceidpattern = None
8488
flags = _re.IGNORECASE
8589

Lib/test/test_string.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,12 @@ def test_invalid_placeholders(self):
270270
raises(ValueError, s.substitute, dict(who='tim'))
271271
s = Template('$who likes $100')
272272
raises(ValueError, s.substitute, dict(who='tim'))
273+
# Template.idpattern should match to only ASCII characters.
274+
# https://bugs.python.org/issue31672
275+
s = Template("$who likes $\u0131") # (DOTLESS I)
276+
raises(ValueError, s.substitute, dict(who='tim'))
277+
s = Template("$who likes $\u0130") # (LATIN CAPITAL LETTER I WITH DOT ABOVE)
278+
raises(ValueError, s.substitute, dict(who='tim'))
273279

274280
def test_idpattern_override(self):
275281
class PathPattern(Template):
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
``idpattern`` in ``string.Template`` matched some non-ASCII characters. Now
2+
it uses ``-i`` regular expression local flag to avoid non-ASCII characters.

0 commit comments

Comments
 (0)