Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[3.6] bpo-31672: Fix string.Template accidentally matched non-ASCII identifiers (GH-3872) #3982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 14, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions Doc/library/string.rst
Original file line number Diff line number Diff line change
Expand Up @@ -746,8 +746,18 @@ to parse template strings. To do this, you can override these class attributes:

* *idpattern* -- This is the regular expression describing the pattern for
non-braced placeholders (the braces will be added automatically as
appropriate). The default value is the regular expression
``[_a-z][_a-z0-9]*``.
appropriate). The default value is the regular expression
``(?-i:[_a-zA-Z][_a-zA-Z0-9]*)``.

.. note::

Since default *flags* is ``re.IGNORECASE``, pattern ``[a-z]`` can match
with some non-ASCII characters. That's why we use local ``-i`` flag here.

While *flags* is kept to ``re.IGNORECASE`` for backward compatibility,
you can override it to ``0`` or ``re.IGNORECASE | re.ASCII`` when
subclassing.


* *flags* -- The regular expression flags that will be applied when compiling
the regular expression used for recognizing substitutions. The default value
Expand Down
6 changes: 5 additions & 1 deletion Lib/string.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,11 @@ class Template(metaclass=_TemplateMetaclass):
"""A string class for supporting $-substitutions."""

delimiter = '$'
idpattern = r'[_a-z][_a-z0-9]*'
# r'[a-z]' matches to non-ASCII letters when used with IGNORECASE,
# but without ASCII flag. We can't add re.ASCII to flags because of
# backward compatibility. So we use local -i flag and [a-zA-Z] pattern.
# See https://bugs.python.org/issue31672
idpattern = r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)'
flags = _re.IGNORECASE

def __init__(self, template):
Expand Down
6 changes: 6 additions & 0 deletions Lib/test/test_string.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,12 @@ def test_invalid_placeholders(self):
raises(ValueError, s.substitute, dict(who='tim'))
s = Template('$who likes $100')
raises(ValueError, s.substitute, dict(who='tim'))
# Template.idpattern should match to only ASCII characters.
# https://bugs.python.org/issue31672
s = Template("$who likes $\u0131") # (DOTLESS I)
raises(ValueError, s.substitute, dict(who='tim'))
s = Template("$who likes $\u0130") # (LATIN CAPITAL LETTER I WITH DOT ABOVE)
raises(ValueError, s.substitute, dict(who='tim'))

def test_idpattern_override(self):
class PathPattern(Template):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
``idpattern`` in ``string.Template`` matched some non-ASCII characters. Now
it uses ``-i`` regular expression local flag to avoid non-ASCII characters.