-
-
Notifications
You must be signed in to change notification settings - Fork 34.5k
gh-140797: Forbid capturing groups in re.Scanner lexicon patterns #140944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
It adds validation to re.Scanner.init that rejects lexicon patterns containing capturing groups. If a user-supplied pattern contains any capturing groups, Scanner now raises ValueError with a clear message advising the use of non-capturing groups (?:...) instead.
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -397,9 +397,16 @@ def __init__(self, lexicon, flags=0): | |
| s = _parser.State() | ||
| s.flags = flags | ||
| for phrase, action in lexicon: | ||
| sub_pattern = _parser.parse(phrase, flags) | ||
| if sub_pattern.state.groups != 1: # <- 1 means always has \0 | ||
| raise ValueError( | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can write in one line. A line should <= 80 characters.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for the suggestion. I have resolved it now |
||
| "re.Scanner lexicon patterns must not contain capturing groups;\n" | ||
|
Abhi210 marked this conversation as resolved.
Outdated
|
||
| "Please use non-capturing groups (?:...) instead" | ||
| ) | ||
|
|
||
| gid = s.opengroup() | ||
|
Abhi210 marked this conversation as resolved.
|
||
| p.append(_parser.SubPattern(s, [ | ||
| (SUBPATTERN, (gid, 0, 0, _parser.parse(phrase, flags))), | ||
| (SUBPATTERN, (gid, 0, 0, sub_pattern)), | ||
| ])) | ||
| s.closegroup(gid, p[-1]) | ||
| p = _parser.SubPattern(s, [(BRANCH, (None, p))]) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1639,6 +1639,25 @@ def s_int(scanner, token): return int(token) | |
| (['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, | ||
| 'op+', 'bar'], '')) | ||
|
|
||
| def test_bug_140797(self): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My negligence, please use - # bug 140797: remove capturing groups compilation form re.Scanner
+ # gh140797: capturing groups is not allowed in re.Scanner
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! Thank you again for your time and suggestions! |
||
| #bug 140797: remove capturing groups compilation form re.Scanner | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add a space after - #Capturing group throws an error
+ # Capturing group throws an errorAnd add a space after - with self.assertRaisesRegex(ValueError,msg):
+ with self.assertRaisesRegex(ValueError, msg):Then looks good to me.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops! Thank you again! Resolved |
||
|
|
||
| #Presence of Capturing group throws an error | ||
| lex = [("(a)b", None)] | ||
| with self.assertRaises(ValueError): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You may check exception message. msg = "Can not use capturing groups in re.Scanner"
with self.assertRaisesRegex(ValueError, msg):
...
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for the suggestion! I have resolved it now. Need to learn a lot! |
||
| Scanner(lex) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This saves a line... - Scanner(lex)
+ Scanner([("(a)b", None)])
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! Testing sure takes the time 😂 |
||
|
|
||
| #Presence of non-capturing groups should pass normally | ||
| s = Scanner([("(?:a)b", lambda scanner, token: token)]) | ||
| result, rem = s.scan("ab") | ||
| self.assertEqual(result,['ab']) | ||
| self.assertEqual(rem,'') | ||
|
serhiy-storchaka marked this conversation as resolved.
|
||
|
|
||
| #Testing a very complex capturing group | ||
|
Abhi210 marked this conversation as resolved.
Outdated
|
||
| pattern= "(?P<name>a)" | ||
| with self.assertRaises(ValueError): | ||
| Scanner([(pattern, None)]) | ||
|
|
||
| def test_bug_448951(self): | ||
| # bug 448951 (similar to 429357, but with single char match) | ||
| # (Also test greedy matches.) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| The re.Scanner class now forbids regular expressions containing capturing | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mention that that class is undocumented. You can also use some formatting, even if the link does not work:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! Thank you 😊 |
||
| groups in its lexicon patterns. Patterns using capturing groups could | ||
| previously lead to crashes with segmentation fault. Use non-capturing groups | ||
| (?:...) instead. | ||
Uh oh!
There was an error while loading. Please reload this page.