Fix potentially cubic-time regex in parsePatch #647
Merged
+149
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, we parsed filenames (or whatever) out of headers like
Index: foo.txtordiff -r 9117c6561b0b -r 273ce12ad8f1 blabla.jsusing the following regex:Against non-malicious input, this is probably fine almost 100% of the time. However, against adversarially-crafted input, it has cubic time complexity. Note that this is not because of the nested quantifiers in the subexpression
(?: -r \w+)+(these are actually fine; any backtracking is linear so no catastrophic backtracking can occur) but rather because of the subexpression\s+(.+?)\s*.That subexpression may look like it will match pretty much anything that starts with a whitespace character and then is followed by at least one other character. Not so! The crucial nuance here is that the wildcard
.does not match line break characters. There are four things JavaScript considers to be line break characters -\r,\n,\u2028, and\u2029- but we only split on\nwhen parsing the patch. So the "line" might include, say, some\u2028s.This lets us construct an adversarial input that contains e.g. 1000 spaces followed by a
\u2028followed by a non-whitespace character followed by a\u2028followed by a non-whitespace character. Such a line does not match\s+(.+?)\s*, and will cause cubic backtracking as the regex engine tries to make it match. (It's cubic because there are 1000 * 999 * 999 different ways to split those thousand spaces into four segments respectively corresponding to the subexpressions\s+,.+?, and\s*, and some leftover at the end - and the regex engine will test every single one of them in the course of backtracking.)Awkward! The simplest fix, implemented here, is basically to lean less heavily on regexes for this parsing. Instead of a single regex matching the entire line, instead we can just match the
Index:ordiff -r ...prefix that we want to discard, chop it off, and.trim()the remainder. This fixes the performance issue but gets us the same results in almost all cases. (The exception being ones where line break characters occur in the line; those characters now get treated like any other whitespace, which seems like a fairly unimportant behaviour change and probably a bugfix in and of itself.)Thanks to @ShiyuBanzhou for reporting this (along with one other yet-to-be-fixed ReDOS) in #644.