Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ExplodingCabbage
Copy link
Collaborator

@ExplodingCabbage ExplodingCabbage commented Jan 7, 2026

Previously, we parsed filenames (or whatever) out of headers like Index: foo.txt or diff -r 9117c6561b0b -r 273ce12ad8f1 blabla.js using the following regex:

/^(?:Index:|diff(?: -r \w+)+)\s+(.+?)\s*$/

Against non-malicious input, this is probably fine almost 100% of the time. However, against adversarially-crafted input, it has cubic time complexity. Note that this is not because of the nested quantifiers in the subexpression (?: -r \w+)+ (these are actually fine; any backtracking is linear so no catastrophic backtracking can occur) but rather because of the subexpression \s+(.+?)\s*.

That subexpression may look like it will match pretty much anything that starts with a whitespace character and then is followed by at least one other character. Not so! The crucial nuance here is that the wildcard . does not match line break characters. There are four things JavaScript considers to be line break characters - \r, \n, \u2028, and \u2029 - but we only split on \n when parsing the patch. So the "line" might include, say, some \u2028s.

This lets us construct an adversarial input that contains e.g. 1000 spaces followed by a \u2028 followed by a non-whitespace character followed by a \u2028 followed by a non-whitespace character. Such a line does not match \s+(.+?)\s*, and will cause cubic backtracking as the regex engine tries to make it match. (It's cubic because there are 1000 * 999 * 999 different ways to split those thousand spaces into four segments respectively corresponding to the subexpressions \s+, .+?, and \s*, and some leftover at the end - and the regex engine will test every single one of them in the course of backtracking.)

Awkward! The simplest fix, implemented here, is basically to lean less heavily on regexes for this parsing. Instead of a single regex matching the entire line, instead we can just match the Index: or diff -r ... prefix that we want to discard, chop it off, and .trim() the remainder. This fixes the performance issue but gets us the same results in almost all cases. (The exception being ones where line break characters occur in the line; those characters now get treated like any other whitespace, which seems like a fairly unimportant behaviour change and probably a bugfix in and of itself.)

Thanks to @ShiyuBanzhou for reporting this (along with one other yet-to-be-fixed ReDOS) in #644.

(This is the feature involved in #644, and I want to try to avoid regressions. It previously had no automated tests.)
@ExplodingCabbage ExplodingCabbage self-assigned this Jan 7, 2026
@ExplodingCabbage ExplodingCabbage marked this pull request as ready for review January 7, 2026 13:51
@ExplodingCabbage
Copy link
Collaborator Author

ExplodingCabbage commented Jan 7, 2026

I'm gonna merge this and fix the other ReDOS in a further PR. Review would be welcome (though supererogatory), @ShiyuBanzhou - if I've made any mistakes here, there's still time to fix them before the next release.

@ExplodingCabbage
Copy link
Collaborator Author

For the record, here's a simple test function that I used from the Node REPL to test whether my fix had indeed fixed the ReDOS:

const { parsePatch } = require('./libcjs');

function timeWithRepeats(nRepeats) {
    const prefix = "Index: ";
    const infix = "\t".repeat(nRepeats);
    const suffix = "◎!\u2028!\u2028◎!\u2028!";
    const payload = prefix + infix + suffix;
    const startTime = new Date();
    parsePatch(payload);
    return new Date() - startTime;
}

On my PC, timeWithRepeats(5000) takes 40 seconds before this fix, and 0ms after it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants