Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Atomic groups (aka. Non-backtracking subexpression or possessive match) aren't supported in re2. I've removed them to increase portability across libraries that use Linguist's heuristics.
Note that atomic groups can be useful to increase performance, but the risk of excessive backtracking in the cases here is minimal or I've added other countermesures when needed.
Win32 Message File
There was no real use of atomic grouping there since
\/\*\s*
can't really cause any branching in this context.INI
The original use of an atomic group in
(?>[^\s\[][^\r\n]*(?:\r?\n|\r))*
can prevent some backtracking in case a file contains "InternetShortcut]", but no "URL" property afterwards. However, even without the atomic grouping, the execution time remains linear since the capture group doesn't have multiple ways of matching a whole line. I've also added a limit of 20 lines for the "URL" property to appear since that should be more than enough.GSC
First pattern: The atomic grouping was only preventing minor backtracking due to the presence of 2 options.
Second pattern: That was probably the most useful case of atomic grouping among the one presented here. However, simply combining
(?>\w+\.)*\w+
into(\w+\.)+
like proposed in this PR will already prevent the vast majority of potential backtracking.