This repository was archived by the owner on Jan 23, 2023. It is now read-only.
Improve performance of Regex ctor and IsMatch #231
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Regex class maintains a cache of byte codes, which the Regex ctor indexes into using a key. It uses this seemingly innocuous line to create that key:
This, however, has the unfortunate effect of allocating a string for the options, a string array for the five strings to be passed to the String.Concat call generated by the compiler, another string array allocation inside of Concat, and then the resulting string for the whole operation. The cost of those allocations is causing a non-trivial slowdown for repeated Regex.IsMatch calls for simple regular expressions, such as for a phone number (e.g. from the MSDN docs "^\d{3}-\d{3}-\d{4}$").
This commit adds a new struct key type that just stores the constitutent options, cultureKey, and pattern, rather than creating a string to store them. That key is then what's stored in each entry in the cache.
For repeated Regex.IsMatch calls for basic regular expressions like the phone number one previously mentioned, on my machine this improves throughput by ~35%, in large part due to an ~80% reduction in number of allocations, and (for this particular test case) an ~70% reduction in number of bytes allocated (it depends primarily on the length of the pattern and the length of the culture name).