Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

stephentoub
Copy link
Member

The Regex class maintains a cache of byte codes, which the Regex ctor indexes into using a key. It uses this seemingly innocuous line to create that key:

String key = ((int)options).ToString(NumberFormatInfo.InvariantInfo) + ":" + cultureKey + ":" + pattern;

This, however, has the unfortunate effect of allocating a string for the options, a string array for the five strings to be passed to the String.Concat call generated by the compiler, another string array allocation inside of Concat, and then the resulting string for the whole operation. The cost of those allocations is causing a non-trivial slowdown for repeated Regex.IsMatch calls for simple regular expressions, such as for a phone number (e.g. from the MSDN docs "^\d{3}-\d{3}-\d{4}$").

This commit adds a new struct key type that just stores the constitutent options, cultureKey, and pattern, rather than creating a string to store them. That key is then what's stored in each entry in the cache.

For repeated Regex.IsMatch calls for basic regular expressions like the phone number one previously mentioned, on my machine this improves throughput by ~35%, in large part due to an ~80% reduction in number of allocations, and (for this particular test case) an ~70% reduction in number of bytes allocated (it depends primarily on the length of the pattern and the length of the culture name).

The Regex class maintains a cache of byte codes, which the Regex ctor indexes into using a key.  It uses this seemingly innocuous line to create that key:

String key = ((int)options).ToString(NumberFormatInfo.InvariantInfo) + ":" + cultureKey + ":" + pattern;

This, however, has the unfortunate effect of allocating a string for the options, a string array for the five strings to be passed to the String.Concat call generated by the compiler, another string array allocation inside of Concat, and then the resulting string for the whole operation.  The cost of those allocations is causing a non-trivial slowdown for repeated Regex.IsMatch calls for simple regular expressions, such as for a phone number (e.g. "^\\d{3}-\\d{3}-\\d{4}$").

This commit adds a new struct key type that just stores the constitutent options, cultureKey, and pattern, rather than creating a string to store them.  That key is then what's stored in each entry in the cache.

For repeated Regex.IsMatch calls for basic regular expressions like the phone number one previously mentioned, on my machine this improves throughput by ~35%, in large part due to ~80% reduction in number of allocations, and (for this particular test case) an ~70% reduction in number of bytes allocated (it depends primarily on the length of the pattern and the length of the culture name).
@ellismg
Copy link
Contributor

ellismg commented Dec 12, 2014

Looks good.

@nguerrera
Copy link
Contributor

👍

stephentoub added a commit that referenced this pull request Dec 12, 2014
Improve performance of Regex ctor and IsMatch
@stephentoub stephentoub merged commit a3a485e into dotnet:master Dec 12, 2014
@stephentoub stephentoub deleted the regex_ctor_perf branch December 12, 2014 22:26
@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants