Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

Alois-xx
Copy link
Contributor

This fix should save ca 40% for the common case when many Char.ToLower calls are emitted which access CultureInfo.CurrentCulture for compiled Regex Queries.

…xOptions.IgnoreCase or RegexOptions.CultureInvariant.

This saves over 40% in these cases.
……xOptions.IgnoreCase or RegexOptions.CultureInvariant.

This saves over 40% in these cases.
…ith RegexOptions.IgnoreCase or RegexOptions.CultureInvariant."

This reverts commit e151e93.

private void CallToLower()
{
Ldloc(_cultureV);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the lifetime of this? e.g. does this cache end up spanning multiple regex calls and user code such that user code could change the current culture and end up with different behavior than before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_cultureV is a local variable. The lifetime of the cache is therefore method local in FindFirstChar and Go methods of the compiled regular expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Thanks.

private void CallToLower()
{
Ldloc(_cultureV);
Call(s_chartolowerM);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ViktorHofer, am I misremembering, or did you do some ToLower-related optimizations/changes when you were cleaning up the regex code recently? If yes, should we instead just port the appropriate changes to the compiled version? If no, is there a similar optimization to be done on the interpreted side of things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but only minor allocation optimizations. I wouldn't do that as part of this PR but revisit the compiled code paths later to make sure that we don't diverge and bring optimizations over.

@stephentoub
Copy link
Member

Do we have regex tests (interpreted and compiled) that rely on the current culture being something specific? If not, it'd be good to add as part of this.

@ViktorHofer
Copy link
Member

Do we have regex tests (interpreted and compiled) that rely on the current culture being something specific? If not, it'd be good to add as part of this.

Only very few for Unicode characters. We should definitely add a bunch.

Copy link
Member

@ViktorHofer ViktorHofer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please post performance numbers with BenchmarkDotNet baselined on master including allocation?

@Alois-xx
Copy link
Contributor Author

Alois-xx commented Oct 19, 2018

@ViktorHofer
Benchmark.NET was a little funny to get it to use the compiled dlls and not the ones from the framework sdk. Here are the patched Results:

                        Method | N |         Mean |        Error |       StdDev | Allocated |
------------------------------ |-- |-------------:|-------------:|-------------:|----------:|
                      Original | 1 | 438,212.4 ns | 4,647.930 ns | 4,347.676 ns |       0 B |
                 CaseSensitive | 1 | 227,563.0 ns | 5,142.693 ns | 5,281.167 ns |       0 B |
 CaseSensitiveCultureInvariant | 1 | 215,541.1 ns | 1,187.442 ns | 1,110.734 ns |       0 B |
          SimplifiedExpression | 1 |   1,525.5 ns |    11.969 ns |    11.195 ns |       0 B |
      Case_Sensitive_Substring | 1 |     691.5 ns |     4.897 ns |     4.581 ns |       0 B |

And here the baseline version where I did revert the changes of my commit to really measure the impact of only my change:

                        Method | N |           Mean |         Error |        StdDev | Allocated |
------------------------------ |-- |---------------:|--------------:|--------------:|----------:|
                      Original | 1 | 1,055,537.2 ns | 31,271.169 ns | 36,011.922 ns |       0 B |
                 CaseSensitive | 1 |   216,579.6 ns |  1,827.153 ns |  1,709.119 ns |       0 B |
 CaseSensitiveCultureInvariant | 1 |   217,116.7 ns |  2,212.376 ns |  1,961.213 ns |       0 B |
          SimplifiedExpression | 1 |     1,523.0 ns |     13.191 ns |     12.339 ns |       0 B |
      Case_Sensitive_Substring | 1 |       671.3 ns |      8.552 ns |      8.000 ns |       0 B |

Just in case you did not see here is the ETW chart:
grafik

Actually the Benchmark.NET numbers are even better. From the numbers it looks like the slow code gen path is only triggerd when RegexOptions.IgnoreCase is used. But anyway this is certainly a widely used option.

@Alois-xx
Copy link
Contributor Author

@stephentoub: I could think of some Turkish I tests which behave differently in different locales. That would be a good test to verify that the right culture was used.

@ViktorHofer
Copy link
Member

ViktorHofer commented Oct 22, 2018

That sounds good and should suffice. As mentioned before, we currently don't have a comprehensive set of inputs with different cultures. We should definitely fix that.

@Alois-xx
Copy link
Contributor Author

@ViktorHofer: Added some locale tests which really show different behavior under the turkish locale.

@Alois-xx
Copy link
Contributor Author

@dotnet-bot test this please

string input = "Iıİi";

var cultInvariantRegex = Create(input, CultureInfo.InvariantCulture, RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
var turkishRegex = Create(input, turkish, RegexOptions.IgnoreCase);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please change var to be the actual type here... we only use var when the type is obvious from the right-hand side, namely a ctor or an explicit cast.

@Alois-xx
Copy link
Contributor Author

@dotnet-bot test this please

@stephentoub
Copy link
Member

Thanks!

@stephentoub stephentoub merged commit 58e2b4c into dotnet:master Oct 25, 2018
@danmoseley
Copy link
Member

Nice win here @Alois-xx many thanks! Any interest in more work on regex?

@Alois-xx
Copy link
Contributor Author

@danmosemsft: If I find new issues or ideas I will definitely file an issue. But currently my free time is quite limited.

@Alois-xx Alois-xx deleted the Issue_32764 branch October 29, 2018 11:28
@karelz karelz added this to the 3.0 milestone Nov 15, 2018
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Fix for Regex performance issue when compiled Regex is used with RegexOptions.IgnoreCase or RegexOptions.CultureInvariant.
This saves over 40% in these cases.

* Fix for Regex performance issue when compiled Regex is used with Rege…xOptions.IgnoreCase or RegexOptions.CultureInvariant.

This saves over 40% in these cases.

* Revert "Fix for Regex performance issue when compiled Regex is used with RegexOptions.IgnoreCase or RegexOptions.CultureInvariant."

This reverts commit dotnet/corefx@e151e93.

* Added TurkishI tests which check compiled and interpreted regular expressions.

* Removed var of test


Commit migrated from dotnet/corefx@58e2b4c
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants