[mscorlib] Improve perf for many Char methods #4881

jamesqo · 2016-05-10T01:57:26Z

Right now many methods in Char are implemented like this (for example):

public static bool IsDigit(char c) => c >= '0' && c <= '9';

This isn't the most efficient way to implement it though; since these methods are often used in a loop, I've taken advantage of the fact that casting to uint causes the value to wrap, which avoids an additional branch. Here is the above method rewritten using this tactic:

// Similar to what @jkotas suggested in dotnet/corefx#7546
public static bool IsDigit(char c) => (uint)(c - '0') <= (uint)('9' - '0');

I've changed a bunch of static methods in Char to take advantage of this fact and avoid an additional branch, along with a couple of other changes:

Used stackalloc, instead of creating a new char array on the heap, for ConvertFromUtf32.
Many overloads that accept a string and an index can simply forward to the char-based overload.
Added HIGH_SURROGATE_END and LOW_SURROGATE_START consts, and removed references to CharUnicodeInfo for getting those values.

Note: Some of the corefx tests are failing with my changes, but unfortunately I'm not sure why/for what chars as the error messages are not very helpful.

cc @JonHanna @jkotas @mikedn @hughbe

edit: Looks like the string-and-index overloads actually can't just forward to the char-based ones, that's likely why the tests are failing. Will fix in a moment.

jkotas · 2016-05-10T13:29:31Z

cc @AlexGhiondea @ellismg

hughbe · 2016-05-10T15:27:50Z

src/mscorlib/src/System/Char.cs




       /*================================= ConvertFromUtf32 ============================
        ** Convert an UTF32 value into a surrogate pair.
        ==============================================================================*/

-        public static String ConvertFromUtf32(int utf32)
+        public unsafe static String ConvertFromUtf32(int utf32)


This is a public method. Does adding unsafe affect the public signature?

It does not.

I would, however, like the unsafe modifier scoped to the part of the method that needs it (specifically around the stackalloc). We should strive to minimize the amount of unsafe regions and how much code is in an unsafe context.

@ellismg I'm going to move the stackalloc part of this PR into a new PR, since (I'm hoping) it won't introduce any merge conflicts and it doesn't really fit in with the rest of this PR. Will address your feedback there.

hughbe · 2016-05-10T15:27:58Z

Good stuff! Assuming you've fixed the corefx test failures, this looks good, espcially the ConvertFromUtf32 allocation stuff

ellismg · 2016-05-10T18:27:04Z

Could you tease out the stack-allocation 8cba55f into it's own commit? It would also be interesting to understand the impact of each of these changes on the relevant microbenchmarks.

ellismg · 2016-05-10T18:30:38Z

Added HIGH_SURROGATE_END and LOW_SURROGATE_START consts, and removed references to CharUnicodeInfo for getting those values.

If we are going to cleanup here, I would rather just always get the values from CharUnicodeInfo so they constants are defined in one place. They are const in CharUnicodeInfo so there should not be a codegen impact either way.

hughbe · 2016-05-10T18:37:11Z

src/mscorlib/src/System/Char.cs

+            char* surrogate = stackalloc char[2];
+            surrogate[0] = (char)((utf32 / 0x400) + HIGH_SURROGATE_START);
+            surrogate[1] = (char)((utf32 % 0x400) + LOW_SURROGATE_START);
+            return new string(surrogate, 0, 2);


I'm not sure what may be more performant, stackallocing a char* and constructing a string, or the following that uses an internal method I've seen in System.Globalization, StringBuilder and String itself:

string result = string.FastAllocateString(2); fixed(char* pResult = result) { pResult[0] = (char)((utf32 / 0x400) + HIGH_SURROGATE_START); pResult[1] = (char)((utf32 % 0x400) + LOW_SURROGATE_START); } return result;

@ellismg let me know what you think

@hughbe Maybe, but 1) that's more of an implementation detail (it wouldn't work if you took the code and copy/pasted it outside of mscorlib), and 2) it requires pinning/unpinning the string while we write characters to it, so the benefit from that would be questionable. For now, I'm sticking to stackalloc.

jamesqo · 2016-05-10T23:18:46Z

@ellismg Regarding the constant values, maybe we should alias the consts in this file to the ones in CharUnicodeInfo, or add a using static CharUnicodeInfo at the top? I'd like to avoid code duplication as well, but personally I find CharUnicodeInfo.HIGH_SURROGATE_START (for example) a little bit verbose.

jamesqo · 2016-05-10T23:25:37Z

src/mscorlib/src/System/Char.cs

@@ -203,7 +203,7 @@ public bool Equals(Char obj)
      [Pure]
      public static bool IsDigit(char c) {
          if (IsLatin1(c)) {
-            return (c >= '0' && c <= '9');
+            return (uint)(c - '0') <= (uint)('9' - '0');


Self-note: I'm considering moving all of these boolean returns into a private helper method like the following:

private bool IsBetweenInclusive(char lowerBound, char upperBound) { return (uint)(m_value - lowerBound) <= (uint)(upperBound - lowerBound); } // Usage: c.IsBetweenInclusive('0', '9');

This way we avoid repeating the value for the lower bound, and is less verbose/error prone.

JonHanna · 2016-05-11T09:25:37Z

Sorry, I was tagged to take a look at this, but I'm a bit busy and not going to be able to pay much attention to .NET Core things for the next couple of days.
I'll just add two thoughts: The first is that the reduction in branching of the first change described could have very different effects with different degrees of inlining, so if it seems to just break even it might be worth doing anyway. It should still be tested though, and an honest test would try several variations of whether the same branch was taken all the time, alternating, random, etc. and then look at them all.
The second is that I think the stackalloc should definitely be examined separately from the rest. Stackalloc optimisations can sometimes be very disappointing pessimisations, so one should be on guard for that.

jamesqo · 2016-05-14T17:49:10Z

Alright, so I finally got around to making perf tests for this PR. Here are the results:

Old
New
Source code (warning: will use ~95% CPU for a good while)

Since I used Parallel.ForEach to calculate the results (I did 1 billion iterations), the results for each char in the old/new files may not be in the same order.

Notes:

IsUpper and IsLower seem to be consistently faster for ASCII chars
IsDigit and IsNumber have regressions of ~300%, ~25% respectively across the board
The numbers of IsSurrogate, IsHighSurrogate and IsLowSurrogate have mostly stayed the same

Would be much appreciated if someone could validate these numbers for me. 😄

mikedn · 2016-05-14T20:02:21Z

Would be much appreciated if someone could validate these numbers for me

The benchmark code should be adjusted so that the result of char.IsX calls is used. As is now the JIT may eliminate code and then you get skewed results.

Self-note: I'm considering moving all of these boolean returns into a private helper method like the following: private bool IsBetweenInclusive(char lowerBound, char upperBound)

IsBetween generates worse code even if it is inlined.

jamesqo · 2016-05-14T20:08:04Z

@mikedn

The benchmark code should be adjusted so that the result of char.IsX calls is used. As is now the JIT may eliminate code and then you get skewed results.

Good point, I had just realized that. I'm going to alter my test scheme to do something like the following:

byte unused = 3;

for (int i = 0; i < Outer; i++)
{
    var watch = Stopwatch.StartNew();
    for (int j = 0; j < Inner; j++)
    {
        if (char.IsX(c)) unused++;
    }
    watch.Stop();
    Console.WriteLine(watch.Elapsed);
}

// At the end...
GC.KeepAlive(unused);

This way, if I understand correctly, the JIT will be forced to generate code for the method as it's being used in a branch.

IsBetween generates worse code even if it is inlined.

Wait, really? Why so? (Also I've refactored it into a new static method named IsIntBetween that takes 3 ints, not sure if this helps or not.)

mikedn · 2016-05-14T20:55:54Z

Good point, I had just realized that. I'm going to alter my test scheme to do something like the following:

Yeah, that should work.

Wait, really? Why so? (Also I've refactored it into a new static method named IsIntBetween that takes 3 ints, not sure if this helps or not.)

The code that the JIT generates for this kind of method isn't very good, see #914

ellismg · 2016-05-15T07:34:14Z

maybe we should alias the consts in this file to the ones in CharUnicodeInfo, or add a using static CharUnicodeInfo at the top? I'd like to avoid code duplication as well, but personally I find CharUnicodeInfo.HIGH_SURROGATE_START (for example) a little bit verbose.

My preference would be to just alias them instead of using static.

jamesqo · 2016-05-15T15:21:39Z

Ok guys, so I finally got around to making another round of perf tests for this change. Here is the source code, the old results, and the new results.

Since the files are quite large / tedious to go through manually, I wrote a script to analyze the test results (you can view the output here).

Notes:

Most of the times have improved from this change (433 vs 127)
Most of the regressions seem to come from IsDigit (34)
There were zero IsWhiteSpace, IsSymbol or IsPunctuation cases that regressed
- Only two IsControl cases have regressed
All of the IsLetter benchmarks that regressed come from ASCII characters
IsUpper only regresses for non-ASCII characters, IsLower for the most part as well (although it has a few regression for ASCII)
There were only 5 regressions for IsNumber, and they all came from ASCII

edit: Ok, I've removed all of the ASCII-related changes, and only kept the ones affecting switch statements like IsSymbol, IsWhiteSpace, or IsPunctuation. Here are my final results: https://gist.github.com/anonymous/c5550fbe75bf36a13b94b7859ea127fa

I think this is finally ready to be merged. 😄

jamesqo · 2016-08-20T02:06:42Z

Closing this for now, I have a better one coming up in the future...

dnfclas added the cla-already-signed label May 10, 2016

jamesqo force-pushed the char-branching branch from d73b30f to 8cba55f Compare May 10, 2016 02:08

hughbe reviewed May 10, 2016
View reviewed changes

jamesqo reviewed May 10, 2016
View reviewed changes

jamesqo force-pushed the char-branching branch 2 times, most recently from 61f3444 to d44f7ab Compare May 12, 2016 05:52

jamesqo mentioned this pull request May 14, 2016

Use stackalloc in WebUtility + avoid some branching dotnet/corefx#8535

Closed

jamesqo changed the title ~~[mscorlib] Improve perf for many Char methods~~ [wip] [mscorlib] Improve perf for many Char methods May 14, 2016

jamesqo changed the title ~~[wip] [mscorlib] Improve perf for many Char methods~~ [mscorlib] Improve perf for many Char methods May 14, 2016

jamesqo changed the title ~~[mscorlib] Improve perf for many Char methods~~ [wip] [mscorlib] Improve perf for many Char methods May 15, 2016

jamesqo force-pushed the char-branching branch from a36ad45 to 3835490 Compare May 15, 2016 16:02

Reduce branching in many Char methods

e27a9d6

jamesqo force-pushed the char-branching branch from 3835490 to e27a9d6 Compare May 15, 2016 16:06

jamesqo changed the title ~~[wip] [mscorlib] Improve perf for many Char methods~~ [mscorlib] Improve perf for many Char methods May 15, 2016

jamesqo mentioned this pull request May 21, 2016

Reduce the IL size of some methods in String #5148

Closed

jamesqo mentioned this pull request Jul 6, 2016

Avoid heap allocating in char.ConvertFromUtf32 #6141

Merged

jamesqo closed this Aug 20, 2016

jamesqo deleted the char-branching branch August 20, 2016 02:06

jkotas mentioned this pull request Jun 21, 2017

Optimizing Int32 Primitive Parsers and clean up dotnet/corefxlab#1616

Merged

[mscorlib] Improve perf for many Char methods #4881

[mscorlib] Improve perf for many Char methods #4881

Uh oh!

Conversation

jamesqo commented May 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented May 10, 2016

Uh oh!

hughbe May 10, 2016

Choose a reason for hiding this comment

Uh oh!

ellismg May 10, 2016

Choose a reason for hiding this comment

Uh oh!

ellismg May 10, 2016

Choose a reason for hiding this comment

Uh oh!

jamesqo May 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hughbe commented May 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellismg commented May 10, 2016

Uh oh!

ellismg commented May 10, 2016

Uh oh!

hughbe May 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesqo May 12, 2016

Choose a reason for hiding this comment

Uh oh!

jamesqo commented May 10, 2016

Uh oh!

jamesqo May 10, 2016

Choose a reason for hiding this comment

Uh oh!

JonHanna commented May 11, 2016

Uh oh!

jamesqo commented May 14, 2016

Uh oh!

mikedn commented May 14, 2016

Uh oh!

jamesqo commented May 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikedn commented May 14, 2016

Uh oh!

ellismg commented May 15, 2016

Uh oh!

jamesqo commented May 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamesqo commented Aug 20, 2016

Uh oh!

Uh oh!

jamesqo commented May 10, 2016 •

edited

Loading

jamesqo May 12, 2016 •

edited

Loading

hughbe commented May 10, 2016 •

edited

Loading

hughbe May 10, 2016 •

edited

Loading

jamesqo commented May 14, 2016 •

edited

Loading

jamesqo commented May 15, 2016 •

edited

Loading