Add vectorization to improve CRC32 performance #83321

brantburnett · 2023-03-13T02:31:54Z

This significantly improves performance for System.IO.Hashing.Crc32 for cases where the source span is 64 bytes or larger on Intel x86/x64 and modern ARM architectures. It also improves the performance on ARM in cases where vectorization is not an option, such as systems without the necessary intrinsic or for short source spans.

The vectorization change only applies to .NET 7 and later targets of System.IO.Hashing because it uses some Vector128 APIs added in .NET 7. The scalar processing ARM changes also apply to .NET 6 and later.

The vectorization algorithm is a C# implementation of the algorithm put forth in the Intel paper "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction" in December 2009. It is a modernization of the implementation found in ImageSharp offered here: #40244 (comment).

BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22621.1413)
Intel Core i7-10850H CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=8.0.100-preview.1.23115.2
[Host] : .NET 8.0.0 (8.0.23.11008), X64 RyuJIT AVX2
Job-UHMIUW : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Job-IZYDKJ : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1

Method	Job	BufferSize	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD
Append	Current	16	31.41 ns	0.637 ns	0.734 ns	31.35 ns	30.23 ns	32.36 ns	1.00	0.00
Append	Intrinsics	16	30.97 ns	0.778 ns	0.896 ns	31.14 ns	29.59 ns	32.12 ns	0.99	0.04

Append	Current	128	250.96 ns	4.743 ns	5.075 ns	251.44 ns	242.05 ns	257.93 ns	1.00	0.00
Append	Intrinsics	128	19.05 ns	0.297 ns	0.263 ns	18.99 ns	18.55 ns	19.57 ns	0.08	0.00

Append	Current	1024	1,990.18 ns	31.113 ns	29.104 ns	1,994.39 ns	1,948.98 ns	2,030.06 ns	1.00	0.00
Append	Intrinsics	1024	58.31 ns	1.452 ns	1.672 ns	58.49 ns	55.71 ns	60.18 ns	0.03	0.00

BenchmarkDotNet=v0.13.2.2052-nightly, OS=ubuntu 22.04
AWS m6g.xlarge Graviton2
.NET SDK=8.0.100-preview.1.23115.2
[Host] : .NET 8.0.0 (8.0.23.11008), Arm64 RyuJIT AdvSIMD
Job-LINWAX : .NET 8.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Job-SOJHQU : .NET 8.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1

Method	Job	BufferSize	Mean	Error	StdDev	Median	Min	Max	Ratio
Append	Current	16	44.797 ns	0.0125 ns	0.0117 ns	44.796 ns	44.777 ns	44.815 ns	1.00
Append	Intrinsics	16	8.521 ns	0.0502 ns	0.0445 ns	8.525 ns	8.444 ns	8.590 ns	0.19

Append	Current	128	363.017 ns	0.0252 ns	0.0223 ns	363.021 ns	362.977 ns	363.048 ns	1.00
Append	Intrinsics	128	29.491 ns	0.0412 ns	0.0385 ns	29.498 ns	29.414 ns	29.543 ns	0.08

Append	Current	1024	2,887.236 ns	0.3149 ns	0.2946 ns	2,887.090 ns	2,886.898 ns	2,887.818 ns	1.00
Append	Intrinsics	1024	92.073 ns	0.4069 ns	0.3807 ns	92.078 ns	91.529 ns	92.833 ns	0.03

This significantly improves performance for System.IO.Hashing.Crc32 for cases where the source span is 64 bytes or larger on Intel x86/x64 architectures. The change only applies to .NET 7 and later targets of System.IO.Hashing because it uses some Vector128 APIs added in .NET 7. BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22000.1641/21H2) Intel Core i7-10850H CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores .NET SDK=8.0.100-preview.1.23115.2 [Host] : .NET 8.0.0 (8.0.23.11008), X64 RyuJIT AVX2 Job-PBKTIR : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2 Job-TVEBLV : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2 PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15 WarmupCount=1 | Method | Job | BufferSize | Mean | Error | StdDev | Median | Min | Max | Ratio | |------- |----------- |----------- |------------:|----------:|----------:|------------:|------------:|------------:|------:| | Append | Current | 128 | 228.20 ns | 2.366 ns | 2.213 ns | 228.07 ns | 225.54 ns | 232.75 ns | 1.00 | | Append | Intrinsics | 128 | 17.62 ns | 0.096 ns | 0.075 ns | 17.59 ns | 17.56 ns | 17.80 ns | 0.08 | | | | | | | | | | | | | Append | Current | 1024 | 1,988.07 ns | 47.120 ns | 54.264 ns | 1,990.18 ns | 1,892.83 ns | 2,089.15 ns | 1.00 | | Append | Intrinsics | 1024 | 64.71 ns | 0.794 ns | 0.704 ns | 64.67 ns | 63.13 ns | 65.96 ns | 0.03 |

ghost · 2023-03-13T02:32:16Z

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

This significantly improves performance for System.IO.Hashing.Crc32 for cases where the source span is 64 bytes or larger on Intel x86/x64 architectures.

The change only applies to .NET 7 and later targets of System.IO.Hashing because it uses some Vector128 APIs added in .NET 7.

This is a C# implementation of the algorithm put forth in the Intel paper "Fast CRC Computation for Generic Polynomials Using
PCLMULQDQ Instruction" in December 2009. It is a modernization of the implementation found in ImageSharp offered here: #40244 (comment).

BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22000.1641/21H2) Intel Core i7-10850H CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores .NET SDK=8.0.100-preview.1.23115.2
[Host] : .NET 8.0.0 (8.0.23.11008), X64 RyuJIT AVX2
Job-PBKTIR : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Job-TVEBLV : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method	Job	BufferSize	Mean	Error	StdDev	Median	Min	Max	Ratio
Append	Current	128	228.20 ns	2.366 ns	2.213 ns	228.07 ns	225.54 ns	232.75 ns	1.00
Append	Intrinsics	128	17.62 ns	0.096 ns	0.075 ns	17.59 ns	17.56 ns	17.80 ns	0.08

Append	Current	1024	1,988.07 ns	47.120 ns	54.264 ns	1,990.18 ns	1,892.83 ns	2,089.15 ns	1.00
Append	Intrinsics	1024	64.71 ns	0.794 ns	0.704 ns	64.67 ns	63.13 ns	65.96 ns	0.03

Author:	brantburnett
Assignees:	-
Labels:	`area-System.IO`
Milestone:	-

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.X86.cs

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.cs

stephentoub

Nice improvements. Thanks.

stephentoub · 2023-03-15T01:52:42Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.X86.cs

+{
+    public partial class Crc32
+    {
+        private const int X86BlockSize = 64;


Nit: I'm generally a fan of putting values like this into named consts, but in this particular case I think it actually muddies the water. There are a bunch of other related const values throughout the code, e.g. 16, 32, 48, that don't have or need such a name, but then when 64 is used there is a name, which to me at least makes it harder to understand the relationship and code. I'd just inline this number into where it's used, and put a comment on the very first use in the up-front guard check that explains where the number comes from.

You could also avoid the numbers and named consts and use things like Vector128<byte>.Count and Vector128<byte>.Count * 4 throughout the code.

After considering it, I agree. The constant was a holdover from a previous iteration where I was checking the length before calling the method, which made it harder to intuit the value. Since you requested that the length check be moved to the Update method, I've left the constant for that purpose only and renamed it appropriately. All the other sites use Vector128<byte>.Count.

Let me know if you still think the Update method should just use Vector128<byte>.Count * 4. I just thought it made things clearer when the logic is split between two files.

I believe I have this all resolved. Thanks.

stephentoub · 2023-03-15T02:00:38Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.X86.cs

+
+        // Processes the bytes in source in X86BlockSize chunks using x86 intrinsics, followed by processing 16
+        // byte chunks, and then processing remaining bytes individually. Requires support for Sse2 and Pclmulqdq intrinsics.
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]


This results in ~800 bytes of asm. We don't want to inline it :)

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.X86.cs

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.cs

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.X86.cs

stephentoub · 2023-03-15T02:25:04Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.X86.cs

+        private const byte CarrylessMultiplyLeftLowerRightUpper = 0x10;
+
+        // Processes the bytes in source in X86BlockSize chunks using x86 intrinsics, followed by processing 16
+        // byte chunks, and then processing remaining bytes individually. Requires support for Sse2 and Pclmulqdq intrinsics.


It'd be nice to include the name of the paper this is based on.

src/libraries/System.IO.Hashing/src/System.IO.Hashing.csproj

ghost · 2023-03-15T03:27:13Z

Tagging subscribers to this area: @dotnet/area-system-security, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

This significantly improves performance for System.IO.Hashing.Crc32 for cases where the source span is 64 bytes or larger on Intel x86/x64 architectures.

The change only applies to .NET 7 and later targets of System.IO.Hashing because it uses some Vector128 APIs added in .NET 7.

This is a C# implementation of the algorithm put forth in the Intel paper "Fast CRC Computation for Generic Polynomials Using
PCLMULQDQ Instruction" in December 2009. It is a modernization of the implementation found in ImageSharp offered here: #40244 (comment).

BenchmarkDotNet=v0.13.2.2052-nightly, OS=Windows 11 (10.0.22000.1641/21H2) Intel Core i7-10850H CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores .NET SDK=8.0.100-preview.1.23115.2
[Host] : .NET 8.0.0 (8.0.23.11008), X64 RyuJIT AVX2
Job-PBKTIR : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Job-TVEBLV : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method	Job	BufferSize	Mean	Error	StdDev	Median	Min	Max	Ratio
Append	Current	128	228.20 ns	2.366 ns	2.213 ns	228.07 ns	225.54 ns	232.75 ns	1.00
Append	Intrinsics	128	17.62 ns	0.096 ns	0.075 ns	17.59 ns	17.56 ns	17.80 ns	0.08

Append	Current	1024	1,988.07 ns	47.120 ns	54.264 ns	1,990.18 ns	1,892.83 ns	2,089.15 ns	1.00
Append	Intrinsics	1024	64.71 ns	0.794 ns	0.704 ns	64.67 ns	63.13 ns	65.96 ns	0.03

Author:	brantburnett
Assignees:	-
Labels:	`area-System.Security`, `community-contribution`
Milestone:	-

danmoseley · 2023-03-15T03:40:11Z

Given this references other work do we need to add an entry in the third party notices file?

tannergooding · 2023-03-15T03:59:05Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.X86.cs

                x5 = Pclmulqdq.CarrylessMultiply(x1, x0, CarrylessMultiplyLower);
                Vector128<ulong> x6 = Pclmulqdq.CarrylessMultiply(x2, x0, CarrylessMultiplyLower);


If you abstract this out into a helper, you can also support it on Arm64 by using the "polynomial multiply" instructions. Believe you specifically want PMULL and PMULL2 which correspond to AdvSimd.PolynomialMultiplyWideningLower/Upper

Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the
vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination
SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.

I actually considered using PMULL and PMULL2 on ARM when I was writing this, but I didn't do so because:

Some of this implementation feels like it might be affected by byte order. Probably addressable, but it was a concern.

I couldn't find an equivalent of Sse2.ShiftRightLogical128BitLane on ARM, so we'd need to create a less performant equivalent (correct me if I'm just blind on this one)

ARM has a built-in CRC32 intrinsic that uses the same polynomial, so I assumed (potentially incorrectly) that it would be a better choice. So I was thinking I'd come back next and add an ARM-specific implementation.

I've done a bit more research on this, and it appears that the PMULL approach is more performant on modern ARM that supports it than the CRC32 intrinsic on larger buffers because it can operate on wider data sets. I see evidence of commits in the Linux kernel and Java based on this. It seems like they use the CRC32 intrinsic as a fallback when PMULL is not available.

Based on this, I'm considering refactoring this to support ARM PMULL as well. However, I'm a bit stumped on how to test it since my dev laptop is Intel. Are there any tricks documented to test using QEMU or something similar?

The easiest way is likely just to let our CI cover it if you don't have your own box.

I wouldn't expect any significant differences and you can filter on BitConverter.IsLittleEndian to ensure it works on BigEndian platforms if that's a concern.

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.cs

brantburnett · 2023-03-15T12:19:09Z

Given this references other work do we need to add an entry in the third party notices file?

Good question, I was hoping someone would tell me. I'm not really an expert on licensing. This work was based on the ImageSharp implementation of the algorithm, but it's a pretty major overhaul of the C# so I'd assume that any required attribution would just be to the Intel paper that it was originally based on. The Intel paper doesn't really have clear licensing, at least to my untrained eye. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf

stephentoub · 2023-03-15T16:36:32Z

If it's a derivative of the ImageSharp code, the relevant information should be added to the third party notices file. If it's instead just based on the algorithm in the paper, I think including the paper title in the source is sufficient. But @richlander would have the final say here.

brantburnett · 2023-03-15T17:28:31Z

If it's a derivative of the ImageSharp code, the relevant information should be added to the third party notices file. If it's instead just based on the algorithm in the paper, I think including the paper title in the source is sufficient. But @richlander would have the final say here.

For reference here is the ImageSharp code: https://github.com/SixLabors/ImageSharp/blob/f4f689ce67ecbcc35cebddba5aacb603e6d1068a/src/ImageSharp/Formats/Png/Zlib/Crc32.cs#L80

richlander · 2023-03-15T17:42:47Z

Overhaul or not overhaul isn't the bar nor is the bar high. If you started with the ImageSharp code, then attribution should be given. We're not trying to limit attributions but given them where warranted. From what I read, it is. Please correct me if I've got that wrong.

ImageSharp is going through a license change, but this file says Apache 2, so we're fine. /cc @JimBobSquarePants.

We should double check with our legal staff on the Intel paper. I suspect it is fine, however the licensing aspects at the end of the doc are confusing. I'll ask.

JimBobSquarePants · 2023-03-17T07:15:55Z

All good by me. Would be great if someone could backport the ARM instrinics to ImageSharp though.

richlander · 2023-03-17T17:16:38Z

Will this change be suitable for you to use @JimBobSquarePants, or do you mean so that you have an improved implementation generally (since this change is .NET 8+)?

brantburnett · 2023-03-17T19:46:05Z

All good by me. Would be great if someone could backport the ARM instrinics to ImageSharp though.

Assuming I get the ARM intrinsics working, you should be able to take System.IO.Hashing 8.0.0 (once released) as a dependency for net7.0 and forward targets to get the intrinsics, without the need to port it. If you want net6.0 support it would require a port, though. This version is currently using some APIs added in .NET 7 to Vector128.

JimBobSquarePants · 2023-03-18T04:52:58Z

@richlander @brantburnett

I've just realised we've already added ARM support since that commit so no need for changes there.

The only thing we're missing is the endianess check.
https://github.com/dotnet/runtime/pull/83321/files#diff-8c4ea1d8b9624f9e2f25b7ffc2f776d6ea77492d9758f116bf1058c6882c2481R176-R180

We'd definitely delete our code once ImageSharp targets .NET 8 and use this.

gfoidl

One question / suggestion, otherwise LGTM.

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.Vectorized.cs

richlander · 2023-03-22T23:17:35Z

I'm assuming that, as the developer of this implementation, that you find that this .asm file you linked to has the same/sufficient information as the paper such that you would have been able to move forward if you'd only had this file initially

I heard back. I was told that this logic is good. If you believe you could have made this implementation with just the .asm file, then we can go forward with making a 3PN entry for just it. We should still add one for ImageSharp if you also based your implementation on that code (even if you ended up with something quite different).

tannergooding · 2023-03-23T14:57:00Z

Assigned myself to give a final review pass and merge if everything looks good.

Please let me know once you've resolved or responded to all feedback and I'll get started (there are a number of open, but likely already handled comments still above).

brantburnett · 2023-03-23T16:30:28Z

Assigned myself to give a final review pass and merge if everything looks good.

Please let me know once you've resolved or responded to all feedback and I'll get started (there are a number of open, but likely already handled comments still above).

To my knowledge, all feedback is resolved above. I just wasn't sure what the procedure is, if I'm supposed to mark it resolved or let the reviewer mark it as resolved. I'm happy to do so if that's the procedure.

tannergooding · 2023-03-23T16:36:47Z

It varies from repo to repo and reviewer to reviewer, unfortunately.

The "safest" thing to do is to at least leave a little comment indicating "Fixed" or "Resolved" if its been explicitly addressed. You can optionally leave a link or comment elaborating if appropriate.

That at least lets other reviewers see that something isn't still pending or simply missed.

brantburnett · 2023-03-23T17:01:27Z

It varies from repo to repo and reviewer to reviewer, unfortunately.

The "safest" thing to do is to at least leave a little comment indicating "Fixed" or "Resolved" if its been explicitly addressed. You can optionally leave a link or comment elaborating if appropriate.

That at least lets other reviewers see that something isn't still pending or simply missed.

Okay, I resolved some from gfoidl since he approved the PR and replied on the other conversations. Thanks.

brantburnett · 2023-04-01T12:57:24Z

@tannergooding

I'm almost done with vectorizing the CRC64 implementation as well. I wanted to check and see, from a workflow perspective, what would be best for you. Would it make sense to get this merged first and then do a separate PR for CRC64? There is some dependency between them, so I don't want to do them both as parallel PRs. But separate commits to main may give a clearer history. Or would it save you time to deal with them both at once in a single PR?

stephentoub · 2023-04-01T13:03:22Z

Thanks. Let's get this one merged and then do the other.

tannergooding · 2023-04-05T14:24:58Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.Arm.cs

+            // Compute in 8 byte chunks
+            if (source.Length >= sizeof(ulong))
+            {
+                ReadOnlySpan<ulong> longSource = MemoryMarshal.Cast<byte, ulong>(source);


This should really use ReadUnaligned instead of casting since there is no guarantee that things are "properly aligned" otherwise.

Rewritten to use ref byte and Unsafe.ReadUnaligned

tannergooding · 2023-04-05T14:28:31Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.Vectorized.cs

+            Debug.Fail("This path should be unreachable.");
+            return default;


We have a new System.Diagnostics.UnreachableException which is probably better.

You'd then want:

ThrowHelper.ThrowUnreachableException(); return default;

tannergooding · 2023-04-05T14:34:09Z

Changes overall LGTM. Should probably have a secondary review/sign-off before merging.

ghost added area-System.IO community-contribution Indicates that the PR has been added by a community member labels Mar 13, 2023

gfoidl reviewed Mar 13, 2023

View reviewed changes

Use vector operator overloads and ref byte indexing

24114dc

jkotas reviewed Mar 13, 2023

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.cs Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request Mar 13, 2023

Roslyn source generator crash on mono/linux/arm64 #81123

Closed

Fix error and remove ref ROS

7df1df4

brantburnett marked this pull request as ready for review March 14, 2023 12:37

stephentoub reviewed Mar 15, 2023

View reviewed changes

jozkee added area-System.Security and removed area-System.IO labels Mar 15, 2023

Drop aggressive inlining and legibility improvements

ddfbe79

tannergooding reviewed Mar 15, 2023

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.cs Outdated Show resolved Hide resolved

Don't overcheck intrinsics

340317e

build-analysis bot mentioned this pull request Mar 15, 2023

Tracking issue for CI build timeouts #76454

Closed

First pass at ARM support

49f970d

Merge branch 'main' into crc32-x86

6bacc2a

brantburnett requested review from stephentoub and removed request for gfoidl March 22, 2023 12:13

gfoidl reviewed Mar 22, 2023

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.Vectorized.cs Outdated Show resolved Hide resolved

Move vector shift right to helper function

3b7c981

brantburnett requested review from gfoidl and removed request for stephentoub March 22, 2023 23:00

gfoidl approved these changes Mar 23, 2023

View reviewed changes

tannergooding self-assigned this Mar 23, 2023

A bit of cleanup

eebae5e

tannergooding reviewed Apr 5, 2023

View reviewed changes

tannergooding approved these changes Apr 5, 2023

View reviewed changes

brantburnett added 2 commits April 5, 2023 10:53

Use System.Diagnostics.UnreachableException

7122e02

Use ReadUnaligned for ARM CRC

ec5ed7c

stephentoub merged commit d0ca558 into dotnet:main Apr 22, 2023

brantburnett deleted the crc32-x86 branch April 23, 2023 01:19

This was referenced Apr 23, 2023

Add .NET Core 2.1 and 3.0 perf improvements force-net/Crc32.NET#19

Open

Vectorize the CRC64 implementation #85221

Merged

adamsitnik added the tenet-performance Performance related issue label May 17, 2023

adamsitnik added this to the 8.0.0 milestone May 17, 2023

ghost locked as resolved and limited conversation to collaborators Jun 16, 2023

		x5 = Pclmulqdq.CarrylessMultiply(x1, x0, CarrylessMultiplyLower);
		Vector128<ulong> x6 = Pclmulqdq.CarrylessMultiply(x2, x0, CarrylessMultiplyLower);

		Debug.Fail("This path should be unreachable.");
		return default;

Add vectorization to improve CRC32 performance #83321

Add vectorization to improve CRC32 performance #83321

Conversation

brantburnett commented Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Mar 13, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stephentoub left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ghost commented Mar 15, 2023

Uh oh!

danmoseley commented Mar 15, 2023

Uh oh!

tannergooding Mar 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brantburnett commented Mar 15, 2023

Uh oh!

stephentoub commented Mar 15, 2023

Uh oh!

brantburnett commented Mar 15, 2023

Uh oh!

richlander commented Mar 15, 2023

Uh oh!

JimBobSquarePants commented Mar 17, 2023

Uh oh!

richlander commented Mar 17, 2023

Uh oh!

brantburnett commented Mar 17, 2023

Uh oh!

JimBobSquarePants commented Mar 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfoidl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

richlander commented Mar 22, 2023

Uh oh!

tannergooding commented Mar 23, 2023

Uh oh!

brantburnett commented Mar 23, 2023

Uh oh!

brantburnett commented Mar 13, 2023 •

edited

Loading

tannergooding Mar 15, 2023 •

edited

Loading

JimBobSquarePants commented Mar 18, 2023 •

edited

Loading

tannergooding Apr 5, 2023 •

edited

Loading