Build clrcompression.dll with optimized Zlib-Intel #5674

bjjones · 2016-01-25T19:09:16Z

#3986 Zlib-Intel is an improved version of Zlib that contains optimizations for processors with the SSE4.2 instruction set. In the most common case of CompressionLevel Optimal, we see a +25-30% performance improvement.

Currently, the version of clrcompression.dll built in the repo is not copied into any of the tests, so to test I manually copied the binary into System.IO.Compression.Tests and ran the test manually resulting in no errors.

Please note that due to a gzip incompatibility I have modified the library to no longer use the agressive deflate_quick strategy and instead use the traditional deflate_fast strategy for CompressionLevel Fastest. This causes performance for CompressionLevel Fastest to be similar to Zlib-Adler.

For ARM platforms, clrcompression.dll is still built using Zlib-Adler.

Performance numbers gathered on an Intel Core i5-4670 CPU for the Canterbury Corpus Benchmark follow:

innerIterations	filename	compressLevel	Adler	Intel	Intel / Adler
25	alice29.txt	Optimal	11.12274	7.505396	67.48%
25	asyoulik.txt	Optimal	10.22775	6.969144	68.14%
25	cp.html	Optimal	1.28132	1.4261	110.30%
25	fields.c	Optimal	0.81646	0.8408881	102.99%
25	grammar.lsp	Optimal	0.7443761	0.670172	90.03%
25	kennedy.xls	Optimal	48.62418	18.24115	37.51%
25	lcet10.txt	Optimal	26.82975	17.67865	65.89%
25	plrabn12.txt	Optimal	39.65364	25.09914	63.3%
25	ptt5	Optimal	14.31153	9.271148	64.78%
25	sum	Optimal	10.1662	3.163316	31.12%
25	xargs.1	Optimal	0.831952	0.7740601	93.04%

innerIterations	CompressLevel	Average Intel / Adler
1	Optimal	75.69%
10	Optimal	70.62%
25	Optimal	70.51%
50	Optimal	71.75%
1	Fastest	101.53%
10	Fastest	100.38%
25	Fastest	95.73%
50	Fastest	99.54%
1	NoCompression	100.07%
10	NoCompression	100.78%
25	NoCompression	97.40%
50	NoCompression	99.26%

@ianhays @stephentoub @joshfree

stephentoub · 2016-01-25T19:12:26Z

Thanks, @bjjones. Does this / how does this affect compression levels / output file sizes?

bjjones · 2016-01-25T19:33:23Z

For the Canterbury Corpus at CompressionLevel Optimal using deflateStream, I saw:

Library	Initial Size	Compressed Size	Ratio (Compressed/Initial)
Zlib-Adler	2670KB	713KB	.26704
Zlib-Intel	2670KB	733KB	.27453

Also noted in the Zlib-Intel Whitepaper, there are very small increases in compression ratio: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/zlib-compression-whitepaper-copy.pdf

stephentoub · 2016-01-25T19:36:16Z

For the Canterbury Corpus at CompressionLevel Optimal using deflateStream

Thanks for the data. What about at Fastest?

bjjones · 2016-01-25T19:44:50Z

In the same scenario, I saw a slightly improved compression ratio using CompressionLevel Fastest.

Library	Initial Size	Compressed Size	Ratio (Compressed/Initial)
Zlib-Adler	2670KB	842KB	.31535
Zlib-Intel	2670KB	812KB	.30412

ianhays · 2016-01-25T19:45:30Z

How many of the zlib-intel files are different than their zlib-adler equivalents? For those that are the same, could we just modify the CMakeLists for zlib-intel to reference the ones in zlib-adler? So we don't have the duplication of identical files and it's also easier to see what zlib-intel changes.

It would mess up the MakeFile build, but we don't use that anyways.

ianhays · 2016-01-25T19:45:52Z

src/Native/Windows/build-native.cmd

Shouldn't this be configured based on the type of build being done?

It is. This is the default setting. It is set on lines 19/20

I see. Why change the default? Don't we normally default to debug unless release is explicitly chosen?

ianhays · 2016-01-25T20:15:18Z

I ran the perf&unit tests again on the zlibs produced by the cmake build and replicated your results.

Looks good other than a few minor comments about duplication :)

ianhays · 2016-01-25T22:09:02Z

@mellinoe OSX_Release failure https://github.com/dotnet/corefx/issues/4833 presumably
13:30:18 System.Numerics.Tests.Matrix4x4Tests.Matrix4x4CreateFromYawPitchRollTest2 [FAIL]
13:30:18 Yaw:155 Pitch:-20 Roll:50
13:30:18 Expected: True
13:30:18 Actual: False
13:30:18 Stack Trace:
13:30:18 at System.Numerics.Tests.Matrix4x4Tests.Matrix4x4CreateFromYawPitchRollTest2()

On x86/x64 platforms, use Zlib-Intel for native compression instead of Zlib-Adler. Performance tests show a +30% improvement when compressing files by using SSE4.2 features. ARM platforms still use Zlib-Adler for clrcompression.dll Fix #3986

bjjones · 2016-01-27T21:32:07Z

Unnecessary/duplicate files have been removed and I added a README.txt to the Zlib-Intel folder.

ianhays · 2016-01-27T22:28:59Z

LGTM. @stephentoub ?

stephentoub · 2016-01-29T00:55:16Z

src/Native/Windows/clrcompression/CMakeLists.txt

I don't know what other potential future values __BuildArch could/should be, but should this just be an else?

stephentoub · 2016-01-29T01:32:49Z

A few nits, but generally looks good to me.

stephentoub · 2016-01-30T01:58:34Z

Thanks, @bjjones.

Build clrcompression.dll with optimized Zlib-Intel

Build clrcompression.dll with optimized Zlib-Intel Commit migrated from dotnet/corefx@1295b0a

dnfclas added the cla-already-signed label Jan 25, 2016

ianhays reviewed Jan 25, 2016
View reviewed changes

stephentoub reviewed Jan 29, 2016
View reviewed changes

stephentoub added a commit that referenced this pull request Jan 30, 2016

Merge pull request #5674 from bjjones/zlib-intel

1295b0a

Build clrcompression.dll with optimized Zlib-Intel

stephentoub merged commit 1295b0a into dotnet:master Jan 30, 2016

bjjones mentioned this pull request Mar 14, 2016

Add Zlib-Intel Opimizations to Linux System.IO.Compression.Native.so #6876

Closed

stephentoub added the netfx-port-consider label Apr 13, 2016

karelz modified the milestone: 1.0.0-rtm Dec 3, 2016

jkotas mentioned this pull request Mar 13, 2018

Upgrade our ZLib version for Windows to 1.2.11 #28014

Closed

jherby2k mentioned this pull request Sep 14, 2018

Different PNG results under Windows / .net core and Mac, Linux, .net47 SixLabors/ImageSharp#702

Closed

vkvenkat mentioned this pull request Oct 10, 2018

Update ZLib-Intel to v1.2.11 #32732

Merged

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Merge pull request dotnet/corefx#5674 from bjjones/zlib-intel

212ca25

Build clrcompression.dll with optimized Zlib-Intel Commit migrated from dotnet/corefx@1295b0a

Build clrcompression.dll with optimized Zlib-Intel #5674

Build clrcompression.dll with optimized Zlib-Intel #5674

Uh oh!

Conversation

bjjones commented Jan 25, 2016

Uh oh!

stephentoub commented Jan 25, 2016

Uh oh!

bjjones commented Jan 25, 2016

Uh oh!

stephentoub commented Jan 25, 2016

Uh oh!

bjjones commented Jan 25, 2016

Uh oh!

ianhays commented Jan 25, 2016

Uh oh!

ianhays Jan 25, 2016

Choose a reason for hiding this comment

Uh oh!

stephentoub Jan 29, 2016

Choose a reason for hiding this comment

Uh oh!

ianhays Jan 29, 2016

Choose a reason for hiding this comment

Uh oh!

stephentoub Jan 29, 2016

Choose a reason for hiding this comment

Uh oh!

ianhays commented Jan 25, 2016

Uh oh!

ianhays commented Jan 25, 2016

Uh oh!

bjjones commented Jan 27, 2016

Uh oh!

ianhays commented Jan 27, 2016

Uh oh!

stephentoub Jan 29, 2016

Choose a reason for hiding this comment

Uh oh!

stephentoub commented Jan 29, 2016

Uh oh!

stephentoub commented Jan 30, 2016

Uh oh!

Uh oh!