Thanks to visit codestin.com
Credit goes to github.com

Skip to content

bpo-38256: Fix binascii.crc32() when inputs are 4+GiB #32000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 20, 2022

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Mar 19, 2022

When compiled using USE_ZLIB_CRC32 (configure sets this when zlib.h is present on POSIX systems), binascii.crc32(...) failed to compute the correct value when the input data was >= 4GiB. Because the zlib crc32 API is limited to a 32-bit length.

This lines it up with the zlib.crc32(...) implementation that doesn't have that flaw.

Performance: It also adopts the same GIL releasing for larger inputs logic that zlib.crc32 has, and causes the Windows build to always use zlib's crc32 instead of our own slower C code as zlib is a required build dependency on Windows. If we're ever willing to declare zlib as a build requirement on POSIX, we could get rid of the C crc32 code entirely.

https://bugs.python.org/issue38256

gpshead added 2 commits March 19, 2022 15:54
Fixed binascii.crc32 when USE_ZLIB_CRC32 was defined at configure
time. It now computes the correct value on inputs larger than
0xffff_ffff in length. It also releases the GIL around large
computation as zlib.crc32 already did.
@gpshead gpshead added type-bug An unexpected behavior, bug, or error needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes labels Mar 19, 2022
@gpshead gpshead added 🔨 test-with-buildbots Test PR w/ buildbots; report in status section and removed needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes labels Mar 20, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @gpshead for commit d200d19 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 20, 2022
@gpshead gpshead marked this pull request as draft March 20, 2022 00:38
@gpshead
Copy link
Member Author

gpshead commented Mar 20, 2022

This is a draft because I'm seeing sporadic crashing behavior when I use a non-USE_ZLIB_CRC32 build that I need to understand first.

@ghost
Copy link

ghost commented Mar 20, 2022

Other functions in binascii.c do not release GIL, I suggest not releasing GIL here as well.
Then no need for if (data->len > 1024*5) condition check.

Or discuss "the problem of GIL in binascii.c" in python-dev.

@gpshead
Copy link
Member Author

gpshead commented Mar 20, 2022

Other functions in binascii.c do not release GIL, I suggest not releasing GIL here as well. Then no need for if (data->len > 1024*5) condition check.

Or discuss "the problem of GIL in binascii.c" in python-dev.

crc32 is a hash function, hash functions are stand alone computational time sinks. zlib's crc32 does this as does everything in hashlib:

https://github.com/python/cpython/blob/main/Modules/zlibmodule.c#L1436
https://github.com/python/cpython/blob/main/Modules/_hashopenssl.c#L578

The ultimate end state for binascii and zlib crc32 is to be a single identical piece of code somewhere. This is another step in that direction as the code in the two is now identical.

@gpshead gpshead marked this pull request as ready for review March 20, 2022 06:42
@gpshead gpshead requested a review from a team as a code owner March 20, 2022 06:42
@gpshead gpshead requested a review from tiran March 20, 2022 06:43
@gpshead
Copy link
Member Author

gpshead commented Mar 20, 2022

@python/windows-team if you could confirm my PCbuild edit to add a preprocessor define. It appears to work on the CI windows builds and looks correct based on other things I see in the file.

Windows builds always require zlib from what I can tell, so having binascii.c always use zlib.h there for a 2-3x faster crc32 implementation makes sense.

Most POSIX systems have zlib so it happens by default thanks to configure on those platforms.

@gpshead gpshead added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 20, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @gpshead for commit 65222b2 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 20, 2022
@gpshead gpshead added the performance Performance or resource usage label Mar 20, 2022
@gpshead gpshead merged commit 9d1c4d6 into python:main Mar 20, 2022
@gpshead gpshead deleted the binascii-zlib-crc32-64bit branch March 20, 2022 19:28
gpshead added a commit that referenced this pull request Mar 20, 2022
Inputs >= 4GiB to `binascii.crc32(...)` when compiled to use the zlib
crc32 implementation (the norm on POSIX) no longer return the wrong
result.
gpshead added a commit to gpshead/cpython that referenced this pull request Mar 20, 2022
…ythonGH-32013)

Inputs >= 4GiB to `binascii.crc32(...)` when compiled to use the zlib
crc32 implementation (the norm on POSIX) no longer return the wrong
result.

(cherry picked from commit 4c989e1)
gpshead added a commit that referenced this pull request Mar 21, 2022
…32015)

Inputs >= 4GiB to `binascii.crc32(...)` when compiled to use the zlib
crc32 implementation (the norm on POSIX) no longer return the wrong
result.

(cherry picked from commit 4c989e1)
@ghost
Copy link

ghost commented Mar 22, 2022

I confirm MSVC build's binascii.crc32() is faster.
Computing 5 GiB data, this commit speedups from 8.34 sec to 3.23 sec.

It would be nice if Linux/macOS builds could use it as well.
It's supposed, but don't know why it doesn't work. In configure file:

if test "x$have_zlib" = xyes; then :

  BINASCII_CFLAGS="-DUSE_ZLIB_CRC32 $ZLIB_CFLAGS"
  BINASCII_LIBS="$ZLIB_LIBS"

fi

https://github.com/python/cpython/blob/v3.11.0a6/configure

@gpshead
Copy link
Member Author

gpshead commented Mar 22, 2022

It does work on Linux and BSD if the zlib-devel or zlib1g-dev or whatever similar package is installed when you do your own builds. Note that Linux distro packagers may do things that alter how their configure and build works.

I'd prefer to move zlib to "required" status instead of optional. It'd simplify and shrink a pile of code... Testing to see if that is possible: https://bugs.python.org/issue47090

@ghost
Copy link

ghost commented Mar 22, 2022

I'd prefer to move zlib to "required" status instead of optional.

Maybe some thin systems do not have zlib configured, such as some embedded systems.
Is it possible to find a fast crc32 implementation to replace the binascii.crc32() implementation?

hello-adam pushed a commit to hello-adam/cpython that referenced this pull request Jun 2, 2022
…-32013) (pythonGH-32015)

Inputs >= 4GiB to `binascii.crc32(...)` when compiled to use the zlib
crc32 implementation (the norm on POSIX) no longer return the wrong
result.

(cherry picked from commit 4c989e1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants