-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
bpo-38256: Fix binascii.crc32() when inputs are 4+GiB #32000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixed binascii.crc32 when USE_ZLIB_CRC32 was defined at configure time. It now computes the correct value on inputs larger than 0xffff_ffff in length. It also releases the GIL around large computation as zlib.crc32 already did.
This is a draft because I'm seeing sporadic crashing behavior when I use a non-USE_ZLIB_CRC32 build that I need to understand first. |
It is 2-3x faster than our generic fallback C implementation.
Other functions in Or discuss "the problem of GIL in |
crc32 is a hash function, hash functions are stand alone computational time sinks. zlib's crc32 does this as does everything in hashlib: https://github.com/python/cpython/blob/main/Modules/zlibmodule.c#L1436 The ultimate end state for binascii and zlib crc32 is to be a single identical piece of code somewhere. This is another step in that direction as the code in the two is now identical. |
@python/windows-team if you could confirm my PCbuild edit to add a preprocessor define. It appears to work on the CI windows builds and looks correct based on other things I see in the file. Windows builds always require zlib from what I can tell, so having binascii.c always use zlib.h there for a 2-3x faster crc32 implementation makes sense. Most POSIX systems have zlib so it happens by default thanks to configure on those platforms. |
…ythonGH-32013) Inputs >= 4GiB to `binascii.crc32(...)` when compiled to use the zlib crc32 implementation (the norm on POSIX) no longer return the wrong result. (cherry picked from commit 4c989e1)
I confirm MSVC build's It would be nice if Linux/macOS builds could use it as well. if test "x$have_zlib" = xyes; then :
BINASCII_CFLAGS="-DUSE_ZLIB_CRC32 $ZLIB_CFLAGS"
BINASCII_LIBS="$ZLIB_LIBS"
fi |
It does work on Linux and BSD if the zlib-devel or zlib1g-dev or whatever similar package is installed when you do your own builds. Note that Linux distro packagers may do things that alter how their configure and build works. I'd prefer to move zlib to "required" status instead of optional. It'd simplify and shrink a pile of code... Testing to see if that is possible: https://bugs.python.org/issue47090 |
Maybe some thin systems do not have zlib configured, such as some embedded systems. |
…-32013) (pythonGH-32015) Inputs >= 4GiB to `binascii.crc32(...)` when compiled to use the zlib crc32 implementation (the norm on POSIX) no longer return the wrong result. (cherry picked from commit 4c989e1)
When compiled using USE_ZLIB_CRC32 (configure sets this when zlib.h is present on POSIX systems),
binascii.crc32(...)
failed to compute the correct value when the input data was >= 4GiB. Because the zlib crc32 API is limited to a 32-bit length.This lines it up with the
zlib.crc32(...)
implementation that doesn't have that flaw.Performance: It also adopts the same GIL releasing for larger inputs logic that zlib.crc32 has, and causes the Windows build to always use zlib's crc32 instead of our own slower C code as zlib is a required build dependency on Windows. If we're ever willing to declare zlib as a build requirement on POSIX, we could get rid of the C crc32 code entirely.
https://bugs.python.org/issue38256