-
-
Notifications
You must be signed in to change notification settings - Fork 358
Use zstandard implementation from stdlib (PEP-784) #2034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It will take me some time to review this but thanks so much for your efforts on standard library inclusion! Out of curiosity, have you ran benchmarks compared to the |
Not yet. This is something I would like to do, but having a meaningful benchmark it kind of hard. So far I chose instead to invest my time in PEP-784, integration into CPython, and backports.zstd. I believe that even if the implementation in CPython (or in the backport) is slower by a huge margin, it would not change much for most applications. Of course this would depend on the difference and the application. |
|
Please let me know whenever you come up with even a simple benchmark, I'm very interested! |
This benchmark is work in progress, but I compared the oneshot compression/decompression of bytes already in memory (in a variable), for different sizes (1kB/1MB/1GB) and different levels (the default of 3 as well as levels with higher compression: 10 and 17). Data used from Each operation was run with
green means For the use case of However, unless you are doing it numerous times, it does not change much, as the operation takes less than 2 milliseconds. Edit: updated the benchmark results after finding an issue in timeit invocation |
|
Thanks! For extra context, the main/only use case for this currently in Hatch is decompressing Python distributions from this project: https://github.com/astral-sh/python-build-standalone As an example, cpython-3.11.13+20250814-x86_64_v4-unknown-linux-musl-lto-full.tar.zst is 62.5 MB |
|
Using your
Benchmark code
import shutil
import tarfile as orig_tarfile
from pathlib import Path
from timeit import timeit
import zstandard
from backports.zstd import tarfile as new_tarfile
archive = "cpython-3.11.13+20250814-x86_64_v4-unknown-linux-musl-lto-full.tar.zst"
directory = Path("/tmp/ramdisk").absolute() # mounted as tmpfs
def clean():
shutil.rmtree(directory, ignore_errors=True)
directory.mkdir()
def run_zstandard():
with open(archive, "rb") as ifh:
dctx = zstandard.ZstdDecompressor()
with (
dctx.stream_reader(ifh) as reader,
orig_tarfile.open(mode="r|", fileobj=reader) as tf,
):
tf.extractall(directory, filter="data")
def run_backportszstd():
with new_tarfile.open(archive, "r:zst") as tf:
tf.extractall(directory, filter="data")
number = 50
timing = timeit(
stmt="run()",
setup="clean()",
number=number,
globals={"clean": clean, "run": run_zstandard},
)
print("zstandard", timing / number)
timing = timeit(
stmt="run()",
setup="clean()",
number=number,
globals={"clean": clean, "run": run_backportszstd},
)
print("backports.zstd", timing / number) |
|
That's impressive, and I'm sure there are optimizations yet to be made. Thank you very much! I will try to review this soon, I've been very busy. Please know that every review I have yet to do and every email I have yet to respond to is a permanent weight on my psyche 😅 |
|
I have rebased to fix the conflicts. I took that opportunity to:
Feel free to ping me if you have any questions! |
|
Because we would still require backports for older versions of Python, I ran a benchmark on load time of the modules. For a CLI like hatch we are very conscious of performance and I am curious what is adding 6ms to the module load time on average. I dont think this is a blocker but something I want to keep in mind. Code for generating benchmarks |
|
Hello @cjames23 and thank you for your interest on the topic! I tried your script, and although it shows a ×2 factor when I run it on my machine, the figures are not the same at all. On average it adds less than 0.4ms. For reference, my Linux machine is almost 10 years old now. I tried running the script several times with different CPython versions, but they all have very similar numbers. Results
Even for the 6ms difference you have, I am wondering which usecases you have in mind that would be impacted? I don't want to downsize it, but from an external viewer it seems to be a negligible amount of time. |
|
@Rogdham This was mostly something Ofek and I needed to be aware of and make a decision on. It wont block us merging this PR and it might seem odd to care about 6ms but these load times and other startup costs eventually start to add up. |
|
Perhaps running tuna's import profiler against Also is this only for the backport? How does the standard library zstd in 3.14 compare? |
|
I remembered this morning: the zstandard import isn't at top-level: hatch/src/hatch/python/resolve.py Line 82 in 8695db4
Isn't considering the impact on hatch's import time a little unfair? Or are you worried about the extra cost on the first call to that code path? |
ofek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for working on this!
|
Measuring import timings from within Python doesn't yield results that are reflective of the user's experience. It's best to time how long it takes to import the always-loaded Regarding these comments:
As Cary said, this won't be a blocker especially since there is no way to avoid it on 3.14 but these little timings that one would perceive to be small are incredibly important to lazily execute at the moment of necessity. Users in 2025 expect CLIs to be approximately as fast as the ones they use most frequently and such tools are primarily written in compiled languages:
I want users of Hatch and to never feel like the experience is hindered by it's implementation. As an easy illustration of what happens when one does not do this, check the time it takes to see the version of any of the CLIs (all written in Python) for the 3 major cloud providers: |
|
Thank you for your feedback and reference for better timing comparisons. Here are the results on my machine: I have made the optimization I mentioned earlier at Rogdham/backports.zstd#56 and here are the results: It closes 65% of the current gap compared to |
This is a good point, but my implementation loads Previously, the This makes me think that we should not be comparing the import time of Here is the comparison (16% slower for |
|
Thank you for the reviews everyone and the merge! ❤️ |
|
Thank you for taking the time to contribute! |



Thanks to PEP-784, Zstandard will be included in Python starting from version 3.14, and also in the
tarfilemodule.So for Python 3.14+, we don't need an external lib. For older version of Python, I'm using
backports.zstd.This also allows to remove the Python version check against 3.12+: we can use the
filterparameter in all cases (implementation oftarfilemodule inbackports.zstdcomes from CPython 3.14 codebase).As a result, this allowed for a small refactor, that improves clarity and reduces redundancy.
I'm expecting the typing tests withEdit: works starting frommypyto fail if checked against version 3.14+ of Python, because the proper type hints for thetarfilemodule (to allowr:zstmode) are not in the latest released version ofmypyyet (they are intypeshedand onmypymaster branch, so probably in next release).mypy1.18.1Also,Edit: now supported starting frombackports.zstddoes not currently support PyPy (but should do before October), so if you do want to work with PyPy I suggest to wait a little bit before merging the PR.backports.zstdversion 0.5.0Note that I don't know the
hatchcodebase, but did my best to fit in. Feel free to suggest any changes!Full disclosure: I'm the author and maintainer of
backports.zstd, and the maintainer ofpyzstd(which code was used as a base for the integration into Python). I also helped with PEP-784 and its integration into CPython.Fixes #1801
Fixes #2007
Fixes #2077