Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Rogdham
Copy link
Contributor

@Rogdham Rogdham commented Aug 3, 2025

Thanks to PEP-784, Zstandard will be included in Python starting from version 3.14, and also in the tarfile module.

So for Python 3.14+, we don't need an external lib. For older version of Python, I'm using backports.zstd.

This also allows to remove the Python version check against 3.12+: we can use the filter parameter in all cases (implementation of tarfile module in backports.zstd comes from CPython 3.14 codebase).

As a result, this allowed for a small refactor, that improves clarity and reduces redundancy.


I'm expecting the typing tests with mypy to fail if checked against version 3.14+ of Python, because the proper type hints for the tarfile module (to allow r:zst mode) are not in the latest released version of mypy yet (they are in typeshed and on mypy master branch, so probably in next release). Edit: works starting from mypy 1.18.1

Also, backports.zstd does not currently support PyPy (but should do before October), so if you do want to work with PyPy I suggest to wait a little bit before merging the PR. Edit: now supported starting from backports.zstd version 0.5.0

Note that I don't know the hatch codebase, but did my best to fit in. Feel free to suggest any changes!

Full disclosure: I'm the author and maintainer of backports.zstd, and the maintainer of pyzstd (which code was used as a base for the integration into Python). I also helped with PEP-784 and its integration into CPython.


Fixes #1801
Fixes #2007
Fixes #2077

@ofek
Copy link
Contributor

ofek commented Aug 3, 2025

It will take me some time to review this but thanks so much for your efforts on standard library inclusion! Out of curiosity, have you ran benchmarks compared to the zstandard package? https://pypi.org/project/zstandard/

@Rogdham
Copy link
Contributor Author

Rogdham commented Aug 3, 2025

Out of curiosity, have you ran benchmarks compared to the zstandard package?

Not yet. This is something I would like to do, but having a meaningful benchmark it kind of hard.

So far I chose instead to invest my time in PEP-784, integration into CPython, and backports.zstd.

I believe that even if the implementation in CPython (or in the backport) is slower by a huge margin, it would not change much for most applications. Of course this would depend on the difference and the application.

@ofek
Copy link
Contributor

ofek commented Aug 3, 2025

Please let me know whenever you come up with even a simple benchmark, I'm very interested!

@Rogdham
Copy link
Contributor Author

Rogdham commented Aug 16, 2025

Please let me know whenever you come up with even a simple benchmark

This benchmark is work in progress, but I compared the oneshot compression/decompression of bytes already in memory (in a variable), for different sizes (1kB/1MB/1GB) and different levels (the default of 3 as well as levels with higher compression: 10 and 17). Data used from enwik9.

Each operation was run with timeit, using a value for number big enough so that each operation runs for about 10 seconds.

benchmark2

green means backports.zstd is faster


For the use case of hatch, my guess is that you would decompress archives around 1MB in size, so switching from zstandard to backports.zstd would be around 3% slower when decompressing.

However, unless you are doing it numerous times, it does not change much, as the operation takes less than 2 milliseconds.

Edit: updated the benchmark results after finding an issue in timeit invocation

@ofek
Copy link
Contributor

ofek commented Aug 16, 2025

Thanks! For extra context, the main/only use case for this currently in Hatch is decompressing Python distributions from this project: https://github.com/astral-sh/python-build-standalone

As an example, cpython-3.11.13+20250814-x86_64_v4-unknown-linux-musl-lto-full.tar.zst is 62.5 MB

@Rogdham
Copy link
Contributor Author

Rogdham commented Aug 16, 2025

Using your .tar.zst file as an example, I measured the following timings (running each implementation 50 times):

Implementation Average time per run
zstandard 1.5027s
backports.zstd 1.5099s (0.5% slower)
Benchmark code

import shutil
import tarfile as orig_tarfile
from pathlib import Path
from timeit import timeit

import zstandard
from backports.zstd import tarfile as new_tarfile

archive = "cpython-3.11.13+20250814-x86_64_v4-unknown-linux-musl-lto-full.tar.zst"
directory = Path("/tmp/ramdisk").absolute()  # mounted as tmpfs


def clean():
    shutil.rmtree(directory, ignore_errors=True)
    directory.mkdir()


def run_zstandard():
    with open(archive, "rb") as ifh:
        dctx = zstandard.ZstdDecompressor()
        with (
            dctx.stream_reader(ifh) as reader,
            orig_tarfile.open(mode="r|", fileobj=reader) as tf,
        ):
            tf.extractall(directory, filter="data")


def run_backportszstd():
    with new_tarfile.open(archive, "r:zst") as tf:
        tf.extractall(directory, filter="data")


number = 50

timing = timeit(
    stmt="run()",
    setup="clean()",
    number=number,
    globals={"clean": clean, "run": run_zstandard},
)
print("zstandard", timing / number)

timing = timeit(
    stmt="run()",
    setup="clean()",
    number=number,
    globals={"clean": clean, "run": run_backportszstd},
)
print("backports.zstd", timing / number)

@ofek
Copy link
Contributor

ofek commented Aug 16, 2025

That's impressive, and I'm sure there are optimizations yet to be made. Thank you very much!

I will try to review this soon, I've been very busy. Please know that every review I have yet to do and every email I have yet to respond to is a permanent weight on my psyche 😅

@Rogdham
Copy link
Contributor Author

Rogdham commented Oct 10, 2025

I have rebased to fix the conflicts. I took that opportunity to:

  • Bump backports.zst dependency to >=1.0.0
  • Run the tests again: type checks now pass with a more recent version of mypy

Feel free to ping me if you have any questions!

@cjames23
Copy link
Member

Because we would still require backports for older versions of Python, I ran a benchmark on load time of the modules.

============================================================
RESULTS (in milliseconds)
============================================================

✓ zstandard:
  Min:    0.153 ms
  Max:    364.895 ms
  Avg:    4.171 ms
  Median: 0.274 ms

✓ backports.zstd:
  Min:    0.822 ms
  Max:    426.419 ms
  Avg:    10.156 ms
  Median: 1.849 ms

zstandard is 2.43x faster on average

For a CLI like hatch we are very conscious of performance and I am curious what is adding 6ms to the module load time on average. I dont think this is a blocker but something I want to keep in mind.

Code for generating benchmarks

import time
import sys


def benchmark_import(module_name: str, iterations: int = 100) -> dict:
    """Benchmark import time for a module."""
    times = []

    for _ in range(iterations):
        # Clear module from sys.modules to force re-import
        if module_name in sys.modules:
            del sys.modules[module_name]

        start = time.perf_counter()
        try:
            __import__(module_name)
            end = time.perf_counter()
            times.append((end - start) * 1000)  # Convert to milliseconds
        except ImportError as e:
            return {"error": str(e)}

    return {
        "min": min(times),
        "max": max(times),
        "avg": sum(times) / len(times),
        "median": sorted(times)[len(times) // 2],
    }


def main():
    print("Benchmarking zstandard vs backports.zstd import times\n")
    print(f"Running {100} iterations for each module...\n")

    # Benchmark zstandard
    print("Testing 'zstandard'...")
    zstd_results = benchmark_import("zstandard")

    # Benchmark backports.zstd
    print("Testing 'backports.zstd'...")
    backports_results = benchmark_import("backports.zstd")

    # Print results
    print("\n" + "=" * 60)
    print("RESULTS (in milliseconds)")
    print("=" * 60)

    if "error" in zstd_results:
        print(f"\n❌ zstandard: {zstd_results['error']}")
    else:
        print(f"\n✓ zstandard:")
        print(f"  Min:    {zstd_results['min']:.3f} ms")
        print(f"  Max:    {zstd_results['max']:.3f} ms")
        print(f"  Avg:    {zstd_results['avg']:.3f} ms")
        print(f"  Median: {zstd_results['median']:.3f} ms")

    if "error" in backports_results:
        print(f"\n❌ backports.zstd: {backports_results['error']}")
    else:
        print(f"\n✓ backports.zstd:")
        print(f"  Min:    {backports_results['min']:.3f} ms")
        print(f"  Max:    {backports_results['max']:.3f} ms")
        print(f"  Avg:    {backports_results['avg']:.3f} ms")
        print(f"  Median: {backports_results['median']:.3f} ms")

    # Compare if both succeeded
    if "error" not in zstd_results and "error" not in backports_results:
        faster = "zstandard" if zstd_results["avg"] < backports_results["avg"] else "backports.zstd"
        ratio = max(zstd_results["avg"], backports_results["avg"]) / min(zstd_results["avg"], backports_results["avg"])
        print(f"\n {faster} is {ratio:.2f}x faster on average")

    print("=" * 60)


if __name__ == "__main__":
    main()

@Rogdham
Copy link
Contributor Author

Rogdham commented Nov 19, 2025

Hello @cjames23 and thank you for your interest on the topic!

I tried your script, and although it shows a ×2 factor when I run it on my machine, the figures are not the same at all. On average it adds less than 0.4ms. For reference, my Linux machine is almost 10 years old now.

I tried running the script several times with different CPython versions, but they all have very similar numbers.

Results

Benchmarking zstandard vs backports.zstd import times

Running 100 iterations for each module...

Testing 'zstandard'...
Testing 'backports.zstd'...

============================================================
RESULTS (in milliseconds)
============================================================

✓ zstandard:
  Min:    0.175 ms
  Max:    16.497 ms
  Avg:    0.373 ms
  Median: 0.189 ms

✓ backports.zstd:
  Min:    0.564 ms
  Max:    6.517 ms
  Avg:    0.723 ms
  Median: 0.602 ms

 zstandard is 1.94x faster on average
============================================================

Even for the 6ms difference you have, I am wondering which usecases you have in mind that would be impacted? I don't want to downsize it, but from an external viewer it seems to be a negligible amount of time.

@cjames23
Copy link
Member

@Rogdham This was mostly something Ofek and I needed to be aware of and make a decision on. It wont block us merging this PR and it might seem odd to care about 6ms but these load times and other startup costs eventually start to add up.

@ngoldbaum
Copy link
Contributor

Perhaps running tuna's import profiler against backports.zstd would indicate that there's an import you could make lazy and optimize this a teeny bit.

Also is this only for the backport? How does the standard library zstd in 3.14 compare?

@Rogdham
Copy link
Contributor Author

Rogdham commented Nov 20, 2025

Also is this only for the backport? How does the standard library zstd in 3.14 compare?

Running the script gives similar results (note: I'm not on the same machine as yesterday).

Details

Benchmarking zstandard vs compression.zstd import times

Running 100 iterations for each module...

Testing 'zstandard'...
Testing 'compression.zstd'...

============================================================
RESULTS (in milliseconds)
============================================================

✓ zstandard:
  Min:    0.061 ms
  Max:    5.086 ms
  Avg:    0.119 ms
  Median: 0.066 ms

✓ compression.zstd:
  Min:    0.295 ms
  Max:    1.073 ms
  Avg:    0.344 ms
  Median: 0.318 ms

 zstandard is 2.88x faster on average
============================================================

running tuna's import profiler

I noticed that results vary a lot from one run to the other. So take the following with a grain of salt!

Here is for backports.zstd on 3.13. The part (1) is needed because the library is part of the backports namespace. The part (2) should be relatively easy to lazy load.

image

Here it is for compression.zstd on 3.14:

image

@ngoldbaum
Copy link
Contributor

I remembered this morning: the zstandard import isn't at top-level:

import zstandard

Isn't considering the impact on hatch's import time a little unfair? Or are you worried about the extra cost on the first call to that code path?

Copy link
Contributor

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for working on this!

@ofek
Copy link
Contributor

ofek commented Nov 20, 2025

Measuring import timings from within Python doesn't yield results that are reflective of the user's experience. It's best to time how long it takes to import the always-loaded site module and then subtract that from what you're actually measuring. Here's a reproducible example:

❯ docker run --rm -it buildpack-deps bash
root@4131ef92d33c:/# curl -Ls https://github.com/astral-sh/uv/releases/latest/download/uv-x86_64-unknown-linux-musl.tar.gz https://github.com/sharkdp/hyperfine/releases/download/v1.20.0/hyperfine-v1.20.0-x86_64-unknown-linux-musl.tar.gz | tar -xziC /usr/local/bin --strip=1 --wildcards '*/uv' '*/hyperfine'
root@4131ef92d33c:/# uv venv -qp 3.13 test && source test/bin/activate
(test) root@4131ef92d33c:/# uv pip install -q backports.zstd zstandard
(test) root@4131ef92d33c:/# hyperfine --shell none -m 100 --warmup 10 "python -c \"import site\""
Benchmark 1: python -c "import site"
  Time (mean ± σ):      13.3 ms ±   0.7 ms    [User: 9.2 ms, System: 3.7 ms]
  Range (min … max):    12.5 ms …  17.8 ms    190 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

(test) root@4131ef92d33c:/# hyperfine --shell none -m 100 --warmup 10 "python -c \"import backports.zstd\""
Benchmark 1: python -c "import backports.zstd"
  Time (mean ± σ):      30.1 ms ±   1.0 ms    [User: 22.8 ms, System: 6.9 ms]
  Range (min … max):    28.7 ms …  34.3 ms    100 runs

(test) root@4131ef92d33c:/# hyperfine --shell none -m 100 --warmup 10 "python -c \"import zstandard\""
Benchmark 1: python -c "import zstandard"
  Time (mean ± σ):      21.7 ms ±   0.6 ms    [User: 16.5 ms, System: 4.8 ms]
  Range (min … max):    20.5 ms …  23.9 ms    137 runs

Regarding these comments:

I am wondering which usecases you have in mind that would be impacted? I don't want to downsize it, but from an external viewer it seems to be a negligible amount of time.
...
Isn't considering the impact on hatch's import time a little unfair? Or are you worried about the extra cost on the first call to that code path?

As Cary said, this won't be a blocker especially since there is no way to avoid it on 3.14 but these little timings that one would perceive to be small are incredibly important to lazily execute at the moment of necessity.

Users in 2025 expect CLIs to be approximately as fast as the ones they use most frequently and such tools are primarily written in compiled languages:

  • Ubiquitous: git, grep/rg + other UNIX utilities, curl, jq, docker/podman, etc.
  • Language-specific: cargo, go, esbuild, deno, bun, swc, etc.
  • Python: uv, ruff, ty, pyrefly, py-spy, etc.
  • Niche, fast-growing: mise, typst, buf, etc.

I want users of Hatch and to never feel like the experience is hindered by it's implementation. As an easy illustration of what happens when one does not do this, check the time it takes to see the version of any of the CLIs (all written in Python) for the 3 major cloud providers:

hyperfine -m 10 --warmup 1 "(aws|az|gcloud) --version"

@Rogdham
Copy link
Contributor Author

Rogdham commented Nov 21, 2025

Thank you for your feedback and reference for better timing comparisons.

Here are the results on my machine:

# hyperfine --shell none -m 100 --warmup 10 "python -c \"import site\""
Benchmark 1: python -c "import site"
  Time (mean ± σ):      12.8 ms ±   0.6 ms    [User: 9.9 ms, System: 2.6 ms]
  Range (min … max):    12.0 ms …  16.9 ms    240 runs

# hyperfine --shell none -m 100 --warmup 10 "python -c \"import zstandard\""
Benchmark 1: python -c "import zstandard"
  Time (mean ± σ):      21.6 ms ±   0.6 ms    [User: 17.4 ms, System: 3.8 ms]
  Range (min … max):    20.6 ms …  25.0 ms    138 runs

# hyperfine --shell none -m 100 --warmup 10 "python -c \"import backports.zstd\"" 
Benchmark 1: python -c "import backports.zstd"
  Time (mean ± σ):      29.8 ms ±   0.5 ms    [User: 24.6 ms, System: 4.8 ms]
  Range (min … max):    28.9 ms …  34.0 ms    101 runs

I have made the optimization I mentioned earlier at Rogdham/backports.zstd#56 and here are the results:

# hyperfine --shell none -m 100 --warmup 10 "python -c \"import backports.zstd\"" 
Benchmark 1: python -c "import backports.zstd"
  Time (mean ± σ):      24.4 ms ±   0.4 ms    [User: 20.3 ms, System: 3.8 ms]
  Range (min … max):    23.7 ms …  27.3 ms    120 runs

It closes 65% of the current gap compared to zstandard.

@Rogdham
Copy link
Contributor Author

Rogdham commented Nov 21, 2025

I remembered this morning: the zstandard import isn't at top-level:

Isn't considering the impact on hatch's import time a little unfair? Or are you worried about the extra cost on the first call to that code path?

This is a good point, but my implementation loads backports.zstd's tarfile module in other code paths (for Python <3.14) that the one using Zstd. This is to make the code easier to read and to maintain (no special case for Python <3.12 nor for Zstd).

Previously, the zstandard module was imported only in the usecase of decompressing a Python distribution from PBS, which takes 1.5s. So that did not matter.


This makes me think that we should not be comparing the import time of zstandard vs backports.zstd, but tarfile+zstandard compared to backports.zstd.tarfile.

Here is the comparison (16% slower for backports.zstd.tarfile):

a1c0eb434bb1:~# hyperfine --shell none -m 100 --warmup 10 "python -c \"import tarfile; import zstandard\""
Benchmark 1: python -c "import tarfile; import zstandard"
  Time (mean ± σ):      26.5 ms ±   0.5 ms    [User: 21.9 ms, System: 4.3 ms]
  Range (min … max):    25.7 ms …  29.7 ms    113 runs

# hyperfine --shell none -m 100 --warmup 10 "python -c \"from backports.zstd import tarfile\""
Benchmark 1: python -c "from backports.zstd import tarfile"
  Time (mean ± σ):      31.6 ms ±   0.7 ms    [User: 26.3 ms, System: 5.0 ms]
  Range (min … max):    30.6 ms …  36.9 ms    100 runs


# with the optimization from https://github.com/Rogdham/backports.zstd/pull/56
# hyperfine --shell none -m 100 --warmup 10 "python -c \"from backports.zstd import tarfile\""
Benchmark 1: python -c "from backports.zstd import tarfile"
  Time (mean ± σ):      30.7 ms ±   0.9 ms    [User: 25.3 ms, System: 5.0 ms]
  Range (min … max):    29.9 ms …  37.0 ms    100 runs

@cjames23 cjames23 merged commit 61e342f into pypa:master Nov 23, 2025
86 of 90 checks passed
@Rogdham
Copy link
Contributor Author

Rogdham commented Nov 24, 2025

Thank you for the reviews everyone and the merge! ❤️

@Rogdham Rogdham deleted the zstd-pep784 branch November 24, 2025 08:17
@cjames23
Copy link
Member

Thank you for taking the time to contribute!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants