Use zstandard implementation from stdlib (PEP-784) #2034

Rogdham · 2025-08-03T18:59:06Z

Thanks to PEP-784, Zstandard will be included in Python starting from version 3.14, and also in the tarfile module.

So for Python 3.14+, we don't need an external lib. For older version of Python, I'm using backports.zstd.

This also allows to remove the Python version check against 3.12+: we can use the filter parameter in all cases (implementation of tarfile module in backports.zstd comes from CPython 3.14 codebase).

As a result, this allowed for a small refactor, that improves clarity and reduces redundancy.

I'm expecting the typing tests with mypy to fail if checked against version 3.14+ of Python, because the proper type hints for the tarfile module (to allow r:zst mode) are not in the latest released version of mypy yet (they are in typeshed and on mypy master branch, so probably in next release). Edit: works starting from mypy 1.18.1

~~Also, backports.zstd does not currently support PyPy (but should do before October), so if you do want to work with PyPy I suggest to wait a little bit before merging the PR.~~ Edit: now supported starting from backports.zstd version 0.5.0

Note that I don't know the hatch codebase, but did my best to fit in. Feel free to suggest any changes!

Full disclosure: I'm the author and maintainer of backports.zstd, and the maintainer of pyzstd (which code was used as a base for the integration into Python). I also helped with PEP-784 and its integration into CPython.

Fixes #1801
Fixes #2007
Fixes #2077

ofek · 2025-08-03T19:13:24Z

It will take me some time to review this but thanks so much for your efforts on standard library inclusion! Out of curiosity, have you ran benchmarks compared to the zstandard package? https://pypi.org/project/zstandard/

Rogdham · 2025-08-03T19:21:29Z

Out of curiosity, have you ran benchmarks compared to the zstandard package?

Not yet. This is something I would like to do, but having a meaningful benchmark it kind of hard.

So far I chose instead to invest my time in PEP-784, integration into CPython, and backports.zstd.

I believe that even if the implementation in CPython (or in the backport) is slower by a huge margin, it would not change much for most applications. Of course this would depend on the difference and the application.

ofek · 2025-08-03T22:51:20Z

Please let me know whenever you come up with even a simple benchmark, I'm very interested!

Rogdham · 2025-08-16T11:01:56Z

Please let me know whenever you come up with even a simple benchmark

This benchmark is work in progress, but I compared the oneshot compression/decompression of bytes already in memory (in a variable), for different sizes (1kB/1MB/1GB) and different levels (the default of 3 as well as levels with higher compression: 10 and 17). Data used from enwik9.

Each operation was run with timeit, using a value for number big enough so that each operation runs for about 10 seconds.

green means backports.zstd is faster

For the use case of hatch, my guess is that you would decompress archives around 1MB in size, so switching from zstandard to backports.zstd would be around 3% slower when decompressing.

However, unless you are doing it numerous times, it does not change much, as the operation takes less than 2 milliseconds.

Edit: updated the benchmark results after finding an issue in timeit invocation

ofek · 2025-08-16T13:19:55Z

Thanks! For extra context, the main/only use case for this currently in Hatch is decompressing Python distributions from this project: https://github.com/astral-sh/python-build-standalone

As an example, cpython-3.11.13+20250814-x86_64_v4-unknown-linux-musl-lto-full.tar.zst is 62.5 MB

Rogdham · 2025-08-16T14:30:20Z

Using your .tar.zst file as an example, I measured the following timings (running each implementation 50 times):

Implementation	Average time per run
zstandard	1.5027s
backports.zstd	1.5099s (0.5% slower)

Benchmark code

import shutil
import tarfile as orig_tarfile
from pathlib import Path
from timeit import timeit

import zstandard
from backports.zstd import tarfile as new_tarfile

archive = "cpython-3.11.13+20250814-x86_64_v4-unknown-linux-musl-lto-full.tar.zst"
directory = Path("/tmp/ramdisk").absolute()  # mounted as tmpfs


def clean():
    shutil.rmtree(directory, ignore_errors=True)
    directory.mkdir()


def run_zstandard():
    with open(archive, "rb") as ifh:
        dctx = zstandard.ZstdDecompressor()
        with (
            dctx.stream_reader(ifh) as reader,
            orig_tarfile.open(mode="r|", fileobj=reader) as tf,
        ):
            tf.extractall(directory, filter="data")


def run_backportszstd():
    with new_tarfile.open(archive, "r:zst") as tf:
        tf.extractall(directory, filter="data")


number = 50

timing = timeit(
    stmt="run()",
    setup="clean()",
    number=number,
    globals={"clean": clean, "run": run_zstandard},
)
print("zstandard", timing / number)

timing = timeit(
    stmt="run()",
    setup="clean()",
    number=number,
    globals={"clean": clean, "run": run_backportszstd},
)
print("backports.zstd", timing / number)

ofek · 2025-08-16T14:45:22Z

That's impressive, and I'm sure there are optimizations yet to be made. Thank you very much!

I will try to review this soon, I've been very busy. Please know that every review I have yet to do and every email I have yet to respond to is a permanent weight on my psyche 😅

Rogdham · 2025-10-10T07:45:18Z

I have rebased to fix the conflicts. I took that opportunity to:

Bump backports.zst dependency to >=1.0.0
Run the tests again: type checks now pass with a more recent version of mypy

Feel free to ping me if you have any questions!

cjames23 · 2025-11-19T04:27:42Z

Because we would still require backports for older versions of Python, I ran a benchmark on load time of the modules.

============================================================
RESULTS (in milliseconds)
============================================================

✓ zstandard:
  Min:    0.153 ms
  Max:    364.895 ms
  Avg:    4.171 ms
  Median: 0.274 ms

✓ backports.zstd:
  Min:    0.822 ms
  Max:    426.419 ms
  Avg:    10.156 ms
  Median: 1.849 ms

zstandard is 2.43x faster on average

For a CLI like hatch we are very conscious of performance and I am curious what is adding 6ms to the module load time on average. I dont think this is a blocker but something I want to keep in mind.

Code for generating benchmarks

import time
import sys


def benchmark_import(module_name: str, iterations: int = 100) -> dict:
    """Benchmark import time for a module."""
    times = []

    for _ in range(iterations):
        # Clear module from sys.modules to force re-import
        if module_name in sys.modules:
            del sys.modules[module_name]

        start = time.perf_counter()
        try:
            __import__(module_name)
            end = time.perf_counter()
            times.append((end - start) * 1000)  # Convert to milliseconds
        except ImportError as e:
            return {"error": str(e)}

    return {
        "min": min(times),
        "max": max(times),
        "avg": sum(times) / len(times),
        "median": sorted(times)[len(times) // 2],
    }


def main():
    print("Benchmarking zstandard vs backports.zstd import times\n")
    print(f"Running {100} iterations for each module...\n")

    # Benchmark zstandard
    print("Testing 'zstandard'...")
    zstd_results = benchmark_import("zstandard")

    # Benchmark backports.zstd
    print("Testing 'backports.zstd'...")
    backports_results = benchmark_import("backports.zstd")

    # Print results
    print("\n" + "=" * 60)
    print("RESULTS (in milliseconds)")
    print("=" * 60)

    if "error" in zstd_results:
        print(f"\n❌ zstandard: {zstd_results['error']}")
    else:
        print(f"\n✓ zstandard:")
        print(f"  Min:    {zstd_results['min']:.3f} ms")
        print(f"  Max:    {zstd_results['max']:.3f} ms")
        print(f"  Avg:    {zstd_results['avg']:.3f} ms")
        print(f"  Median: {zstd_results['median']:.3f} ms")

    if "error" in backports_results:
        print(f"\n❌ backports.zstd: {backports_results['error']}")
    else:
        print(f"\n✓ backports.zstd:")
        print(f"  Min:    {backports_results['min']:.3f} ms")
        print(f"  Max:    {backports_results['max']:.3f} ms")
        print(f"  Avg:    {backports_results['avg']:.3f} ms")
        print(f"  Median: {backports_results['median']:.3f} ms")

    # Compare if both succeeded
    if "error" not in zstd_results and "error" not in backports_results:
        faster = "zstandard" if zstd_results["avg"] < backports_results["avg"] else "backports.zstd"
        ratio = max(zstd_results["avg"], backports_results["avg"]) / min(zstd_results["avg"], backports_results["avg"])
        print(f"\n {faster} is {ratio:.2f}x faster on average")

    print("=" * 60)


if __name__ == "__main__":
    main()

Rogdham · 2025-11-19T19:07:03Z

Hello @cjames23 and thank you for your interest on the topic!

I tried your script, and although it shows a ×2 factor when I run it on my machine, the figures are not the same at all. On average it adds less than 0.4ms. For reference, my Linux machine is almost 10 years old now.

I tried running the script several times with different CPython versions, but they all have very similar numbers.

Results

Benchmarking zstandard vs backports.zstd import times

Running 100 iterations for each module...

Testing 'zstandard'...
Testing 'backports.zstd'...

============================================================
RESULTS (in milliseconds)
============================================================

✓ zstandard:
  Min:    0.175 ms
  Max:    16.497 ms
  Avg:    0.373 ms
  Median: 0.189 ms

✓ backports.zstd:
  Min:    0.564 ms
  Max:    6.517 ms
  Avg:    0.723 ms
  Median: 0.602 ms

 zstandard is 1.94x faster on average
============================================================

Even for the 6ms difference you have, I am wondering which usecases you have in mind that would be impacted? I don't want to downsize it, but from an external viewer it seems to be a negligible amount of time.

cjames23 · 2025-11-19T22:46:15Z

@Rogdham This was mostly something Ofek and I needed to be aware of and make a decision on. It wont block us merging this PR and it might seem odd to care about 6ms but these load times and other startup costs eventually start to add up.

ngoldbaum · 2025-11-19T23:05:13Z

Perhaps running tuna's import profiler against backports.zstd would indicate that there's an import you could make lazy and optimize this a teeny bit.

Also is this only for the backport? How does the standard library zstd in 3.14 compare?

Rogdham · 2025-11-20T06:48:37Z

Also is this only for the backport? How does the standard library zstd in 3.14 compare?

Running the script gives similar results (note: I'm not on the same machine as yesterday).

Details

Benchmarking zstandard vs compression.zstd import times

Running 100 iterations for each module...

Testing 'zstandard'...
Testing 'compression.zstd'...

============================================================
RESULTS (in milliseconds)
============================================================

✓ zstandard:
  Min:    0.061 ms
  Max:    5.086 ms
  Avg:    0.119 ms
  Median: 0.066 ms

✓ compression.zstd:
  Min:    0.295 ms
  Max:    1.073 ms
  Avg:    0.344 ms
  Median: 0.318 ms

 zstandard is 2.88x faster on average
============================================================

running tuna's import profiler

I noticed that results vary a lot from one run to the other. So take the following with a grain of salt!

Here is for backports.zstd on 3.13. The part (1) is needed because the library is part of the backports namespace. The part (2) should be relatively easy to lazy load.

Here it is for compression.zstd on 3.14:

ngoldbaum · 2025-11-20T14:54:52Z

I remembered this morning: the zstandard import isn't at top-level:

hatch/src/hatch/python/resolve.py

Line 82 in 8695db4

import zstandard

Isn't considering the impact on hatch's import time a little unfair? Or are you worried about the extra cost on the first call to that code path?

ofek

Thanks again for working on this!

src/hatch/python/resolve.py

ofek · 2025-11-20T19:26:44Z

Measuring import timings from within Python doesn't yield results that are reflective of the user's experience. It's best to time how long it takes to import the always-loaded site module and then subtract that from what you're actually measuring. Here's a reproducible example:

❯ docker run --rm -it buildpack-deps bash
root@4131ef92d33c:/# curl -Ls https://github.com/astral-sh/uv/releases/latest/download/uv-x86_64-unknown-linux-musl.tar.gz https://github.com/sharkdp/hyperfine/releases/download/v1.20.0/hyperfine-v1.20.0-x86_64-unknown-linux-musl.tar.gz | tar -xziC /usr/local/bin --strip=1 --wildcards '*/uv' '*/hyperfine'
root@4131ef92d33c:/# uv venv -qp 3.13 test && source test/bin/activate
(test) root@4131ef92d33c:/# uv pip install -q backports.zstd zstandard
(test) root@4131ef92d33c:/# hyperfine --shell none -m 100 --warmup 10 "python -c \"import site\""
Benchmark 1: python -c "import site"
  Time (mean ± σ):      13.3 ms ±   0.7 ms    [User: 9.2 ms, System: 3.7 ms]
  Range (min … max):    12.5 ms …  17.8 ms    190 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

(test) root@4131ef92d33c:/# hyperfine --shell none -m 100 --warmup 10 "python -c \"import backports.zstd\""
Benchmark 1: python -c "import backports.zstd"
  Time (mean ± σ):      30.1 ms ±   1.0 ms    [User: 22.8 ms, System: 6.9 ms]
  Range (min … max):    28.7 ms …  34.3 ms    100 runs

(test) root@4131ef92d33c:/# hyperfine --shell none -m 100 --warmup 10 "python -c \"import zstandard\""
Benchmark 1: python -c "import zstandard"
  Time (mean ± σ):      21.7 ms ±   0.6 ms    [User: 16.5 ms, System: 4.8 ms]
  Range (min … max):    20.5 ms …  23.9 ms    137 runs

Regarding these comments:

I am wondering which usecases you have in mind that would be impacted? I don't want to downsize it, but from an external viewer it seems to be a negligible amount of time.
...
Isn't considering the impact on hatch's import time a little unfair? Or are you worried about the extra cost on the first call to that code path?

As Cary said, this won't be a blocker especially since there is no way to avoid it on 3.14 but these little timings that one would perceive to be small are incredibly important to lazily execute at the moment of necessity.

Users in 2025 expect CLIs to be approximately as fast as the ones they use most frequently and such tools are primarily written in compiled languages:

Ubiquitous: git, grep/rg + other UNIX utilities, curl, jq, docker/podman, etc.
Language-specific: cargo, go, esbuild, deno, bun, swc, etc.
Python: uv, ruff, ty, pyrefly, py-spy, etc.
Niche, fast-growing: mise, typst, buf, etc.

I want users of Hatch and to never feel like the experience is hindered by it's implementation. As an easy illustration of what happens when one does not do this, check the time it takes to see the version of any of the CLIs (all written in Python) for the 3 major cloud providers:

hyperfine -m 10 --warmup 1 "(aws|az|gcloud) --version"

Rogdham · 2025-11-21T20:06:37Z

Thank you for your feedback and reference for better timing comparisons.

Here are the results on my machine:

# hyperfine --shell none -m 100 --warmup 10 "python -c \"import site\""
Benchmark 1: python -c "import site"
  Time (mean ± σ):      12.8 ms ±   0.6 ms    [User: 9.9 ms, System: 2.6 ms]
  Range (min … max):    12.0 ms …  16.9 ms    240 runs

# hyperfine --shell none -m 100 --warmup 10 "python -c \"import zstandard\""
Benchmark 1: python -c "import zstandard"
  Time (mean ± σ):      21.6 ms ±   0.6 ms    [User: 17.4 ms, System: 3.8 ms]
  Range (min … max):    20.6 ms …  25.0 ms    138 runs

# hyperfine --shell none -m 100 --warmup 10 "python -c \"import backports.zstd\"" 
Benchmark 1: python -c "import backports.zstd"
  Time (mean ± σ):      29.8 ms ±   0.5 ms    [User: 24.6 ms, System: 4.8 ms]
  Range (min … max):    28.9 ms …  34.0 ms    101 runs

I have made the optimization I mentioned earlier at Rogdham/backports.zstd#56 and here are the results:

# hyperfine --shell none -m 100 --warmup 10 "python -c \"import backports.zstd\"" 
Benchmark 1: python -c "import backports.zstd"
  Time (mean ± σ):      24.4 ms ±   0.4 ms    [User: 20.3 ms, System: 3.8 ms]
  Range (min … max):    23.7 ms …  27.3 ms    120 runs

It closes 65% of the current gap compared to zstandard.

Rogdham · 2025-11-21T20:32:43Z

I remembered this morning: the zstandard import isn't at top-level:

Isn't considering the impact on hatch's import time a little unfair? Or are you worried about the extra cost on the first call to that code path?

This is a good point, but my implementation loads backports.zstd's tarfile module in other code paths (for Python <3.14) that the one using Zstd. This is to make the code easier to read and to maintain (no special case for Python <3.12 nor for Zstd).

Previously, the zstandard module was imported only in the usecase of decompressing a Python distribution from PBS, which takes 1.5s. So that did not matter.

This makes me think that we should not be comparing the import time of zstandard vs backports.zstd, but tarfile+zstandard compared to backports.zstd.tarfile.

Here is the comparison (16% slower for backports.zstd.tarfile):

a1c0eb434bb1:~# hyperfine --shell none -m 100 --warmup 10 "python -c \"import tarfile; import zstandard\""
Benchmark 1: python -c "import tarfile; import zstandard"
  Time (mean ± σ):      26.5 ms ±   0.5 ms    [User: 21.9 ms, System: 4.3 ms]
  Range (min … max):    25.7 ms …  29.7 ms    113 runs

# hyperfine --shell none -m 100 --warmup 10 "python -c \"from backports.zstd import tarfile\""
Benchmark 1: python -c "from backports.zstd import tarfile"
  Time (mean ± σ):      31.6 ms ±   0.7 ms    [User: 26.3 ms, System: 5.0 ms]
  Range (min … max):    30.6 ms …  36.9 ms    100 runs


# with the optimization from https://github.com/Rogdham/backports.zstd/pull/56
# hyperfine --shell none -m 100 --warmup 10 "python -c \"from backports.zstd import tarfile\""
Benchmark 1: python -c "from backports.zstd import tarfile"
  Time (mean ± σ):      30.7 ms ±   0.9 ms    [User: 25.3 ms, System: 5.0 ms]
  Range (min … max):    29.9 ms …  37.0 ms    100 runs

Rogdham · 2025-11-24T08:17:33Z

Thank you for the reviews everyone and the merge! ❤️

cjames23 · 2025-11-24T15:46:28Z

Thank you for taking the time to contribute!

Rogdham mentioned this pull request Aug 3, 2025

pip install hatch fails on free-threaded build #1801

Closed

Rogdham mentioned this pull request Aug 3, 2025

hatch not using zstandard for python-3.14.0b3, for free-threading compatibility #2007

Closed

This was referenced Oct 8, 2025

Use compression.zstd/backports.zstd instead of zstandard #2078

Closed

Use compression.zstd on 3.14 and newer #2077

Closed

Rogdham force-pushed the zstd-pep784 branch from 3b1677c to 64819cc Compare October 10, 2025 07:41

Use zstandard implementation from stdlib (PEP-784)

c1ef1ba

Rogdham force-pushed the zstd-pep784 branch from 64819cc to c1ef1ba Compare October 10, 2025 07:46

ngoldbaum mentioned this pull request Oct 14, 2025

Add CI for 3.14 and 3.14t #2082

Closed

cjames23 added 2 commits November 14, 2025 07:23

Merge branch 'master' into zstd-pep784

80417b1

Merge branch 'master' into zstd-pep784

775d32c

ofek requested changes Nov 20, 2025

View reviewed changes

src/hatch/python/resolve.py Outdated Show resolved Hide resolved

src/hatch/python/resolve.py Outdated Show resolved Hide resolved

src/hatch/python/resolve.py Show resolved Hide resolved

Rogdham mentioned this pull request Nov 21, 2025

Lazy-load register_shutil Rogdham/backports.zstd#56

Merged

Apply ofek suggestion

505c5fd

cjames23 approved these changes Nov 23, 2025

View reviewed changes

cjames23 merged commit 61e342f into pypa:master Nov 23, 2025
86 of 90 checks passed

Rogdham deleted the zstd-pep784 branch November 24, 2025 08:17

Uh oh!

Use zstandard implementation from stdlib (PEP-784) #2034

Use zstandard implementation from stdlib (PEP-784) #2034

Uh oh!

Conversation

Rogdham commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ofek commented Aug 3, 2025

Uh oh!

Rogdham commented Aug 3, 2025

Uh oh!

ofek commented Aug 3, 2025

Uh oh!

Rogdham commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ofek commented Aug 16, 2025

Uh oh!

Rogdham commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ofek commented Aug 16, 2025

Uh oh!

Rogdham commented Oct 10, 2025

Uh oh!

cjames23 commented Nov 19, 2025

Uh oh!

Rogdham commented Nov 19, 2025

Uh oh!

cjames23 commented Nov 19, 2025

Uh oh!

ngoldbaum commented Nov 19, 2025

Uh oh!

Rogdham commented Nov 20, 2025

Uh oh!

ngoldbaum commented Nov 20, 2025

Uh oh!

ofek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ofek commented Nov 20, 2025

Uh oh!

Rogdham commented Nov 21, 2025

Uh oh!

Rogdham commented Nov 21, 2025

Uh oh!

Uh oh!

Rogdham commented Nov 24, 2025

Uh oh!

cjames23 commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Rogdham commented Aug 3, 2025 •

edited

Loading

Rogdham commented Aug 16, 2025 •

edited

Loading

Rogdham commented Aug 16, 2025 •

edited

Loading