Thanks to visit codestin.com
Credit goes to github.com

Skip to content

bpo-47009: Let PRECALL_NO_KW_LIST_APPEND do its own POP_TOP #32239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 5, 2022

Conversation

sweeneyde
Copy link
Member

@sweeneyde sweeneyde commented Apr 1, 2022

Most code won't do y = L.append(x) or whatnot, so PRECALL_NO_KW_LIST_APPEND is almost always followed by POP_TOP. We can verify at specialization time.

This saves a Py_INCREF(Py_None), a SET_TOP(Py_None), and POP_TOP's Py_DECREF(POP()); DISPATCH();.

Some microbenchmarks:

from pyperf import Runner, perf_counter

def bench_append(loops, length):
    src = list(map(float, range(length)))
    arr = []
    t0 = perf_counter()

    for i in range(loops):
        arr.clear()
        for x in src:
            arr.append(x)

    return perf_counter() - t0

def bench_append_less_gc(loops, length):
    src = list(map(float, range(length)))
    out = [None] * loops
    t0 = perf_counter()

    for i in range(loops):
        arr = []
        for x in src:
            arr.append(x)
        out[i] = arr

    return perf_counter() - t0

runner = Runner()
for n in [100, 1_000, 10_000, 100_000]:
    runner.bench_time_func(f"append {n}", bench_append, n, inner_loops=n)
    runner.bench_time_func(f"append-less-gc {n}", bench_append_less_gc, n, inner_loops=n)

From GCC, --enable-optimizations, --with-lto:

- append 100000: 14.9 ns +- 0.3 ns -> 13.3 ns +- 0.4 ns: 1.12x faster
- append 10000: 15.1 ns +- 0.3 ns -> 13.6 ns +- 0.5 ns: 1.11x faster
- append-less-gc 100000: 16.4 ns +- 0.5 ns -> 14.9 ns +- 0.4 ns: 1.10x faster
- append 1000: 15.6 ns +- 0.3 ns -> 14.2 ns +- 0.3 ns: 1.09x faster
- append 100: 18.9 ns +- 0.6 ns -> 17.3 ns +- 0.6 ns: 1.09x faster
- append-less-gc 100: 27.4 ns +- 1.1 ns -> 25.2 ns +- 1.2 ns: 1.09x faster
- append-less-gc 10000: 19.2 ns +- 0.3 ns -> 17.8 ns +- 0.2 ns: 1.08x faster
- append-less-gc 1000: 22.0 ns +- 0.6 ns -> 20.8 ns +- 0.3 ns: 1.06x faster

Geometric mean: 1.09x faster

https://bugs.python.org/issue47009

@sweeneyde sweeneyde requested a review from brandtbucher April 5, 2022 07:19
@markshannon
Copy link
Member

Looks good. I'm a bit wary of specialized superinstructions, but this seems solid.
I can imagine cases where list.append() wouldn't be followed by a POP_TOP, but they are contrived and highly unlikely.

@markshannon markshannon merged commit 6c6e040 into python:main Apr 5, 2022
@tiran
Copy link
Member

tiran commented Apr 5, 2022

The assert is failing on s390x Fedora buildbot https://buildbot.python.org/all/#/builders/232/builds/524

_bootstrap_python: Python/ceval.c:5045: _PyEval_EvalFrameDefault: Assertion `next_instr[-1] == POP_TOP' failed.
make: *** [Makefile:1204: Python/frozen_modules/io.h] Aborted (core dumped)

@markshannon
Copy link
Member

Strange. The bytecode is exactly the same on all platforms.

@sweeneyde sweeneyde deleted the listappend_pop branch April 5, 2022 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants