Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gh-97912: Avoid quadratic behavior when adding LOAD_FAST_CHECK #97952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 20, 2022

Conversation

sweeneyde
Copy link
Member

@sweeneyde sweeneyde commented Oct 6, 2022

@sweeneyde
Copy link
Member Author

A benchmark:

from time import perf_counter
from pathlib import Path
import sympy.integrals.rubi.rules.sine as sine
text = Path(sine.__file__).read_text("utf-8")
t0 = perf_counter()
for _ in range(10):
    compile(text, "sine", "exec")
t1 = perf_counter()
print((t1 - t0) / 10)

On my machine, this goes from about 0.96 seconds before this PR to 0.32 seconds after, both on windows without PGO.

Benchmarking 3.11 (no PGO) in the same way, I also get roughly 0.32 seconds.

@ambv
Copy link
Contributor

ambv commented Oct 7, 2022

The sympy file in the benchmark can be downloaded from
https://raw.githubusercontent.com/sympy/sympy/master/sympy/integrals/rubi/rules/sine.py

I can confirm the benchmark goes much faster compared to main. In fact, my benchmark results are more dramatic. That's a PGO build on macOS arm64.

Main: 1.9035797520901543
With the patch: 0.4364458600000944

Benchmark at 63 locals looks like this:

Main: 0.00048459595802705736
With the patch: 0.00043200499995145946

Benchmark at 64 locals looks like this:

Main: 0.0004902609169948846
With the patch: 0.00043480941600864754

The patch is still faster in those cases.

@sweeneyde
Copy link
Member Author

sweeneyde commented Oct 7, 2022

I could measure no compilation performance difference from adding fast_scan_many_locals, but it does restore some of the functionality. I ran this:

import sympy.integrals.rubi.rules.sine as sine
from collections import Counter
import dis
counts = Counter(i.opname for i in dis.get_instructions(sine.sine))
print(counts)

Before this PR:

Counter({'EXTENDED_ARG': 39840, 'LOAD_GLOBAL': 34145, 'CALL': 26331, 'LOAD_CONST': 19502, 'BINARY_OP': 14087, 'LOAD_FAST': 13861, 'STORE_FAST': 2836, 'LIST_APPEND': 1234, 'IMPORT_FROM': 368, 'RESUME': 1, 'IMPORT_NAME': 1, 'POP_TOP': 1, 'BUILD_LIST': 1, 'RETURN_VALUE': 1})

Intermediate: without fast_scan_many_locals

Counter({'EXTENDED_ARG': 39840, 'LOAD_GLOBAL': 34145, 'CALL': 26331, 'LOAD_CONST': 19502, 'BINARY_OP': 14087, 'LOAD_FAST': 8756, 'LOAD_FAST_CHECK': 5105, 'STORE_FAST': 2836, 'LIST_APPEND': 1234, 'IMPORT_FROM': 368, 'RESUME': 1, 'IMPORT_NAME': 1, 'POP_TOP': 1, 'BUILD_LIST': 1, 'RETURN_VALUE': 1})

After this PR, including fast_scan_many_locals: the same counts as before this PR. Since the sine() function has very few basicblocks (one?), fast_scan_many_locals was effective.

@sweeneyde sweeneyde added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 20, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @sweeneyde for commit b07183c 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Oct 20, 2022
@sweeneyde
Copy link
Member Author

@iritkatriel or @markshannon, are there any objections to adding basicblock->b_unsafe_locals_mask? If not, I think this is ready.

@iritkatriel
Copy link
Member

@iritkatriel or @markshannon, are there any objections to adding basicblock->b_unsafe_locals_mask? If not, I think this is ready.

No objections. I wondered if we can have this called from optimize_cfg (so that it can be accessed from unit tests). It would probably just require passing in the nparams and nlocals rather than the compiler (c). We can make that change later though.

@sweeneyde sweeneyde merged commit 39bc70e into python:main Oct 20, 2022
@sweeneyde sweeneyde deleted the load_fast_check_faster2 branch October 20, 2022 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants