Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gh-100726: optimize construction of range object for medium sized integers (version 6) #100810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jan 21, 2023

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Jan 6, 2023

This is a simplified version of #100726.

Benchmark against main:

range(0, 5): Mean +- std dev: [main_range] 107 ns +- 0 ns -> [pr_v6_2] 65.1 ns +- 1.1 ns: 1.64x faster
list(range(0, 5)): Mean +- std dev: [main_range] 237 ns +- 2 ns -> [pr_v6_2] 194 ns +- 2 ns: 1.22x faster
range(-2, 3): Mean +- std dev: [main_range] 107 ns +- 1 ns -> [pr_v6_2] 65.5 ns +- 0.9 ns: 1.64x faster
list(range(-2, 3)): Mean +- std dev: [main_range] 239 ns +- 3 ns -> [pr_v6_2] 195 ns +- 2 ns: 1.23x faster
range(2, 60): Mean +- std dev: [main_range] 108 ns +- 1 ns -> [pr_v6_2] 65.5 ns +- 0.9 ns: 1.65x faster
list(range(2, 60)): Mean +- std dev: [main_range] 518 ns +- 14 ns -> [pr_v6_2] 477 ns +- 16 ns: 1.09x faster
range(2, 60, 2): Mean +- std dev: [main_range] 111 ns +- 2 ns -> [pr_v6_2] 70.6 ns +- 1.0 ns: 1.57x faster
list(range(2, 60, 2)): Mean +- std dev: [main_range] 372 ns +- 4 ns -> [pr_v6_2] 328 ns +- 5 ns: 1.13x faster
range(4398046511104, 4398046511109, 2): Mean +- std dev: [main_range] 133 ns +- 1 ns -> [pr_v6_2] 76.0 ns +- 0.6 ns: 1.75x faster
list(range(4398046511104, 4398046511109, 2)): Mean +- std dev: [main_range] 301 ns +- 3 ns -> [pr_v6_2] 242 ns +- 1 ns: 1.24x faster
range(73786976294838206464, 73786976294838206469, 2): Mean +- std dev: [main_range] 134 ns +- 2 ns -> [pr_v6_2] 145 ns +- 3 ns: 1.08x slower
range(-1, -1, -1): Mean +- std dev: [main_range] 72.3 ns +- 2.2 ns -> [pr_v6_2] 67.4 ns +- 1.2 ns: 1.07x faster
range(-1, -1, 2**66): Mean +- std dev: [main_range] 229 ns +- 4 ns -> [pr_v6_2] 248 ns +- 3 ns: 1.08x slower
for loop with range(10): Mean +- std dev: [main_range] 860 ns +- 29 ns -> [pr_v6_2] 826 ns +- 19 ns: 1.04x faster
for loop with range(20): Mean +- std dev: [main_range] 860 ns +- 16 ns -> [pr_v6_2] 827 ns +- 26 ns: 1.04x faster

Benchmark hidden because not significant (1): list(range(73786976294838206464, 73786976294838206469, 2))

Geometric mean: 1.23x faster

@eendebakpt eendebakpt changed the title gh-100726: optimize construction of range object for medium sized integers gh-100726: optimize construction of range object for medium sized integers (version 6) Jan 6, 2023
@mdickinson mdickinson self-requested a review January 7, 2023 15:58
Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea looks good to me; we're missing some error handling in the call to compute_range_length_long, and I have a few style nitpicks.

int overflow = 0;

long long_start = PyLong_AsLongAndOverflow(start, &overflow);
if (overflow || (long_start==-1 && PyErr_Occurred()) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there's a slightly ugly problem here. If PyErr_Occurred() is true then we never clear the exception, so we leave this function with an exception set. That's okay, but then we should be checking PyErr_Occurred() in the calling function, too. And yes, I think it's true that given that start is a PyLong_Object, the current implementation of PyLong_AsLongAndOverflow can't possibly raise, so this seems like a non-issue. Except that it's not safe to rely on some future version of PyLong_AsLongAndOverflow not raising.

So we either need to check for PyErr_Occurred() in the caller, or go back to the idea of a new PyLong API function that's guaranteed not to raise. I think the extra check should be fairly cheap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get the point. The current check (long_start==-1 && PyErr_Occurred()) is not good as is never occurs, and if it would occur because the implementation of PyLong_AsLongAndOverflow changes we are not handling it correctly. Right?

I refactored so the check is like

...
    long long_start = PyLong_AsLongAndOverflow(start, &overflow);
    if (overflow)
        return -1;
    if (long_start==-1 && PyErr_Occurred()) {
        PyErr_Clear();
        return -2;
    }
...

For an overflow the documentation of PyLong_AsLongAndOverflow states there is no exception. For the other errors we check the value of long_start (which is fast) in combination with PyErr_Occurred(). If required, we clear the error.

We could also perform the check and clear outside compute_range_length_long, but in this way all the logic for the fast path is contained inside a single method.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@eendebakpt
Copy link
Contributor Author

Updated benchmark against main:

range(0, 5): Mean +- std dev: [main_range] 107 ns +- 0 ns -> [pr_v6_rework] 63.4 ns +- 0.6 ns: 1.69x faster
list(range(0, 5)): Mean +- std dev: [main_range] 237 ns +- 2 ns -> [pr_v6_rework] 193 ns +- 2 ns: 1.22x faster
range(-2, 3): Mean +- std dev: [main_range] 107 ns +- 1 ns -> [pr_v6_rework] 64.0 ns +- 1.6 ns: 1.68x faster
list(range(-2, 3)): Mean +- std dev: [main_range] 239 ns +- 3 ns -> [pr_v6_rework] 195 ns +- 1 ns: 1.23x faster
range(2, 60): Mean +- std dev: [main_range] 108 ns +- 1 ns -> [pr_v6_rework] 64.2 ns +- 0.5 ns: 1.69x faster
list(range(2, 60)): Mean +- std dev: [main_range] 518 ns +- 14 ns -> [pr_v6_rework] 474 ns +- 14 ns: 1.09x faster
range(2, 60, 2): Mean +- std dev: [main_range] 111 ns +- 2 ns -> [pr_v6_rework] 70.3 ns +- 1.1 ns: 1.58x faster
list(range(2, 60, 2)): Mean +- std dev: [main_range] 372 ns +- 4 ns -> [pr_v6_rework] 329 ns +- 7 ns: 1.13x faster
range(4398046511104, 4398046511109, 2): Mean +- std dev: [main_range] 133 ns +- 1 ns -> [pr_v6_rework] 75.2 ns +- 0.6 ns: 1.77x faster
list(range(4398046511104, 4398046511109, 2)): Mean +- std dev: [main_range] 301 ns +- 3 ns -> [pr_v6_rework] 239 ns +- 1 ns: 1.26x faster
range(73786976294838206464, 73786976294838206469, 2): Mean +- std dev: [main_range] 134 ns +- 2 ns -> [pr_v6_rework] 144 ns +- 1 ns: 1.07x slower
range(-1, -1, -1): Mean +- std dev: [main_range] 72.3 ns +- 2.2 ns -> [pr_v6_rework] 67.9 ns +- 2.9 ns: 1.06x faster
range(-1, -1, 2**66): Mean +- std dev: [main_range] 229 ns +- 4 ns -> [pr_v6_rework] 249 ns +- 5 ns: 1.09x slower
for loop with range(10): Mean +- std dev: [main_range] 860 ns +- 29 ns -> [pr_v6_rework] 828 ns +- 16 ns: 1.04x faster
for loop with range(20): Mean +- std dev: [main_range] 860 ns +- 16 ns -> [pr_v6_rework] 828 ns +- 15 ns: 1.04x faster

Benchmark hidden because not significant (1): list(range(73786976294838206464, 73786976294838206469, 2))

Geometric mean: 1.24x faster

The commits addressing the review comments seem to not have changed the performance.

@pochmann
Copy link
Contributor

pochmann commented Jan 8, 2023

for loop with range(10): Mean +- std dev: [main_range] 860 ns +- 29 ns -> [pr_v6_rework] 828 ns +- 16 ns: 1.04x faster
for loop with range(20): Mean +- std dev: [main_range] 860 ns +- 16 ns -> [pr_v6_rework] 828 ns +- 15 ns: 1.04x faster

That looks buggy, 20 should take longer than 10.

@eendebakpt
Copy link
Contributor Author

for loop with range(10): Mean +- std dev: [main_range] 860 ns +- 29 ns -> [pr_v6_rework] 828 ns +- 16 ns: 1.04x faster
for loop with range(20): Mean +- std dev: [main_range] 860 ns +- 16 ns -> [pr_v6_rework] 828 ns +- 15 ns: 1.04x faster

That looks buggy, 20 should take longer than 10.

@pochmann You are right. In the test script I only changed the name for the tests, not the statement tested. Here are the updated results:

for loop with range(2): Mean +- std dev: [for] 202 ns +- 3 ns -> [pr_for] 159 ns +- 9 ns: 1.27x faster
for loop with range(10): Mean +- std dev: [for] 455 ns +- 3 ns -> [pr_for] 413 ns +- 13 ns: 1.10x faster
for loop with range(20): Mean +- std dev: [for] 852 ns +- 7 ns -> [pr_for] 827 ns +- 19 ns: 1.03x faster

Geometric mean: 1.13x faster

@eendebakpt
Copy link
Contributor Author

With this PR the compute_range_length is still an expensive part of the range. Results from valgrind:

Screenshot from 2023-01-08 14-12-44

Performance could be improved for cases like range(N) or range(N, M) where for some of the arguments we already know they fit into a long. That requires refactoring or combining the range_from_array, make_range_object and compute_range_length. I am not convinced that is worth the effort, so I will leave it out of the PR.

@eendebakpt
Copy link
Contributor Author

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@mdickinson: please review the changes made to this pull request.

if (overflow)
return -1;
if (long_start==-1 && PyErr_Occurred()) {
PyErr_Clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With apologies for fussing about a case that currently can't even happen: this still isn't quite right. If we get an unexpected exception from PyLong_AsLongAndOverflow (and right now, any exception from PyLong_AsLongAndOverflow would count as unexpected), then we'll want to propagate that to the caller rather than clearing it. Otherwise we're doing the C-API equivalent of a Python "except: pass".

So I think all that's needed is to drop the PyErr_Clear() here, and check for a return of -2 in the calling function.

To avoid too much confusion, we could also consider swapping the return values around (so -2 means overflow and -1 means unexpected exception), since returning -1 for an exception is fairly consistent in the rest of the codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a point. My line of reasoning was to bail out in case of any error, clear the exception and then continue with the regular path (which could encounter and then handle the same error). I updated the PR to not clear the error, but check for the return value in the calling function. In the hypothetical case (since we both agree it cannot happen) such an error is propagated, we can either clear the error in the caller or propagate again. I have chosen the latter.

The choice for -1 was because that is used for overflow in PyLong_AsLongAndOverflow. But PyLong_AsLongAndOverflow also uses -1 for error checking, so agree that -1 is more in line with the rest of the code.

I also added assert statements to check that in compute_range_length the arguments have PyLong_Check equal to 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdickinson The reasoning about the case that cannot happen is actually quite useful, as there was a bug in the code. The problem was that get_len_of_range guarantees the results fits into an unsigned long, but we cast to long without any checks. I added a check and a regression test. (not sure what the correct way is to check this is cpython, I just checked the result of the cast is negative)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not sure what the correct way is to check this is cpython, I just checked the result of the cast is negative)

Thanks. That's not totally safe in standard C, since conversions from unsigned to signed are implementation-defined. (C99 §6.3.1.3p3). I think we want to emulate what the other callers of get_len_of_range do and compare the return value to LONG_MAX.

I took the liberty of pushing a commit that does this, along with a couple of drive-by style and consistency fixes. If you're okay with the latest commit, I'll merge this PR once CI completes. And if not, I'm happy to revert and/or discuss further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdickinson Thanks for the improvement with the cast. Changes are fine with me.

@mdickinson mdickinson added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 14, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @mdickinson for commit e037563 🤖

If you want to schedule another build, you need to add the :hammer: test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 14, 2023
@mdickinson mdickinson added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 21, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @mdickinson for commit 605256d 🤖

If you want to schedule another build, you need to add the :hammer: test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 21, 2023
Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Waiting for buildbots.

@mdickinson
Copy link
Member

Three buildbot failures: a refleak in test_typing, a failure in test_asyncio, and a stack overflow. None of them appear to be related to this issue.

@mdickinson mdickinson merged commit f63f525 into python:main Jan 21, 2023
@eendebakpt
Copy link
Contributor Author

Three buildbot failures: a refleak in test_typing, a failure in test_asyncio, and a stack overflow. None of them appear to be related to this issue.

Thanks for reviewing!

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x Fedora LTO 3.x has failed when building commit f63f525.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/55/builds/2891) and take a look at the build logs.
  4. Check if the failure is related to this commit (f63f525) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/55/builds/2891

Failed tests:

  • test_asyncio

Failed subtests:

  • test_fork_signal_handling - test.test_asyncio.test_unix_events.TestFork.test_fork_signal_handling

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

413 tests OK.

10 slowest tests:

  • test_tools: 2 min 34 sec
  • test_concurrent_futures: 2 min 27 sec
  • test_gdb: 2 min 19 sec
  • test_multiprocessing_spawn: 1 min 30 sec
  • test_signal: 1 min 12 sec
  • test_multiprocessing_forkserver: 1 min 11 sec
  • test_multiprocessing_fork: 1 min 6 sec
  • test_nntplib: 1 min 6 sec
  • test_asyncio: 1 min 6 sec
  • test_socket: 41.7 sec

1 test failed:
test_asyncio

19 tests skipped:
test_check_c_globals test_devpoll test_ioctl test_kqueue
test_launcher test_msilib test_nis test_peg_generator
test_perf_profiler test_readline test_startfile test_tix
test_tkinter test_ttk test_winconsoleio test_winreg test_winsound
test_wmi test_zipfile64

1 re-run test:
test_asyncio

Total duration: 5 min 17 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.lto/build/Lib/unittest/async_case.py", line 90, in _callTestMethod
    if self._callMaybeAsync(method) is not None:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.lto/build/Lib/unittest/async_case.py", line 117, in _callMaybeAsync
    return self._asyncioTestContext.run(func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.lto/build/Lib/test/support/hashlib_helper.py", line 49, in wrapper
    return func_or_class(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.lto/build/Lib/test/test_asyncio/test_unix_events.py", line 1938, in test_fork_signal_handling
    self.assertTrue(child_handled.is_set())
AssertionError: False is not true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants