gh-100726: optimize construction of range object for medium sized integers (version 6) #100810

eendebakpt · 2023-01-06T20:49:01Z

This is a simplified version of #100726.

Benchmark against main:

range(0, 5): Mean +- std dev: [main_range] 107 ns +- 0 ns -> [pr_v6_2] 65.1 ns +- 1.1 ns: 1.64x faster
list(range(0, 5)): Mean +- std dev: [main_range] 237 ns +- 2 ns -> [pr_v6_2] 194 ns +- 2 ns: 1.22x faster
range(-2, 3): Mean +- std dev: [main_range] 107 ns +- 1 ns -> [pr_v6_2] 65.5 ns +- 0.9 ns: 1.64x faster
list(range(-2, 3)): Mean +- std dev: [main_range] 239 ns +- 3 ns -> [pr_v6_2] 195 ns +- 2 ns: 1.23x faster
range(2, 60): Mean +- std dev: [main_range] 108 ns +- 1 ns -> [pr_v6_2] 65.5 ns +- 0.9 ns: 1.65x faster
list(range(2, 60)): Mean +- std dev: [main_range] 518 ns +- 14 ns -> [pr_v6_2] 477 ns +- 16 ns: 1.09x faster
range(2, 60, 2): Mean +- std dev: [main_range] 111 ns +- 2 ns -> [pr_v6_2] 70.6 ns +- 1.0 ns: 1.57x faster
list(range(2, 60, 2)): Mean +- std dev: [main_range] 372 ns +- 4 ns -> [pr_v6_2] 328 ns +- 5 ns: 1.13x faster
range(4398046511104, 4398046511109, 2): Mean +- std dev: [main_range] 133 ns +- 1 ns -> [pr_v6_2] 76.0 ns +- 0.6 ns: 1.75x faster
list(range(4398046511104, 4398046511109, 2)): Mean +- std dev: [main_range] 301 ns +- 3 ns -> [pr_v6_2] 242 ns +- 1 ns: 1.24x faster
range(73786976294838206464, 73786976294838206469, 2): Mean +- std dev: [main_range] 134 ns +- 2 ns -> [pr_v6_2] 145 ns +- 3 ns: 1.08x slower
range(-1, -1, -1): Mean +- std dev: [main_range] 72.3 ns +- 2.2 ns -> [pr_v6_2] 67.4 ns +- 1.2 ns: 1.07x faster
range(-1, -1, 2**66): Mean +- std dev: [main_range] 229 ns +- 4 ns -> [pr_v6_2] 248 ns +- 3 ns: 1.08x slower
for loop with range(10): Mean +- std dev: [main_range] 860 ns +- 29 ns -> [pr_v6_2] 826 ns +- 19 ns: 1.04x faster
for loop with range(20): Mean +- std dev: [main_range] 860 ns +- 16 ns -> [pr_v6_2] 827 ns +- 26 ns: 1.04x faster

Benchmark hidden because not significant (1): list(range(73786976294838206464, 73786976294838206469, 2))

Geometric mean: 1.23x faster

Issue: Optimize Python range object for small integers #100726

mdickinson

The idea looks good to me; we're missing some error handling in the call to compute_range_length_long, and I have a few style nitpicks.

Objects/rangeobject.c

mdickinson · 2023-01-07T16:30:33Z

Objects/rangeobject.c

+    int overflow = 0;
+
+    long long_start = PyLong_AsLongAndOverflow(start, &overflow);
+    if (overflow || (long_start==-1 && PyErr_Occurred()) ) {


Hmm, there's a slightly ugly problem here. If PyErr_Occurred() is true then we never clear the exception, so we leave this function with an exception set. That's okay, but then we should be checking PyErr_Occurred() in the calling function, too. And yes, I think it's true that given that start is a PyLong_Object, the current implementation of PyLong_AsLongAndOverflow can't possibly raise, so this seems like a non-issue. Except that it's not safe to rely on some future version of PyLong_AsLongAndOverflow not raising.

So we either need to check for PyErr_Occurred() in the caller, or go back to the idea of a new PyLong API function that's guaranteed not to raise. I think the extra check should be fairly cheap.

I think I get the point. The current check (long_start==-1 && PyErr_Occurred()) is not good as is never occurs, and if it would occur because the implementation of PyLong_AsLongAndOverflow changes we are not handling it correctly. Right?

I refactored so the check is like

... long long_start = PyLong_AsLongAndOverflow(start, &overflow); if (overflow) return -1; if (long_start==-1 && PyErr_Occurred()) { PyErr_Clear(); return -2; } ...

For an overflow the documentation of PyLong_AsLongAndOverflow states there is no exception. For the other errors we check the value of long_start (which is fast) in combination with PyErr_Occurred(). If required, we clear the error.

We could also perform the check and clear outside compute_range_length_long, but in this way all the logic for the fast path is contained inside a single method.

Objects/rangeobject.c

bedevere-bot · 2023-01-07T16:43:49Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

eendebakpt · 2023-01-07T22:23:56Z

Updated benchmark against main:

range(0, 5): Mean +- std dev: [main_range] 107 ns +- 0 ns -> [pr_v6_rework] 63.4 ns +- 0.6 ns: 1.69x faster
list(range(0, 5)): Mean +- std dev: [main_range] 237 ns +- 2 ns -> [pr_v6_rework] 193 ns +- 2 ns: 1.22x faster
range(-2, 3): Mean +- std dev: [main_range] 107 ns +- 1 ns -> [pr_v6_rework] 64.0 ns +- 1.6 ns: 1.68x faster
list(range(-2, 3)): Mean +- std dev: [main_range] 239 ns +- 3 ns -> [pr_v6_rework] 195 ns +- 1 ns: 1.23x faster
range(2, 60): Mean +- std dev: [main_range] 108 ns +- 1 ns -> [pr_v6_rework] 64.2 ns +- 0.5 ns: 1.69x faster
list(range(2, 60)): Mean +- std dev: [main_range] 518 ns +- 14 ns -> [pr_v6_rework] 474 ns +- 14 ns: 1.09x faster
range(2, 60, 2): Mean +- std dev: [main_range] 111 ns +- 2 ns -> [pr_v6_rework] 70.3 ns +- 1.1 ns: 1.58x faster
list(range(2, 60, 2)): Mean +- std dev: [main_range] 372 ns +- 4 ns -> [pr_v6_rework] 329 ns +- 7 ns: 1.13x faster
range(4398046511104, 4398046511109, 2): Mean +- std dev: [main_range] 133 ns +- 1 ns -> [pr_v6_rework] 75.2 ns +- 0.6 ns: 1.77x faster
list(range(4398046511104, 4398046511109, 2)): Mean +- std dev: [main_range] 301 ns +- 3 ns -> [pr_v6_rework] 239 ns +- 1 ns: 1.26x faster
range(73786976294838206464, 73786976294838206469, 2): Mean +- std dev: [main_range] 134 ns +- 2 ns -> [pr_v6_rework] 144 ns +- 1 ns: 1.07x slower
range(-1, -1, -1): Mean +- std dev: [main_range] 72.3 ns +- 2.2 ns -> [pr_v6_rework] 67.9 ns +- 2.9 ns: 1.06x faster
range(-1, -1, 2**66): Mean +- std dev: [main_range] 229 ns +- 4 ns -> [pr_v6_rework] 249 ns +- 5 ns: 1.09x slower
for loop with range(10): Mean +- std dev: [main_range] 860 ns +- 29 ns -> [pr_v6_rework] 828 ns +- 16 ns: 1.04x faster
for loop with range(20): Mean +- std dev: [main_range] 860 ns +- 16 ns -> [pr_v6_rework] 828 ns +- 15 ns: 1.04x faster

Benchmark hidden because not significant (1): list(range(73786976294838206464, 73786976294838206469, 2))

Geometric mean: 1.24x faster

The commits addressing the review comments seem to not have changed the performance.

pochmann · 2023-01-08T11:51:25Z

for loop with range(10): Mean +- std dev: [main_range] 860 ns +- 29 ns -> [pr_v6_rework] 828 ns +- 16 ns: 1.04x faster
for loop with range(20): Mean +- std dev: [main_range] 860 ns +- 16 ns -> [pr_v6_rework] 828 ns +- 15 ns: 1.04x faster

That looks buggy, 20 should take longer than 10.

eendebakpt · 2023-01-08T12:52:57Z

for loop with range(10): Mean +- std dev: [main_range] 860 ns +- 29 ns -> [pr_v6_rework] 828 ns +- 16 ns: 1.04x faster
for loop with range(20): Mean +- std dev: [main_range] 860 ns +- 16 ns -> [pr_v6_rework] 828 ns +- 15 ns: 1.04x faster

That looks buggy, 20 should take longer than 10.

@pochmann You are right. In the test script I only changed the name for the tests, not the statement tested. Here are the updated results:

for loop with range(2): Mean +- std dev: [for] 202 ns +- 3 ns -> [pr_for] 159 ns +- 9 ns: 1.27x faster
for loop with range(10): Mean +- std dev: [for] 455 ns +- 3 ns -> [pr_for] 413 ns +- 13 ns: 1.10x faster
for loop with range(20): Mean +- std dev: [for] 852 ns +- 7 ns -> [pr_for] 827 ns +- 19 ns: 1.03x faster

Geometric mean: 1.13x faster

eendebakpt · 2023-01-08T13:18:08Z

With this PR the compute_range_length is still an expensive part of the range. Results from valgrind:

Performance could be improved for cases like range(N) or range(N, M) where for some of the arguments we already know they fit into a long. That requires refactoring or combining the range_from_array, make_range_object and compute_range_length. I am not convinced that is worth the effort, so I will leave it out of the PR.

eendebakpt · 2023-01-08T13:59:54Z

I have made the requested changes; please review again

bedevere-bot · 2023-01-08T13:59:57Z

Thanks for making the requested changes!

@mdickinson: please review the changes made to this pull request.

mdickinson · 2023-01-09T17:52:57Z

Objects/rangeobject.c

+    if (overflow)
+        return -1;
+    if (long_start==-1 && PyErr_Occurred()) {
+        PyErr_Clear();


With apologies for fussing about a case that currently can't even happen: this still isn't quite right. If we get an unexpected exception from PyLong_AsLongAndOverflow (and right now, any exception from PyLong_AsLongAndOverflow would count as unexpected), then we'll want to propagate that to the caller rather than clearing it. Otherwise we're doing the C-API equivalent of a Python "except: pass".

So I think all that's needed is to drop the PyErr_Clear() here, and check for a return of -2 in the calling function.

To avoid too much confusion, we could also consider swapping the return values around (so -2 means overflow and -1 means unexpected exception), since returning -1 for an exception is fairly consistent in the rest of the codebase.

You have a point. My line of reasoning was to bail out in case of any error, clear the exception and then continue with the regular path (which could encounter and then handle the same error). I updated the PR to not clear the error, but check for the return value in the calling function. In the hypothetical case (since we both agree it cannot happen) such an error is propagated, we can either clear the error in the caller or propagate again. I have chosen the latter.

The choice for -1 was because that is used for overflow in PyLong_AsLongAndOverflow. But PyLong_AsLongAndOverflow also uses -1 for error checking, so agree that -1 is more in line with the rest of the code.

I also added assert statements to check that in compute_range_length the arguments have PyLong_Check equal to 1.

@mdickinson The reasoning about the case that cannot happen is actually quite useful, as there was a bug in the code. The problem was that get_len_of_range guarantees the results fits into an unsigned long, but we cast to long without any checks. I added a check and a regression test. (not sure what the correct way is to check this is cpython, I just checked the result of the cast is negative)

(not sure what the correct way is to check this is cpython, I just checked the result of the cast is negative)

Thanks. That's not totally safe in standard C, since conversions from unsigned to signed are implementation-defined. (C99 §6.3.1.3p3). I think we want to emulate what the other callers of get_len_of_range do and compare the return value to LONG_MAX.

I took the liberty of pushing a commit that does this, along with a couple of drive-by style and consistency fixes. If you're okay with the latest commit, I'll merge this PR once CI completes. And if not, I'm happy to revert and/or discuss further.

@mdickinson Thanks for the improvement with the cast. Changes are fine with me.

…to range_fast_path_v6

bedevere-bot · 2023-01-14T15:53:33Z

🤖 New build scheduled with the buildbot fleet by @mdickinson for commit e037563 🤖

If you want to schedule another build, you need to add the :hammer: test-with-buildbots label again.

bedevere-bot · 2023-01-21T10:25:14Z

🤖 New build scheduled with the buildbot fleet by @mdickinson for commit 605256d 🤖

If you want to schedule another build, you need to add the :hammer: test-with-buildbots label again.

mdickinson

LGTM. Waiting for buildbots.

mdickinson · 2023-01-21T19:31:59Z

Three buildbot failures: a refleak in test_typing, a failure in test_asyncio, and a stack overflow. None of them appear to be related to this issue.

eendebakpt · 2023-01-21T19:51:49Z

Three buildbot failures: a refleak in test_typing, a failure in test_asyncio, and a stack overflow. None of them appear to be related to this issue.

Thanks for reviewing!

bedevere-bot · 2023-01-21T20:43:08Z

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x Fedora LTO 3.x has failed when building commit f63f525.

What do you need to do:

Don't panic.
Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/55/builds/2891) and take a look at the build logs.
Check if the failure is related to this commit (f63f525) or if it is a false positive.
If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/55/builds/2891

Failed tests:

test_asyncio

Failed subtests:

test_fork_signal_handling - test.test_asyncio.test_unix_events.TestFork.test_fork_signal_handling

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

413 tests OK.

10 slowest tests:

test_tools: 2 min 34 sec
test_concurrent_futures: 2 min 27 sec
test_gdb: 2 min 19 sec
test_multiprocessing_spawn: 1 min 30 sec
test_signal: 1 min 12 sec
test_multiprocessing_forkserver: 1 min 11 sec
test_multiprocessing_fork: 1 min 6 sec
test_nntplib: 1 min 6 sec
test_asyncio: 1 min 6 sec
test_socket: 41.7 sec

1 test failed:
test_asyncio

19 tests skipped:
test_check_c_globals test_devpoll test_ioctl test_kqueue
test_launcher test_msilib test_nis test_peg_generator
test_perf_profiler test_readline test_startfile test_tix
test_tkinter test_ttk test_winconsoleio test_winreg test_winsound
test_wmi test_zipfile64

1 re-run test:
test_asyncio

Total duration: 5 min 17 sec

Click to see traceback logs

Traceback (most recent call last):
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.lto/build/Lib/unittest/async_case.py", line 90, in _callTestMethod
    if self._callMaybeAsync(method) is not None:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.lto/build/Lib/unittest/async_case.py", line 117, in _callMaybeAsync
    return self._asyncioTestContext.run(func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.lto/build/Lib/test/support/hashlib_helper.py", line 49, in wrapper
    return func_or_class(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dje/cpython-buildarea/3.x.edelsohn-fedora-z.lto/build/Lib/test/test_asyncio/test_unix_events.py", line 1938, in test_fork_signal_handling
    self.assertTrue(child_handled.is_set())
AssertionError: False is not true

eendebakpt and others added 12 commits January 3, 2023 21:48

optimize construction of range object for medium sized integers

88b601c

📜🤖 Added by blurb_it.

477f341

fix news item

b755093

fix assert

7b3717c

Merge branch 'main' into range_fast_path

2f914bc

use PyLong_asLongAndOverflow

81fb4dd

refactor PyLong_AsLongAndOverflow

449b31e

fix error handling

388cc13

make method static

7e765a5

refactor

70035a4

whitespace

419352c

use -1 check

1b3e47c

bedevere-bot added the awaiting review label Jan 6, 2023

bedevere-bot mentioned this pull request Jan 6, 2023

Optimize Python range object for small integers #100726

Closed

whitespace

66a6998

eendebakpt changed the title ~~gh-100726: optimize construction of range object for medium sized integers~~ gh-100726: optimize construction of range object for medium sized integers (version 6) Jan 6, 2023

eendebakpt mentioned this pull request Jan 6, 2023

gh-100726: optimize construction of range object for medium sized integers #100727

Closed

mdickinson self-requested a review January 7, 2023 15:58

mdickinson suggested changes Jan 7, 2023

View reviewed changes

bedevere-bot removed the awaiting review label Jan 7, 2023

bedevere-bot added the awaiting changes label Jan 7, 2023

eendebakpt added 2 commits January 7, 2023 22:24

review comments - part 1

53259da

review comments - part 2

f0518a0

eendebakpt added 2 commits January 8, 2023 14:57

review comments

5e4bcac

whitespace

fdb59ab

bedevere-bot added awaiting change review and removed awaiting changes labels Jan 8, 2023

bedevere-bot requested a review from mdickinson January 8, 2023 13:59

Merge branch 'main' into range_fast_path_v6

e91f15f

mdickinson reviewed Jan 9, 2023

View reviewed changes

eendebakpt and others added 5 commits January 10, 2023 09:24

review comments

73fd3c9

Merge branch 'range_fast_path_v6' of github.com:eendebakpt/cpython in…

8c1d3a9

…to range_fast_path_v6

Merge branch 'main' into range_fast_path_v6

0f0762e

fix bug if unsigned long length does not fit into a long

9f8caef

Style fixes; make the conversion from unsigned long to long safe

e037563

mdickinson added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 14, 2023

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 14, 2023

Merge branch 'main' into range_fast_path_v6

605256d

mdickinson added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 21, 2023

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 21, 2023

mdickinson approved these changes Jan 21, 2023

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting change review labels Jan 21, 2023

mdickinson merged commit f63f525 into python:main Jan 21, 2023

bedevere-bot removed the awaiting merge label Jan 21, 2023

Uh oh!

gh-100726: optimize construction of range object for medium sized integers (version 6) #100810

gh-100726: optimize construction of range object for medium sized integers (version 6) #100810

Uh oh!

Conversation

eendebakpt commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdickinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdickinson Jan 7, 2023

Choose a reason for hiding this comment

Uh oh!

eendebakpt Jan 7, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bedevere-bot commented Jan 7, 2023

Uh oh!

eendebakpt commented Jan 7, 2023

Uh oh!

pochmann commented Jan 8, 2023

Uh oh!

eendebakpt commented Jan 8, 2023

Uh oh!

eendebakpt commented Jan 8, 2023

Uh oh!

eendebakpt commented Jan 8, 2023

Uh oh!

bedevere-bot commented Jan 8, 2023

Uh oh!

mdickinson Jan 9, 2023

Choose a reason for hiding this comment

Uh oh!

eendebakpt Jan 10, 2023

Choose a reason for hiding this comment

Uh oh!

eendebakpt Jan 10, 2023

Choose a reason for hiding this comment

Uh oh!

mdickinson Jan 14, 2023

Choose a reason for hiding this comment

Uh oh!

eendebakpt Jan 15, 2023

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Jan 14, 2023

Uh oh!

bedevere-bot commented Jan 21, 2023

Uh oh!

mdickinson left a comment

Choose a reason for hiding this comment

Uh oh!

mdickinson commented Jan 21, 2023

Uh oh!

eendebakpt commented Jan 21, 2023

Uh oh!

bedevere-bot commented Jan 21, 2023

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Uh oh!

Uh oh!

eendebakpt commented Jan 6, 2023 •

edited

Loading