Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Make an O(1) fastpath for sum(range(...)) #107868

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mcognetta opened this issue Aug 11, 2023 · 14 comments
Closed

Make an O(1) fastpath for sum(range(...)) #107868

mcognetta opened this issue Aug 11, 2023 · 14 comments
Labels
performance Performance or resource usage type-feature A feature request or enhancement

Comments

@mcognetta
Copy link

mcognetta commented Aug 11, 2023

Currently, sum(range(...)) falls back to an iterative loop that takes O(n) time, where n is the length of the range. Instead, this can be done in O(1) time with some arithmetic.

This is slightly related to issue #68264, where it is noted that sum(range(...)) had a slightly performance regression between Python2 and 3. However, in both cases, an O(n) time implementation is used.

Note: this is my first time contributing here using only Github. I already have a patch prepared, but I think I needed to open an issue first.

Linked PRs

@mcognetta mcognetta added the type-feature A feature request or enhancement label Aug 11, 2023
@AlexWaygood AlexWaygood added the performance Performance or resource usage label Aug 11, 2023
@mcognetta mcognetta changed the title Make a O(1) fastpath for sum(range(...)) Make an O(1) fastpath for sum(range(...)) Aug 11, 2023
@Eclips4
Copy link
Member

Eclips4 commented Aug 11, 2023

Hello, and thanks for your effort in writing issue/PR!

I'm 50/50 on it.
Calling sum on range objects isn't most frequency situation, but on the other hand, this adds only one if in body of the sum implementation, therefore, it will hardly slow down the rest of the cases when arg of sum isn't a range object.

@mdickinson
Copy link
Member

This seems like too rare a use-case to be worth the extra complexity in the sum implementation. @mcognetta Do you have examples of code in the wild that would benefit from this change?

@jxu
Copy link

jxu commented Aug 12, 2023

Hopefully any programmer who actually has sum(range(...)) as a performance bottleneck will be able to remember the formula for arithmetic series!

// compute start * length + step * length * (length - 1) // 2

If you know the exact last number end, it's a little simpler to compute (start + end) * length // 2

@miccoli
Copy link

miccoli commented Aug 12, 2023

I would also argue that sometimes statements like sum(range(BigNumber)) are executed just to add a O(n) workload in some testing environment.

In general I do not think that it is desiderable to outsmart the programmer writing sum(range(...)) with an O(1) implementation.

@pochmann
Copy link
Contributor

pochmann commented Aug 13, 2023

The similar all(range(...)) issue has some more thoughts.

@sunmy2019
Copy link
Member

I am negative on this change. Complexity over rare use case.

@mcognetta
Copy link
Author

mcognetta commented Aug 13, 2023

It seems the overall consensus is that this change is not worth the added complexity and so I will close this issue and my PR. Thanks for the comments.

Since I am just starting out contributing to cpython, for my own knowledge, would one of you mind quickly clarifying the two questions I asked in these comments in my PR:

>>> sum(range(2**61))
2658455991569831744654692615953842176
>>> sum(range(2**62))
10633823966279326980924613473029062656
>>> sum(range(2**63 - 1))
42535295865117307919086767873688862721
>>> sum(range(2**63))
Segmentation fault (core dumped)

It can obviously generate numbers larger than the max integer size, and I think every operation in my implementation uses PyObjects not cint, so I am confused what causes this. It also works ok with larger N, as long as the range length isn't so big:

>>> sum(range(2**1000, 2**1000 + 1000, 999))
21430172143725346418968500981200036211228096234110672148875007767407021022498722449863967576313917162551893458351062936503742905713846280871969155149397149607869135549648461970842149210124742283755908364306092949967163882534797535118331087892154125829142392955373084335320859663305248773674411336139751

Thanks in advance.

@AA-Turner
Copy link
Member

Closing per Marco's rationale.

A

@AA-Turner AA-Turner closed this as not planned Won't fix, can't repro, duplicate, stale Aug 13, 2023
@mdickinson
Copy link
Member

  • If I needed to add a NULL check after each PyNumber_* operation in my implementation.

Yes: each PyNumber_* call can potentially raise an exception, in which case NULL will be returned and an exception set, so normal code will need to check each return value so that the exception can be propagated if necessary. You'll see this pattern all over the codebase, but the math module is one place you can find lots of PyNumber_* calls in particular. E.g.,

cpython/Modules/mathmodule.c

Lines 1769 to 1794 in cc2cf85

Py_SETREF(q, PyNumber_FloorDivide(q, a));
if (q == NULL) {
goto error;
}
/* a = (a << d - 1 - e) + q */
Py_SETREF(a, _PyLong_Lshift(a, d - 1U - e));
if (a == NULL) {
Py_DECREF(q);
goto error;
}
Py_SETREF(a, PyNumber_Add(a, q));
Py_DECREF(q);
if (a == NULL) {
goto error;
}
}
/* The correct result is either a or a - 1. Figure out which, and
decrement a if necessary. */
/* a_too_large = n < a * a */
b = PyNumber_Multiply(a, a);
if (b == NULL) {
goto error;
}

  • Why my implementation has a segfault when calling sum(range(N)) when N is too large

There are lots of potential causes for segfaults in Python C-API code; refcounting and exception-handling issues are probably the most common, but by no means the only ones. In this particular case, I think you're running into the issue that len of a range fails for large ranges:

>>> len(range(2**63))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t

So your call to builtin_len returns NULL and sets an exception, and you then try to pass that NULL as a number into subsequent operations. This is good example of why NULL returns need to be handled: if you add error checks for both the builtin_len call and the range_sum_fastpath into your code, then you'd see a Python exception instead of the segfault:

>>> sum(range(2**63))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t

@mdickinson
Copy link
Member

@corona10
Copy link
Member

IMHO, this kind of optimization should be achieved by bytecode-level specialization, not runtime-level optimization.

@terryjreedy
Copy link
Member

If we had a multitude of compressed (vitural) range collection objects, then it might make sense to add sum and possibly even stdev special method objects. But we do not. (And to me this seems like something for PyPI.)

@mcognetta
Copy link
Author

@mdickinson Thanks for the detailed comment! That makes sense to me.

One last note, I had planned for another range optimization: checking contains for non-integral types. E.g.,

>>> import time
>>> r = range(100000000)
>>> t=time.time();8000000 in r;print(time.time()-t)
True
4.57763671875e-05
>>> t=time.time();8000000.0 in r;print(time.time()-t)
True
0.20510268211364746

I imagine this would similarly be rejected, but I am putting it here as another place where operations on range are quick to fall back on an iterative method.

Anyway, thanks again for the comments, I appreciated the discussion.

@mdickinson
Copy link
Member

mdickinson commented Aug 14, 2023

@mcognetta

another range optimization: checking contains for non-integral types

Take a look at https://discuss.python.org/t/fast-path-for-float-in-range-checks/18248 for previous discussion. Indeed there didn't seem to be much consensus that this would be a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests