-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-107868 Add an O(1)
fastpath for sum(range(...))
#107870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
Lib/test/test_builtin.py
Outdated
@@ -1626,8 +1626,12 @@ def test_sum(self): | |||
|
|||
self.assertEqual(sum(range(10), 1000), 1045) | |||
self.assertEqual(sum(range(10), start=1000), 1045) | |||
self.assertEqual(sum(range(10), 0.1), 45.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that adding new test cases about sum
is part of this PR.
If you want to improve sum
tests, you can open separate issue/PR.
However, as a rule, float objects is not compared by assertEqual
. You should use assertAlmostEqual
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I will move them to another PR if that would be better.
PyObject* start = PyObject_GetAttrString(range, "start"); | ||
PyObject* step = PyObject_GetAttrString(range, "step"); | ||
|
||
PyObject* one = PyLong_FromLong(1); | ||
PyObject* a = PyNumber_Subtract(length, one); | ||
|
||
PyObject* b = PyNumber_Multiply(a, length); | ||
|
||
PyObject* two = PyLong_FromLong(2); | ||
PyObject* c = PyNumber_FloorDivide(b, two); | ||
|
||
PyObject* d = PyNumber_Multiply(step, c); | ||
|
||
PyObject* e = PyNumber_Multiply(length, start); | ||
|
||
PyObject* result = PyNumber_Add(d, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these calls can return NULL
. You should prevent these situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, do you mean adding something like:
if (one == NULL) {
Py_DECREF(one);
return NULL;
}
after every one of these? I would have to do it after
PyObject* rangesum = range_sum_fastpath(module, iterable);
result = PyNumber_Add(result, rangesum);
Py_DECREF(rangesum);
return result;
as well then, I think.
Python/bltinmodule.c
Outdated
Py_DecRef(length); | ||
Py_DecRef(start); | ||
Py_DecRef(step); | ||
|
||
Py_DecRef(one); | ||
Py_DecRef(a); | ||
Py_DecRef(b); | ||
Py_DecRef(two); | ||
Py_DecRef(c); | ||
Py_DecRef(d); | ||
Py_DecRef(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's seems incorrect. You should use Py_DECREF(...)
macro.
I mean, you should replace all calls of Py_DecRef
to Py_DECREF
(in the whole code which you has written)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
static PyObject * | ||
builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start) | ||
/*[clinic end generated code: output=df758cec7d1d302f input=162b50765250d222]*/ | ||
{ | ||
PyObject *result = start; | ||
PyObject *temp, *item, *iter; | ||
|
||
if (PyRange_Check(iterable)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
builtin_sum_impl
written with the usage of Argument Clinic. So, you should run AC on this file to re-generate checksum's (you can see this at the beggining of this function).
More details about the Argument Clinic you can read here:
https://docs.python.org/3.13/howto/clinic.html
This comment was marked as resolved.
This comment was marked as resolved.
Also, I would prefer to have a |
This comment was marked as resolved.
This comment was marked as resolved.
Thanks for your reviews. I will fix the clinic stuff tomorrow. There is one other issue that I am concerned about. When running something like: >>> r = range(2**1000, 2**1000 + 1000, 999)
>>> sum(r)
21430172143725346418968500981200036211228096234110672148875007767407021022498722449863967576313917162551893458351062936503742905713846280871969155149397149607869135549648461970842149210124742283755908364306092949967163882534797535118331087892154125829142392955373084335320859663305248773674411336139751
>>> r = range(2**64)
>>> sum(r)
Segmentation fault (core dumped) There is a segfault, even though it should be able to represent all of these numbers. Actually, in a prior version of this implementation, doing I am not sure what the result would be if you ran that same code on |
Misc/NEWS.d/next/Core and Builtins/2023-08-11-16-44-29.gh-issue-107868.BgT7zE.rst
Outdated
Show resolved
Hide resolved
Closing due to the consensus that the added complexity is not worth it since this use case is so rare. |
This adds a fastpath for
sum(range(...))
, which takesO(1)
time. This partially resolves #68264 (but does not address the core issue of why there was a slowdown overall) as well as #107868.Note: I am still not too familiar with cpython's internals. I believe this is a reasonable place to implement it, but I think the best way would be to add
sum
toPySequenceMethods
, except that that would break ABI backwards compatibility, right?The other thing I am worried about is that I did not implement the chain of operators (and their reference decrementing) in the most idiomatic way. Any advice would be appreciated.
Some example timings are below (I do not have python2 on my machine to compare to, but #68264 shows it would be closer to the
main
branch timings than this PR):Main:
This PR:
O(1)
fastpath forsum(range(...))
#107868