Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gh-107868 Add an O(1) fastpath for sum(range(...)) #107870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

mcognetta
Copy link

@mcognetta mcognetta commented Aug 11, 2023

This adds a fastpath for sum(range(...)), which takes O(1) time. This partially resolves #68264 (but does not address the core issue of why there was a slowdown overall) as well as #107868.

Note: I am still not too familiar with cpython's internals. I believe this is a reasonable place to implement it, but I think the best way would be to add sum to PySequenceMethods, except that that would break ABI backwards compatibility, right?

The other thing I am worried about is that I did not implement the chain of operators (and their reference decrementing) in the most idiomatic way. Any advice would be appreciated.


Some example timings are below (I do not have python2 on my machine to compare to, but #68264 shows it would be closer to the main branch timings than this PR):

Main:

>>> import time
>>> t=time.time();sum(range(1,pow(10,8)+1));print(time.time()-t)
5000000050000000
3.165787696838379

This PR:

>>> import time
>>> t=time.time();sum(range(1,pow(10,8)+1));print(time.time()-t)
5000000050000000
7.939338684082031e-05

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@@ -1626,8 +1626,12 @@ def test_sum(self):

self.assertEqual(sum(range(10), 1000), 1045)
self.assertEqual(sum(range(10), start=1000), 1045)
self.assertEqual(sum(range(10), 0.1), 45.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that adding new test cases about sum is part of this PR.
If you want to improve sum tests, you can open separate issue/PR.

However, as a rule, float objects is not compared by assertEqual. You should use assertAlmostEqual

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I will move them to another PR if that would be better.

Comment on lines +2509 to +2524
PyObject* start = PyObject_GetAttrString(range, "start");
PyObject* step = PyObject_GetAttrString(range, "step");

PyObject* one = PyLong_FromLong(1);
PyObject* a = PyNumber_Subtract(length, one);

PyObject* b = PyNumber_Multiply(a, length);

PyObject* two = PyLong_FromLong(2);
PyObject* c = PyNumber_FloorDivide(b, two);

PyObject* d = PyNumber_Multiply(step, c);

PyObject* e = PyNumber_Multiply(length, start);

PyObject* result = PyNumber_Add(d, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these calls can return NULL. You should prevent these situations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, do you mean adding something like:

if (one == NULL) {
    Py_DECREF(one);
    return NULL;
}

after every one of these? I would have to do it after

        PyObject* rangesum =  range_sum_fastpath(module, iterable);
        result = PyNumber_Add(result, rangesum);
        Py_DECREF(rangesum);
        return result;

as well then, I think.

Comment on lines 2526 to 2536
Py_DecRef(length);
Py_DecRef(start);
Py_DecRef(step);

Py_DecRef(one);
Py_DecRef(a);
Py_DecRef(b);
Py_DecRef(two);
Py_DecRef(c);
Py_DecRef(d);
Py_DecRef(e);
Copy link
Member

@Eclips4 Eclips4 Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's seems incorrect. You should use Py_DECREF(...) macro.
I mean, you should replace all calls of Py_DecRef to Py_DECREF (in the whole code which you has written)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Eclips4 Eclips4 added the performance Performance or resource usage label Aug 11, 2023
static PyObject *
builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start)
/*[clinic end generated code: output=df758cec7d1d302f input=162b50765250d222]*/
{
PyObject *result = start;
PyObject *temp, *item, *iter;

if (PyRange_Check(iterable)) {
Copy link
Member

@Eclips4 Eclips4 Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builtin_sum_impl written with the usage of Argument Clinic. So, you should run AC on this file to re-generate checksum's (you can see this at the beggining of this function).
More details about the Argument Clinic you can read here:
https://docs.python.org/3.13/howto/clinic.html

@bedevere-bot

This comment was marked as resolved.

@Eclips4
Copy link
Member

Eclips4 commented Aug 11, 2023

Also, I would prefer to have a NEWS entry for this PR :)

@bedevere-bot

This comment was marked as resolved.

@mcognetta
Copy link
Author

Thanks for your reviews. I will fix the clinic stuff tomorrow.

There is one other issue that I am concerned about. When running something like:

>>> r = range(2**1000, 2**1000 + 1000, 999)
>>> sum(r)
21430172143725346418968500981200036211228096234110672148875007767407021022498722449863967576313917162551893458351062936503742905713846280871969155149397149607869135549648461970842149210124742283755908364306092949967163882534797535118331087892154125829142392955373084335320859663305248773674411336139751
>>> r = range(2**64)
>>> sum(r)
Segmentation fault (core dumped)

There is a segfault, even though it should be able to represent all of these numbers.

Actually, in a prior version of this implementation, doing sum(range(2**64)) wouldn't segfault, but it would say that the code returned a error as it could not be represented in a ssize_t (I don't have the exact error unfortunately).

I am not sure what the result would be if you ran that same code on main, as it would take too long to finish. But it concerns me that this is clearly able to generate numbers larger than the system max, but in some cases it fails. What do you think?

@mcognetta
Copy link
Author

Closing due to the consensus that the added complexity is not worth it since this use case is so rare.

@mcognetta mcognetta closed this Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sum() several times slower on Python 3 64-bit
3 participants