Thanks to visit codestin.com
Credit goes to github.com

Skip to content

GH-98363: Presize the list for batched() #98419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 20, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions Modules/itertoolsmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -142,35 +142,35 @@ static PyObject *
batched_next(batchedobject *bo)
{
Py_ssize_t i;
Py_ssize_t n = bo->batch_size;
PyObject *it = bo->it;
PyObject *item;
PyObject *result;

if (it == NULL) {
return NULL;
}
result = PyList_New(0);
result = PyList_New(n);
if (result == NULL) {
return NULL;
}
for (i=0 ; i < bo->batch_size ; i++) {
for (i=0 ; i < n ; i++) {
item = PyIter_Next(it);
if (item == NULL) {
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care how this handles strange iterators that keep going after having raised StopIteration?

>>> from itertools import batched
>>> class A:
...     def __init__(self):
...         self.i = 0
...     def __iter__(self):
...         return self
...     def __next__(self):
...         i = self.i
...         self.i += 1
...         if i == 7:
...             raise StopIteration()
...         return i
...
>>> x = batched(A(), 5)
>>> next(x)
[0, 1, 2, 3, 4]
>>> next(x)
[5, 6]
>>> next(x)
[8, 9, 10, 11, 12]

Maybe the Py_CLEAR(bo->it) could be moved to before this break;.

Copy link
Contributor Author

@rhettinger rhettinger Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of leaving this as is. The current PR matches the behavior of the pure python version and the strange iterator works the same way with other tools:

>>> it = zip(range(10), A())
>>> list(it)
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)]
>>> list(it)
[(8, 8), (9, 9)]

>>> it = enumerate(A())
>>> list(it)
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)]
>>> next(it)
(7, 8)
>>> next(it)
(8, 9)

That said, I'm not really sure about this one and could easily be persuaded to add a Py_CLEAR(bo->it) to the code path for the short list.

Copy link
Member

@sweeneyde sweeneyde Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem like an edge case that could go either way.

I'm noticing that some iterators do Py_CLEAR(obj->it) as soon as the next() call fails, while other objects have internal iterators that are never NULL. My concern is that batched() clears the iterator sometimes (when the first next() of a batch fails), but not other times (when a subsequent next() fails).

So I would think to do one or the other, but not both:

  1. Clear the iterator immediately after any internal next() call fails, similar to pairwise, islice, chain, and cycle (sort of).

  2. Never clear the iterator and remove the if (it == NULL) check altogether, similar to dropwhile, takewhile, accumulate, compress, zip and enumerate. Unfortunately, it would mean allocating and throwing away PyList_New(n), but that's in the rare case that someone is abusing the iterator.

Maybe removing the Py_CLEAR and the NULL check is most natural?

}
if (PyList_Append(result, item) < 0) {
Py_DECREF(item);
Py_DECREF(result);
return NULL;
}
Py_DECREF(item);
PyList_SET_ITEM(result, i, item);
}
if (PyList_GET_SIZE(result) > 0) {
return result;
if (i == 0) {
Py_CLEAR(bo->it);
Py_DECREF(result);
return NULL;
}
Py_CLEAR(bo->it);
Py_DECREF(result);
return NULL;
if (i < n) {
PyObject *short_list = PyList_GetSlice(result, 0, i);
Py_SETREF(result, short_list);
}
return result;
}

static PyTypeObject batched_type = {
Expand Down