-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH/WIP: introduce a where keyword for reductions. #12635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The iterator masking is unsuited, as it prevents writing back an element to an array, while what is needed is to skip even reading/operating on an element. Might need a double internal loop, where the operand is filled in with an identity whenever the mask is set.
@seberg, @jaimefrio, since you are called the |
Ah, copying things like that sounds like a pretty smart hack. Don't really know myself quickly. why buffering does not happen. The normal ufuncs must use the buffer when The other thing is that possibly a trivial loop is triggered. But that would have to happen before nditer, so doubt it. |
Do you have any plan for handling reductions that don't have an identity? |
Yeah, that would be much nicer. I think it may be possible when the iteration is along the slow axis by adapting the current machinery (write only those results back to the output array that are not masked, but fetch all every time). Or you would have to not copy everything, but not sure that can be hacked easily and it means the inner loop size changes (although I think it can change already). |
@njsmith - for ufuncs with no identity, my current plan was to let the docs tell that one has to give one (via the recently added |
@seberg - I tried changing the inner loop size first, just removing elements in the input that were not selected, but found it didn't work, as least not simply, since the inner loop is not guaranteed to go over the reduction axis (i.e., the output stride is not always 0). In the end, my present solution seemed to rely less on iterator implementation details. Of course, I could rely even less on the iterator by buffering myself, but this seemed silly since in quite a number of cases the iterator will buffer already. |
It might be we can make something work that uses a different mechanism depending on which axis the reduction is working on. When it is not along the reduction axis, I think the current |
hmm, you're right, when the inner loop is not along the reduction axis, not writing back the output would just do the trick. But to force that would make 1-d very slow. But perhaps when it is along the axis, one could do the count-reduction. Though perhaps this is still trying to force things too much: the broadcasting of the where mask is different for a reduction: it has to be to the input, not the output. |
Then again, perhaps one can set the input as the masked array, then broadcasting will presumably be OK. |
Not all reductions have an identity, though, so we'll eventually need some other way to handle them, and the no-elements-selected case. And doesn't Maybe we want a masked buffering mode, where first we collect unmasked elements into a buffer, and then we run the operation over the dense buffer? No-elements-selected creates some tricky API design issues. We already have a strategy for handling no-identity empty reductions, which is to raise an error. And that's ok because right now, this can only happen when a dimension has size 0, which means that in a vectorized reduction all of the core reductions are empty. But with Gufuncs have similar issues with reporting partial errors. Maybe it's time to come up with a standard numpy-wide convention for how to handle this. I don't think that needs to block |
Good point about |
@seberg - currently, the iterator explicitly forbids a mask that has more elements than the output along a reduction axis. But another question: can one force the iterator to have the inner loop along a reduction axis? That would allow the select-data and reduce-the-count method to be used always. |
I do not think there is a flag to do that currently, not 100% sure though. In principle some ufuncs might even be better of doing this for reductions in any case. (well, to be honest the only ones are probably the float16 loops, which need to cast back and forth less often). |
See #12640 for an alternative where I force the iterator to give the external loop an axis one reduces over. An advantage of this approach is that |
Just as a note: this implementation is not exactly speedy. E.g.,
The alternative implementation in #12640 is even slower, as it is moving data around (but can probably be sped up to be roughly equivalent). I guess part of the problem is the buffering, especially as I use the casting machinery for that. Though EDIT: with buffering turned off, this version is slightly faster than |
I would imagine that things get faster if you disable |
closing in favour of the simpler and faster solution in #12644, which doesn't require messing with the iterator. |
This introduces a
where
keyword for reductions - which would greatly help things likeMaskedArray
.For lack of better ideas, the masking is done by setting elements that are not to be used to the identity. For this purposes, I have to ensure the operand reduced over is buffered - and this is now done with the drastic hack of (ab)using
NPY_ITER_READONLY|NPY_ITER_UPDATEIFCOPY
- I could not find a better way to tell the iterator to always buffer.Work in progress - mostly here to request for comments, surely this can be done better...