Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH/WIP: introduce a where keyword for reductions. #12635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented Dec 31, 2018

This introduces a where keyword for reductions - which would greatly help things like MaskedArray.

For lack of better ideas, the masking is done by setting elements that are not to be used to the identity. For this purposes, I have to ensure the operand reduced over is buffered - and this is now done with the drastic hack of (ab)using NPY_ITER_READONLY|NPY_ITER_UPDATEIFCOPY - I could not find a better way to tell the iterator to always buffer.

Work in progress - mostly here to request for comments, surely this can be done better...

mhvk added 4 commits December 30, 2018 15:53
The iterator masking is unsuited, as it prevents writing back an element
to an array, while what is needed is to skip even reading/operating on an
element.  Might need a double internal loop, where the operand is filled
in with an identity whenever the mask is set.
@mhvk
Copy link
Contributor Author

mhvk commented Dec 31, 2018

@seberg, @jaimefrio, since you are called the nditer experts in #12362: that issue followed really from this work - I'd like the iterator to always buffer. Can one do this more elegantly than the hack here with the existing machinery?

@seberg
Copy link
Member

seberg commented Dec 31, 2018

Ah, copying things like that sounds like a pretty smart hack. Don't really know myself quickly. why buffering does not happen. The normal ufuncs must use the buffer when where is used for the write operand. But maybe that is the magic, that it only is used for the write operand.

The other thing is that possibly a trivial loop is triggered. But that would have to happen before nditer, so doubt it.

@njsmith
Copy link
Member

njsmith commented Dec 31, 2018

Do you have any plan for handling reductions that don't have an identity?

@seberg
Copy link
Member

seberg commented Dec 31, 2018

Yeah, that would be much nicer. I think it may be possible when the iteration is along the slow axis by adapting the current machinery (write only those results back to the output array that are not masked, but fetch all every time). Or you would have to not copy everything, but not sure that can be hacked easily and it means the inner loop size changes (although I think it can change already).

@mhvk
Copy link
Contributor Author

mhvk commented Dec 31, 2018

@njsmith - for ufuncs with no identity, my current plan was to let the docs tell that one has to give one (via the recently added initial) - this is needed anyway for the case that no elements are selected at all.

@mhvk
Copy link
Contributor Author

mhvk commented Dec 31, 2018

@seberg - I tried changing the inner loop size first, just removing elements in the input that were not selected, but found it didn't work, as least not simply, since the inner loop is not guaranteed to go over the reduction axis (i.e., the output stride is not always 0). In the end, my present solution seemed to rely less on iterator implementation details.

Of course, I could rely even less on the iterator by buffering myself, but this seemed silly since in quite a number of cases the iterator will buffer already.

@seberg
Copy link
Member

seberg commented Dec 31, 2018

It might be we can make something work that uses a different mechanism depending on which axis the reduction is working on. When it is not along the reduction axis, I think the current where mechanism may actually (almost) work?

@mhvk
Copy link
Contributor Author

mhvk commented Dec 31, 2018

hmm, you're right, when the inner loop is not along the reduction axis, not writing back the output would just do the trick. But to force that would make 1-d very slow. But perhaps when it is along the axis, one could do the count-reduction.

Though perhaps this is still trying to force things too much: the broadcasting of the where mask is different for a reduction: it has to be to the input, not the output.

@mhvk
Copy link
Contributor Author

mhvk commented Dec 31, 2018

Though perhaps this is still trying to force things too much: the broadcasting of the where mask is different for a reduction: it has to be to the input, not the output.

Then again, perhaps one can set the input as the masked array, then broadcasting will presumably be OK.

@njsmith
Copy link
Member

njsmith commented Dec 31, 2018

Not all reductions have an identity, though, so we'll eventually need some other way to handle them, and the no-elements-selected case. And doesn't initial= already have other semantics? e.g. for a silly example, np.add.reduce(arr, initial=10) is equivalent to no.add.reduce(arr) + 10, right? So I would expect np.add.reduce(arr, where=arr2, initial=10) to be equivalent to np.add.reduce(arr, where=arr2) + 10.

Maybe we want a masked buffering mode, where first we collect unmasked elements into a buffer, and then we run the operation over the dense buffer?

No-elements-selected creates some tricky API design issues. We already have a strategy for handling no-identity empty reductions, which is to raise an error. And that's ok because right now, this can only happen when a dimension has size 0, which means that in a vectorized reduction all of the core reductions are empty. But with where=, we could have a mix of empty and non-empty core reductions.

Gufuncs have similar issues with reporting partial errors. Maybe it's time to come up with a standard numpy-wide convention for how to handle this. I don't think that needs to block where= support in reductions though; it's ok if in the first version you get an error if any core reductions are empty and the operation has no identity, and then we refine that later.

@mhvk
Copy link
Contributor Author

mhvk commented Dec 31, 2018

Good point about initial - it indeed only works when it is the identity, otherwise results would be more than a little surprising....

@mhvk
Copy link
Contributor Author

mhvk commented Jan 1, 2019

@seberg - currently, the iterator explicitly forbids a mask that has more elements than the output along a reduction axis.

But another question: can one force the iterator to have the inner loop along a reduction axis? That would allow the select-data and reduce-the-count method to be used always.

@seberg
Copy link
Member

seberg commented Jan 1, 2019

I do not think there is a flag to do that currently, not 100% sure though. In principle some ufuncs might even be better of doing this for reductions in any case. (well, to be honest the only ones are probably the float16 loops, which need to cast back and forth less often).

@mhvk
Copy link
Contributor Author

mhvk commented Jan 1, 2019

See #12640 for an alternative where I force the iterator to give the external loop an axis one reduces over. An advantage of this approach is that initial now will be used properly as an initial value, making it more reasonable to ask users to pass it in for ufuncs that do not have an identity.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 1, 2019

Just as a note: this implementation is not exactly speedy. E.g.,

a = np.arange(100000.)
a[1000] = np.nan
np.add.reduce(a, where=~np.isnan(a)) == np.nansum(a)
# True
%timeit np.nansum(a)
# 10000 loops, best of 5: 110 µs per loop
%timeit np.add.reduce(a, where=~np.isnan(a))
# 10000 loops, best of 5: 156 µs per loop
m = ~np.isnan(a)
%timeit np.add.reduce(a, where=m)
# 10000 loops, best of 5: 126 µs per loop

The alternative implementation in #12640 is even slower, as it is moving data around (but can probably be sped up to be roughly equivalent).

I guess part of the problem is the buffering, especially as I use the casting machinery for that. Though nansum obviously does a copy too.

EDIT: with buffering turned off, this version is slightly faster than nansum (but changes a inplace...)

@seberg
Copy link
Member

seberg commented Jan 1, 2019

I would imagine that things get faster if you disable GROWINNER, at least when buffering is enabled.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 2, 2019

closing in favour of the simpler and faster solution in #12644, which doesn't require messing with the iterator.

@mhvk mhvk closed this Jan 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants