Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: where for ufunc reductions #12644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 4, 2019
Merged

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented Jan 2, 2019

Note: This is a simpler and more robust, yet faster, version of #12635 and #12640 (way too much time wasted on being clever, but the comments helped steer to this better solution).

This introduces a where keyword for reductions. It works well, but for ufuncs with no identity, one has to explicitly pass in an initial. If people agree this is the way to go, I will add documentation and more tests.

Unlike for my other attempts, this performs quite well and I think we should consider using it in nanfunctions and in MaskedArray:

a = np.arange(100000.)
a[1000] = np.nan
np.add.reduce(a, where=~np.isnan(a)) == np.nansum(a)
# True
%timeit np.nansum(a)
# 10000 loops, best of 5: 113 µs per loop
%timeit np.add.reduce(a, where=~np.isnan(a))
# 10000 loops, best of 5: 92.3 µs per loop

# Now try what will be the worst-case scenario.
a[::2] = np.nan
np.add.reduce(a, where=~np.isnan(a)) == np.nansum(a)
# True
%timeit np.nansum(a)
# 1000 loops, best of 5: 381 µs per loop
%timeit np.add.reduce(a, where=~np.isnan(a))
# 1000 loops, best of 5: 389 µs per loop

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good!

npy_intp count = *countptr;
char *maskptr = dataptrs[2];
char mask = *maskptr;
if (strides[2] != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a mask_stride alias for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

goto fail;
}
op_flags[2] = NPY_ITER_READONLY |
NPY_ITER_ALIGNED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the aligned for here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess for bools that is pretty senseless. Removed.

@@ -493,6 +484,13 @@ PyUFunc_ReduceWrapper(PyArrayObject *operand, PyArrayObject *out,
Py_INCREF(op_view);
}
else {
/* Cannot use where when we initialize from the operand */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents passing where=np.True_ into identity-less ufuncs, despite allowing where=True. I think we might want to detect empty masks on a per iterator-loop basis, to allow both.

Fine to leave to another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I prefer to leave this for another PR, as it needs some thought about what the ideal behaviour is (or even how to check whether an empty mask occurs).

if (wheremask != NULL) {
PyErr_SetString(PyExc_RuntimeError,
"Reduce operations with no identity only support "
"a where mask if 'initial' is passed in.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to include the ufunc name in this message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Actually, what error do you think this should be? RunTimeError is more for something that is unexpected, where this is not. Just ValueError? I guess I should catch it earlier, in the keyword getting stage (can keep it here just in case, though ReduceWrapper is not part of the C API so it is just for internal checking).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it a ValueError just above, where we check multiple axes and reorderable as well.

@njsmith
Copy link
Member

njsmith commented Jan 2, 2019

Neat!

for ufuncs with no identity, one has to explicitly pass in an initial.

Can you explain why?

goto fail;
}
}
else {
if (!PyArg_ParseTupleAndKeywords(args, kwds, "O|OO&O&iO:reduce",
if (!PyArg_ParseTupleAndKeywords(args, kwds, "O|OO&O&iOO:reduce",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a $ here to make where keyword-only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the way sum is done in _methods.py, this needs to be kept as a positional argument.

Copy link
Member

@eric-wieser eric-wieser Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, only if we want to keep the 10% performance boost on tiny arrays. I suppose most users will pass by keyword-argument anyway, so there's little to gain by requiring it to be keyword-only.

@numpy numpy deleted a comment from njsmith Jan 2, 2019
@seberg
Copy link
Member

seberg commented Jan 2, 2019

I think I may rather want to skip the initial part to leave the option of adding identity specifically.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 2, 2019

@shoyer - initial was initially called identity but after discussion was replaced only by initial - see #10635. In our current reductions, it is actually a better description of what happens...

@njsmith - the reason for having to pass in initial is the same as in my initial attempt, to be able to deal with the case where no elements are selected (but here, it is truly an initial, not an identity, as it is no longer used as a replacement).

@seberg
Copy link
Member

seberg commented Jan 2, 2019

@mhvk sorry, I think I interpreted it the wrong way around. I though initial was assumed to be an identity for the operation here.

@mhvk mhvk force-pushed the ufunc-reduce-where-simple branch from 4d8990a to b9b03cd Compare January 2, 2019 17:27
@mhvk
Copy link
Contributor Author

mhvk commented Jan 2, 2019

OK, now with initial comments addressed and more tests and docs.

@njsmith
Copy link
Member

njsmith commented Jan 2, 2019

So if you have a reduction that has no identity, and you try to use where= but don't pass initial=, then is that always an error, or is it only an error when one of the reductions turns out to have zero elements?

@mhvk
Copy link
Contributor Author

mhvk commented Jan 2, 2019

Right now, without an identity and initial, using where always errors. To change this would mean quite a bit of hacking in the code that determines the initial from the data itself: currently, it simply takes a view of the first element of the input data along the reduction axis, but obviously that won't work if that first element might be masked. I would like to postpone this to later (or not do it at all; not sure it is worth it...).

@eric-wieser
Copy link
Member

not sure it is worth it

Without doing it, I don't think we can use it to implement things like summary masked arrays of dtype=object

@@ -4876,7 +4876,7 @@

add_newdoc('numpy.core', 'ufunc', ('reduce',
"""
reduce(a, axis=0, dtype=None, out=None, keepdims=False, initial)
reduce(a, axis=0, dtype=None, out=None, keepdims=False, initial, *, where=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This * is not true - I assume it's a remnant of the rejected suggestion to implement it this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes, corrected.

/* Remove initial=np._NoValue */
if (i == 5 && obj == NoValue) {
/* Remove {initial,where}=np._NoValue */
if (i >= 5 && obj == NoValue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is where=np._NoValue even allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets used by sum etc in fromnumeric. In principle, though, those also strip it away again (unless overridden with __array_function__). Unlike initial, it does not get used in _methods, so I think it can indeed be removed. Will do so.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 3, 2019

Without doing it, I don't think we can use it to implement things like summary masked arrays of dtype=object

I don't quite follow. Currently, the MaskedArray.sum just fills the masked pieces with 0, so it is assumed that the object array knows what to do when summing with 0. It would seem that nothing would break if one were to use initial=0.

p.s. For object arrays one could pass in a special initial object that has def self.__add__(other): return other, and then at the end replace any remaining instances of those with masked, thus avoiding the need to reduce the mask as well. But I don't see how to generalize this to non-object.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 3, 2019

@eric-wieser - on MaskedArray.sum, I tried replacing self.filled(0).sum(...) with self._data.sum(..., where=~_mask, initial=0) and almost all masked array tests pass. The exception are tests with masked matrices, because I did not update matrix.sum to recognize where (or initial for that matter). Which I guess brings up a general problem of using this in MaskedArray - it would need to be able to fall back to using filled for a while to support other masked subclasses, while emitting a FutureWarning or so.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 3, 2019

p.s. I guess the first use of this may be in nanfunctions...

@eric-wieser
Copy link
Member

eric-wieser commented Jan 3, 2019

For object arrays one could pass in a special initial object that has def self.__add__(other): return other

That's a really neat idea!

self._data.sum(..., where=~_mask, initial=0) and almost all masked array tests pass

There's no need to pass initial since it's implied as the identity, right?

I think you're still going to have a bad time using where to implement ma_array.min, which will fail if a slice is all masked. Perhaps in the same way that np.minimum(a, b, where=False) leaves the output uninitialized, np.minimum.reduce(a, out, where=False) should do so too, and without any error. Either way, that can be left for another PR.

/*
* Optimization: where=True is the same as no where argument.
* This lets us document it as a default argument.
*/
Copy link
Member

@eric-wieser eric-wieser Jan 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization needs to come before calling PyArray_DescrFromType, else you leak dtype

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think you might leak it no matter what.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyArray_FromAny steals the reference to the dtype, so indeed the dtype should be created inside the if clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, turns out I wrote a _wheremask_converter when redoing the keyword parsing for the normal ufunc. Might as well use it...

@eric-wieser
Copy link
Member

Other than the reference leak above, this looks good to go for me

@mhvk mhvk force-pushed the ufunc-reduce-where-simple branch from 2dcb83c to bb9b9bb Compare January 3, 2019 17:59
@mhvk
Copy link
Contributor Author

mhvk commented Jan 3, 2019

OK, fixed the reference issue by re-using _wheremask_converter, and rebased/squashed since it seems ready to go in.

@mhvk mhvk force-pushed the ufunc-reduce-where-simple branch from bb9b9bb to 489483d Compare January 3, 2019 18:03
@ahaldane
Copy link
Member

ahaldane commented Jan 3, 2019

Re: where this might be used:

I have a ducktype'd MaskedArray available here (link)
which is pretty close to feature-complete. It has docs (link) in which I actually mention issues I had with masked reductions. It occurred to me that the functionality like in this PR might help fix it, so it's nice to see it here!

To use this PR there I would just need to modify the _Masked_BinOp.reduce method. I expect it should be pretty easy and would actually simplify the code, since currently I have to do some ugly manipulation by filling in masked elements with the identity element.

@ahaldane
Copy link
Member

ahaldane commented Jan 3, 2019

Also, one case I thought about a lot has to do with non-associative (?) reductions. For instance:

>>> np.equal.reduce([False, False])                            
True
>>> np.equal.reduce([False, False, False])
False

if you work it out, you'll see that the end result depends on how successive values alternate. Therefore, there is a difference between replacing a value by an identity element before reducing (my strategy), and not using the element in the reduction in the first place (this PR, correct?).

@eric-wieser
Copy link
Member

if you work it out, you'll see that the end result depends on how successive values alternat

Right, but np.equal has no identity. For the special case of the ??->? loop, it has an identity of True, for which your above would work just fine.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 3, 2019

@ahaldane - yes, I think this should work well for the general masking case. The one problem I noted earlier is that whatever is being masked has to be able to deal with where and initial in their own sum() method. But if a project is all up to date with the numpy API, it should be fine.

p.s. Will look at your MaskedArray class; curious to see if it would allow astropy finally to have a MaskedQuantity...

In this implementation, if the ufunc does not have an identity, it needs
an initial vavlue to be supplied.
@mhvk mhvk force-pushed the ufunc-reduce-where-simple branch from 489483d to 5afe650 Compare January 3, 2019 22:33
@mattip mattip merged commit b60b583 into numpy:master Jan 4, 2019
@mattip
Copy link
Member

mattip commented Jan 4, 2019

Nice and simple enhancement. Would be nice to add the comparison benchmark at some point

@seberg
Copy link
Member

seberg commented Jan 4, 2019

Yeah, looked good to me as well. Only thing to maybe quickly follow up on: Do we want to allow allow non-boolean masks? I think I would prefer to not allow it for now but the code seems to me like it does, and there should be a test for it probably.

@seberg
Copy link
Member

seberg commented Jan 4, 2019

Sorry, nevermind, the loop uses "safe" casting, thought I skimmed an unsafe cast there somewhere.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 4, 2019

Nice and simple enhancement. Would be nice to add the comparison benchmark at some point

I added a new issue #12662 to use this in nanfunctions - item 1 for that is to add benchmarks for those functions.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 4, 2019

@seberg - mask array creation and casting is now identical to what is used for the regular ufuncs, so OK to the extent those are OK!

@mhvk mhvk deleted the ufunc-reduce-where-simple branch January 4, 2019 15:16
@ahaldane
Copy link
Member

ahaldane commented Jan 4, 2019

I added a commit using it in the MaskedArray ducktype, and it passes my test suite and gives a fair speedup. Setup:

>>> np.random.seed(12345)                                                      
>>> x = np.random.rand(515, 512)                                                
>>> a = MaskedArray(x, x < 0.5) 

Before this PR:

>>> %timeit np.add.reduce(a, axis=1)    
1.63 ms ± 8.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

After this PR:

>>> %timeit np.add.reduce(a, axis=1)                                           
1.39 ms ± 2.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The case I tested might be a worst-case for this PR, too, since it has lots of alternating masked elements.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 4, 2019

@ahaldane - happy to see that it helps! And, yes, that'll definitely be close to worst-case performance.

@ahaldane
Copy link
Member

ahaldane commented Jan 4, 2019

Apologies, I mixed myself up on the timings. Here is the correct comparison:

Before:

>>> a = MaskedArray(x, x < 0.9)                                                
>>> %timeit np.add.reduce(a, axis=1)                                           
751 µs ± 3.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> a = MaskedArray(x, x < 0.5)
>>> %timeit np.add.reduce(a, axis=1)      
1.4 ms ± 3.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> a = MaskedArray(x, x < 0.1)                                               
>>> %timeit np.add.reduce(a, axis=1)                                          
748 µs ± 16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

After:

>>> a = MaskedArray(x, x < 0.9)                                                
>>> %timeit np.add.reduce(a, axis=1)                                           
600 µs ± 1.97 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)    
>>> a = MaskedArray(x, x < 0.5)
>>> %timeit np.add.reduce(a, axis=1)      
1.63 ms ± 3.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)                       
>>> a = MaskedArray(x, x < 0.1)                                               
>>> %timeit np.add.reduce(a, axis=1)                                          
684 µs ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So if I did the timings right it seems be faster in the two biased cases, but slower in the worst case.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 4, 2019

@ahaldane - OK, pity that it isn't always faster! I think in part it may be the final change of doing this in a nested while loop - the original I had of a single for loop turns out to have been faster for this case (but substantially worse for the presumably more common case where only few elements are not selected). Still, it means no need to make a copy, which should help for very large arrays (for nansum, things are faster again for shape=(5120,5120); your case should be better since you only need to invert the mask, not also determine it using isnan).

@seberg
Copy link
Member

seberg commented Jan 4, 2019

Yeah, the only way to get better is probably to have specialized inner loop functions that handle the mask internally... Which makes it a ternary ufunc...

@mhvk
Copy link
Contributor Author

mhvk commented Jan 4, 2019

@seberg - I've made some progress with "chaining ufuncs", which execute inner loops in sequence (see https://github.com/mhvk/chain_ufunc). I think those could solve problems like this too.

@seberg
Copy link
Member

seberg commented Jan 4, 2019

@mhvk yeah, chaining is another cool addition that is probably reasonably hard to achieve even. It could achieve something similar, although not for optimization purposes (that would basically need a dedicated specialization). I wonder how far we should go down the line doing things similar to numexpr or libraries that do lazy evaluation with optimization.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 4, 2019

Agreed that it is not clear how far one should go. The chaining I was looking at was mostly for use in quantities, where often you need to multiply one input with a constant to get the right units for the operation. It helps less than I hoped...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants