-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Adding np.nanmean(), nanstd(), and nanvar() #3297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -61,6 +61,26 @@ def _mean(a, axis=None, dtype=None, out=None, keepdims=False): | |||
ret = ret / float(rcount) | |||
return ret | |||
|
|||
def _nanmean(a, axis=None, dtype=None, out=None, keepdims=False): | |||
arr = array(a, subok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was going to say that it might be better a baseclass + wrap at the end (for matrix support, but matrix support is bad anyway...), but then the non-nan code does the same. Which makes me wonder, would it be sensible to just create a where=
kwarg instead making the nan-funcs just tiny wrappers? Of course I could dream about having where for usual ufunc.reduce, but I think it probably would require larger additions to the nditer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @mwiebe did something along those lines at one point with the NA work, but it got pulled out. I seriously want a where= kwarg in the ufunc architecture so that I can "fix" masked arrays making a copy of itself whenever one does a min or a max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, there is a where= in ufunc.call now, but Mark didn't get around
to implementing it for ufunc.reduce. It would be great to have, definitely.
On Thu, May 2, 2013 at 9:09 AM, Benjamin Root [email protected]:
In numpy/core/_methods.py:
@@ -61,6 +61,26 @@ def _mean(a, axis=None, dtype=None, out=None, keepdims=False):
ret = ret / float(rcount)
return ret+def _nanmean(a, axis=None, dtype=None, out=None, keepdims=False):
- arr = array(a, subok=True)
I think @mwiebe https://github.com/mwiebe did something along those
lines at one point with the NA work, but it got pulled out. I seriously
want a where= kwarg in the ufunc architecture so that I can "fix" masked
arrays making a copy of itself whenever one does a min or a max.—
Reply to this email directly or view it on GitHubhttps://github.com//pull/3297/files#r4054067
.
Needs tests. |
Yes, hopefully I will be able to get to them possibly today. |
@WeatherGod Don't forget this ;) |
You know how much work it takes to get ready for the first tropical storm of the season? Conjuring those things up ain't easy, ya know? ;-) Tests added. |
Are you talking about the storm front moving through NE Texas? Take care. Looks like you got hit with bad values under the masks. That's why we test :0 |
The division by zeros were expected... that's how you get nans where you need them. I didn't realize that runtime warnings are flagged as errors in Travis. Is just simply setting the appropriate np.seterr() settings in the tests good enough? |
Hmm, I don't know, it depends on whether the warnings should be raised in normal use. If not, then they should be caught in the functions. If they should be raised, then that should be tested also with |
Although we don't normally test for warnings other than deprecation warnings. |
Anyreason you can't just use |
Take another look at the names of the tests that are "failing" (note, these don't fail when I run them myself). "test_allnans", "test_empty". There will be a division by zero as there are zero elements to calculate a mean (or std() or var()) from. This is the same behavior of np.mean([]). So, I figured it should match them in the empty and all nan case. |
Haven't read the code yet, waiting for the tests to pass. If that is what they should do, then you should catch that in the(a) test. Might want to make sure that the appropriate warning is made an error. Could be that case should be caught or avoided in the function and a more informative error message raised. I don't use the nan stuff much, so am not really familiar with it. |
I added the tests for testing np.nanmean([]) and np.nanmean([nan, nan]) and they do exactly the same thing as the (untested) np.mean([]), which is to simply raise a warning message. I could do:
Either approach is fine with me, but I lean towards (1). |
Sounds like the warnings serve a good purpose then. So I guess the ideal
|
@njsmith , that sounds like a good idea, but I don't see an example of using a context manager for such a purpose. Can you point me to an example? |
There is (np.testing.)assert_warns, which is good enough for the purpose I think. |
Ok, found an example in another directory on how to catch_warnings, and added that to my tests. I also made sure that np.mean(), np.std(), and np.var() are tested to return NaNs as well with empty arrays. |
ping? |
arr = array(a, subok=True) | ||
mask = isnan(arr) | ||
|
||
# Upgrade bool, unsigned int, and int to float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cast instead of upgrade.
Oh, you have got to be kidding me. Indeed,
does return False as expected. Will fix and push up my latest round of revisions. |
Ok, so I addressed some of the comments so far. I also accidentally made a version of the code that modified the input array, so I created a test for that to catch it if it ever happens again. I think the only remaining issues are issues that are more inherent in the _mean() and _var() methods than _nanmean() and _nanvar(). |
It would be nice if nanmedian were also part of this set. |
I would be more than happy to accept a PR against my branch. I barely have |
@WeatherGod For some reason I can't comment on the code anymore. Anything odd at your end? |
@WeatherGod Also, I just noticed that you did your work in master instead of a branch. Could you resubmit this? |
Pulled into proper branch in #3416. |
Added these functions, but still needs some tests and verification. I did put these functions in a completely different location than where the nansum(), nanmin(), and family are because the way to do the work more closely matched those in _methods.py rather than in function_base.py.
I will update this PR with some test code soon.