Adding np.nanmean(), nanstd(), and nanvar() #3297

WeatherGod · 2013-05-02T04:58:28Z

Added these functions, but still needs some tests and verification. I did put these functions in a completely different location than where the nansum(), nanmin(), and family are because the way to do the work more closely matched those in _methods.py rather than in function_base.py.

I will update this PR with some test code soon.

seberg · 2013-05-02T13:00:51Z

numpy/core/_methods.py

@@ -61,6 +61,26 @@ def _mean(a, axis=None, dtype=None, out=None, keepdims=False):
        ret = ret / float(rcount)
    return ret

+def _nanmean(a, axis=None, dtype=None, out=None, keepdims=False):
+    arr = array(a, subok=True)


Was going to say that it might be better a baseclass + wrap at the end (for matrix support, but matrix support is bad anyway...), but then the non-nan code does the same. Which makes me wonder, would it be sensible to just create a where= kwarg instead making the nan-funcs just tiny wrappers? Of course I could dream about having where for usual ufunc.reduce, but I think it probably would require larger additions to the nditer.

I think @mwiebe did something along those lines at one point with the NA work, but it got pulled out. I seriously want a where= kwarg in the ufunc architecture so that I can "fix" masked arrays making a copy of itself whenever one does a min or a max.

IIRC, there is a where= in ufunc.call now, but Mark didn't get around
to implementing it for ufunc.reduce. It would be great to have, definitely.

On Thu, May 2, 2013 at 9:09 AM, Benjamin Root [email protected]:

In numpy/core/_methods.py:

@@ -61,6 +61,26 @@ def _mean(a, axis=None, dtype=None, out=None, keepdims=False):
ret = ret / float(rcount)
return ret

+def _nanmean(a, axis=None, dtype=None, out=None, keepdims=False):

arr = array(a, subok=True)

I think @mwiebe https://github.com/mwiebe did something along those
lines at one point with the NA work, but it got pulled out. I seriously
want a where= kwarg in the ufunc architecture so that I can "fix" masked
arrays making a copy of itself whenever one does a min or a max.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/3297/files#r4054067
.

charris · 2013-05-12T14:35:36Z

Needs tests.

WeatherGod · 2013-05-13T13:22:26Z

Yes, hopefully I will be able to get to them possibly today.

charris · 2013-05-15T15:29:16Z

@WeatherGod Don't forget this ;)

WeatherGod · 2013-05-16T02:13:37Z

You know how much work it takes to get ready for the first tropical storm of the season? Conjuring those things up ain't easy, ya know? ;-)

Tests added.

charris · 2013-05-16T03:28:47Z

Are you talking about the storm front moving through NE Texas? Take care.

Looks like you got hit with bad values under the masks. That's why we test :0

WeatherGod · 2013-05-17T01:49:19Z

The division by zeros were expected... that's how you get nans where you need them. I didn't realize that runtime warnings are flagged as errors in Travis. Is just simply setting the appropriate np.seterr() settings in the tests good enough?

charris · 2013-05-17T03:05:23Z

Hmm, I don't know, it depends on whether the warnings should be raised in normal use. If not, then they should be caught in the functions. If they should be raised, then that should be tested also with assert_raises. Generally, the tests should not raise warnings.

charris · 2013-05-17T03:07:58Z

Although we don't normally test for warnings other than deprecation warnings.

charris · 2013-05-17T03:08:51Z

Anyreason you can't just use nan?

WeatherGod · 2013-05-17T03:18:10Z

Take another look at the names of the tests that are "failing" (note, these don't fail when I run them myself). "test_allnans", "test_empty". There will be a division by zero as there are zero elements to calculate a mean (or std() or var()) from. This is the same behavior of np.mean([]). So, I figured it should match them in the empty and all nan case.

charris · 2013-05-17T03:26:25Z

Haven't read the code yet, waiting for the tests to pass. If that is what they should do, then you should catch that in the(a) test. Might want to make sure that the appropriate warning is made an error. Could be that case should be caught or avoided in the function and a more informative error message raised.

I don't use the nan stuff much, so am not really familiar with it.

WeatherGod · 2013-05-17T03:59:28Z

I added the tests for testing np.nanmean([]) and np.nanmean([nan, nan]) and they do exactly the same thing as the (untested) np.mean([]), which is to simply raise a warning message. I could do:

silence the warning in the test so that I can properly test the results of np.nanmean([]) to be equal to "nan".
add tests of np.mean([]) and catch the warnings in _mean() and _nanmean() (which would be a change in behavior for np.mean())

Either approach is fine with me, but I lean towards (1).

njsmith · 2013-05-17T05:01:57Z

Sounds like the warnings serve a good purpose then. So I guess the ideal
would be, test that all of mean([]), nanmean([]), nanmean([nan]) both
return nan and raise a warning. (To test for warnings you use a warnings
context manager that just saves a list of warnings instead of erroring out
on them, and check it after each operation.)
On 17 May 2013 04:59, "Benjamin Root" [email protected] wrote:

I added the tests for testing np.nanmean([]) and np.nanmean([nan, nan])
and they do exactly the same thing as the (untested) np.mean([]), which is
to simply raise a warning message. I could do:

silence the warning in the test so that I can properly test the results
of np.nanmean([]) to be equal to "nan".

add tests of np.mean([]) and catch the warnings in _mean() and
_nanmean() (which would be a change in behavior for np.mean())

Either approach is fine with me, but I lean towards (1).

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/3297#issuecomment-18042716
.

WeatherGod · 2013-05-18T00:52:08Z

@njsmith , that sounds like a good idea, but I don't see an example of using a context manager for such a purpose. Can you point me to an example?

seberg · 2013-05-18T17:35:56Z

There is (np.testing.)assert_warns, which is good enough for the purpose I think.

WeatherGod · 2013-05-19T18:49:55Z

Ok, found an example in another directory on how to catch_warnings, and added that to my tests. I also made sure that np.mean(), np.std(), and np.var() are tested to return NaNs as well with empty arrays.

WeatherGod · 2013-05-21T17:59:56Z

ping?

charris · 2013-05-22T01:53:51Z

numpy/core/_methods.py

+    arr = array(a, subok=True)
+    mask = isnan(arr)
+
+    # Upgrade bool, unsigned int, and int to float64


Cast instead of upgrade.

WeatherGod · 2013-05-31T01:58:10Z

Oh, you have got to be kidding me. Indeed,

np.issubdtype(np.float64, np.bool_)

does return False as expected. Will fix and push up my latest round of revisions.

WeatherGod · 2013-05-31T02:44:08Z

Ok, so I addressed some of the comments so far. I also accidentally made a version of the code that modified the input array, so I created a test for that to catch it if it ever happens again. I think the only remaining issues are issues that are more inherent in the _mean() and _var() methods than _nanmean() and _nanvar().

m-d-w · 2013-06-05T14:40:11Z

It would be nice if nanmedian were also part of this set.

WeatherGod · 2013-06-05T14:43:43Z

I would be more than happy to accept a PR against my branch. I barely have
the time to complete this PR as-is (hurricane season has started).

charris · 2013-06-08T22:38:19Z

@WeatherGod For some reason I can't comment on the code anymore. Anything odd at your end?

charris · 2013-06-08T22:40:41Z

@WeatherGod Also, I just noticed that you did your work in master instead of a branch. Could you resubmit this?

charris · 2013-06-09T15:56:05Z

Pulled into proper branch in #3416.

Adding np.nanmean(), np.nanstd(), np.nanvar()

de30692

seberg reviewed May 2, 2013
View reviewed changes

Added tests for nanmean(), nanvar(), nanstd()

a457158

Tests now checks the warning state

f15be52

charris reviewed May 22, 2013
View reviewed changes

Updated comments and dtype tests in _methods.py

5be45b2

charris mentioned this pull request Jun 9, 2013

Adding np.nanmean(), nanstd(), and nanvar(), gh-3297 branched. #3416

Closed

charris closed this Jun 9, 2013

Uh oh!

Adding np.nanmean(), nanstd(), and nanvar() #3297

Adding np.nanmean(), nanstd(), and nanvar() #3297

Uh oh!

Conversation

WeatherGod commented May 2, 2013

Uh oh!

seberg May 2, 2013

Choose a reason for hiding this comment

Uh oh!

WeatherGod May 2, 2013

Choose a reason for hiding this comment

Uh oh!

njsmith May 2, 2013

Choose a reason for hiding this comment

Uh oh!

charris commented May 12, 2013

Uh oh!

WeatherGod commented May 13, 2013

Uh oh!

charris commented May 15, 2013

Uh oh!

WeatherGod commented May 16, 2013

Uh oh!

charris commented May 16, 2013

Uh oh!

WeatherGod commented May 17, 2013

Uh oh!

charris commented May 17, 2013

Uh oh!

charris commented May 17, 2013

Uh oh!

charris commented May 17, 2013

Uh oh!

WeatherGod commented May 17, 2013

Uh oh!

charris commented May 17, 2013

Uh oh!

WeatherGod commented May 17, 2013

Uh oh!

njsmith commented May 17, 2013

Uh oh!

WeatherGod commented May 18, 2013

Uh oh!

seberg commented May 18, 2013

Uh oh!

WeatherGod commented May 19, 2013

Uh oh!

WeatherGod commented May 21, 2013

Uh oh!

charris May 22, 2013

Choose a reason for hiding this comment

Uh oh!

WeatherGod commented May 31, 2013

Uh oh!

WeatherGod commented May 31, 2013

Uh oh!

m-d-w commented Jun 5, 2013

Uh oh!

WeatherGod commented Jun 5, 2013

Uh oh!

charris commented Jun 8, 2013

Uh oh!

charris commented Jun 8, 2013

Uh oh!

charris commented Jun 9, 2013

Uh oh!

Uh oh!