Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Adding np.nanmean(), nanstd(), and nanvar() #3297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

WeatherGod
Copy link
Contributor

Added these functions, but still needs some tests and verification. I did put these functions in a completely different location than where the nansum(), nanmin(), and family are because the way to do the work more closely matched those in _methods.py rather than in function_base.py.

I will update this PR with some test code soon.

@@ -61,6 +61,26 @@ def _mean(a, axis=None, dtype=None, out=None, keepdims=False):
ret = ret / float(rcount)
return ret

def _nanmean(a, axis=None, dtype=None, out=None, keepdims=False):
arr = array(a, subok=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going to say that it might be better a baseclass + wrap at the end (for matrix support, but matrix support is bad anyway...), but then the non-nan code does the same. Which makes me wonder, would it be sensible to just create a where= kwarg instead making the nan-funcs just tiny wrappers? Of course I could dream about having where for usual ufunc.reduce, but I think it probably would require larger additions to the nditer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @mwiebe did something along those lines at one point with the NA work, but it got pulled out. I seriously want a where= kwarg in the ufunc architecture so that I can "fix" masked arrays making a copy of itself whenever one does a min or a max.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, there is a where= in ufunc.call now, but Mark didn't get around
to implementing it for ufunc.reduce. It would be great to have, definitely.

On Thu, May 2, 2013 at 9:09 AM, Benjamin Root [email protected]:

In numpy/core/_methods.py:

@@ -61,6 +61,26 @@ def _mean(a, axis=None, dtype=None, out=None, keepdims=False):
ret = ret / float(rcount)
return ret

+def _nanmean(a, axis=None, dtype=None, out=None, keepdims=False):

  • arr = array(a, subok=True)

I think @mwiebe https://github.com/mwiebe did something along those
lines at one point with the NA work, but it got pulled out. I seriously
want a where= kwarg in the ufunc architecture so that I can "fix" masked
arrays making a copy of itself whenever one does a min or a max.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3297/files#r4054067
.

@charris
Copy link
Member

charris commented May 12, 2013

Needs tests.

@WeatherGod
Copy link
Contributor Author

Yes, hopefully I will be able to get to them possibly today.

@charris
Copy link
Member

charris commented May 15, 2013

@WeatherGod Don't forget this ;)

@WeatherGod
Copy link
Contributor Author

You know how much work it takes to get ready for the first tropical storm of the season? Conjuring those things up ain't easy, ya know? ;-)

Tests added.

@charris
Copy link
Member

charris commented May 16, 2013

Are you talking about the storm front moving through NE Texas? Take care.

Looks like you got hit with bad values under the masks. That's why we test :0

@WeatherGod
Copy link
Contributor Author

The division by zeros were expected... that's how you get nans where you need them. I didn't realize that runtime warnings are flagged as errors in Travis. Is just simply setting the appropriate np.seterr() settings in the tests good enough?

@charris
Copy link
Member

charris commented May 17, 2013

Hmm, I don't know, it depends on whether the warnings should be raised in normal use. If not, then they should be caught in the functions. If they should be raised, then that should be tested also with assert_raises. Generally, the tests should not raise warnings.

@charris
Copy link
Member

charris commented May 17, 2013

Although we don't normally test for warnings other than deprecation warnings.

@charris
Copy link
Member

charris commented May 17, 2013

Anyreason you can't just use nan?

@WeatherGod
Copy link
Contributor Author

Take another look at the names of the tests that are "failing" (note, these don't fail when I run them myself). "test_allnans", "test_empty". There will be a division by zero as there are zero elements to calculate a mean (or std() or var()) from. This is the same behavior of np.mean([]). So, I figured it should match them in the empty and all nan case.

@charris
Copy link
Member

charris commented May 17, 2013

Haven't read the code yet, waiting for the tests to pass. If that is what they should do, then you should catch that in the(a) test. Might want to make sure that the appropriate warning is made an error. Could be that case should be caught or avoided in the function and a more informative error message raised.

I don't use the nan stuff much, so am not really familiar with it.

@WeatherGod
Copy link
Contributor Author

I added the tests for testing np.nanmean([]) and np.nanmean([nan, nan]) and they do exactly the same thing as the (untested) np.mean([]), which is to simply raise a warning message. I could do:

  1. silence the warning in the test so that I can properly test the results of np.nanmean([]) to be equal to "nan".
  2. add tests of np.mean([]) and catch the warnings in _mean() and _nanmean() (which would be a change in behavior for np.mean())

Either approach is fine with me, but I lean towards (1).

@njsmith
Copy link
Member

njsmith commented May 17, 2013

Sounds like the warnings serve a good purpose then. So I guess the ideal
would be, test that all of mean([]), nanmean([]), nanmean([nan]) both
return nan and raise a warning. (To test for warnings you use a warnings
context manager that just saves a list of warnings instead of erroring out
on them, and check it after each operation.)
On 17 May 2013 04:59, "Benjamin Root" [email protected] wrote:

I added the tests for testing np.nanmean([]) and np.nanmean([nan, nan])
and they do exactly the same thing as the (untested) np.mean([]), which is
to simply raise a warning message. I could do:

  1. silence the warning in the test so that I can properly test the results
    of np.nanmean([]) to be equal to "nan".
  2. add tests of np.mean([]) and catch the warnings in _mean() and
    _nanmean() (which would be a change in behavior for np.mean())

Either approach is fine with me, but I lean towards (1).


Reply to this email directly or view it on GitHubhttps://github.com//pull/3297#issuecomment-18042716
.

@WeatherGod
Copy link
Contributor Author

@njsmith , that sounds like a good idea, but I don't see an example of using a context manager for such a purpose. Can you point me to an example?

@seberg
Copy link
Member

seberg commented May 18, 2013

There is (np.testing.)assert_warns, which is good enough for the purpose I think.

@WeatherGod
Copy link
Contributor Author

Ok, found an example in another directory on how to catch_warnings, and added that to my tests. I also made sure that np.mean(), np.std(), and np.var() are tested to return NaNs as well with empty arrays.

@WeatherGod
Copy link
Contributor Author

ping?

arr = array(a, subok=True)
mask = isnan(arr)

# Upgrade bool, unsigned int, and int to float64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast instead of upgrade.

@WeatherGod
Copy link
Contributor Author

Oh, you have got to be kidding me. Indeed,

np.issubdtype(np.float64, np.bool_)

does return False as expected. Will fix and push up my latest round of revisions.

@WeatherGod
Copy link
Contributor Author

Ok, so I addressed some of the comments so far. I also accidentally made a version of the code that modified the input array, so I created a test for that to catch it if it ever happens again. I think the only remaining issues are issues that are more inherent in the _mean() and _var() methods than _nanmean() and _nanvar().

@m-d-w
Copy link
Contributor

m-d-w commented Jun 5, 2013

It would be nice if nanmedian were also part of this set.

@WeatherGod
Copy link
Contributor Author

I would be more than happy to accept a PR against my branch. I barely have
the time to complete this PR as-is (hurricane season has started).

@charris
Copy link
Member

charris commented Jun 8, 2013

@WeatherGod For some reason I can't comment on the code anymore. Anything odd at your end?

@charris
Copy link
Member

charris commented Jun 8, 2013

@WeatherGod Also, I just noticed that you did your work in master instead of a branch. Could you resubmit this?

@charris
Copy link
Member

charris commented Jun 9, 2013

Pulled into proper branch in #3416.

@charris charris closed this Jun 9, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants