Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Percentile function gets tripped up by NaNs when dtype is object #9044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hx2A opened this issue May 3, 2017 · 1 comment
Open

Percentile function gets tripped up by NaNs when dtype is object #9044

hx2A opened this issue May 3, 2017 · 1 comment

Comments

@hx2A
Copy link

hx2A commented May 3, 2017

I have observed some unusual behavior in the numpy percentile function when dtype is object.

In [1]: import numpy as np
   ...: np.__version__
   ...: 
Out[1]: '1.12.1'

This works as expected, sorting the values and giving me the percentiles:

In [2]: data = np.array([105, 100, 44, 10, 14, 120])
   ...: 
   ...: np.percentile(data, np.arange(0, 101, 10))
   ...: 
Out[2]: array([  10. ,   12. ,   14. ,   29. ,   44. ,   72. ,  100. ,  102.5,  105. ,  112.5,  120. ])

If there are NaNs in there I get a warning, as I should:

In [3]: data = np.array([105, 100, np.nan, 10, 14, np.nan])
   ...: 
   ...: np.percentile(data, np.arange(0, 101, 10))
   ...: 
/home/jim/INSTALL/anaconda3/envs/numpy_bug/lib/python3.6/site-packages/numpy/lib/function_base.py:4116: RuntimeWarning: Invalid value encountered in percentile
  interpolation=interpolation)
Out[3]: array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan])

Now I restart Python and change the dtype to object:

In [1]: data = np.array([105, 100, np.nan, 10, 14, np.nan]).astype('object')
   ...: 
   ...: np.percentile(data, np.arange(0, 101, 10))
   ...: 
Out[1]: array([100.0, 102.5, nan, nan, nan, nan, 10.0, 12.0, nan, nan, nan], dtype=object)

No warning and bad output.

I know there is the nanpercentile function but it rejects input of dtype object.

This works as expected:

In [2]: data = np.array([105, 100, np.nan, 10, 14, np.nan])
   ...: 
   ...: np.nanpercentile(data, np.arange(0, 101, 10))
   ...: 
   ...: 
Out[2]: array([  10. ,   11.2,   12.4,   13.6,   31.2,   57. ,   82.8,  100.5,  102. ,  103.5,  105. ])

This is rejected:

In [3]: data = np.array([105, 100, np.nan, 10, 14, np.nan]).astype('object')
   ...: 
   ...: np.nanpercentile(data, np.arange(0, 101, 10))

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I found this while using np.percentile on a Pandas DataFrame where the underlying numpy array had an object dtype. Interestingly the Pandas quantile method (which I probably should have been using instead) handles NaNs correctly. Since that works it's possible somebody has thought about this and I will learn something new. In any case, the no warning and bad output in a large DataFrame with only couple of NaNs results in an easily overlooked mistake.

@eric-wieser
Copy link
Member

eric-wieser commented May 3, 2017

This is essentially #9009 again. It's possible that my changes in #9013 have already fixed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants