-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
Hello,
I have two arrays of sizes around 8000 reals. The second array contains some nan values.
I want to calculate spearmans rank correlation coefficient and would like to ignore pairs containing nans.
According to the documentation
nan_policy ‘omit’ performs the calculations ignoring nan values.
So I set the nan_policy to omit. After ten minutes of computation I cancelled the computation.
I examined the underlying code and discovered that nan values are assigned the rank of zero.
Shouldnt they be skipped altogether?
When I ignored the nan values myself like this
from numpy import ma
x1 = ma.masked_invalid(array1)
y1 = ma.masked_invalid(array2)
m = ma.mask_or(ma.getmask(x1), ma.getmask(y1))
k = ma.array(x1, mask=m, copy=True).compressed()
j = ma.array(y1, mask=m, copy=True) .compressed()
stats.spearmanr(k,j,nan_policy='omit')
the computation ended in the matter of seconds. Something similar is done in the scipy itself with the exception of calling the .compressed() function.
I am using Python 3.5.2, Numpy 1.11.2 and scipy 0.18.1.
Many thanks,
Jakub