Spearman's rank correlation coefficient slow with nan values and nan policy = omit

Hello,

I have two arrays of sizes around 8000 reals. The second array contains some nan values.
I want to calculate spearmans rank correlation coefficient and would like to ignore pairs containing nans.

According to the documentation 

>  nan_policy ‘omit’ performs the calculations ignoring nan values. 

So I set the nan_policy to omit. After ten minutes of computation I cancelled the computation.
I examined the underlying code and discovered that nan values are assigned the rank of zero.
Shouldnt they be skipped altogether?

When I ignored the nan values myself like this

```
from numpy import ma
x1 = ma.masked_invalid(array1)
y1 = ma.masked_invalid(array2)
m = ma.mask_or(ma.getmask(x1), ma.getmask(y1))
k = ma.array(x1, mask=m, copy=True).compressed()
j = ma.array(y1, mask=m, copy=True) .compressed()
stats.spearmanr(k,j,nan_policy='omit')
```

the computation ended in the matter of seconds. Something similar is done in the scipy itself with the exception of calling the `.compressed()` function.

I am using Python 3.5.2, Numpy 1.11.2 and scipy 0.18.1.
Many thanks,
Jakub


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Spearman's rank correlation coefficient slow with nan values and nan policy = omit #6654

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Spearman's rank correlation coefficient slow with nan values and nan policy = omit #6654

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions