Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Spearman's rank correlation coefficient slow with nan values and nan policy = omit #6654

@jaksmid

Description

@jaksmid

Hello,

I have two arrays of sizes around 8000 reals. The second array contains some nan values.
I want to calculate spearmans rank correlation coefficient and would like to ignore pairs containing nans.

According to the documentation

nan_policy ‘omit’ performs the calculations ignoring nan values.

So I set the nan_policy to omit. After ten minutes of computation I cancelled the computation.
I examined the underlying code and discovered that nan values are assigned the rank of zero.
Shouldnt they be skipped altogether?

When I ignored the nan values myself like this

from numpy import ma
x1 = ma.masked_invalid(array1)
y1 = ma.masked_invalid(array2)
m = ma.mask_or(ma.getmask(x1), ma.getmask(y1))
k = ma.array(x1, mask=m, copy=True).compressed()
j = ma.array(y1, mask=m, copy=True) .compressed()
stats.spearmanr(k,j,nan_policy='omit')

the computation ended in the matter of seconds. Something similar is done in the scipy itself with the exception of calling the .compressed() function.

I am using Python 3.5.2, Numpy 1.11.2 and scipy 0.18.1.
Many thanks,
Jakub

Metadata

Metadata

Assignees

No one assigned

    Labels

    defectA clear bug or issue that prevents SciPy from being installed or used as expectedscipy.stats

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions