Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: Fix argsort vs sort in Masked arrays #8678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 7, 2017

Conversation

eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Feb 23, 2017

This fixes #8664, at the cost of a behavioural change in argsort -- masked values are now sorted to the end by default, whereas before they could end up being placed somewhere arbitrary in the middle. This brings the behaviour of argsort inline with sort.

In particular, argsort would use the default fill value, which for int8 types would be 999999 % 256 = 63. This sorted very misleadingly.

sort made the more sensible choice of using the maximum/minimum possible value.

This also fixes code duplication between the method and function forms of these operations

indexing='ij')
indx[axis] = sindx
return a[indx]
a.sort(axis=axis, kind=kind, order=order)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a choice we need to make here. Right now (and before this patch), np.ma.sort sometimes returns an ma.array, and sometimes an ndarray.

Should we change it to always promote to ma.array, for consistency with the other functions? (Which do this as of #8665 )

self._data.flat = tmp_data
self._mask.flat = tmp_mask
return
self[...] = self[idx]
Copy link
Member Author

@eric-wieser eric-wieser Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could do a little better here, by implementing the in-place sort in terms of the non-in place. Right now (and before this patch), the latter does a redundant copy. So either way, best left for another PR, I think

numpy/ma/core.py Outdated
@@ -5213,7 +5213,8 @@ def round(self, decimals=0, out=None):
out.__setmask__(self._mask)
return out

def argsort(self, axis=None, kind='quicksort', order=None, fill_value=None):
def argsort(self, axis=-1, kind='quicksort', order=None,
endwith=True, fill_value=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endwith needs to be after fill_value to not break the api

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing axis default also probably breaks api

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not changing it breaks liskov substitution though. The default values should really match ndarray. Fair call on swapping argument order

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if they matched but unfortunately that ship has sailed, this default argument is part of our api and we can't change it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliantaylor: Is there a deprecation path for changing it then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding named argument order, perhaps we should enforce that endwith is a boolean, just to help guard against someone (stupidly) passing unnamed arguments, and assuming they match the order of sort?

Copy link
Contributor

@juliantaylor juliantaylor Feb 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a way we could deprecate the axis argument value besides adding an argsort2 and deprecating the old one.

sort has a different order? that is unfortunate ...
I guess we could try to get away with that change, usage of positional arguments for the ones at the end are probably not very common in the wild.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, argsort is documented as having a default argument of -1, even though the actual default is None. So can we update the API to match the documentation?

Copy link
Member Author

@eric-wieser eric-wieser Feb 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re deprecation:

if axis is np._NoValue:
    if  self.ndim > 1:
        warn("Unlike np.argsort and the documentation for this function, the"
             "default axis argument is None, not -1. This only matters for 2 or"
             "higher-dimensional arrays. If this is intended, pass it explicitly to squash this warning")
    axis = None

@charris charris changed the title Fix argsort vs sort in Masked arrays BUG: Fix argsort vs sort in Masked arrays Feb 24, 2017
@eric-wieser
Copy link
Member Author

@juliantaylor: OK, clearly changing the axis default does not belong in this PR. I've created an issue about that at #8701. I think I agree with you that using positional arguments is unlikely in the wild for argsort, especially given how unlikely it is that users pass kind explicitly.

filler = maximum_fill_value(self)
else:
filler = fill_value
idx = list(np.ix_(*[np.arange(x) for x in self.shape]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like these shortcuts very much, the explicit meshgrid is easier to read imo
there isn't really an advantage to using ix_ is there?

Copy link
Member Author

@eric-wieser eric-wieser Feb 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a shortcut though? The two functions are independant, with neither calling the other. Meshgrid doesn't do what we need by default, but ix_ does.

Furthermore, ix_ is used as an indexer in its examples, whereas meshgrid is used to evaluate functions. The former use is what we want here.

@mhvk
Copy link
Contributor

mhvk commented Mar 7, 2017

@eric-wieser - Overall, this looks good. The main possible issue I see is that in adding the additional endwith argument, you change the order. I think that is fine, since it is unlikely someone would pass in all preceding arguments as positional ones.
A question is whether you wouldn't rather use #8714 (though I guess the order is not so important).
Finally, in reply to your in-line question, yes, I think we should always return a MaskedArray instance, i.e., just to masked_array on the input and call sort.

All this means it is a bit of an API change, so definitely needs an entry in the release notes (and arguably a note to the mailing list).

@eric-wieser
Copy link
Member Author

A question is whether you wouldn't rather use #8714

I don't really want a feature needing discussion to hold up a bugfix.

The main possible issue I see is that in adding the additional endwith argument, you change the order.

@juliantaylor seemed to think we could get away with it (hidden comment) as well

yes, I think we should always return a MaskedArray instance

I'll update this accordingly then

so definitely needs an entry in the release notes

Will do

@mhvk
Copy link
Contributor

mhvk commented Mar 7, 2017

OK, sounds good. I do think this has gone beyond a bug fix (ie., MAINT as well), which I why I set the milestone to 1.13. (I think a bug fix would only touch the default fill value, but do not really think this is worth implementing).

@eric-wieser eric-wieser force-pushed the fix-argsort-vs-sort branch from 06c40c3 to 37bb166 Compare March 7, 2017 16:10
@eric-wieser
Copy link
Member Author

eric-wieser commented Mar 7, 2017

yes, I think we should always return a MaskedArray instance

Actually, I'm backtracking on this. While I think that all the np.ma functions should return masked arrays, I think that that change belongs in a PR by itself, along with changes to the other functions.

Release notes added, should be good to go.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one annoying tidbit, otherwise good to go

@eric-wieser eric-wieser force-pushed the fix-argsort-vs-sort branch from 37bb166 to ee90efc Compare March 7, 2017 17:35
@eric-wieser
Copy link
Member Author

Done, I think - just waiting on tests.

@mhvk mhvk merged commit 6a3edf3 into numpy:master Mar 7, 2017
@eric-wieser
Copy link
Member Author

A question is whether you wouldn't rather use #8714

I've updated #8714 to make that fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected numpy.unique behavior with return_inverse=True on uint8 dtype
3 participants