-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Fix argsort vs sort in Masked arrays #8678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
indexing='ij') | ||
indx[axis] = sindx | ||
return a[indx] | ||
a.sort(axis=axis, kind=kind, order=order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a choice we need to make here. Right now (and before this patch), np.ma.sort
sometimes returns an ma.array
, and sometimes an ndarray
.
Should we change it to always promote to ma.array
, for consistency with the other functions? (Which do this as of #8665 )
f498c2c
to
34a02d7
Compare
self._data.flat = tmp_data | ||
self._mask.flat = tmp_mask | ||
return | ||
self[...] = self[idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we could do a little better here, by implementing the in-place sort in terms of the non-in place. Right now (and before this patch), the latter does a redundant copy. So either way, best left for another PR, I think
numpy/ma/core.py
Outdated
@@ -5213,7 +5213,8 @@ def round(self, decimals=0, out=None): | |||
out.__setmask__(self._mask) | |||
return out | |||
|
|||
def argsort(self, axis=None, kind='quicksort', order=None, fill_value=None): | |||
def argsort(self, axis=-1, kind='quicksort', order=None, | |||
endwith=True, fill_value=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
endwith needs to be after fill_value to not break the api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing axis default also probably breaks api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not changing it breaks liskov substitution though. The default values should really match ndarray
. Fair call on swapping argument order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice if they matched but unfortunately that ship has sailed, this default argument is part of our api and we can't change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juliantaylor: Is there a deprecation path for changing it then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding named argument order, perhaps we should enforce that endwith
is a boolean, just to help guard against someone (stupidly) passing unnamed arguments, and assuming they match the order of sort
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a way we could deprecate the axis argument value besides adding an argsort2 and deprecating the old one.
sort has a different order? that is unfortunate ...
I guess we could try to get away with that change, usage of positional arguments for the ones at the end are probably not very common in the wild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, argsort
is documented as having a default argument of -1
, even though the actual default is None
. So can we update the API to match the documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re deprecation:
if axis is np._NoValue:
if self.ndim > 1:
warn("Unlike np.argsort and the documentation for this function, the"
"default axis argument is None, not -1. This only matters for 2 or"
"higher-dimensional arrays. If this is intended, pass it explicitly to squash this warning")
axis = None
34a02d7
to
b5c4f9a
Compare
@juliantaylor: OK, clearly changing the axis default does not belong in this PR. I've created an issue about that at #8701. I think I agree with you that using positional arguments is unlikely in the wild for |
filler = maximum_fill_value(self) | ||
else: | ||
filler = fill_value | ||
idx = list(np.ix_(*[np.arange(x) for x in self.shape])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like these shortcuts very much, the explicit meshgrid is easier to read imo
there isn't really an advantage to using ix_
is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a shortcut though? The two functions are independant, with neither calling the other. Meshgrid doesn't do what we need by default, but ix_
does.
Furthermore, ix_
is used as an indexer in its examples, whereas meshgrid is used to evaluate functions. The former use is what we want here.
@eric-wieser - Overall, this looks good. The main possible issue I see is that in adding the additional All this means it is a bit of an API change, so definitely needs an entry in the release notes (and arguably a note to the mailing list). |
I don't really want a feature needing discussion to hold up a bugfix.
@juliantaylor seemed to think we could get away with it (hidden comment) as well
I'll update this accordingly then
Will do |
OK, sounds good. I do think this has gone beyond a bug fix (ie., MAINT as well), which I why I set the milestone to 1.13. (I think a bug fix would only touch the default fill value, but do not really think this is worth implementing). |
06c40c3
to
37bb166
Compare
Actually, I'm backtracking on this. While I think that all the Release notes added, should be good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one annoying tidbit, otherwise good to go
Previously, these had different rules for unmasking values, and even different arguments to decide how to do so. Fixes numpy#8664
37bb166
to
ee90efc
Compare
Done, I think - just waiting on tests. |
|
This fixes #8664, at the cost of a behavioural change in
argsort
-- masked values are now sorted to the end by default, whereas before they could end up being placed somewhere arbitrary in the middle. This brings the behaviour ofargsort
inline withsort
.In particular,
argsort
would use the default fill value, which forint8
types would be999999 % 256 = 63
. This sorted very misleadingly.sort
made the more sensible choice of using the maximum/minimum possible value.This also fixes code duplication between the method and function forms of these operations