-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Allow using masked in set_offsets
#24757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be using np.ma
namespace functions when we don't need to. The current call will force the offsets to always be a MaskedArray
even if a normal Numpy array was passed in.
lib/matplotlib/collections.py
Outdated
self._offsets = np.column_stack( | ||
(np.asarray(self.convert_xunits(offsets[:, 0]), float), | ||
np.asarray(self.convert_yunits(offsets[:, 1]), float))) | ||
self._offsets = np.ma.column_stack( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._offsets = np.ma.column_stack( | |
self._offsets = np.column_stack( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your concern but using np.column_stack
loses all the mask information if offsets
is masked (the reason for this PR). I guess I could put it in an if condition to separate the two cases to make sure that masked arrays are used only when the inputs are masked and not otherwise. How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is (at least) one other case where the conditional approach is used for exactly this, so I think it makes sense here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I didn't realize that np.column_stack
didn't handle the mask properly. It is a bit of a mess in numpy that needs sorting out over there unfortunately.
See a PR where I tried to make it a bit more uniform in the handling of all these functions: numpy/numpy#16022 but it got reverted and I never had the time to finish it up. Maybe this will give me the motivation to revisit that.
Yes, I do think that special casing masked vs normal array is the right way to go here. Another check you can put in as a test is to verify the types of set_offsets() == get_offsets()
so we aren't changing an input numpy array into a MaskedArray as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@greglucas what is the community feeling on masked arrays? My impression is that practically they are not used, in favour of NaN for missing/bad data. Xarray, which is the basis for much of my analysis, doesn't even have the concept of masked arrays.
More practically I think some of this should wait for whatever unit refactor will happen. How we input this and represent this internally should be handled by the same machinery and I'm pretty negative about casting everything to masked arrays. If anything I'd go the other way and strip the mask and enter NaN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the community feeling on masked arrays?
My impression is that everyone thinks there should be something better/more robust than np.ma.MaskedArray
, but it may be tricky to make a transition over to a replacement version because of some of the baked in assumptions that are already in the module.
How we input this and represent this internally should be handled by the same machinery and I'm pretty negative about casting everything to masked arrays.
I completely agree. I don't like casting things to MaskedArrays by default either (hence my block on this as it currently stands).
If anything I'd go the other way and strip the mask and enter NaN.
I also don't like this approach because we are changing the user's data type again. I'm hoping this can be solved by the Data Refactor work as well if we just point to a user's data rather than copy it somewhere new. A reason not to do default NaN
casting is because then you run into another wall where we probably use np.mean(arr)
throughout the library rather than np.nanmean(arr)
(and similar non-nan functions) and I don't think we want to change all of those call sites around either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More practically I think some of this should wait for whatever unit refactor will happen. How we input this and represent this internally should be handled by the same machinery and I'm pretty negative about casting everything to masked arrays. If anything I'd go the other way and strip the mask and enter NaN.
I understand the concerns with masked arrays but I do think we should include this change into the library mainly because of consistency and user expectations. Sure, we use the np.ma
functions only if the input is masked but the behaviour shouldn't be that we lose the mask entirely, imo (as it currently happens).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ksunden , just pinging you here for background since you're working on the new data and units model. The MaskedArray
handling throughout the code is something you might want to look at as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, but overall I think this looks good and keeps the instance passed in.
It looks like the docs failure is real in animation/unchained.py |
The doc failure is real, but unfortunately on the |
PR Summary
scatter
allows using masked arrays as data and it also respects it by not plotting the masked points (#24545 , #24732 , #24733). But, if the offsets are updated usingset_offsets
and the input data is a masked array, the mask information is lost inset_offsets
and the new plot contains all the points. A quick example:Current behaviour:

Proposed behaviour:

This is a small effort in lines with #24733 .
PR Checklist
Documentation and Tests
pytest
passes)Release Notes
.. versionadded::
directive in the docstring and documented indoc/users/next_whats_new/
.. versionchanged::
directive in the docstring and documented indoc/api/next_api_changes/
next_whats_new/README.rst
ornext_api_changes/README.rst