Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Allow using masked in set_offsets #24757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 21, 2022

Conversation

chahak13
Copy link
Contributor

PR Summary

scatter allows using masked arrays as data and it also respects it by not plotting the masked points (#24545 , #24732 , #24733). But, if the offsets are updated using set_offsets and the input data is a masked array, the mask information is lost in set_offsets and the new plot contains all the points. A quick example:

import numpy as np
import matplotlib.pyplot as plt

x = np.ma.array([1, 2, 3, 4, 5], mask=[0, 0, 1, 1, 0])
y = np.arange(1, 6)

fig, ax = plt.subplots()
scat = ax.scatter(x, y)

x = x/ 2
scat.set_offsets(np.ma.column_stack([x, y]))
ax.set_xlim(0, 6)
plt.show()

Current behaviour:
image

Proposed behaviour:
image

This is a small effort in lines with #24733 .

PR Checklist

Documentation and Tests

  • Has pytest style unit tests (and pytest passes)
  • [n/a] Documentation is sphinx and numpydoc compliant (the docs should build without error).
  • [n/a] New plotting related features are documented with examples.

Release Notes

  • [n/a] New features are marked with a .. versionadded:: directive in the docstring and documented in doc/users/next_whats_new/
  • [n/a] API changes are marked with a .. versionchanged:: directive in the docstring and documented in doc/api/next_api_changes/
  • [n/a] Release notes conform with instructions in next_whats_new/README.rst or next_api_changes/README.rst

Copy link
Contributor

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be using np.ma namespace functions when we don't need to. The current call will force the offsets to always be a MaskedArray even if a normal Numpy array was passed in.

self._offsets = np.column_stack(
(np.asarray(self.convert_xunits(offsets[:, 0]), float),
np.asarray(self.convert_yunits(offsets[:, 1]), float)))
self._offsets = np.ma.column_stack(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._offsets = np.ma.column_stack(
self._offsets = np.column_stack(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your concern but using np.column_stack loses all the mask information if offsets is masked (the reason for this PR). I guess I could put it in an if condition to separate the two cases to make sure that masked arrays are used only when the inputs are masked and not otherwise. How does that sound?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is (at least) one other case where the conditional approach is used for exactly this, so I think it makes sense here as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I didn't realize that np.column_stack didn't handle the mask properly. It is a bit of a mess in numpy that needs sorting out over there unfortunately.
See a PR where I tried to make it a bit more uniform in the handling of all these functions: numpy/numpy#16022 but it got reverted and I never had the time to finish it up. Maybe this will give me the motivation to revisit that.

Yes, I do think that special casing masked vs normal array is the right way to go here. Another check you can put in as a test is to verify the types of set_offsets() == get_offsets() so we aren't changing an input numpy array into a MaskedArray as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greglucas what is the community feeling on masked arrays? My impression is that practically they are not used, in favour of NaN for missing/bad data. Xarray, which is the basis for much of my analysis, doesn't even have the concept of masked arrays.

More practically I think some of this should wait for whatever unit refactor will happen. How we input this and represent this internally should be handled by the same machinery and I'm pretty negative about casting everything to masked arrays. If anything I'd go the other way and strip the mask and enter NaN.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the community feeling on masked arrays?

My impression is that everyone thinks there should be something better/more robust than np.ma.MaskedArray, but it may be tricky to make a transition over to a replacement version because of some of the baked in assumptions that are already in the module.

How we input this and represent this internally should be handled by the same machinery and I'm pretty negative about casting everything to masked arrays.

I completely agree. I don't like casting things to MaskedArrays by default either (hence my block on this as it currently stands).

If anything I'd go the other way and strip the mask and enter NaN.

I also don't like this approach because we are changing the user's data type again. I'm hoping this can be solved by the Data Refactor work as well if we just point to a user's data rather than copy it somewhere new. A reason not to do default NaN casting is because then you run into another wall where we probably use np.mean(arr) throughout the library rather than np.nanmean(arr) (and similar non-nan functions) and I don't think we want to change all of those call sites around either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More practically I think some of this should wait for whatever unit refactor will happen. How we input this and represent this internally should be handled by the same machinery and I'm pretty negative about casting everything to masked arrays. If anything I'd go the other way and strip the mask and enter NaN.

I understand the concerns with masked arrays but I do think we should include this change into the library mainly because of consistency and user expectations. Sure, we use the np.ma functions only if the input is masked but the behaviour shouldn't be that we lose the mask entirely, imo (as it currently happens).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksunden , just pinging you here for background since you're working on the new data and units model. The MaskedArray handling throughout the code is something you might want to look at as well.

greglucas
greglucas previously approved these changes Dec 19, 2022
Copy link
Contributor

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, but overall I think this looks good and keeps the instance passed in.

@greglucas
Copy link
Contributor

It looks like the docs failure is real in animation/unchained.py

@greglucas greglucas dismissed their stale review December 19, 2022 14:31

doc failure is real

@oscargus
Copy link
Member

The doc failure is real, but unfortunately on the main branch as well (so not caused by this PR).

@tacaswell tacaswell merged commit 2b5b925 into matplotlib:main Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants