Allow using masked in `set_offsets` #24757

chahak13 · 2022-12-17T11:16:38Z

PR Summary

scatter allows using masked arrays as data and it also respects it by not plotting the masked points (#24545 , #24732 , #24733). But, if the offsets are updated using set_offsets and the input data is a masked array, the mask information is lost in set_offsets and the new plot contains all the points. A quick example:

import numpy as np
import matplotlib.pyplot as plt

x = np.ma.array([1, 2, 3, 4, 5], mask=[0, 0, 1, 1, 0])
y = np.arange(1, 6)

fig, ax = plt.subplots()
scat = ax.scatter(x, y)

x = x/ 2
scat.set_offsets(np.ma.column_stack([x, y]))
ax.set_xlim(0, 6)
plt.show()

Current behaviour:

Proposed behaviour:

This is a small effort in lines with #24733 .

PR Checklist

Documentation and Tests

Has pytest style unit tests (and pytest passes)
[n/a] Documentation is sphinx and numpydoc compliant (the docs should build without error).
[n/a] New plotting related features are documented with examples.

Release Notes

[n/a] New features are marked with a .. versionadded:: directive in the docstring and documented in doc/users/next_whats_new/
[n/a] API changes are marked with a .. versionchanged:: directive in the docstring and documented in doc/api/next_api_changes/
[n/a] Release notes conform with instructions in next_whats_new/README.rst or next_api_changes/README.rst

greglucas

I don't think we should be using np.ma namespace functions when we don't need to. The current call will force the offsets to always be a MaskedArray even if a normal Numpy array was passed in.

greglucas · 2022-12-17T15:33:08Z

lib/matplotlib/collections.py

-        self._offsets = np.column_stack(
-            (np.asarray(self.convert_xunits(offsets[:, 0]), float),
-             np.asarray(self.convert_yunits(offsets[:, 1]), float)))
+        self._offsets = np.ma.column_stack(


Suggested change

self._offsets = np.ma.column_stack(

self._offsets = np.column_stack(

I understand your concern but using np.column_stack loses all the mask information if offsets is masked (the reason for this PR). I guess I could put it in an if condition to separate the two cases to make sure that masked arrays are used only when the inputs are masked and not otherwise. How does that sound?

There is (at least) one other case where the conditional approach is used for exactly this, so I think it makes sense here as well.

Ahh, I didn't realize that np.column_stack didn't handle the mask properly. It is a bit of a mess in numpy that needs sorting out over there unfortunately.
See a PR where I tried to make it a bit more uniform in the handling of all these functions: numpy/numpy#16022 but it got reverted and I never had the time to finish it up. Maybe this will give me the motivation to revisit that.

Yes, I do think that special casing masked vs normal array is the right way to go here. Another check you can put in as a test is to verify the types of set_offsets() == get_offsets() so we aren't changing an input numpy array into a MaskedArray as well.

@greglucas what is the community feeling on masked arrays? My impression is that practically they are not used, in favour of NaN for missing/bad data. Xarray, which is the basis for much of my analysis, doesn't even have the concept of masked arrays.

More practically I think some of this should wait for whatever unit refactor will happen. How we input this and represent this internally should be handled by the same machinery and I'm pretty negative about casting everything to masked arrays. If anything I'd go the other way and strip the mask and enter NaN.

what is the community feeling on masked arrays?

My impression is that everyone thinks there should be something better/more robust than np.ma.MaskedArray, but it may be tricky to make a transition over to a replacement version because of some of the baked in assumptions that are already in the module.

How we input this and represent this internally should be handled by the same machinery and I'm pretty negative about casting everything to masked arrays.

I completely agree. I don't like casting things to MaskedArrays by default either (hence my block on this as it currently stands).

If anything I'd go the other way and strip the mask and enter NaN.

I also don't like this approach because we are changing the user's data type again. I'm hoping this can be solved by the Data Refactor work as well if we just point to a user's data rather than copy it somewhere new. A reason not to do default NaN casting is because then you run into another wall where we probably use np.mean(arr) throughout the library rather than np.nanmean(arr) (and similar non-nan functions) and I don't think we want to change all of those call sites around either.

More practically I think some of this should wait for whatever unit refactor will happen. How we input this and represent this internally should be handled by the same machinery and I'm pretty negative about casting everything to masked arrays. If anything I'd go the other way and strip the mask and enter NaN.

I understand the concerns with masked arrays but I do think we should include this change into the library mainly because of consistency and user expectations. Sure, we use the np.ma functions only if the input is masked but the behaviour shouldn't be that we lose the mask entirely, imo (as it currently happens).

@ksunden , just pinging you here for background since you're working on the new data and units model. The MaskedArray handling throughout the code is something you might want to look at as well.

lib/matplotlib/tests/test_collections.py

greglucas

Some minor comments, but overall I think this looks good and keeps the instance passed in.

lib/matplotlib/collections.py

lib/matplotlib/tests/test_collections.py

greglucas · 2022-12-19T14:30:18Z

It looks like the docs failure is real in animation/unchained.py

doc failure is real

oscargus · 2022-12-19T15:51:21Z

The doc failure is real, but unfortunately on the main branch as well (so not caused by this PR).

lib/matplotlib/tests/test_collections.py

lib/matplotlib/collections.py

Allow using masked array as offsets

eaf830f

chahak13 mentioned this pull request Dec 17, 2022

[MNT]: Masked arrays with units #24733

Open

oscargus approved these changes Dec 17, 2022

View reviewed changes

oscargus added this to the v3.7.0 milestone Dec 17, 2022

oscargus added the topic: units and array ducktypes label Dec 17, 2022

greglucas requested changes Dec 17, 2022

View reviewed changes

Use np.ma functions only when input is masked

bf73d48

greglucas previously approved these changes Dec 19, 2022

View reviewed changes

lib/matplotlib/collections.py Outdated Show resolved Hide resolved

lib/matplotlib/tests/test_collections.py Outdated Show resolved Hide resolved

Refactor function selection and test

a057121

greglucas approved these changes Dec 20, 2022

View reviewed changes

lib/matplotlib/tests/test_collections.py Outdated Show resolved Hide resolved

lib/matplotlib/collections.py Outdated Show resolved Hide resolved

chahak13 added 2 commits December 21, 2022 14:31

Remove indentation.

6605026

Remove explicit limits

5267067

tacaswell merged commit 2b5b925 into matplotlib:main Dec 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Allow using masked in `set_offsets` #24757

Allow using masked in `set_offsets` #24757

Uh oh!

chahak13 commented Dec 17, 2022

Uh oh!

greglucas left a comment

Uh oh!

greglucas Dec 17, 2022

Uh oh!

chahak13 Dec 18, 2022

Uh oh!

oscargus Dec 18, 2022

Uh oh!

greglucas Dec 18, 2022

Uh oh!

jklymak Dec 18, 2022

Uh oh!

greglucas Dec 18, 2022

Uh oh!

chahak13 Dec 19, 2022

Uh oh!

greglucas Dec 19, 2022

Uh oh!

Uh oh!

greglucas left a comment

Uh oh!

Uh oh!

Uh oh!

greglucas commented Dec 19, 2022

Uh oh!

oscargus commented Dec 19, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	self._offsets = np.ma.column_stack(
	self._offsets = np.column_stack(

Uh oh!

Allow using masked in set_offsets #24757

Allow using masked in set_offsets #24757

Uh oh!

Conversation

chahak13 commented Dec 17, 2022

PR Summary

PR Checklist

Uh oh!

greglucas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greglucas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greglucas commented Dec 19, 2022

Uh oh!

oscargus commented Dec 19, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Allow using masked in `set_offsets` #24757

Allow using masked in `set_offsets` #24757