Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Use masked stack to preserve mask info #24732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 15, 2022

Conversation

chahak13
Copy link
Contributor

PR Summary

As pointed out in #24545 , scatter was not masking datetime correctly. This issue would be true for any data with units because of the following lines

xs = self.convert_xunits(offsets[:, 0])
ys = self.convert_yunits(offsets[:, 1])
offsets = np.column_stack([xs, ys])

Whenever self.have_units() was True, it would use np.column_stack instead of np.ma.column_stack which would lead to the loss of mask information. Changing that to the masked version resolves #24545 .

PR Checklist

Documentation and Tests

  • Has pytest style unit tests (and pytest passes)
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).
  • New plotting related features are documented with examples.

Release Notes

  • New features are marked with a .. versionadded:: directive in the docstring and documented in doc/users/next_whats_new/
  • API changes are marked with a .. versionchanged:: directive in the docstring and documented in doc/api/next_api_changes/
  • Release notes conform with instructions in next_whats_new/README.rst or next_api_changes/README.rst

@oscargus
Copy link
Member

Can you please add a test (like just copy the one in #24545)?

(This seems to be the correct fix, although my knowledge of masked is very limited...)

@rcomer
Copy link
Member

rcomer commented Dec 15, 2022

Nice detective work @chahak13! Could you add a test that would have failed without this change? Maybe adapt the example from #24545 into an image test.

[Edit: cross-post with @oscargus, but it is good to see we agreed!]

@oscargus
Copy link
Member

oscargus commented Dec 15, 2022

Btw, the same thing cannot happen here as well?

paths.append(mpath.Path(np.column_stack([xs, ys]), path.codes))

But maybe paths cannot have units or masked arrays to start with?

@oscargus
Copy link
Member

oscargus commented Dec 15, 2022

It may be that this is actually better solved in set_offsets:

self._offsets = np.column_stack(
(np.asarray(self.convert_xunits(offsets[:, 0]), float),
np.asarray(self.convert_yunits(offsets[:, 1]), float)))

which would then remove the need for an additional unit conversion etc. Although one probably should move the logic of get_offsets for None values to set_offsets and call set_offsets from the constructor. (Have not checked all the consequences and that doesn't really have to be part of this PR...) Edit: no, but either set self._offset in constructor in case of None or call set_offsets if not None. But not sure if it is a good idea as it is not clear when masked is wanted or not as you point out @chahak13 . But it seems a bit inconsistent anyway.

Anyway, probably better to get this in and then spend some effort trying to make it more unified. The risk is that it will take quite a bit of time to untangle everything...

@chahak13
Copy link
Contributor Author

@oscargus good point. This might actually be a more deep-rooted issue. I checked Path and it does take in masked arrays (which it then converts to NaNs) so that should also use masked array functions. Also, as you pointed out, this is probably something that should be set directly in set_offsets. Not all functions use the masked versions, maybe it is worthwhile to put in effort to recognize and change those?

@chahak13
Copy link
Contributor Author

I've added a test and baseline images for this for now. This should resolve the current issue but I'll try to dig into the usage of masked data later. Let me know if there's something missing!

@oscargus
Copy link
Member

Thanks!

Maybe, for the sake of it, add a @check_figures_equal test where the same is used for one of the figures and the other masks the x-data instead?

(It may be enough to only have one of the format, e.g., png, but that would require a rebase with force push as the cleanliness test will become unhappy otherwise.)

@oscargus
Copy link
Member

Oh, and if there is a way to actually get a masked Path with units to test with that would be great, but shouldn't be a show stopper for now.

@chahak13
Copy link
Contributor Author

@oscargus @rcomer, need advice. I was working on the @check_figures_equal test and it wasn't working correctly if I masked the datetime array instead of y. On further digging, this goes into the maintenance issue that Oscar just opened.

TLDR: date2num converts the masked data array to a normal numpy array which is why the test was failing. What do you suggest I should do? I can't write the test to compare the two cases of masking without this.


if munits._is_natively_supported(x):
return x
if self.converter is None:
self.converter = munits.registry.get_converter(x)
if self.converter is None:
return x
try:
ret = self.converter.convert(x, self.units, self)

The reason the fix that I made worked for this case is that the mask was on a numeric field. For all the non-number types, _is_natively_supported() returns False and then asks the axis' converter to convert the data. The data contains the mask information until this point, so every converter in the registry should be able to handle the masked array by either using masked array functions or raising a warning about it to ensure that the user doesn't expect the masked information in the plot.

@oscargus
Copy link
Member

I'd say skip the equal test then for now. It may be a deep rabbit hole...

I'll add a comment to #24733 pointing this out. Probably the right thing is to support masked arrays in date2num, but that may lead to other strange things...

@chahak13
Copy link
Contributor Author

Understood, thanks. I'll resolve the 2 comments on the review then.

@chahak13 chahak13 force-pushed the 24545_scatter_mask_datetime branch from 6c59932 to 9001a20 Compare December 15, 2022 14:53
@chahak13
Copy link
Contributor Author

That should do it. Let me know if there's anything needed!

@oscargus oscargus mentioned this pull request Dec 15, 2022
6 tasks
Copy link
Member

@rcomer rcomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chahak13, this looks good to me (assuming CI passes, but I can't see a reason it wouldn't).

@oscargus
Copy link
Member

Thanks! FYI, it wasn't that tricky to get masked dates to pass: #24734

But we can add the image equal test when digging into the other aspects later.

Copy link
Member

@oscargus oscargus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! Looks good! Anyone can merge on green.

@chahak13
Copy link
Contributor Author

Yes, it was pretty straight forward! I had the fix on my local machine but just wasn't sure if we wanted to include that in this or would that be out of scope for this particular PR. Thanks though!

@oscargus
Copy link
Member

Ahh, sorry about that then. Just didn't want you to go down a potential rabbit hole by encouraging it...

@rcomer rcomer added this to the v3.6.3 milestone Dec 15, 2022
@rcomer rcomer added PR: bugfix Pull requests that fix identified bugs and removed status: needs tests labels Dec 15, 2022
@rcomer rcomer merged commit 7e32745 into matplotlib:main Dec 15, 2022
meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request Dec 15, 2022
oscargus added a commit that referenced this pull request Dec 15, 2022
…732-on-v3.6.x

Backport PR #24732 on branch v3.6.x (Use masked stack to preserve mask info)
@ksunden ksunden mentioned this pull request Feb 20, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: bugfix Pull requests that fix identified bugs topic: units and array ducktypes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: matplotlib.pyplot.scatter does not respect mask rules with datetime
3 participants