Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix check 1d #22141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 3, 2022
Merged

Fix check 1d #22141

merged 3 commits into from
Mar 3, 2022

Conversation

jklymak
Copy link
Member

@jklymak jklymak commented Jan 7, 2022

PR Summary

Closes #22125 and #22330

This was an issue with handling

x = pd.Series(x, dtype="Float32")  

in the logic of our plot argument parsing.

x.values returns a FloatingArray. But it doesn't behave like a numpy array, in that x.values[:, None] returns a ValueError: values must be a 1D array. I actually think this is a Pandas bug, but I'll leave that to someone with more pandas knowledge to inform them. However, we can get around it by using x.to_numpy() which seems to work fine.

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

Comment on lines 1752 to 1753
for x in [pd.Series([1, 2], dtype="float64"),
pd.Series([1, 2], dtype="Float32")]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specifically for "Float32" or would the test also do with "Float64"? If it's only the custom type and precision does not matter, I'd got with "Float64" to communicate that we're primarily testing the custom pandas type.

Side-note: It seems the capital "Float" types are not yet documented. There's only a v1.2.0 change note and the GH issues linked therein.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell its undocumented. Maybe we should hold of on support, or maybe this PR can go in, but without the test?

I actually disagree with Pandas having a new type here - it seems people want this for pedantic reasons, but I think 99.999% of the world doesn't care if NaN means a failed computation or missing data. However, if they feel strongly, they should get numpy onboard, and then everyone will have this flag rather than making a new data type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think any of their capital F floats will work.

@tacaswell tacaswell added this to the v3.5.2 milestone Jan 11, 2022
@@ -1649,7 +1650,7 @@ def index_of(y):
The x and y values to plot.
"""
try:
return y.index.values, y.values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both this change and the additional exception handling above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, but I think we should be in the habit of using to_numpy() when possible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_numpy came in via pandan 0.24 on Jan 25, 2019 so we can rely on it being there and https://stackoverflow.com/a/54508052/380231 makes an arguement in favor of no_numpy.

I think this change may be un-related (or fix this by chance) but I do not think it will avoid needing the other one and does no harm.

tacaswell
tacaswell previously approved these changes Jan 21, 2022
@jklymak
Copy link
Member Author

jklymak commented Jan 27, 2022

@tacaswell, thanks for your review, but dismissing as this approach is completely different, and attempts to remove the pandas-ness of the data right at the beginning.

@jklymak jklymak dismissed tacaswell’s stale review January 27, 2022 12:41

Approach now is completely different, so requires a re-review

@jklymak jklymak added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Jan 27, 2022
Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns (mostly because any time we change complicated code we find that the complexity was there to handle some weird corner case that failed to get a test), but overall am 👍 on this as I think this will also fix other data containers. I think there is a risk that something subtle will break, but we are already dealing with something subtle breaking and I am optimistic that the cure will not be worse than the malady.

If you think of a DF as a 2D array with columns, then this is consistent with how we handle 2D arrays being passed into plot.

@jklymak
Copy link
Member Author

jklymak commented Jan 27, 2022

This gets used in _plot_args, usually via index_of (which adds a y if it is missing).

if len(xy) == 2:
x = _check_1d(xy[0])
y = _check_1d(xy[1])
else:
x, y = index_of(xy[-1])
if self.axes.xaxis is not None:
self.axes.xaxis.update_units(x)
if self.axes.yaxis is not None:
self.axes.yaxis.update_units(y)

and it gets called before we check units. So I guess what I am proposing here would strip an object that has a to_numpy method but somehow was being used by somebody for units.

This does give me pause that we have maybe done the wrong thing here, and are using _check_1d to both coerce a singleton to an array and to convert pandas series and dataframes to arrays. Maybe the right thing to do is just return the DataFrame, run through the converter, and then coerce to numpy if it is not already.

Meh, it has done this conversion for a long time.

@jklymak
Copy link
Member Author

jklymak commented Feb 21, 2022

@tacaswell, this simplified a bit more since your approval - if you wanted to double check, that would be appreciated.

@oscargus
Copy link
Member

In #22560 I added cbook._unpack_pandas (so _unpack_pandas for this purpose) which basically does to_numpy(), but also with a fallback to values. I did not touch the lines you changed here though.

I guess it can make sense to have a coordinates merge of this and that PR. If this is merged first, I'll update my PR. If my PR is merged first, it can make sense to use that function here.

@jklymak
Copy link
Member Author

jklymak commented Feb 27, 2022

Well first it's not only pandas that has to_numpy so that is a bit of a misnomer. But also, why have a separate method at all?

@oscargus
Copy link
Member

It was suggested to use a separate function. Right now, slightly different approaches are used at different locations in the code. Sometimes a fallback to values, sometimes not. A benefit is possibly that it is enough to modify it in a single location (and that old Pandas versions are still supported with this fallback). Also, if we want to support new libraries that with some other name, it will be easy to do that. Well, all the standard reasons to factor out a piece of common code...

Regarding naming, I considered that, but I do not know which other libraries support that function. However, I do not really see the name neither written in stone nor something that should prohibit using a single function, there will be a name that is correct enough. (And as you can see, there are explicit comments mentioning pandas at all the other locations where it was used.)

@oscargus
Copy link
Member

I then assume that we merge this first, find a good name for the function and, if you want to strongly object using the function here, you can do that in #22560.

@jklymak
Copy link
Member Author

jklymak commented Feb 27, 2022

I meant why have a separate method than check_1d? Our problem is inconsistent duckttping so if we can have it all in one spot that would be very helpful. If check1d does more than duck type pandas then sure it could call the ducktype converter.

I do actually wonder if all of this should just be part of the unit conversion machinery rather than cbook calls

@oscargus
Copy link
Member

oscargus commented Mar 1, 2022

I meant why have a separate method than check_1d?

I do not have an enough overview of the code base to see if one should/could have used check_1d (or check_2d?) instead. But if possible, that is of course even better.

@tacaswell tacaswell merged commit 0359832 into matplotlib:main Mar 3, 2022
meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request Mar 3, 2022
QuLogic added a commit that referenced this pull request Mar 3, 2022
…141-on-v3.5.x

Backport PR #22141 on branch v3.5.x (Fix check 1d)
@jklymak jklymak deleted the fix-check-1d branch March 4, 2022 07:41
@mhvk mhvk mentioned this pull request May 4, 2022
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. third-party integration: pandas
Projects
None yet
5 participants