Fix check 1d #22141

jklymak · 2022-01-07T08:11:57Z

PR Summary

This was an issue with handling

x = pd.Series(x, dtype="Float32")

in the logic of our plot argument parsing.

x.values returns a FloatingArray. But it doesn't behave like a numpy array, in that x.values[:, None] returns a ValueError: values must be a 1D array. I actually think this is a Pandas bug, but I'll leave that to someone with more pandas knowledge to inform them. However, we can get around it by using x.to_numpy() which seems to work fine.

PR Checklist

Tests and Styling

Has pytest style unit tests (and pytest passes).
Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

New features are documented, with examples if plot related.
New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
Documentation is sphinx and numpydoc compliant (the docs should build without error).

timhoffm · 2022-01-09T22:07:05Z

lib/matplotlib/tests/test_axes.py

+    for x in [pd.Series([1, 2], dtype="float64"),
+              pd.Series([1, 2], dtype="Float32")]:


Is this specifically for "Float32" or would the test also do with "Float64"? If it's only the custom type and precision does not matter, I'd got with "Float64" to communicate that we're primarily testing the custom pandas type.

Side-note: It seems the capital "Float" types are not yet documented. There's only a v1.2.0 change note and the GH issues linked therein.

As far as I can tell its undocumented. Maybe we should hold of on support, or maybe this PR can go in, but without the test?

I actually disagree with Pandas having a new type here - it seems people want this for pedantic reasons, but I think 99.999% of the world doesn't care if NaN means a failed computation or missing data. However, if they feel strongly, they should get numpy onboard, and then everyone will have this flag rather than making a new data type.

I think any of their capital F floats will work.

tacaswell · 2022-01-12T15:10:17Z

lib/matplotlib/cbook/__init__.py

@@ -1649,7 +1650,7 @@ def index_of(y):
       The x and y values to plot.
    """
    try:
-        return y.index.values, y.values


Do we need both this change and the additional exception handling above?

I don't know, but I think we should be in the habit of using to_numpy() when possible?

to_numpy came in via pandan 0.24 on Jan 25, 2019 so we can rely on it being there and https://stackoverflow.com/a/54508052/380231 makes an arguement in favor of no_numpy.

I think this change may be un-related (or fix this by chance) but I do not think it will avoid needing the other one and does no harm.

lib/matplotlib/tests/test_axes.py

jklymak · 2022-01-27T12:40:21Z

@tacaswell, thanks for your review, but dismissing as this approach is completely different, and attempts to remove the pandas-ness of the data right at the beginning.

Approach now is completely different, so requires a re-review

lib/matplotlib/cbook/__init__.py

tacaswell

I have some concerns (mostly because any time we change complicated code we find that the complexity was there to handle some weird corner case that failed to get a test), but overall am 👍 on this as I think this will also fix other data containers. I think there is a risk that something subtle will break, but we are already dealing with something subtle breaking and I am optimistic that the cure will not be worse than the malady.

If you think of a DF as a 2D array with columns, then this is consistent with how we handle 2D arrays being passed into plot.

jklymak · 2022-01-27T15:28:13Z

This gets used in _plot_args, usually via index_of (which adds a y if it is missing).

matplotlib/lib/matplotlib/axes/_base.py

Lines 486 to 495 in f0593a6

    
           if len(xy) == 2: 
        
               x = _check_1d(xy[0]) 
        
               y = _check_1d(xy[1]) 
        
           else: 
        
               x, y = index_of(xy[-1]) 
        
           if self.axes.xaxis is not None: 
        
               self.axes.xaxis.update_units(x) 
        
           if self.axes.yaxis is not None: 
        
               self.axes.yaxis.update_units(y)

and it gets called before we check units. So I guess what I am proposing here would strip an object that has a to_numpy method but somehow was being used by somebody for units.

This does give me pause that we have maybe done the wrong thing here, and are using _check_1d to both coerce a singleton to an array and to convert pandas series and dataframes to arrays. Maybe the right thing to do is just return the DataFrame, run through the converter, and then coerce to numpy if it is not already.

Meh, it has done this conversion for a long time.

lib/matplotlib/cbook/__init__.py

jklymak · 2022-02-21T08:12:44Z

@tacaswell, this simplified a bit more since your approval - if you wanted to double check, that would be appreciated.

oscargus · 2022-02-27T12:14:53Z

In #22560 I added cbook._unpack_pandas (so _unpack_pandas for this purpose) which basically does to_numpy(), but also with a fallback to values. I did not touch the lines you changed here though.

I guess it can make sense to have a coordinates merge of this and that PR. If this is merged first, I'll update my PR. If my PR is merged first, it can make sense to use that function here.

jklymak · 2022-02-27T14:39:46Z

Well first it's not only pandas that has to_numpy so that is a bit of a misnomer. But also, why have a separate method at all?

oscargus · 2022-02-27T15:06:16Z

It was suggested to use a separate function. Right now, slightly different approaches are used at different locations in the code. Sometimes a fallback to values, sometimes not. A benefit is possibly that it is enough to modify it in a single location (and that old Pandas versions are still supported with this fallback). Also, if we want to support new libraries that with some other name, it will be easy to do that. Well, all the standard reasons to factor out a piece of common code...

Regarding naming, I considered that, but I do not know which other libraries support that function. However, I do not really see the name neither written in stone nor something that should prohibit using a single function, there will be a name that is correct enough. (And as you can see, there are explicit comments mentioning pandas at all the other locations where it was used.)

oscargus · 2022-02-27T15:07:51Z

I then assume that we merge this first, find a good name for the function and, if you want to strongly object using the function here, you can do that in #22560.

jklymak · 2022-02-27T15:23:19Z

I meant why have a separate method than check_1d? Our problem is inconsistent duckttping so if we can have it all in one spot that would be very helpful. If check1d does more than duck type pandas then sure it could call the ducktype converter.

I do actually wonder if all of this should just be part of the unit conversion machinery rather than cbook calls

oscargus · 2022-03-01T11:22:22Z

I meant why have a separate method than check_1d?

I do not have an enough overview of the code base to see if one should/could have used check_1d (or check_2d?) instead. But if possible, that is of course even better.

…141-on-v3.5.x Backport PR #22141 on branch v3.5.x (Fix check 1d)

jklymak added third-party integration third-party integration: pandas and removed third-party integration labels Jan 7, 2022

FIX: accomodate pandas type that doesn't return numpy from .values

5052af5

jklymak force-pushed the fix-check-1d branch from 2a64a0d to 5052af5 Compare January 7, 2022 09:22

jklymak marked this pull request as ready for review January 7, 2022 10:29

jklymak mentioned this pull request Jan 7, 2022

[Bug]: plt.plot thinks pandas.Series is 2-dimensional when nullable data type is used #22125

Closed

timhoffm reviewed Jan 9, 2022

View reviewed changes

tacaswell added this to the v3.5.2 milestone Jan 11, 2022

tacaswell reviewed Jan 12, 2022

View reviewed changes

tacaswell previously approved these changes Jan 21, 2022

View reviewed changes

timhoffm reviewed Jan 27, 2022

View reviewed changes

lib/matplotlib/tests/test_axes.py Outdated Show resolved Hide resolved

FIX: more holistic fix

509626d

jklymak linked an issue Jan 27, 2022 that may be closed by this pull request

[Bug]: possible regression with pandas 1.4 with plt.plot when using a single column dataframe as the x argument #22330

Closed

jklymak mentioned this pull request Jan 27, 2022

[Bug]: possible regression with pandas 1.4 with plt.plot when using a single column dataframe as the x argument #22330

Closed

jklymak commented Jan 27, 2022

View reviewed changes

lib/matplotlib/cbook/__init__.py Show resolved Hide resolved

jklymak added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Jan 27, 2022

tacaswell approved these changes Jan 27, 2022

View reviewed changes

dstansby reviewed Feb 18, 2022

View reviewed changes

lib/matplotlib/cbook/__init__.py Outdated Show resolved Hide resolved

FIX: simplify a bit more

6dfa93a

dstansby approved these changes Feb 19, 2022

View reviewed changes

QuLogic mentioned this pull request Feb 25, 2022

Improve pandas/xarray/... conversion #22560

Merged

2 tasks

tacaswell merged commit 0359832 into matplotlib:main Mar 3, 2022

meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request Mar 3, 2022

Backport PR matplotlib#22141: Fix check 1d

4715aff

meeseeksmachine mentioned this pull request Mar 3, 2022

Backport PR #22141 on branch v3.5.x (Fix check 1d) #22592

Merged

QuLogic added a commit that referenced this pull request Mar 3, 2022

Merge pull request #22592 from meeseeksmachine/auto-backport-of-pr-22…

0040b0a

…141-on-v3.5.x Backport PR #22141 on branch v3.5.x (Fix check 1d)

jklymak deleted the fix-check-1d branch March 4, 2022 07:41

saimn mentioned this pull request May 4, 2022

RTD failure due to plots involving convolution Kernels astropy/astropy#13209

Closed

WilliamJamieson mentioned this pull request May 4, 2022

[Bug]: v3.5.2 causing plot to crash when plotting object with __array__ method #22973

Closed

mhvk mentioned this pull request May 4, 2022

MNT: fix __array__ to numpy #22975

Merged

6 tasks

		for x in [pd.Series([1, 2], dtype="float64"),
		pd.Series([1, 2], dtype="Float32")]:

Uh oh!

Fix check 1d #22141

Fix check 1d #22141

Uh oh!

Conversation

jklymak commented Jan 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

PR Checklist

Uh oh!

timhoffm Jan 9, 2022

Choose a reason for hiding this comment

Uh oh!

jklymak Jan 10, 2022

Choose a reason for hiding this comment

Uh oh!

tacaswell Jan 21, 2022

Choose a reason for hiding this comment

Uh oh!

tacaswell Jan 12, 2022

Choose a reason for hiding this comment

Uh oh!

jklymak Jan 20, 2022

Choose a reason for hiding this comment

Uh oh!

tacaswell Jan 21, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jklymak commented Jan 27, 2022

Uh oh!

Uh oh!

tacaswell left a comment

Choose a reason for hiding this comment

Uh oh!

jklymak commented Jan 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jklymak commented Feb 21, 2022

Uh oh!

oscargus commented Feb 27, 2022

Uh oh!

jklymak commented Feb 27, 2022

Uh oh!

oscargus commented Feb 27, 2022

Uh oh!

oscargus commented Feb 27, 2022

Uh oh!

jklymak commented Feb 27, 2022

Uh oh!

oscargus commented Mar 1, 2022

Uh oh!

Uh oh!

jklymak commented Jan 7, 2022 •

edited

Loading

jklymak commented Jan 27, 2022 •

edited

Loading