Improve pandas/xarray/... conversion #22560

oscargus · 2022-02-24T22:38:24Z

PR Summary

See https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array/54508052#54508052 for motivation.

Related to #16402

PR Checklist

Tests and Styling

Has pytest style unit tests (and pytest passes).
Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

[N/A] New features are documented, with examples if plot related.
[N/A] New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
[N/A] API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
[N/A] Documentation is sphinx and numpydoc compliant (the docs should build without error).

greglucas · 2022-02-25T03:44:08Z

lib/matplotlib/axes/_axes.py

-            if hasattr(X, 'values'):  # support pandas.Series
-                X = X.values
+            if hasattr(X, 'to_numpy'):  # support pandas.Series
+                X = X.to_numpy()


Are there any libraries we need to worry about that implement X.values, but not X.to_numpy()? Maybe we also want to leave an elif with the X.values block as a fallback just in case...?

QuLogic · 2022-02-25T03:49:45Z

Perhaps should be coordinated with #22141?

oscargus · 2022-02-25T08:21:56Z

Perhaps should be coordinated with #22141?

That's where I got the idea. As seen, I have not touched cbook.

Related to the question from @greglucas, I do not know. But I saw this code in cbook (which I didn't touch):

matplotlib/lib/matplotlib/cbook/__init__.py

Lines 1369 to 1377 in 3a994d2

    
           # unpack if we have a values or to_numpy method. 
        
           try: 
        
               X = X.to_numpy() 
        
           except AttributeError: 
        
               try: 
        
                   if isinstance(X.values, np.ndarray): 
        
                       X = X.values 
        
               except AttributeError: 
        
                   pass

This is a bit more complex as it uses a try-except approach (not sure how much that affects things though), has a fallback on values and check that values actually returns an nparray. One can of course use a similar approach here (and in #22141). Possibly slightly improved as I do not know if there are cases if values actually is non-trivial, so no need to run it twice.

Edit: I have now touched this and use a similar approach, but with hasattr instead of try-except see https://stackoverflow.com/questions/903130/hasattr-vs-try-except-block-to-deal-with-non-existent-attributes as the assumption is that most often we do not pass pandas or xarray (or whatever) data.

greglucas · 2022-02-25T15:40:29Z

I guess you could write a cbook._unpack_pandas(x) helper to use that in all of these areas for consistency? I would say it is safer to keep the .values fallback, and if you do want to get rid of it eventually, you could put in a deprecation warning if that branch is hit.

oscargus · 2022-02-25T16:46:18Z

Yes, I was also thinking about a helper function (but didn't know where to place it, so great info that cbook is the place). I do not have any strong opinions as such, more that I read the link and it seemed like the right thing to do.

oscargus · 2022-02-27T15:09:14Z

Seems like the easiest way is to wait for #22141, add those conversions here as well, and then discuss the correct name for the function.

oscargus · 2022-02-27T15:10:20Z

I'm thinking that maybe _unpack_to_numpy is a general enough name?

oscargus · 2022-02-28T10:25:01Z

This is now updated:

renamed function to _unpack_to_numpy
added test for xarray ~~(now an extra test dependency)~~ (Removed extra dependency)
~~(removed duplicate test code for pandas)~~ (No duplicate code to start with...)

tacaswell · 2022-03-03T20:52:17Z

Please update the lines touched by #22141

oscargus · 2022-03-03T22:34:28Z

There is also this function where it doesn't work to simply replace to_numpy with _unpack_to_numpy. No idea why. (One reason may be that there is no attribute error for to_numpy anymore, so some types of input data is handled well by _check_1d, has an index method, but no to_numpy method... Seems like plain Python lists can be a culprit here then...)

matplotlib/lib/matplotlib/cbook/__init__.py

Lines 1607 to 1639 in 0359832

    
           def index_of(y): 
        
               """ 
        
               A helper function to create reasonable x values for the given *y*. 
        
               This is used for plotting (x, y) if x values are not explicitly given. 
        
               First try ``y.index`` (assuming *y* is a `pandas.Series`), if that 
        
               fails, use ``range(len(y))``. 
        
               This will be extended in the future to deal with more types of 
        
               labeled data. 
        
               Parameters 
        
               ---------- 
        
               y : float or array-like 
        
               Returns 
        
               ------- 
        
               x, y : ndarray 
        
                  The x and y values to plot. 
        
               """ 
        
               try: 
        
                   return y.index.to_numpy(), y.to_numpy() 
        
               except AttributeError: 
        
                   pass 
        
               try: 
        
                   y = _check_1d(y) 
        
               except (np.VisibleDeprecationWarning, ValueError): 
        
                   # NumPy 1.19 will warn on ragged input, and we can't actually use it. 
        
                   pass 
        
               else: 
        
                   return np.arange(y.shape[0], dtype=float), y 
        
               raise ValueError('Input could not be cast to an at-least-1D NumPy array')

lib/matplotlib/cbook/__init__.py

greglucas · 2022-03-11T04:21:51Z

lib/matplotlib/tests/test_cbook.py

+    np.testing.assert_array_equal(Idx, IdxRef)
+
+
+def test_index_of_xarray(xr):


Does xarray get us more coverage here? They have a to_numpy() method the same as pandas I believe.
https://xarray.pydata.org/en/stable/generated/xarray.DataArray.to_numpy.html

So, it seems like a pretty heavy dependency to add for just this one test...

Not so much for coverage as for actually testing using data of specified formats. With the discussion about which formats we support, it makes sense to test them as well. Right now some of these are tested in the plots, but it can possibly make sense to simply test them here as these are the core function used to get data that can be plotted.

If we claim (which we actually don't, maybe we should?) that we can plot xarray, we should probably test it as well. And other types that we may want to claim to support. Or maybe fork off a specific dependency test that is not executed on all platforms/version, including pandas (which is 11.7 MB, xarray is 870 kB).

(There is another xarray-test above, so two.)

I can of course remove them, but I think we should discuss if we want to support more formats than pandas and numpy (and Python list/tuple), and, if so, have explicit tests for them.

I agree, we should probably discuss what we want to support/test. To me, this doesn't seem to add a whole lot of value for adding a new dependency.

There was also a discussion around removing Scipy as a dependency in the docs: #22120

I removed the dependencies but kept the tests. Hence, they will run if xarray is available.

I also opened #22645 for discussions (probably should be discussed at a dev-call as well).

tacaswell · 2022-04-13T20:20:00Z

I made an executive decision to install xarray on CI.

We already have all of its dependencies installed and it is a pure-python package.

timhoffm · 2022-04-17T09:23:14Z

lib/matplotlib/cbook/__init__.py

+
+
+def _unpack_to_numpy(x):
+    """Internal helper to extract data from e.g. pandas and xarray objects."""


Please document what we intend to support, i.e. everything with .to_numpy() or .values, and what types we expect to catch with it, e.g. values-> older pandas dataframes(?).

tacaswell · 2022-04-21T19:53:49Z

We discussed this on the call, I will write up an issues shortly about the documentation of the intention etc.

QuLogic · 2022-04-21T22:37:38Z

meeseeksdev backport to v3.5.x

…560-on-v3.5.x Backport PR #22560 on branch v3.5.x (Improve pandas/xarray/... conversion)

see matplotlib/matplotlib#22973, matplotlib/matplotlib#22879, and matplotlib/matplotlib#22560 It is not clear to me as to which is the standard interface for unit handling (for eg, hist still doesn't handle unit by default)

oscargus added the third-party integration: pandas label Feb 24, 2022

greglucas reviewed Feb 25, 2022

View reviewed changes

oscargus marked this pull request as draft February 25, 2022 16:46

oscargus force-pushed the pandasconversion branch from a0488df to 64a301e Compare February 27, 2022 12:10

oscargus marked this pull request as ready for review February 27, 2022 12:11

oscargus mentioned this pull request Feb 27, 2022

Fix check 1d #22141

Merged

6 tasks

oscargus added the status: waiting for other PR label Feb 27, 2022

oscargus force-pushed the pandasconversion branch from 64a301e to a1757d1 Compare February 28, 2022 10:23

oscargus force-pushed the pandasconversion branch from a1757d1 to 873dff2 Compare February 28, 2022 10:53

oscargus changed the title ~~Improve Pandas conversion~~ Improve pandas/xarray/... conversion Feb 28, 2022

oscargus force-pushed the pandasconversion branch from 873dff2 to 1e27b8a Compare February 28, 2022 11:06

oscargus mentioned this pull request Feb 28, 2022

review how we duck-type non-numpy inputs #16402

Open

tacaswell added this to the v3.5.2 milestone Mar 3, 2022

oscargus force-pushed the pandasconversion branch 3 times, most recently from 19e86ca to f17f3af Compare March 3, 2022 22:13

tacaswell approved these changes Mar 3, 2022

View reviewed changes

tacaswell removed the status: waiting for other PR label Mar 3, 2022

greglucas reviewed Mar 11, 2022

View reviewed changes

oscargus force-pushed the pandasconversion branch from 05e8501 to f17f3af Compare March 11, 2022 08:31

oscargus mentioned this pull request Mar 14, 2022

[ENH]: Which array libraries should Matplotlib support (and test support for)? #22645

Open

Improve pandas and xarray conversion

7b51044

oscargus force-pushed the pandasconversion branch from f17f3af to 7b51044 Compare March 14, 2022 11:29

CI: add xarray to extra dependencies

56af810

timhoffm reviewed Apr 17, 2022

View reviewed changes

jklymak merged commit 709fba8 into matplotlib:main Apr 21, 2022

meeseeksmachine mentioned this pull request Apr 21, 2022

Backport PR #22560 on branch v3.5.x (Improve pandas/xarray/... conversion) #22876

Merged

meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request Apr 21, 2022

Backport PR matplotlib#22560: Improve pandas/xarray/... conversion

de4fda5

QuLogic mentioned this pull request Apr 21, 2022

bot not responding to backport label? scientific-python/MeeseeksDev#84

Closed

tacaswell mentioned this pull request Apr 22, 2022

Document and test what "array like" means to Matplotlib #22879

Open

QuLogic added a commit that referenced this pull request Apr 29, 2022

Merge pull request #22876 from meeseeksmachine/auto-backport-of-pr-22…

3420565

…560-on-v3.5.x Backport PR #22560 on branch v3.5.x (Improve pandas/xarray/... conversion)

		np.testing.assert_array_equal(Idx, IdxRef)


		def test_index_of_xarray(xr):



		def _unpack_to_numpy(x):
		"""Internal helper to extract data from e.g. pandas and xarray objects."""

Uh oh!

Improve pandas/xarray/... conversion #22560

Improve pandas/xarray/... conversion #22560

Uh oh!

Conversation

oscargus commented Feb 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

PR Checklist

Uh oh!

greglucas Feb 25, 2022

Choose a reason for hiding this comment

Uh oh!

QuLogic commented Feb 25, 2022

Uh oh!

oscargus commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greglucas commented Feb 25, 2022

Uh oh!

oscargus commented Feb 25, 2022

Uh oh!

oscargus commented Feb 27, 2022

Uh oh!

oscargus commented Feb 27, 2022

Uh oh!

oscargus commented Feb 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tacaswell commented Mar 3, 2022

Uh oh!

oscargus commented Mar 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

greglucas Mar 11, 2022

Choose a reason for hiding this comment

Uh oh!

oscargus Mar 11, 2022

Choose a reason for hiding this comment

Uh oh!

greglucas Mar 11, 2022

Choose a reason for hiding this comment

Uh oh!

oscargus Mar 14, 2022

Choose a reason for hiding this comment

Uh oh!

tacaswell commented Apr 13, 2022

Uh oh!

timhoffm Apr 17, 2022

Choose a reason for hiding this comment

Uh oh!

tacaswell commented Apr 21, 2022

Uh oh!

QuLogic commented Apr 21, 2022

Uh oh!

Uh oh!

oscargus commented Feb 24, 2022 •

edited

Loading

oscargus commented Feb 25, 2022 •

edited

Loading

oscargus commented Feb 28, 2022 •

edited

Loading

oscargus commented Mar 3, 2022 •

edited

Loading