Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[ENH]: Which array libraries should Matplotlib support (and test support for)? #22645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
oscargus opened this issue Mar 14, 2022 · 13 comments
Open
Labels
status: needs comment/discussion needs consensus on next step

Comments

@oscargus
Copy link
Member

oscargus commented Mar 14, 2022

Problem

Related to e.g. #16402 #21036 #22560

We currently explicitly support numpy and pandas. What other array libraries are of interest to support natively (as in users can just feed an array and Matplotlib handles the conversion)? Should we test for them? (There seems to be a decision that only pandas will be tested for: #19574 (comment) )

Some alternatives:

Proposed solution

I think it can make sense to have a test that tests with "all" dependencies. It may not have to be executed on all platforms and all Python versions, but it will at least give some idea if things still work and when they break.

That is maybe not the same thing that we guarantee that these will always work.

cupy relies on GPUs so it is not clear if it is possible to test that.

We should also probably add something to the documentation about which are supported (and not). (Maybe there is, primarily looked at the code...)

@oscargus oscargus added the status: needs comment/discussion needs consensus on next step label Mar 14, 2022
@story645
Copy link
Member

story645 commented Mar 14, 2022

I think natively anything that supports .to_numpy() & agree it might be worth adding xarray to the tests

Broadly this intersects w/ my dissertation work & the NASA RSE work, but basically as far as I can tell the idea is if we can revamp the API enough to decouple the data bits from the artist/drawing bits, then external packages can write adapters against a data interface API. This allows external libraries to define what support means on their terms.

@QuLogic
Copy link
Member

QuLogic commented Mar 15, 2022

Is there not a NumPy standard compatibility NEP we can say we follow?

@story645
Copy link
Member

story645 commented Mar 15, 2022

I think it looks like a lot of the relevant stuff is open neps?

@timhoffm
Copy link
Member

For completeness, we should add h5py datasets to the above list.

Their way of converting to numpy is dataset[()] (and deprecated: dataset.value), none of which we currently use AFAIK. Still, h5py datasets work because their API is enough numpy-like 😲 .

@tacaswell
Copy link
Member

tacaswell commented Mar 18, 2022

with h5py dataset slicing we have a bunch of tests which are "do h5py and numpy slice exactly the same" so it is very very close to numpy.

@timhoffm
Copy link
Member

Good to know. OTOH I don’t think we’re slicing in Matplotlib, so I assume we‘re relying on other numpy-like aspects.

@sa-
Copy link

sa- commented Mar 20, 2022

polars

Maybe pyarrow (instead of polars specifically), then it would make matplotlib easier to use for anyone in the Arrow ecosystem, even from other languages that are doing interop with python

@oscargus
Copy link
Member Author

I didn't really get the relation between pyarrow and polars (just did a quick search), but it seems like both support to_numpy. With #22560 this is more consistently used and should improve things.

Can make sense to add a test for both though and I added it to the list (as well as h5py).

@jklymak
Copy link
Member

jklymak commented Mar 21, 2022

I guess I view this as the other way around - we are not guaranteeing compatibility with anything except numpy. However, as a courtesy, we will take results of your to_numpy and try and do the right thing with it. If you don't provide to_numpy or their to_numpy is broken, that is the other library's bug. Pandas falls into a grey area where we have tried to bend over backwards for them because it is so well-used, but that has entailed a moderately large maintenance burden. I don't think we want to add to it by testing the to_numpy implementations of more libraries?

@sa-
Copy link

sa- commented Mar 21, 2022

the relation between pyarrow and polars

Good question - apache arrow is a memory model, so it just tells you what the data should look like in memory but it doesn't provide any functions to manipulate the data. Polars uses apache arrow as its memory model and provides methods to transform data.

One of the motivations for using Arrow is passing data around in a zero-copy way between libraries or even between languages. For example calling polars_df.to_arrow() will not create a copy of the data, instead it will return an object of type pyarrow.Table that points to the same data in memory.

Hope this helps and thank you for considering to support these libs.

@oscargus
Copy link
Member Author

The conclusion we came to was to add tests for "all" libraries, but only install them in the weekly event testing the newest version of numpy etc. (And not claim compatibility beyond that "if it has a to_numpy you are quite likely to be able to plot something, if not run the right function yourself".)

- name: Install the nightly dependencies
# Only install the nightly dependencies during the scheduled event
if: ${{ github.event_name == 'schedule' && matrix.name-suffix != '(Minimum Versions)' }}
run: |
python -m pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple --upgrade numpy pandas
# Turn all warnings to errors, except ignore the distutils deprecations and the find_spec warning
cat >> pytest.ini << EOF
filterwarnings =
error
ignore:.*distutils:DeprecationWarning
ignore:DynamicImporter.find_spec\(\) not found; falling back to find_module\(\):ImportWarning
EOF

In this way, we can get a heads up in case something breaks.

@tacaswell
Copy link
Member

We probably should put some floor on usage (polar has like 4k downloads from pypi a day, pandas has 2-3M, xarray hase 20-40k, and we have 500k-1M) before we worry about testing it.

@timhoffm
Copy link
Member

Compatibility in the ecosystem has evolved. IMHO we should try to accept everything that follows the array API standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: needs comment/discussion needs consensus on next step
Projects
None yet
Development

No branches or pull requests

7 participants