Feature: Support passing DataFrames to table.table #28830

anijjar · 2024-09-17T05:17:56Z

PR summary

Resolves #28726 : feature request: support passing DataFrames to table.table
dawnwangcx wanted a feature to enable building tables using Pandas's DataFrame object. I implemented his solution and made a Unit test in test_table.py to confirm the DataFrame was correctly converted into a table.

PR checklist

"closes #0000" is in the body of the PR description to link the related issue
new and changed code is tested
[N/A] Plotting related features are demonstrated in an example
New Features and API Changes are noted with a directive and release note
Documentation complies with general and docstring guidelines

anijjar · 2024-09-17T05:20:37Z

PyTest Results

timhoffm

Thanks for the contribution. The solution is slightly more complex because pandas is an optional dependency.

Also this should get a test. Something like (unchecked - you may need to slightly adapt it to make it work):

def test_table_dataframe():
    df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
    fig, ax = plt.subplots()
    table = ax.table(df)
    assert table[0, 0] == 'a'
    assert table[0, 1] == 'b'
    assert table[1, 0] == '1'
    assert table[1, 1] == '2'
    assert table[2, 0] == '3'
    assert table[2, 1] == '4'

lib/matplotlib/table.py

…andle Pandas DataFrame Objects

…y and _is_pandas_dataframe with associated test.

anijjar · 2024-09-18T22:40:19Z

Pytest test_table.py::test_table_dataframe

Unit Test

def test_table_dataframe():
    # Test if Pandas Data Frame can be passed in cellText
    import pandas as pd

    data = {
        'Letter': ['A', 'B', 'C'],
        'Number': [100, 200, 300]
    }

    df = pd.DataFrame(data)
    fig, ax = plt.subplots()
    ax.axis('off')
    table = ax.table(df, loc='center')

    assert table[0, 0].get_text().get_text() == 'Letter'
    assert table[0, 1].get_text().get_text() == 'Number'
    assert table[1, 0].get_text().get_text() == 'A'
    assert table[1, 1].get_text().get_text() == str(100)
    assert table[2, 0].get_text().get_text() == 'B'
    assert table[2, 1].get_text().get_text() == str(200)
    assert table[3, 0].get_text().get_text() == 'C'
    assert table[3, 1].get_text().get_text() == str(300)

Result

Passed

lib/matplotlib/tests/test_table.py

lib/matplotlib/cbook.py

lib/matplotlib/table.py

lib/matplotlib/tests/test_cbook.py

timhoffm

This should get a whats new note. See https://matplotlib.org/devdocs/devel/api_changes.html#what-s-new-notes

story645

Slight concern about the wording in the what's new so gonna give you time to address, but otherwise I think this is a great addition!

doc/users/next_whats_new/pass_pandasDataFrame_into_table.rst

story645

To fix the failing tests, add pandas to the testing requirements
I think to https://github.com/matplotlib/matplotlib/blob/main/requirements/testing/minver.txt or limit the new tests to 3.10+ like the xarray tests

I'm honestly extra confused why those tests don't find pandas since it's in the extra testing

anijjar · 2024-09-22T19:10:16Z

To fix the failing tests, add pandas to the testing requirements I think to https://github.com/matplotlib/matplotlib/blob/main/requirements/testing/minver.txt or limit the new tests to 3.10+ like the xarray tests

I'm honestly extra confused why those tests don't find pandas since it's in the extra testing

This is where I am confused also. For now, , Ill modify the the mini version file to include pandas and hopefully, that test can run correctly.

anijjar · 2024-09-23T05:34:14Z

I think I found the problem. Inside the tests.yml file, the extra-requirements line is missing from the other workflows. We could add pandas to all.txt, but I suggest against it because it is not a default dependency. So instead, Ill add the extra-requirements line to the other workflows.

          - os: ubuntu-22.04
            python-version: '3.11'
            # https://www.riverbankcomputing.com/pipermail/pyqt/2023-November/045606.html
            pyqt6-ver: '!=6.6.0'
            # https://bugreports.qt.io/projects/PYSIDE/issues/PYSIDE-2346
            pyside6-ver: '!=6.5.1'
            extra-requirements: '-r requirements/testing/extra.txt'
          - os: ubuntu-22.04
            python-version: '3.12'
            # https://www.riverbankcomputing.com/pipermail/pyqt/2023-November/045606.html
            pyqt6-ver: '!=6.6.0'
            # https://bugreports.qt.io/projects/PYSIDE/issues/PYSIDE-2346
            pyside6-ver: '!=6.5.1'
          - os: ubuntu-22.04
....

For the minimum versions workflow, minver.txt doesn't add dependencies, but defines the package versions for the all.txt file. Ill revert my changes there.

story645 · 2024-09-23T16:20:35Z

Sorry for the backtracking, but I think instead of changing the minimum version tests to run everything, you should flag this test to only run when pandas is installed.

doc/users/next_whats_new/pass_pandasDataFrame_into_table.rst

jklymak · 2024-09-23T16:48:06Z

lib/matplotlib/table.py

@@ -674,7 +676,7 @@ def table(ax,

    Parameters
    ----------
-    cellText : 2D list of str, optional
+    cellText : 2D list of str, Pandas.DataFrame, optional


I'm not sure we want to call out pandas data frames here. How do we do this elsewhere? I'd just state "object" and then below, state that
"""
If cellText is not a list of texts, we attempt to access cellText.columns.to_numpy() for the column headers,
and cellText.to_numpy() for the table's cells.
"""

We don't have any other functions explicitly take DataFrames. Typically, DataFrame is covered by what we call array-like (and internally runs via to_numpy(). It's different here because we take the column headers as well. Due to this particular usage, we've chosen to explicitly check for DataFrames (_is_pandas_dataframe()). I believe that duck-typing would be a bit too vague here: An object with obj.columns.to_numpy() and obj.to_numpy() is very particular. I'm not sure if this matches anything but pandas.DataFrame - and if there exists other such objects, I'm not clear that accepting them is appropriate. If the need for similar objects should arise, we can always revisit. For now, the explicit type is the simplest documentation and matches implementation:

Suggested change

cellText : 2D list of str, Pandas.DataFrame, optional

cellText : 2D list of str or pandas.DataFrame, optional

I linked an example of another very popular package, polars, that has the same form of data frame. I'm sure there are others out there. I don't think we should be special casing pandas here.

Polars is logically equivalent, but it does not have the same API. See #28830 (comment).

There are several other dataframes out their (https://data-apis.org/dataframe-api/draft/purpose_and_scope.html#history-and-dataframe-implementations), but the problem is that their API is currently not consistent. IMHO we therefore would have to special case to support multiple of them.

While I appreciate the attempt to duck-type instead of explicitly relying on types, I believe the current dataframe APIs are to inhomogeneous to support reasonable duck-typing. Any object that supports obj.columns.to_numpy() and obj.to_numpy() is too technical - i suspect that at least half of the pandas users would not know whether that holds for pandas.DataFrame or not. Duck-typing would work if dataframe-like was well-defined, but it is currently not (see the draft dataframe API standard).

I think np.asanyarray(df.columns) works on both pandas and polars, and is pretty straightforward.

If you really want to push for duck-typing, go for it. But I'd require reasonable type stubs (accepting Any is not good enough) and type documentation (users should at least easily see that pandas.DataFrame as the most common dataframe type is supported).

Ideally, that claimed behavior would also be tested. (It's easy for pandas, because it's a test dependency. For others, you'd likely need a mock similar to test_unpack_to_numpy_from_torch.)

I personally believe it's not worth the added effort and would advise against requiring this from a first-time contribution. When accepting the limited scope of pandas, this PR is almost ready apart from minor documentation issues and the type stub. I therefore suggest to take this PR as a limited but well scoped improvement. We don't paint us in any corner with that. You can generalize in a followup PR.

But I'd require reasonable type stubs

OK, but I dont' think you will get pandas dataframes in a typestub from this PR either, will you? I'm personally pretty 👎 on losing functionality because of type stubbing issues.

jklymak · 2024-09-23T16:51:27Z

lib/matplotlib/table.py

@@ -744,6 +746,13 @@ def table(ax,
        cols = len(cellColours[0])
        cellText = [[''] * cols] * rows

+    # Check if we have a Pandas DataFrame
+    if _is_pandas_dataframe(cellText):


I'd actually make this more generic, and check if cellText has a columns attribute and a to_numpy method. That lets non-pandas objects also work; for instance polars is a pandas competitor, and there is no reason its dataframes could not be passed in here: https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.columns.html
https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.to_numpy.html

Note that polars.DataFrame.columns is of type list[str], so the current implementation using cellText.columns.to_numpy() would fail. These are exactly the subtleties I did not want to go into as part of this first-time contributor PR. Let's start with pandas, which is a clean addition. We can always generalize later, which should likely conform to the dataframe API standard (note that this is still draft)

As a side-remark: The table implementation has lots of design problems. If one was serious about tables, the whole Table would need to be rewritten/replaced. Therefore, I wouldn't spend too much effort on trying to improve implementation.

timhoffm

This needs an update to the table stub in tables.pyi

anijjar · 2024-09-24T21:00:18Z

Sorry for the backtracking, but I think instead of changing the minimum version tests to run everything, you should flag this test to only run when pandas is installed.

Sure, Ill revert my changes and add a flag instead.

Thank you all for the thorough review of my commits. I'm learning a lot👍🏾. About the polars package, I dont mind repeating what I did here and make another PR for it. If it is as popular as you say, I can see the value in adding support.

anijjar · 2024-09-24T21:08:46Z

When trying to commit the tables.pyi file, I get this error from pre-commit.

mypy.....................................................................Failed
- hook id: mypy
- exit code: 1

Warning: Unpack is already enabled by default
lib\matplotlib\table.pyi:71: error: Name "pandas.DataFrame" is not defined  [name-defined]
Found 1 error in 1 file (checked 107 source files)

from this change

cellText: Sequence[Sequence[str]] | 'pandas.DataFrame' | None = ...,

Im not sure how to fix this unless I outright import the pandas module.

lib/matplotlib/tests/test_table.py

timhoffm · 2024-09-25T08:44:55Z

When trying to commit the tables.pyi file, I get this error from pre-commit.
mypy.....................................................................Failed
- hook id: mypy
- exit code: 1

Warning: Unpack is already enabled by default
lib\matplotlib\table.pyi:71: error: Name "pandas.DataFrame" is not defined  [name-defined]
Found 1 error in 1 file (checked 107 source files)
from this change
cellText: Sequence[Sequence[str]] | 'pandas.DataFrame' | None = ...,
Im not sure how to fix this unless I outright import the pandas module.

I believe it should work to

try:
    from pandas import DataFrame
except ImportError:
    DataFrame = None

...

cellText: Sequence[Sequence[str]] | DataFrame | None = ...,

but maybe @QuLogic or @ksunden have a better solution.

story645

Bunch of hopefully small things and also this PR will need two approvals

lib/matplotlib/tests/test_table.py

lib/matplotlib/cbook.py

doc/users/next_whats_new/pass_pandasDataFrame_into_table.rst

ksunden · 2024-09-30T21:56:58Z

lib/matplotlib/table.pyi

-from typing import Any, Literal
+from typing import Any, Literal, TYPE_CHECKING
+
+if TYPE_CHECKING:


Having an if TYPE_CHECKING: in a pyi is kind of redundant, as pyi is only used for type checking.

This does introduce a type-check time requirement on pandas, which I don't fully love, but is not that bad. The if/else here does not protect against not having Pandas (and not even sure a try/except would work in a pyi... it would at least be unusual)

I'll reiterate that I don't think we should be special casing pandas at all, and we should not have a pandas dataframe as a type annotation. Is there anywhere else in the library that we do this?

Should I import the DataFrame method directly?

timhoffm

Following #28726 (comment), I compared with the pandas implementation:

https://github.com/pandas-dev/pandas/blob/5829e3ea20adc978ebfb82f08d3d5347108be0f0/pandas/plotting/_matplotlib/tools.py#L72-L88

We should mingle columns and index into rowLabels and colLabels: If rowLabels / colLabels are not given, use index / columns as the respective labels.

We have two options for handling these with additional explicit labels:

explicit labels take precedence and overwrite index/columns values
we error if both are given.

While 1) could be convenient to overwrite the labels, I'm inclined to go with 2) - in the face of ambiguity, refuse the temptation to guess. We could always expand to 1) later if the need would arise.

story645

I think this works, but I think we should absolutely support overwriting the pandas column and row labels b/c I'm pretty sure that would be my use case at least half the time.

I don't think 1. is guessing b/c they're explicitly passing in labels.

doc/users/next_whats_new/pass_pandasDataFrame_into_table.rst

lib/matplotlib/table.pyi

Cleanup.

doc/users/next_whats_new/pass_pandasDataFrame_into_table.rst

timhoffm · 2024-10-29T08:36:52Z

@story645 adding the support for row/column labels is a straight forward extension that can be added later. I thought it might be nice to get a basic version in (i) because there already has been a lot of discussion and I don't want to overburden the PR with additional tweaks, in particular since this is a first-time contribution, and (ii) because it would be nice to push this into 3.10, which we most likely will miss when doing additional changes.

timhoffm · 2024-10-29T10:00:37Z

I'll go and merge as is. Titles can be handled in a followup PR.

timhoffm · 2024-10-29T10:04:28Z

@anijjar thanks and congratulations on your first contribution to Matplotlib 🎉. We hope to see you again.

Added check for Pandas Dataframe in table.table

b307395

github-actions bot added the topic: table label Sep 17, 2024

timhoffm reviewed Sep 17, 2024

View reviewed changes

lib/matplotlib/table.py Outdated Show resolved Hide resolved

lib/matplotlib/table.py Outdated Show resolved Hide resolved

lib/matplotlib/table.py Outdated Show resolved Hide resolved

anijjar added 2 commits September 18, 2024 15:08

Created _is_pandas_dataframe(x) and modified _unpack_to_numpy(x) to h…

7cb26b1

…andle Pandas DataFrame Objects

Modified table.py to remove Pandas Dependency and use _unpack_to_nump…

10d69f2

…y and _is_pandas_dataframe with associated test.

timhoffm reviewed Sep 19, 2024

View reviewed changes

lib/matplotlib/tests/test_table.py Outdated Show resolved Hide resolved

lib/matplotlib/cbook.py Outdated Show resolved Hide resolved

lib/matplotlib/table.py Outdated Show resolved Hide resolved

lib/matplotlib/tests/test_cbook.py Outdated Show resolved Hide resolved

Adjusted files following timhoffm review on commit 10d69f2

f73fa6a

timhoffm approved these changes Sep 20, 2024

View reviewed changes

added next_whats_new file explaining changes to table

89f79cb

story645 approved these changes Sep 22, 2024

View reviewed changes

doc/users/next_whats_new/pass_pandasDataFrame_into_table.rst Outdated Show resolved Hide resolved

story645 requested changes Sep 22, 2024

View reviewed changes

modified miniver.txt and clarified pass_xxx.rst file

e919949

reverted minver.txt and modified tests.yml to run extra.txt

193a64f

jklymak reviewed Sep 23, 2024

View reviewed changes

doc/users/next_whats_new/pass_pandasDataFrame_into_table.rst Outdated Show resolved Hide resolved

jklymak reviewed Sep 23, 2024

View reviewed changes

doc/users/next_whats_new/pass_pandasDataFrame_into_table.rst Outdated Show resolved Hide resolved

jklymak reviewed Sep 23, 2024

View reviewed changes

doc/users/next_whats_new/pass_pandasDataFrame_into_table.rst Outdated Show resolved Hide resolved

jklymak reviewed Sep 23, 2024

View reviewed changes

timhoffm reviewed Sep 24, 2024

View reviewed changes

reverted tests.yml and modified stub, .rst file, and test

e73c8ae

anijjar requested a review from story645 September 24, 2024 21:18

story645 reviewed Sep 24, 2024

View reviewed changes

lib/matplotlib/tests/test_table.py Outdated Show resolved Hide resolved

story645 mentioned this pull request Sep 24, 2024

doc: add pandas and xarray fixtures to testing API docs #28879

Merged

5 tasks

Modified test+table_dataframe to use the pd test fixture

80a2390

modified stub to include DataFrame

f713806

anijjar requested a review from story645 September 28, 2024 04:30

story645 reviewed Sep 29, 2024

View reviewed changes

ksunden reviewed Sep 30, 2024

View reviewed changes

modified .rst file and the test file following story645 recommendations

707bc72

anijjar requested a review from story645 October 3, 2024 04:43

timhoffm requested changes Oct 6, 2024

View reviewed changes

Modified implementation to follow Pandas tools.py

dbcb91f

timhoffm approved these changes Oct 28, 2024

View reviewed changes

story645 approved these changes Oct 29, 2024

View reviewed changes

timhoffm reviewed Oct 29, 2024

View reviewed changes

doc/users/next_whats_new/pass_pandasDataFrame_into_table.rst Outdated Show resolved Hide resolved

timhoffm reviewed Oct 29, 2024

View reviewed changes

lib/matplotlib/table.pyi Outdated Show resolved Hide resolved

Apply suggestions from code review

1946284

Cleanup.

timhoffm reviewed Oct 29, 2024

View reviewed changes

doc/users/next_whats_new/pass_pandasDataFrame_into_table.rst Outdated Show resolved Hide resolved

Fix code link

ef7d139

timhoffm added this to the v3.10.0 milestone Oct 29, 2024

timhoffm merged commit 8e6d6b6 into matplotlib:main Oct 29, 2024
41 of 42 checks passed

anijjar deleted the feat/28726 branch November 1, 2024 04:06

	cellText : 2D list of str, Pandas.DataFrame, optional
	cellText : 2D list of str or pandas.DataFrame, optional

Uh oh!

Feature: Support passing DataFrames to table.table #28830

Feature: Support passing DataFrames to table.table #28830

Uh oh!

Conversation

anijjar commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR summary

PR checklist

Uh oh!

anijjar commented Sep 17, 2024

PyTest Results

Uh oh!

timhoffm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anijjar commented Sep 18, 2024

Pytest test_table.py::test_table_dataframe

Unit Test

Result

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timhoffm left a comment

Choose a reason for hiding this comment

Uh oh!

story645 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

story645 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anijjar commented Sep 22, 2024

Uh oh!

anijjar commented Sep 23, 2024

Uh oh!

story645 commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timhoffm Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timhoffm left a comment

Choose a reason for hiding this comment

Uh oh!

anijjar commented Sep 24, 2024

Uh oh!

anijjar commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

timhoffm commented Sep 25, 2024

Uh oh!

anijjar commented Sep 17, 2024 •

edited

Loading

story645 left a comment •

edited

Loading

story645 commented Sep 23, 2024 •

edited

Loading

timhoffm Sep 24, 2024 •

edited

Loading

anijjar commented Sep 24, 2024 •

edited

Loading

story645 left a comment •

edited

Loading