-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
List of lists of categorical data failing: Scatter ravel is performed before _process_unit_info() is called. #27035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
That's expected behavior xref:#19139 for a request to fix that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on, much appreciated. This definitely needs some tests before it can be merged.
plotters = [Axes.scatter, Axes.bar, | ||
# plotters = [Axes.scatter, Axes.bar, | ||
# pytest.param(Axes.plot, marks=pytest.mark.xfail)] | ||
plotters = [Axes.bar, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please either put scatter back or have an explicit test that this PR does not break the behavior of 1D categorical data.
Also please add a test (preferably [parametized to check both categorical and numerical) that the nested lists work as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I intended the PR as a way of discussing the changes more hands-on.
Sorry, I didn't understand what you mean. The reason I removed the scatter from the testing is that the test will fail, as there is no more TypeError raised after the changes.
matplotlib/lib/matplotlib/tests/test_category.py
Lines 261 to 266 in 554b784
@pytest.mark.parametrize("plotter", plotters) | |
@pytest.mark.parametrize("xdata", fvalues, ids=fids) | |
def test_mixed_type_exception(self, plotter, xdata): | |
ax = plt.figure().subplots() | |
with pytest.raises(TypeError): | |
plotter(ax, xdata, [1, 2]) |
This code now executes without errors. The output is the scatters I included in the PR message to show that they seem to behave as expected, to make the case for removing the test, But I am not sure why they were there in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the plotters list is used to parameterize multiple tests, so removing scatters from the list removes scatter from a couple of tests and not just this one. This test is to check that scatter errors out when passed a list of ints and strings together [1, 2, 'A']. We should decide if [['A', 'B'], [1,2]] is valid input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I hope I interpreted correctly what you suggested.
I reinserted the scatter, but marked it as xfail to keep track of the change, following what was used in the plot case (L259). This is only for the failing cases scatter (L257 of the test_category.py). I believe the proper scatter tests are the ones parametrized in:
matplotlib/lib/matplotlib/tests/test_category.py
Lines 119 to 120 in 2a54864
PLOT_LIST = [Axes.scatter, Axes.plot, Axes.bar] | |
PLOT_IDS = ["scatter", "plot", "bar"] |
I added the test for the nested lists to see that either the offset (for numerical values ) or the text (for categorical) is the expected result.
554b784
to
2a54864
Compare
|
||
categorical_examples = [("nested categorical", [["a", "b"], ["c", "d"]]), | ||
("nested with nan", [['0', np.nan], ["aa", "bb"]]), | ||
("nested mixed", [[1, 'a'], ['b', np.nan]])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By mixed, I meant what happens for [[1,2], ['a', 'b']] -> they all get cased to the same type b/c they get raveled into one list, so what happens then?
hi @borgesaugusto sorry it took so long to review this, are you planning to pick it back up? |
PR summary
Addresses issue #26743 . A possible solution is to flatten the data before passing it to the _base._process_unit_info() function. In this way the behaviour is consitent. However, this creates that the tests in test_category.py::TestPlotTypes related to scatter fail, due to a TypeError not being raised.
This happens because the modification in the _axes.py makes it so that no error is raised and the plots are created (Image below). I don't think that the behaviour of this plots is odd, so I removed the Scatter from the test. This tests were added in PR #9783, but I am not sure why. The plots below show the output of:
and xdata are the test cases (as title of each subplot)
The only possible discrepancy in this plots is the case with ['12', np.nan] vs [12, np.nan]. When 12 is a string, nan is also taken as a string. I don't know how I could avoid this.
If we wished to conserve the tests, another posibility would be to add a check in _axes.py before the _base._process_unit_info() to avoid having to edit _base.
Also, as said in the original issue( #26743 (comment) ), the flattening could be deprecated. In that case what would be the correct implementation? add a warning, and after a few versions then check if its a list of list and raise some exception?)
PR checklist