List of lists of categorical data failing: Scatter ravel is performed before _process_unit_info() is called. #27035

borgesaugusto · 2023-10-08T16:56:40Z

PR summary

Addresses issue #26743 . A possible solution is to flatten the data before passing it to the _base._process_unit_info() function. In this way the behaviour is consitent. However, this creates that the tests in test_category.py::TestPlotTypes related to scatter fail, due to a TypeError not being raised.

This happens because the modification in the _axes.py makes it so that no error is raised and the plots are created (Image below). I don't think that the behaviour of this plots is odd, so I removed the Scatter from the test. This tests were added in PR #9783, but I am not sure why. The plots below show the output of:

ydata=[1, 2]
ax.scatter(xdata, ydata)

and xdata are the test cases (as title of each subplot)

The only possible discrepancy in this plots is the case with ['12', np.nan] vs [12, np.nan]. When 12 is a string, nan is also taken as a string. I don't know how I could avoid this.

If we wished to conserve the tests, another posibility would be to add a check in _axes.py before the _base._process_unit_info() to avoid having to edit _base.
Also, as said in the original issue( #26743 (comment) ), the flattening could be deprecated. In that case what would be the correct implementation? add a warning, and after a few versions then check if its a list of list and raise some exception?)

PR checklist

"closes [Bug]: scatter plot fails for list of lists with categorical data #26743" is in the body of the PR description to link the related issue
new and changed code is tested
Plotting related features are demonstrated in an example
New Features and API Changes are noted with a directive and release note
Documentation complies with general and docstring guidelines

story645 · 2023-10-10T01:57:03Z

When 12 is a string, nan is also taken as a string. I don't know how I could avoid this.

That's expected behavior xref:#19139 for a request to fix that.

story645

Thanks for taking this on, much appreciated. This definitely needs some tests before it can be merged.

story645 · 2023-10-10T01:59:02Z

lib/matplotlib/tests/test_category.py

-    plotters = [Axes.scatter, Axes.bar,
+    # plotters = [Axes.scatter, Axes.bar,
+    #             pytest.param(Axes.plot, marks=pytest.mark.xfail)]
+    plotters = [Axes.bar,


please either put scatter back or have an explicit test that this PR does not break the behavior of 1D categorical data.

Also please add a test (preferably [parametized to check both categorical and numerical) that the nested lists work as expected.

Thanks for the feedback! I intended the PR as a way of discussing the changes more hands-on.

Sorry, I didn't understand what you mean. The reason I removed the scatter from the testing is that the test will fail, as there is no more TypeError raised after the changes.

matplotlib/lib/matplotlib/tests/test_category.py

Lines 261 to 266 in 554b784

@pytest.mark.parametrize("plotter", plotters)

@pytest.mark.parametrize("xdata", fvalues, ids=fids)

def test_mixed_type_exception(self, plotter, xdata):

ax = plt.figure().subplots()

with pytest.raises(TypeError):

plotter(ax, xdata, [1, 2])

This code now executes without errors. The output is the scatters I included in the PR message to show that they seem to behave as expected, to make the case for removing the test, But I am not sure why they were there in the first place.

the plotters list is used to parameterize multiple tests, so removing scatters from the list removes scatter from a couple of tests and not just this one. This test is to check that scatter errors out when passed a list of ints and strings together [1, 2, 'A']. We should decide if [['A', 'B'], [1,2]] is valid input.

Hi! I hope I interpreted correctly what you suggested.

I reinserted the scatter, but marked it as xfail to keep track of the change, following what was used in the plot case (L259). This is only for the failing cases scatter (L257 of the test_category.py). I believe the proper scatter tests are the ones parametrized in:

matplotlib/lib/matplotlib/tests/test_category.py

Lines 119 to 120 in 2a54864

PLOT_LIST = [Axes.scatter, Axes.plot, Axes.bar]

PLOT_IDS = ["scatter", "plot", "bar"]

I added the test for the nested lists to see that either the offset (for numerical values ) or the text (for categorical) is the expected result.

story645 · 2023-10-15T20:27:55Z

lib/matplotlib/tests/test_category.py

+
+categorical_examples = [("nested categorical", [["a", "b"], ["c", "d"]]),
+                        ("nested with nan", [['0', np.nan], ["aa", "bb"]]),
+                        ("nested mixed", [[1, 'a'], ['b', np.nan]])]


By mixed, I meant what happens for [[1,2], ['a', 'b']] -> they all get cased to the same type b/c they get raveled into one list, so what happens then?

story645 · 2023-11-05T19:20:00Z

hi @borgesaugusto sorry it took so long to review this, are you planning to pick it back up?

borgesaugusto changed the title ~~Scatter ravel is performed before _process_unit_info() is called.~~ List of lists of categorical data failing: Scatter ravel is performed before _process_unit_info() is called. Oct 8, 2023

melissawm added topic: categorical topic: plotting methods labels Oct 9, 2023

story645 requested changes Oct 10, 2023

View reviewed changes

Scatter ravel is performed before _process_unit_info() is called.

2a54864

borgesaugusto force-pushed the iss_26743 branch from 554b784 to 2a54864 Compare October 14, 2023 13:42

borgesaugusto requested a review from story645 October 15, 2023 15:48

story645 requested changes Oct 15, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

List of lists of categorical data failing: Scatter ravel is performed before _process_unit_info() is called. #27035

List of lists of categorical data failing: Scatter ravel is performed before _process_unit_info() is called. #27035

Uh oh!

borgesaugusto commented Oct 8, 2023

Uh oh!

story645 commented Oct 10, 2023

Uh oh!

story645 left a comment

Uh oh!

story645 Oct 10, 2023

Uh oh!

borgesaugusto Oct 10, 2023

Uh oh!

story645 Oct 10, 2023

Uh oh!

borgesaugusto Oct 14, 2023

Uh oh!

story645 Oct 15, 2023

Uh oh!

story645 commented Nov 5, 2023

Uh oh!

Uh oh!

	@pytest.mark.parametrize("plotter", plotters)
	@pytest.mark.parametrize("xdata", fvalues, ids=fids)
	def test_mixed_type_exception(self, plotter, xdata):
	ax = plt.figure().subplots()
	with pytest.raises(TypeError):
	plotter(ax, xdata, [1, 2])

	PLOT_LIST = [Axes.scatter, Axes.plot, Axes.bar]
	PLOT_IDS = ["scatter", "plot", "bar"]

Uh oh!

List of lists of categorical data failing: Scatter ravel is performed before _process_unit_info() is called. #27035

Are you sure you want to change the base?

List of lists of categorical data failing: Scatter ravel is performed before _process_unit_info() is called. #27035

Uh oh!

Conversation

borgesaugusto commented Oct 8, 2023

PR summary

PR checklist

Uh oh!

story645 commented Oct 10, 2023

Uh oh!

story645 left a comment

Choose a reason for hiding this comment

Uh oh!

story645 Oct 10, 2023

Choose a reason for hiding this comment

Uh oh!

borgesaugusto Oct 10, 2023

Choose a reason for hiding this comment

Uh oh!

story645 Oct 10, 2023

Choose a reason for hiding this comment

Uh oh!

borgesaugusto Oct 14, 2023

Choose a reason for hiding this comment

Uh oh!

story645 Oct 15, 2023

Choose a reason for hiding this comment

Uh oh!

story645 commented Nov 5, 2023

Uh oh!

Uh oh!