Thanks to visit codestin.com
Credit goes to github.com

Skip to content

List of lists of categorical data failing: Scatter ravel is performed before _process_unit_info() is called. #27035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

borgesaugusto
Copy link
Contributor

PR summary

Addresses issue #26743 . A possible solution is to flatten the data before passing it to the _base._process_unit_info() function. In this way the behaviour is consitent. However, this creates that the tests in test_category.py::TestPlotTypes related to scatter fail, due to a TypeError not being raised.

This happens because the modification in the _axes.py makes it so that no error is raised and the plots are created (Image below). I don't think that the behaviour of this plots is odd, so I removed the Scatter from the test. This tests were added in PR #9783, but I am not sure why. The plots below show the output of:

ydata=[1, 2]
ax.scatter(xdata, ydata)

and xdata are the test cases (as title of each subplot)

tests

The only possible discrepancy in this plots is the case with ['12', np.nan] vs [12, np.nan]. When 12 is a string, nan is also taken as a string. I don't know how I could avoid this.

If we wished to conserve the tests, another posibility would be to add a check in _axes.py before the _base._process_unit_info() to avoid having to edit _base.
Also, as said in the original issue( #26743 (comment) ), the flattening could be deprecated. In that case what would be the correct implementation? add a warning, and after a few versions then check if its a list of list and raise some exception?)

PR checklist

@borgesaugusto borgesaugusto changed the title Scatter ravel is performed before _process_unit_info() is called. List of lists of categorical data failing: Scatter ravel is performed before _process_unit_info() is called. Oct 8, 2023
@story645
Copy link
Member

When 12 is a string, nan is also taken as a string. I don't know how I could avoid this.

That's expected behavior xref:#19139 for a request to fix that.

Copy link
Member

@story645 story645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on, much appreciated. This definitely needs some tests before it can be merged.

plotters = [Axes.scatter, Axes.bar,
# plotters = [Axes.scatter, Axes.bar,
# pytest.param(Axes.plot, marks=pytest.mark.xfail)]
plotters = [Axes.bar,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please either put scatter back or have an explicit test that this PR does not break the behavior of 1D categorical data.

Also please add a test (preferably [parametized to check both categorical and numerical) that the nested lists work as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I intended the PR as a way of discussing the changes more hands-on.

Sorry, I didn't understand what you mean. The reason I removed the scatter from the testing is that the test will fail, as there is no more TypeError raised after the changes.

@pytest.mark.parametrize("plotter", plotters)
@pytest.mark.parametrize("xdata", fvalues, ids=fids)
def test_mixed_type_exception(self, plotter, xdata):
ax = plt.figure().subplots()
with pytest.raises(TypeError):
plotter(ax, xdata, [1, 2])

This code now executes without errors. The output is the scatters I included in the PR message to show that they seem to behave as expected, to make the case for removing the test, But I am not sure why they were there in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the plotters list is used to parameterize multiple tests, so removing scatters from the list removes scatter from a couple of tests and not just this one. This test is to check that scatter errors out when passed a list of ints and strings together [1, 2, 'A']. We should decide if [['A', 'B'], [1,2]] is valid input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I hope I interpreted correctly what you suggested.

I reinserted the scatter, but marked it as xfail to keep track of the change, following what was used in the plot case (L259). This is only for the failing cases scatter (L257 of the test_category.py). I believe the proper scatter tests are the ones parametrized in:

PLOT_LIST = [Axes.scatter, Axes.plot, Axes.bar]
PLOT_IDS = ["scatter", "plot", "bar"]

I added the test for the nested lists to see that either the offset (for numerical values ) or the text (for categorical) is the expected result.


categorical_examples = [("nested categorical", [["a", "b"], ["c", "d"]]),
("nested with nan", [['0', np.nan], ["aa", "bb"]]),
("nested mixed", [[1, 'a'], ['b', np.nan]])]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By mixed, I meant what happens for [[1,2], ['a', 'b']] -> they all get cased to the same type b/c they get raveled into one list, so what happens then?

@story645
Copy link
Member

story645 commented Nov 5, 2023

hi @borgesaugusto sorry it took so long to review this, are you planning to pick it back up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: scatter plot fails for list of lists with categorical data
3 participants