Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix scatterplot categorical support #9705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 7, 2017

Conversation

jklymak
Copy link
Member

@jklymak jklymak commented Nov 6, 2017

Fix Categorical Scatterplot support

PR Summary

Fixes #9700

import matplotlib.pyplot as plt
plt.scatter(["a", "b"], [0,2])
plt.scatter(["a","c"], [1,4])

failed. New fix adds new categories if needed when convert is called.

Not sure if this is right, so feel free to scratch. Passes the tests and examples, so I guess its OK.

PR Checklist

  • Has Pytest style unit tests (old ones)
  • Code is PEP 8 compliant

@jklymak jklymak requested a review from story645 November 6, 2017 21:22
@dstansby dstansby added this to the v2.1.1 milestone Nov 6, 2017
@jklymak
Copy link
Member Author

jklymak commented Nov 6, 2017

Interestingly scatter doesn't appear to set the major tick labels?

import matplotlib
matplotlib.use('Qt5Agg')
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.scatter([1., 0.], [0., 3.])
for tt in ax.xaxis.get_majorticklabels():
    print(tt)

Returns:

Text(0,0,'')
Text(0,0,'')
Text(0,0,'')

Etc..

EDIT: Ooops, need to call fig.canvas.draw() to get the ticks updated. Learn something new every day...

Added a new test to test_categorical.py to test for this case

@jklymak
Copy link
Member Author

jklymak commented Nov 6, 2017

Also fixes #9494

Almost fixed

Almost fixed

Almost fixed

Added test

Added test

Added test

Added test
@tacaswell
Copy link
Member

Two birds with one stone is nice!

Copy link
Member

@story645 story645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Wouldn't know how to fix this 'cause I'm still confused as to why stuff is hitting the convert path without ever being registered.

@dstansby dstansby merged commit 2a2330e into matplotlib:master Nov 7, 2017
tacaswell added a commit that referenced this pull request Nov 7, 2017
@jklymak
Copy link
Member Author

jklymak commented Nov 7, 2017

@story645 I don't quite know what you mean by "registered". The converter gets initialiazed and attached to the xaxis object. But subsequent calls don't get sent to __init__, they get sent to convert so convert has to be able to handle new categories. I think... If thats wrong, and there is a better solution, then by all means we should revisit it.

I'm uneasy about it all because I dont' quite understand why axes.plot worked in the first place. This doesn't break axes.plot, but its strange that it worked but others did not

@jklymak jklymak deleted the fixcategorical branch November 7, 2017 17:07
@jklymak
Copy link
Member Author

jklymak commented Nov 7, 2017

OK, I don't like this at all:

scatter does _process_unit_info(xdata=x, ydata=y, kwargs=kwargs) which only does xaxis.update_units if the xaxis doesn't have units already:

in _base.py:

if xdata is not None:
            # we only need to update if there is nothing set yet.
            if not self.xaxis.have_units():
                self.xaxis.update_units(xdata)

So thats why scatter wasn't updapting for categoricals, because the axis already had its units assigned.

Plot, on the other hand, calls:

_process_plot_var_args():

which always calls _xy_from_xy(self, x, y) which I am pretty sure always calls: self.axes.xaxis.update_units(x) regardless of whether the units have been already set.

So, plot has significantly different units handling than other plotting functions; plot calls update_units no matter what, where as other functions that run through _process_unit_info(xdata=x, ydata=y, kwargs=kwargs) do not.

  1. I don't like the inconsistency, and I somewhat think that plot is in the wrong here and that this could lead to issues like datetime being mixed with non-datetime trying to work and failing.
  2. I think that what we are doing in this PR for categoricals is OK, but really just in the case of categoricals where the mapping is one-to-one rather than a real scaling.

@story645
Copy link
Member

story645 commented Nov 7, 2017

I've also faced this inconsistency. When I tried to use an empty array for the conversions, this worked:

fig, ax = plt.subplots()
ax.plot(["!", "0"], [1, 2], ".")

but this didn't

fig, ax = plt.subplots()
ax.scatter(["!", "0"], [1, 2])

and in both cases the convertor yields array([ 0., 1.])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subsequent calls to plt.scatter with different categories raise ValueError
4 participants