Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Discrete scatter? #6802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Jul 19, 2016 · 14 comments
Closed

Discrete scatter? #6802

amueller opened this issue Jul 19, 2016 · 14 comments

Comments

@amueller
Copy link
Contributor

Hey.
So I briefly talked with @tacaswell about this. I mostly use scatter to show a discrete third variable via color (say I plot weight vs height and want to color by gender). The current scatter is not entirely equipped for this, because it doesn't really allow me to create the legend that I want, and I doesn't allow me to use cyclers on the symbols or color.
What I want can be relatively trivially implemented as a for-loop over the unique values of the discrete variable, and calling plot once for each.
I use this A LOT, and other data sciency folks, too, I think, so it would be nice to have a one-liner for this, i.e. implement a function that does that for you.

My question is:
Should this go into matplotlib or somewhere else?
If it should go into matplotlib, should it go into scatter or into a new function?

@story645
Copy link
Member

story645 commented Jul 19, 2016

Can you please be more specific? Is it something like #6214?

'cause for that, I'm proposing a dict Colormap/Norm of the form:

cmap, norm = mcolors.DictColor({'snow':'gray', 'rain':'blue, 'broken detector':red})

since scatter and imshow both take cmap and norm, that could probably handle the separations...and then possibly a helper legend of sorts that does legend_from_cmapnorm or something? (or hooking it directly into legend...but I expect more pushback there...)

@amueller
Copy link
Contributor Author

Which part is unclear?
Yes, I think it is a duplicate of #6214.

I implemented both the "legend from colormap" helper and the for-loop approach. I think the for-loop is much cleaner. And I think the right way to go about this is to use cyclers (though I'm not entirely sure yet).

The question is whether you want symbols and colors to be able to use the same variable or different variables. In my application, I'm currently using the same variable for both. But having different variables to go 4d or beyond also makes sense.

@amueller
Copy link
Contributor Author

Though I guess if you support discrete variables for any of the mutable attributes you can just register one cycler for each, and you can do both things.

@story645
Copy link
Member

story645 commented Jul 19, 2016

Sorry, didn't realize you'd already implemented it. (That's what was unclear...I thought you were making a feature request)

There was a discussion with @tacaswell about making a bunch of the attributes cyclers to support stuff like when an array gets passed in..hmm, I'm wondering how you can keep them independent if they're all cyclers cause wouldn't the cyclers still be acting on the same aggregations? (like row 0 of the passed in array all gets the first color, symbol, size in the cycler, etc...

I'm partial to a categorical cmap 'cause it would work for both imshow and scatter and it has the advantage of not requiring the user to group their data-pretty sure a cycler would require some preaggregation.

ETA: this has me thinking about a function that probably doesn't belong in matplotlib but lets you define the group colors, markers, sizes, etc...and then plots each subset accordingly...but this would also probably get visually noisy really fast

@amueller
Copy link
Contributor Author

Well either solution if implemented for my purposes is like 10 lines. Implementing them feature-complete in matplotlib is probably more ;)
So it is a feature-request mixed with volunteering to contribute ;)

Yeah the aggregation for the cycler is a bit of a bummer.
But you could do
python scatter(x, y, c=categorical1, s=categorical2, marker=categorical3)
and different variables would get different cyclers.

@story645
Copy link
Member

story645 commented Jul 19, 2016

But you could do python scatter(x, y, c=categorical1, s=categorical2, marker=categorical3)

Hmm, categorical1, 2, and 3 would either be dicts, tuples, or nested lists (or I guess enumerated types)? Something with a (key:value) structure..which granted, is the same thing that the colormap that I'm calling dict but should likely be a CategoricalColor that accepts anything that supports (key,value) enumeration (sorry, stream of conscious fleshing out 'cause better mpl categorical support is my gsoc project and working through what I wanna propose next)

But any which way, I dunno if that could be supported straight or if it's the sort of thing that should be pitched to seaborne (if they don't already support it...) since I think each (c,s,m) triplet acts on a subset of the data...and I dunno if all that subsetting breaks the matplotlib rule of sorts of trying to avoid manipulating the data (yes hist manipulates the data, but it's sort of in matplotlib for matlab/historical reasons and there's an issue/PR about factoring out the plotting there)

@amueller
Copy link
Contributor Author

I was thinking about categoricalX to be numpy arrays or lists of integers. That currently works with color, and I don't see why that shouldn't work with others. That is problematic for s and marker, though, because there integers already have semantics.
That is one of the question of "should this go in the same function".
It would also be possible to add new kwargs like c_cycle or something with a better name, to use numpy arrays of integers. Actually, they wouldn't need to be integers, they could be anything.

What would the dict look like?

Another approach would be to do something like marker=apply_cycler(categorical3, my_cycler) which uses the cycler to assign markers to the entries of categorical. That throws away the values of categorical3, though, so that is no good for generating the labels / legends.

@story645
Copy link
Member

story645 commented Jul 19, 2016

That currently works with color,

I'm pretty sure that's just 'cause then color gets visualized using a cmap. s can already take an array for the same reason-it just gets directly mapped.

Markers though don't have a continuous mapping, so it would require a) ensuring the input was discrete, b) coming up with a logical scheme for marker mapping (and what if np.unique(markers)<number(available(markers))

I'm thinking the data structure in these cases is something like (catagory name, category style element)

{'early':yellow', 'middle':orange, 'late':red}
{'early':'+', 'middle':'.', 'late':'o'}

Granted, I also feel like I'm conflating two use cases here:
grouped: user is passing in an N (groups) by M (observations matrix for which cyclers of N elements make sense - this is the use case for cyclers for all the kwargs (and is common for line plots especially)

ungrouped: user is passing in cols C1 and C2 of a table wherein cols C3, C4, and C5 are discrete variables that the data can be grouped on, or they're passing in a matrix of classes - this is the use case for [(key, value)] based kwargs (but on the flip side, this also fails in that there'd have to be a way to grab the values in C3, C4, and C5 too...)

@amueller
Copy link
Contributor Author

The solution is a bit non-obvious. Happy to provide feedback on my use-cases, but I have no time to dive deeper into a solution for now.

@Phlya
Copy link
Contributor

Phlya commented Jul 20, 2016

@tacaswell tacaswell added this to the 2.1 (next point release) milestone Jul 20, 2016
@amueller
Copy link
Contributor Author

I might really need to move to seaborn (like many others did).

@story645
Copy link
Member

It's written on top of mpl/takes an mpl axis, so you can always mix and match as needed.

@tacaswell tacaswell modified the milestones: 2.1 (next point release), 2.2 (next next feature release) Oct 3, 2017
@story645
Copy link
Member

story645 commented Feb 5, 2018

This is kind of a dupe of #6214

@amueller
Copy link
Contributor Author

amueller commented Feb 5, 2018

I agree. Feel free to close in favor of #6214.

@jklymak jklymak closed this as completed Feb 5, 2018
@QuLogic QuLogic modified the milestones: needs sorting, v2.2.0 Feb 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants