-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Discrete scatter? #6802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you please be more specific? Is it something like #6214? 'cause for that, I'm proposing a dict Colormap/Norm of the form:
since scatter and imshow both take cmap and norm, that could probably handle the separations...and then possibly a helper legend of sorts that does legend_from_cmapnorm or something? (or hooking it directly into legend...but I expect more pushback there...) |
Which part is unclear? I implemented both the "legend from colormap" helper and the for-loop approach. I think the for-loop is much cleaner. And I think the right way to go about this is to use cyclers (though I'm not entirely sure yet). The question is whether you want symbols and colors to be able to use the same variable or different variables. In my application, I'm currently using the same variable for both. But having different variables to go 4d or beyond also makes sense. |
Though I guess if you support discrete variables for any of the mutable attributes you can just register one cycler for each, and you can do both things. |
Sorry, didn't realize you'd already implemented it. (That's what was unclear...I thought you were making a feature request) There was a discussion with @tacaswell about making a bunch of the attributes cyclers to support stuff like when an array gets passed in..hmm, I'm wondering how you can keep them independent if they're all cyclers cause wouldn't the cyclers still be acting on the same aggregations? (like row 0 of the passed in array all gets the first color, symbol, size in the cycler, etc... I'm partial to a categorical cmap 'cause it would work for both imshow and scatter and it has the advantage of not requiring the user to group their data-pretty sure a cycler would require some preaggregation. ETA: this has me thinking about a function that probably doesn't belong in matplotlib but lets you define the group colors, markers, sizes, etc...and then plots each subset accordingly...but this would also probably get visually noisy really fast |
Well either solution if implemented for my purposes is like 10 lines. Implementing them feature-complete in matplotlib is probably more ;) Yeah the aggregation for the cycler is a bit of a bummer. |
Hmm, categorical1, 2, and 3 would either be dicts, tuples, or nested lists (or I guess enumerated types)? Something with a ( But any which way, I dunno if that could be supported straight or if it's the sort of thing that should be pitched to seaborne (if they don't already support it...) since I think each (c,s,m) triplet acts on a subset of the data...and I dunno if all that subsetting breaks the matplotlib rule of sorts of trying to avoid manipulating the data (yes hist manipulates the data, but it's sort of in matplotlib for matlab/historical reasons and there's an issue/PR about factoring out the plotting there) |
I was thinking about What would the dict look like? Another approach would be to do something like |
I'm pretty sure that's just 'cause then color gets visualized using a cmap. s can already take an array for the same reason-it just gets directly mapped. Markers though don't have a continuous mapping, so it would require a) ensuring the input was discrete, b) coming up with a logical scheme for marker mapping (and what if np.unique(markers)<number(available(markers)) I'm thinking the data structure in these cases is something like (catagory name, category style element)
Granted, I also feel like I'm conflating two use cases here: ungrouped: user is passing in cols C1 and C2 of a table wherein cols C3, C4, and C5 are discrete variables that the data can be grouped on, or they're passing in a matrix of classes - this is the use case for [(key, value)] based kwargs (but on the flip side, this also fails in that there'd have to be a way to grab the values in C3, C4, and C5 too...) |
The solution is a bit non-obvious. Happy to provide feedback on my use-cases, but I have no time to dive deeper into a solution for now. |
You can use seaborn for this. |
I might really need to move to seaborn (like many others did). |
It's written on top of mpl/takes an mpl axis, so you can always mix and match as needed. |
This is kind of a dupe of #6214 |
I agree. Feel free to close in favor of #6214. |
Hey.
So I briefly talked with @tacaswell about this. I mostly use scatter to show a discrete third variable via color (say I plot weight vs height and want to color by gender). The current scatter is not entirely equipped for this, because it doesn't really allow me to create the legend that I want, and I doesn't allow me to use cyclers on the symbols or color.
What I want can be relatively trivially implemented as a for-loop over the unique values of the discrete variable, and calling plot once for each.
I use this A LOT, and other data sciency folks, too, I think, so it would be nice to have a one-liner for this, i.e. implement a function that does that for you.
My question is:
Should this go into matplotlib or somewhere else?
If it should go into matplotlib, should it go into scatter or into a new function?
The text was updated successfully, but these errors were encountered: