-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[ENH]: plt.scatter() parameters are extremely confusing #27765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Size
The only thing one could debate here is whether one wants to add an additional alias I see your issue but I don't see a way forward to make the API more clear while maintaining backward compatibility. |
I understand the motivations, historical and not, that you have provided me for the current situation. But this does not change the fact that the current situation is confusing. Am I supposed to think "oh ok, "It's not reasonable to add a shortcut "I claim it's not good to have multiple parameter names for the same thing." -> But you already have that: The ideal solution for this messy situation would be to add I am sorry if i sound a little rough, but this problem frustrates me so much, i think i have googled how to set the properties of a scatter plot at least 100 times in my life. |
See also #1101. |
Indeed! The inconsistency of |
Isn't the real problem |
Honestly also trips me up all the time that scatter doesn't have marker{size, facecolor, edgecolor} & I agree consistency on marker setting across the methods would be nice. |
The only reason to use |
We mention this as an optimization, but |
Sure, but it would also be misleading. The |
A) I have lots of situations where I don't want vector size but want vector color (sometimes also the reverse) |
This interpretation is also the one that the majority of new users have (me included). |
@francescoboc answering to #27765 (comment): Of couse you're not supposed to think these convoluted thoughts on every plot, my explanation was rather to motivate where the current state comes from.I advocate to think like this:
On a side note, but not something you have to know/remember:
I'm very cautious with general claims what our users think - we usually don't have reliable data about that. I also think this is not a good interpretation and should not be advertised (@story645 I know you differ here). " One may argue whether the visual-based interpretation (line/marker) is simpler than the data-based interpretation. But whether we like it or not, the implementation of the functions matches the data-based interpretation, and I think we're not doing the users a service when we try to retrofit the visual-based interpretation. |
I would go further and say this interpretation is completely wrong (even though it may be common) and we should actively write the docs in a way that goes against this misunderstanding. |
@timhoffm As for the "reliable data on what other users think", ok I admit that I have not done an official survey, but at least this is my experience from talking to coworkers and colleagues in academia. |
@francescoboc there has to be a default, and
For sure, if there are places where we could differentiate better that would be welcome. I'll also point out that it is quite easy to write wrappers around our API for your own API ( |
Yes, sorry I wrote it quickly and I forgot the marker parameter. What I would normally use is The main confusion for me comes from the fact that, intuitively, if I want to make a simple scatter plot (with all points having the same properties) I use |
(Perhaps I can take advantage of this discussion to try and revive #14174, by the way?) |
Ok fine it's wrong but I don't think this is possible to correct in docs b/c line and scatter have fundementally different semantics and our defaults highlight those different semantics
So I don't see how we sell people on "so yes the default of plot is a line plot but use plot when trying to make a line plot or a very specific type of scatter plot, but use scatter when making every single other type of scatter plot and oh yeah it can make the specific type of scatter you're trying to make too." For the record, it also frustrates me that stackplot is either an areaplot or a streamgraph, but that at least is b/c they're originating from the same paper.
Yeah, I think that's really terrible for consistency and we have #25259 for that reason |
It probably doesn't help that |
Would it be worth adding a second entry for |
Friends, the relation between We only have very limited possibilities to change the API and naming due to backward-compatibility. IMHO we can help the users most by proper description in the documentation. In particular that means not primarily associtating |
Cross post 😄 . Yes! See my comment above. |
Frankly, I think the API mismatch issue between I'm not trying to complain, I'm just wondering why we're insisting on what we'd normally consider bad API:
When we have the simple usability/less confusing out of recommending Like I think @rcomer 's thumbnail example only highlights this issue of heavy somewhat confusing overlap. I think adding the line example makes more sense b/c it highlights what I think is the primary purpose of |
Just to confuse things further, I recently changed some code from using |
You're right, and I think scatter should support the half-filled b/c I don't think we should have inconsistency in the markers we support - we've done the other way and allowed ETA: w/ the caveat that I think the reason we don't support it has to do w/ the technical implementation of the markers such that I recognize it may be hard/technically impossible to implement half-filled and respect scatter semantics. |
I don't think it's a misuse of
I don't think anyone is encouraging that, necessarily. If they want to move from |
Inspired from the discussion in matplotlib#27765: We should visually communicate that `plot()` covers all three variants: markers only, line+markers, line-only. They are visually distinct enough that it's not possible to infer the variants if you see only one. In particular, it's important to communicate that you can draw markers only. We don't want to automatically drive people who want markers (e.g. some discrete measurements of a dependent variable y (x)) to scatter because that's the only one showing discrete markers in the overview.
While this is getting quite off-topic, I just want to note that you can have half-filled markers in
So the fundamental mechanism is in place. I suspect however that styling is limited, because the colors and linewidths are not exposed through kwargs. |
Inspired from the discussion in matplotlib#27765: We should visually communicate that `plot()` covers all three variants: markers only, line+markers, line-only. They are visually distinct enough that it's not possible to infer the variants if you see only one. In particular, it's important to communicate that you can draw markers only. We don't want to automatically drive people who want markers (e.g. some discrete measurements of a dependent variable y (x)) to scatter because that's the only one showing discrete markers in the overview.
So I kinda agree here in that I think if scatter were to get a markersize keyword, it should be in the same units as |
A fundamental issue with the marker handling in I therefore recommend to make a marker-aware subclass Note however, that even then there will remain some rough edges. For example, the rcParams for markers are in the “lines” subgroup. |
Inspired from the discussion in matplotlib#27765: We should visually communicate that `plot()` covers all three variants: markers only, line+markers, line-only. They are visually distinct enough that it's not possible to infer the variants if you see only one. In particular, it's important to communicate that you can draw markers only. We don't want to automatically drive people who want markers (e.g. some discrete measurements of a dependent variable y (x)) to scatter because that's the only one showing discrete markers in the overview.
Problem
Everytime I have to change the marker, size or color of a scatter plot I have to google the parameters because they are impossible to remember. Why is it that:
color
orc
parameter,marker
, while the shortened versionm
does not work,s
works but notsize
!!?So confusing!
Proposed solution
Define parameters to customize the scatter plot more consistently.
The text was updated successfully, but these errors were encountered: