-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[DOC]: improve consistency of plot types gallery #26328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor suggestions.
My 2 cents: I find "tabular and linear data" quite odd and partly wrong. I don't think many of our users think in that category. This first section is on basic/fundamental/most common/simple plots. Not more not less. FWIW "Basic" is an ok description (and I cannot come up with something better). Note that plotly also uses the term "basic": https://plotly.com/python/ Releated: IMHO this should focus on the plot type (either visual or semantic), not that much on the input data - even though certain input data types suggest certain visualizations. |
But the input data structure determines the plot type...otherwise the structured and unstructured grid plots should all be together since they're all different types of heatmaps, contour plots, and quivers and barbs are basically specialized scatters... The reason the plots for tabular data structures get grouped together as "basic" is because those are the first plots folks tend to make/first type of data folks work w/. (ETA: Grammer of Graphics presupposes table as fundamental structure for viz data). I don't love that plotly calls it basic either... Eta: if we were actually gonna show just the most basic plot types, that's |
I don't like basic because the current section headings are a mix of the semantic properties, computational properties, data, aesthetics and subjective assessment of the "fundementalness" of the chart. I'm proposing here to focus 'em all to the data side, I can instead focus em all to the chart type side (and then I'd combine the two gridded data sections). I also really like the python graph galleries approach of just the pictures: |
I conceptualize plot types by dimensionality, and that is the general division in the gallery.
where, x, y, and z are 1-D arrays (though of course we often accept higher dimensions, but just as a convenience to repeat the plot N times) and Z is a 2-D array. There are 3-D analogs to many of those, but should be in a separate subsection as they are now. Statistical plots are separate, and I don't think will fit in to any reasonable division, because they are generally composed of the fundamental plots but have a computational front end to them. Of course in the above, x and y can come from any sort of structure the user likes, be it a dictionary, xarray, pandas DataFrame. But from Matplotlib's point of view we only deal with array-like (lists, or numpy arrays or things that can convert to such). |
The point I'm trying to make is that this is the implementation choice mpl made for plotting data with a tabular structure - rows of 0D (discrete) items or points/observations sampled on a 1D line and each column/variable passed in separately . The arrays are 1D, but the data can be 0D or 1D and I really don't want to ID the section by data container shape even if that's the most technically correct. |
Data shape is the most important constraint when plotting data - there is no point in trying to use |
Because data shape is being used as a proxy for the underlying topology (structure) of the data. The actual shape of the array (or that the array is the data container we're using to hold the data) is the wrong thing to focus on b/c the same ds = xr.open_dataset('air.mon.mean.nc')
Z = ds.sel(time='2021-01-01', lon=slice(360-126, 360-55), lat=slice(55, 20)).to_array().squeeze()
xy = ds.sel(time=slice('2020-01-01', '2022-01-01'), lon=slice(360-95, 360-55), lat=slice(45, 35)).to_dataframe()
xyt = xy.pivot_table(columns=['lat', 'lon'], index = 'time', values='air')
fig, axd = plt.subplot_mosaic([['map_map', 'map_time', 'map_time'], ['time_map', 'time_time', 'time_time']], figsize=(15,5))
axd['map_map'].pcolormesh(Z[::-1], cmap='coolwarm')
axd['map_time'].plot(Z)
axd['time_map'].pcolormesh(xyt[::-1], cmap='coolwarm')
axd['time_time'].plot(xyt) |
In the above you have dropped x information from the line plots. The fact it doesn't break is a convenience we allow, not fundamental to the plotting method. |
I mean this in all earnestness - I'm so confused about how this conversation is going that I can't tell if I've been too in the weeds in this stuff and so I'm poorly explaining my baseline or if I'm dropping a lot of middle or if I'm just being really unclear or what. We all agree that plot assumes 1D continuity input and that imshow assumes 2D continuity - that's is viz 101 stuff.
This is the point I'm trying to make - that I didn't actively drop anything but that the function implicitly dropped it due to the assumptions it makes about what an
Going one level back up, the reason I do not want to use array is b/c of term overload. Array in The image is wrong because the structure of the data stored in that |
And my point is that if you start with In your example, you have chosen to forget about the co-ordinates altogether, which is usually a very bad idea and leads to practical, and apparently conceptual, problems. |
plot and imshow got the exact same input
I passed in the exact same data to two functions. What I'd like to make clearer in this PR is what assumptions about the structure of their input these different families of functions are making such that passing in the same data will cause it to be read differently depending on the plot type. Yes like "line plot != heatmap" but what I'm getting at here is giving a bit more guidance on what the data structure looks like for each. The reason for that is that one of the goals of this page is to orient (especially new) users to the plots matplotib can make & it's modeled on the excel "chart types here's what we recommend for what we see". The major reason for including the signatures is to highlight the assumed/implied structure of the data & I just want to push that one level up. |
@story645 At the risk of repeating myself,
The point is that pcolormesh(x, y, Z) retains the |
pie doesn't have a y but it's also not in this section..and going back to the very beginning of this thread, this x versus y is b/c of the underlying assumption of a tabular structure (x_i, y_i) for i in table index.
I don't think we're far apart here. What I'm trying to get at is basically Munzner's taxonomy of data types |
Switched to pairwise and moved tabular into framing. the stats plots are all distribution plots which means it's the structure, but I kept stats in the title cause I figured otherwise I'd confuse folk. Switched coordinate x,y to i,j to differentiated from data x,y. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the motivation to go by the type of the data. The section title texts cover the contents well.
I have a bit of a hard time with the longer explanations and parts of the "structural" literals.
For the literals, please adopt a consistent whitespace formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data descriptions are very subtle and it's difficult to convey precisely what we mean. I've tried to make suggestions to make it as clear as possible.
Gridded data: :math:`Z_{x, y}` | ||
----------------------------- | ||
|
||
Plotting for arrays of data ``Z(x, y)`` and fields ``U(x, y), V(x, y)``. | ||
Plots of arrays and images :math:`Z_{x, y}` and fields :math:`U_{x, y}, V_{x, y}` | ||
on `regular grids <https://en.wikipedia.org/wiki/Regular_grid>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to keep the index notation :math:`Z_{i, j}`
here. All this naming is a bit suble but
- Its confusing to use
Z_{x,y}
both for regular and and irregular grids - the i,j indices hint more at the regular structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change but that's also sort of the point I'm trying to make about structure->That the irregular and regular gridded data are structurally the same -> data on a continuous 2D surface, and the difference is in how that surface is broken up. That's why the plots look visually similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the irregular and regular gridded data are structurally the same
You mean the visualizations look alike (and the data are semantically similar)? The data are structurally different.
for regular grids, you have 2D structures
I know, it's a hassle to be precise here 😄.
Maybe actually putting all the indices indices helps?
Pairwise data
Plots of pairwise
Gridded data
Plots of arrays and images
Irrgularly grided data
Plots of unstructured coordinate grids
3D and volumetric data
Plots of three-dimensional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for regular grids, you have 2D structures
$Z_{i,j}$ (and optionally$X_{i,j}$ ,$Y_{i,j}$ ). For irregular grids, you have 1D structures$x_i$ ,$y_i$ (and optionally$z_i$ ).
For regular and irregular grids, the data can be melted down to (x, y, z) -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically yes, they're structurally different because the topology of an irregular grid is by definition different from the topology of a regular grid, but the difference in those topologies is how a 2D surface is cut up.
So it's in the axiomatic construction of the topology rather than in the surface and this is all new to me so I can't say that w/ a ton of confidence...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the headers w/o math better so I will edit that when I get to a computer. Will see if the explicit index makes things more or less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing the math from the header makes the explanations make more sense because now they're not tied to the model representation in the header, which I think resolves the ambiguity issue better than trying to use another layer of indexing to help clarify.
plot types Co-authored-by: Oscar Gustafsson <[email protected]> Co-authored-by: Tim Hoffmann <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot cleaner now. Thanks for the intense discussion!
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove the If these instructions are inaccurate, feel free to suggest an improvement. |
Thanks for the review! It was helpful not just here but in my general "wait how do I describe data?" struggles. |
There's a big conflict on gallery sorting order, so I won't backport this. |
Changed the plot type headings of the plot type gallery sections so that they'd hopefully be more informative and also generally more consistent in that for the most part they're describing the type of data assumed by each batch of functions. I also tried to have most of the section subheadings follow the same format of describing the assumptions of the structure of the data. Moved stats to be near the discrete/linear plots and unstructured next to gridded to keep the larger structure 0/1D, 2D, 3D.
I'm not completely feeling this in terms of general audience friendly (attn: @esibinga) but think this provides a more useful side panel/toc for navigation.