[DOC]: improve consistency of plot types gallery #26328

story645 · 2023-07-17T04:47:48Z

Changed the plot type headings of the plot type gallery sections so that they'd hopefully be more informative and also generally more consistent in that for the most part they're describing the type of data assumed by each batch of functions. I also tried to have most of the section subheadings follow the same format of describing the assumptions of the structure of the data. Moved stats to be near the discrete/linear plots and unstructured next to gridded to keep the larger structure 0/1D, 2D, 3D.

I'm not completely feeling this in terms of general audience friendly (attn: @esibinga) but think this provides a more useful side panel/toc for navigation.

oscargus

Some minor suggestions.

galleries/plot_types/3D/README.rst

galleries/plot_types/basic/README.rst

galleries/plot_types/stats/README.rst

timhoffm · 2023-07-19T16:18:39Z

My 2 cents: I find "tabular and linear data" quite odd and partly wrong. I don't think many of our users think in that category. This first section is on basic/fundamental/most common/simple plots. Not more not less. FWIW "Basic" is an ok description (and I cannot come up with something better). Note that plotly also uses the term "basic": https://plotly.com/python/

Releated: IMHO this should focus on the plot type (either visual or semantic), not that much on the input data - even though certain input data types suggest certain visualizations.

story645 · 2023-07-19T17:21:43Z

Releated: IMHO this should focus on the plot type (either visual or semantic), not that much on the input data

But the input data structure determines the plot type...otherwise the structured and unstructured grid plots should all be together since they're all different types of heatmaps, contour plots, and quivers and barbs are basically specialized scatters...

The reason the plots for tabular data structures get grouped together as "basic" is because those are the first plots folks tend to make/first type of data folks work w/. (ETA: Grammer of Graphics presupposes table as fundamental structure for viz data).

I don't love that plotly calls it basic either...

Eta: if we were actually gonna show just the most basic plot types, that's
a bar chart, scatter chart, line chart, and heat map. Most everything else is kind of a derivative/specialization of those. (ETA: Semiology of Graphics, Chapter 2 is the fundamental properties of graphics and how to combine them and you basically get these 4).

story645 · 2023-07-19T17:46:55Z

I don't like basic because the current section headings are a mix of the semantic properties, computational properties, data, aesthetics and subjective assessment of the "fundementalness" of the chart. I'm proposing here to focus 'em all to the data side, I can instead focus em all to the chart type side (and then I'd combine the two gridded data sections).

I also really like the python graph galleries approach of just the pictures:

jklymak · 2023-07-19T18:30:08Z

I conceptualize plot types by dimensionality, and that is the general division in the gallery.

x versus y plots (plot, scatter, bar, pie),
gridded x, y, Z plots (pcolor(mesh), imshow, contour, quiver (Z1, Z2)),
unstructured x, y, z plots (tricontour, tripcolor)

where, x, y, and z are 1-D arrays (though of course we often accept higher dimensions, but just as a convenience to repeat the plot N times) and Z is a 2-D array.

There are 3-D analogs to many of those, but should be in a separate subsection as they are now.

Statistical plots are separate, and I don't think will fit in to any reasonable division, because they are generally composed of the fundamental plots but have a computational front end to them.

Of course in the above, x and y can come from any sort of structure the user likes, be it a dictionary, xarray, pandas DataFrame. But from Matplotlib's point of view we only deal with array-like (lists, or numpy arrays or things that can convert to such).

story645 · 2023-07-19T20:23:55Z

x versus y plots (plot, scatter, bar, pie),
where, x, y, and z are 1-D arrays (though of course we often accept higher

The point I'm trying to make is that this is the implementation choice mpl made for plotting data with a tabular structure - rows of 0D (discrete) items or points/observations sampled on a 1D line and each column/variable passed in separately .

The arrays are 1D, but the data can be 0D or 1D and I really don't want to ID the section by data container shape even if that's the most technically correct.

jklymak · 2023-07-19T21:19:41Z

I really don't want to ID the section by data container shape even if that's the most technically correct.

Data shape is the most important constraint when plotting data - there is no point in trying to use contour on y vs x data, and if you have gridded Z vs x, y data, then you need to drop either x or y to use plot. ID-ing by data shape makes the most sense to me, but 🤷 ...

story645 · 2023-07-20T00:46:12Z

there is no point in trying to use contour on y vs x data, and if you have gridded Z vs x, y data, then you need to drop either x or y to use plot

Because data shape is being used as a proxy for the underlying topology (structure) of the data. The actual shape of the array (or that the array is the data container we're using to hold the data) is the wrong thing to focus on b/c the same $N \times M$ array can be passed into plot and here pcolormesh and it will not break:

ds = xr.open_dataset('air.mon.mean.nc')
Z = ds.sel(time='2021-01-01', lon=slice(360-126, 360-55), lat=slice(55, 20)).to_array().squeeze()
xy = ds.sel(time=slice('2020-01-01', '2022-01-01'),  lon=slice(360-95, 360-55), lat=slice(45, 35)).to_dataframe()
xyt = xy.pivot_table(columns=['lat', 'lon'], index = 'time', values='air')

fig, axd = plt.subplot_mosaic([['map_map', 'map_time', 'map_time'], ['time_map', 'time_time', 'time_time']], figsize=(15,5))

axd['map_map'].pcolormesh(Z[::-1], cmap='coolwarm')
axd['map_time'].plot(Z)

axd['time_map'].pcolormesh(xyt[::-1], cmap='coolwarm')
axd['time_time'].plot(xyt)

jklymak · 2023-07-20T01:11:27Z

@story645

if you have gridded Z vs x, y data, then you need to drop either x or y to use plot.

In the above you have dropped x information from the line plots. The fact it doesn't break is a convenience we allow, not fundamental to the plotting method.

story645 · 2023-07-20T02:13:14Z

I mean this in all earnestness - I'm so confused about how this conversation is going that I can't tell if I've been too in the weeds in this stuff and so I'm poorly explaining my baseline or if I'm dropping a lot of middle or if I'm just being really unclear or what.

We all agree that plot assumes 1D continuity input and that imshow assumes 2D continuity - that's is viz 101 stuff.

In the above you have dropped x information from the line plots.

This is the point I'm trying to make - that I didn't actively drop anything but that the function implicitly dropped it due to the assumptions it makes about what an $N\times M$ array holds:

pcolormesh assumes the $N\times M$ array is a grid of Z points
plot assumes the $N \times M$ array is a list of lists: $[y_{1} = (y_{11}, \cdots, y_{1n}),\ \cdots \ y_{n} = (y_{n1}, \cdots, y_{nn})]$

Going one level back up, the reason I do not want to use array is b/c of term overload. Array in matplotlib is our core data object/current data model and it's too easy to conflate data container with data structure. Basically, np.Array can hold either a list of time series or an image, and neither np.Array object will throw an error if passed into either plot or pcolormesh.

The image is wrong because the structure of the data stored in that np.Array doesn't match the assumptions of the function - plot assumes it's getting a list of ys and x is unimportant , pcolormesh assumes it's getting a grid - not because of the value in np.Array.shape . I figure since we're using shape as a proxy for structure, we'd be better of being more explicit by talking about structure directly.

jklymak · 2023-07-20T02:29:58Z

This is the point I'm trying to make - that I didn't actively drop anything but that the function implicitly dropped it due to the assumptions it makes about what an NxM array holds.

And my point is that if you start with x, y, and Z, and do plot(y, Z) you have actively dropped x. Your plot method no longer knows anything about x because it never saw x.

In your example, you have chosen to forget about the co-ordinates altogether, which is usually a very bad idea and leads to practical, and apparently conceptual, problems.

story645 · 2023-07-20T03:33:01Z

Your plot method no longer knows anything about x because it never saw x.

plot and imshow got the exact same input plot(Z) and imshow(Z) and it's an implementation choice which dimension plot treats as x and which it treats as distinct lines.

you have chosen to forget about the co-ordinates altogether

I passed in the exact same data to two functions. What I'd like to make clearer in this PR is what assumptions about the structure of their input these different families of functions are making such that passing in the same data will cause it to be read differently depending on the plot type.

Yes like "line plot != heatmap" but what I'm getting at here is giving a bit more guidance on what the data structure looks like for each. The reason for that is that one of the goals of this page is to orient (especially new) users to the plots matplotib can make & it's modeled on the excel "chart types here's what we recommend for what we see". The major reason for including the signatures is to highlight the assumed/implied structure of the data & I just want to push that one level up.

jklymak · 2023-07-20T04:28:10Z

@story645 plot(Z) is a convenience for plot(np.arange(N), Z) and loses all co-ordindate information. (Indeed in your first line plot the x axis is flipped, and clearly does not represent degrees latitude).

At the risk of repeating myself,

x versus y plots (plot, scatter, bar, pie),

gridded x, y, Z plots (pcolor(mesh), imshow, contour, quiver (Z1, Z2)),

unstructured x, y, z plots (tricontour, tripcolor)

The point is that pcolormesh(x, y, Z) retains the x and y information. plot(y, Z) drops the x co-ordinate information. That is the fundamental difference between the first category and the second two - the first category of functions only have one coordinate (y in this case), the second and third categories have two coordinates (x and y).

story645 · 2023-07-20T05:46:41Z

x versus y plots (plot, scatter, bar, pie)

pie doesn't have a y but it's also not in this section..and going back to the very beginning of this thread, this x versus y is b/c of the underlying assumption of a tabular structure (x_i, y_i) for i in table index.

x versus y plots (plot, scatter, bar, pie),

gridded x, y, Z plots (pcolor(mesh), imshow, contour, quiver (Z1, Z2)),

unstructured x, y, z plots (tricontour, tripcolor)

I don't think we're far apart here. What I'm trying to get at is basically Munzner's taxonomy of data types

story645 · 2023-08-01T03:39:38Z

Switched to pairwise and moved tabular into framing. the stats plots are all distribution plots which means it's the structure, but I kept stats in the title cause I figured otherwise I'd confuse folk. Switched coordinate x,y to i,j to differentiated from data x,y.

timhoffm

I see the motivation to go by the type of the data. The section title texts cover the contents well.

I have a bit of a hard time with the longer explanations and parts of the "structural" literals.

For the literals, please adopt a consistent whitespace formatting.

galleries/plot_types/basic/README.rst

galleries/plot_types/unstructured/README.rst

galleries/plot_types/stats/README.rst

galleries/plot_types/3D/README.rst

timhoffm

The data descriptions are very subtle and it's difficult to convey precisely what we mean. I've tried to make suggestions to make it as clear as possible.

galleries/plot_types/basic/README.rst

timhoffm · 2023-08-02T13:35:14Z

galleries/plot_types/arrays/README.rst

+Gridded data: :math:`Z_{x, y}`
+-----------------------------

-Plotting for arrays of data ``Z(x, y)`` and fields ``U(x, y), V(x, y)``.
+Plots of arrays and images :math:`Z_{x, y}` and fields :math:`U_{x, y}, V_{x, y}`
+on `regular grids <https://en.wikipedia.org/wiki/Regular_grid>`_.


I suggest to keep the index notation :math:`Z_{i, j}` here. All this naming is a bit suble but

Its confusing to use Z_{x,y} both for regular and and irregular grids

the i,j indices hint more at the regular structure.

I'll change but that's also sort of the point I'm trying to make about structure->That the irregular and regular gridded data are structurally the same -> data on a continuous 2D surface, and the difference is in how that surface is broken up. That's why the plots look visually similar.

That the irregular and regular gridded data are structurally the same

You mean the visualizations look alike (and the data are semantically similar)? The data are structurally different.

for regular grids, you have 2D structures $Z_{i,j}$ (and optionally $X_{i,j}$, $Y_{i,j}$). For irregular grids, you have 1D structures $x_i$, $y_i$ (and optionally $z_i$).

I know, it's a hassle to be precise here 😄.

Maybe actually putting all the indices indices helps?

Pairwise data

Plots of pairwise $(x_i, y_i)$, tabular $(v_{i,0}, \ldots, v_{i,n})$ and functional $f(x_i) = y_i$ data.

Gridded data

Plots of arrays and images $Z_{i,j}$ and fields $U_{i,j}$, $V_{i,j}$ on regular grids, optionally, with coordinates $X_{i,j}$, $Y_{i,j}$.

Irrgularly grided data

Plots of unstructured coordinate grids $(x_i, y_i)$ and data $z_i$ on such grids and 2D functions $f(x_i, y_i) = z_i$.

3D and volumetric data

Plots of three-dimensional $(x_i, y_i, z_i)$, surface $f(x_i, y_i) = z_i$, and volumetric $({x_i, y_i, z_i, v_i})$ data using the mpl_toolkits.mplot3d library.

for regular grids, you have 2D structures $Z_{i,j}$ (and optionally $X_{i,j}$, $Y_{i,j}$). For irregular grids, you have 1D structures $x_i$, $y_i$ (and optionally $z_i$).

For regular and irregular grids, the data can be melted down to (x, y, z) - $Z_{x,y}$ is used to denote that (x,y) is a point on a continuous surface and Z is a value at that point. Whether the surface is regularly or irregularly gridded determines the locations of (x,y). Here's a good break down of what I'm trying to get at https://www.bu.edu/tech/support/research/training-consulting/online-tutorials/introduction-to-scientific-visualization-tutorial/data-representation/

Basically yes, they're structurally different because the topology of an irregular grid is by definition different from the topology of a regular grid, but the difference in those topologies is how a 2D surface is cut up.

So it's in the axiomatic construction of the topology rather than in the surface and this is all new to me so I can't say that w/ a ton of confidence...

I like the headers w/o math better so I will edit that when I get to a computer. Will see if the explicit index makes things more or less confusing.

I think removing the math from the header makes the explanations make more sense because now they're not tied to the model representation in the header, which I think resolves the ambiguity issue better than trying to use another layer of indexing to help clarify.

plot types Co-authored-by: Oscar Gustafsson <[email protected]> Co-authored-by: Tim Hoffmann <[email protected]>

timhoffm

This is a lot cleaner now. Thanks for the intense discussion!

lumberbot-app · 2023-08-03T03:26:55Z

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

Checkout backport branch and update it.

git checkout v3.7.x
git pull

Cherry pick the first parent branch of the this PR on top of the older branch:

git cherry-pick -x -m1 54f16125df8338a1e3784bdd489cc0488ac6ffe9

You will likely have some merge/cherry-pick conflict here, fix them and commit:

git commit -am 'Backport PR #26328: [DOC]: improve consistency of plot types gallery'

Push to a named branch:

git push YOURFORK v3.7.x:auto-backport-of-pr-26328-on-v3.7.x

Create a PR against branch v3.7.x, I would have named this PR:

"Backport PR #26328 on branch v3.7.x ([DOC]: improve consistency of plot types gallery)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

story645 · 2023-08-03T03:27:37Z

Thanks for the review! It was helpful not just here but in my general "wait how do I describe data?" struggles.

QuLogic · 2023-08-19T05:19:54Z

There's a big conflict on gallery sorting order, so I won't backport this.

story645 added Documentation Documentation: plot types files in galleries/plot_types labels Jul 17, 2023

oscargus reviewed Jul 17, 2023

View reviewed changes

galleries/plot_types/3D/README.rst Outdated Show resolved Hide resolved

galleries/plot_types/basic/README.rst Outdated Show resolved Hide resolved

galleries/plot_types/stats/README.rst Outdated Show resolved Hide resolved

galleries/plot_types/stats/README.rst Outdated Show resolved Hide resolved

story645 mentioned this pull request Jul 17, 2023

moved doc root to landing page, make user landing a guide page #26332

Merged

story645 force-pushed the plot_types branch from bd79a5f to 6c052e6 Compare July 18, 2023 00:30

story645 force-pushed the plot_types branch from 6c052e6 to c5b9d06 Compare August 1, 2023 03:28

timhoffm reviewed Aug 1, 2023

View reviewed changes

timhoffm reviewed Aug 2, 2023

View reviewed changes

story645 force-pushed the plot_types branch from dacc321 to b345e57 Compare August 2, 2023 15:49

more consistent data structure oriented headers and explanations for

ea51f3c

plot types Co-authored-by: Oscar Gustafsson <[email protected]> Co-authored-by: Tim Hoffmann <[email protected]>

story645 force-pushed the plot_types branch from b345e57 to ea51f3c Compare August 2, 2023 20:33

timhoffm approved these changes Aug 3, 2023

View reviewed changes

timhoffm added this to the v3.7.3 milestone Aug 3, 2023

timhoffm merged commit 54f1612 into matplotlib:main Aug 3, 2023

lumberbot-app bot added the Still Needs Manual Backport label Aug 3, 2023

QuLogic modified the milestones: v3.7.3, v3.8.0 Aug 19, 2023

QuLogic removed the Still Needs Manual Backport label Aug 19, 2023

story645 deleted the plot_types branch October 12, 2023 16:04

story645 mentioned this pull request May 17, 2024

[DOC] plot type heading consistency #28254

Merged

5 tasks

Uh oh!

[DOC]: improve consistency of plot types gallery #26328

[DOC]: improve consistency of plot types gallery #26328

Uh oh!

Conversation

story645 commented Jul 17, 2023

Uh oh!

oscargus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timhoffm commented Jul 19, 2023

Uh oh!

story645 commented Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

story645 commented Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jklymak commented Jul 19, 2023

Uh oh!

story645 commented Jul 19, 2023

Uh oh!

jklymak commented Jul 19, 2023

Uh oh!

story645 commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jklymak commented Jul 20, 2023

Uh oh!

story645 commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jklymak commented Jul 20, 2023

Uh oh!

story645 commented Jul 20, 2023

Uh oh!

jklymak commented Jul 20, 2023

Uh oh!

story645 commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

story645 commented Aug 1, 2023

Uh oh!

timhoffm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timhoffm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timhoffm Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

story645 Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

timhoffm Aug 2, 2023

Choose a reason for hiding this comment

Pairwise data

Gridded data

Irrgularly grided data

3D and volumetric data

Uh oh!

story645 Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

story645 Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

story645 Aug 2, 2023

Choose a reason for hiding this comment

story645 commented Jul 19, 2023 •

edited

Loading

story645 commented Jul 19, 2023 •

edited

Loading

story645 commented Jul 20, 2023 •

edited

Loading

story645 commented Jul 20, 2023 •

edited

Loading

story645 commented Jul 20, 2023 •

edited

Loading

story645 Aug 2, 2023 •

edited

Loading