Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[DOC]: improve consistency of plot types gallery #26328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 3, 2023

Conversation

story645
Copy link
Member

Changed the plot type headings of the plot type gallery sections so that they'd hopefully be more informative and also generally more consistent in that for the most part they're describing the type of data assumed by each batch of functions. I also tried to have most of the section subheadings follow the same format of describing the assumptions of the structure of the data. Moved stats to be near the discrete/linear plots and unstructured next to gridded to keep the larger structure 0/1D, 2D, 3D.

I'm not completely feeling this in terms of general audience friendly (attn: @esibinga) but think this provides a more useful side panel/toc for navigation.

@story645 story645 added Documentation Documentation: plot types files in galleries/plot_types labels Jul 17, 2023
Copy link
Member

@oscargus oscargus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor suggestions.

@timhoffm
Copy link
Member

My 2 cents: I find "tabular and linear data" quite odd and partly wrong. I don't think many of our users think in that category. This first section is on basic/fundamental/most common/simple plots. Not more not less. FWIW "Basic" is an ok description (and I cannot come up with something better). Note that plotly also uses the term "basic": https://plotly.com/python/

Releated: IMHO this should focus on the plot type (either visual or semantic), not that much on the input data - even though certain input data types suggest certain visualizations.

@story645
Copy link
Member Author

story645 commented Jul 19, 2023

Releated: IMHO this should focus on the plot type (either visual or semantic), not that much on the input data

But the input data structure determines the plot type...otherwise the structured and unstructured grid plots should all be together since they're all different types of heatmaps, contour plots, and quivers and barbs are basically specialized scatters...

The reason the plots for tabular data structures get grouped together as "basic" is because those are the first plots folks tend to make/first type of data folks work w/. (ETA: Grammer of Graphics presupposes table as fundamental structure for viz data).

I don't love that plotly calls it basic either...

Eta: if we were actually gonna show just the most basic plot types, that's
a bar chart, scatter chart, line chart, and heat map. Most everything else is kind of a derivative/specialization of those. (ETA: Semiology of Graphics, Chapter 2 is the fundamental properties of graphics and how to combine them and you basically get these 4).

@story645
Copy link
Member Author

story645 commented Jul 19, 2023

I don't like basic because the current section headings are a mix of the semantic properties, computational properties, data, aesthetics and subjective assessment of the "fundementalness" of the chart. I'm proposing here to focus 'em all to the data side, I can instead focus em all to the chart type side (and then I'd combine the two gridded data sections).

I also really like the python graph galleries approach of just the pictures:
image

@jklymak
Copy link
Member

jklymak commented Jul 19, 2023

I conceptualize plot types by dimensionality, and that is the general division in the gallery.

  • x versus y plots (plot, scatter, bar, pie),
  • gridded x, y, Z plots (pcolor(mesh), imshow, contour, quiver (Z1, Z2)),
  • unstructured x, y, z plots (tricontour, tripcolor)

where, x, y, and z are 1-D arrays (though of course we often accept higher dimensions, but just as a convenience to repeat the plot N times) and Z is a 2-D array.

There are 3-D analogs to many of those, but should be in a separate subsection as they are now.

Statistical plots are separate, and I don't think will fit in to any reasonable division, because they are generally composed of the fundamental plots but have a computational front end to them.

Of course in the above, x and y can come from any sort of structure the user likes, be it a dictionary, xarray, pandas DataFrame. But from Matplotlib's point of view we only deal with array-like (lists, or numpy arrays or things that can convert to such).

@story645
Copy link
Member Author

x versus y plots (plot, scatter, bar, pie),
where, x, y, and z are 1-D arrays (though of course we often accept higher

The point I'm trying to make is that this is the implementation choice mpl made for plotting data with a tabular structure - rows of 0D (discrete) items or points/observations sampled on a 1D line and each column/variable passed in separately .

The arrays are 1D, but the data can be 0D or 1D and I really don't want to ID the section by data container shape even if that's the most technically correct.

@jklymak
Copy link
Member

jklymak commented Jul 19, 2023

I really don't want to ID the section by data container shape even if that's the most technically correct.

Data shape is the most important constraint when plotting data - there is no point in trying to use contour on y vs x data, and if you have gridded Z vs x, y data, then you need to drop either x or y to use plot. ID-ing by data shape makes the most sense to me, but 🤷 ...

@story645
Copy link
Member Author

story645 commented Jul 20, 2023

there is no point in trying to use contour on y vs x data, and if you have gridded Z vs x, y data, then you need to drop either x or y to use plot

Because data shape is being used as a proxy for the underlying topology (structure) of the data. The actual shape of the array (or that the array is the data container we're using to hold the data) is the wrong thing to focus on b/c the same $N \times M$ array can be passed into plot and here pcolormesh and it will not break:

ds = xr.open_dataset('air.mon.mean.nc')
Z = ds.sel(time='2021-01-01', lon=slice(360-126, 360-55), lat=slice(55, 20)).to_array().squeeze()
xy = ds.sel(time=slice('2020-01-01', '2022-01-01'),  lon=slice(360-95, 360-55), lat=slice(45, 35)).to_dataframe()
xyt = xy.pivot_table(columns=['lat', 'lon'], index = 'time', values='air')

fig, axd = plt.subplot_mosaic([['map_map', 'map_time', 'map_time'], ['time_map', 'time_time', 'time_time']], figsize=(15,5))

axd['map_map'].pcolormesh(Z[::-1], cmap='coolwarm')
axd['map_time'].plot(Z)

axd['time_map'].pcolormesh(xyt[::-1], cmap='coolwarm')
axd['time_time'].plot(xyt)

image

@jklymak
Copy link
Member

jklymak commented Jul 20, 2023

@story645

if you have gridded Z vs x, y data, then you need to drop either x or y to use plot.

In the above you have dropped x information from the line plots. The fact it doesn't break is a convenience we allow, not fundamental to the plotting method.

@story645
Copy link
Member Author

story645 commented Jul 20, 2023

I mean this in all earnestness - I'm so confused about how this conversation is going that I can't tell if I've been too in the weeds in this stuff and so I'm poorly explaining my baseline or if I'm dropping a lot of middle or if I'm just being really unclear or what.

We all agree that plot assumes 1D continuity input and that imshow assumes 2D continuity - that's is viz 101 stuff.

In the above you have dropped x information from the line plots.

This is the point I'm trying to make - that I didn't actively drop anything but that the function implicitly dropped it due to the assumptions it makes about what an $N\times M$ array holds:

  • pcolormesh assumes the $N\times M$ array is a grid of Z points
  • plot assumes the $N \times M$ array is a list of lists: $[y_{1} = (y_{11}, \cdots, y_{1n}),\ \cdots \ y_{n} = (y_{n1}, \cdots, y_{nn})]$

Going one level back up, the reason I do not want to use array is b/c of term overload. Array in matplotlib is our core data object/current data model and it's too easy to conflate data container with data structure. Basically, np.Array can hold either a list of time series or an image, and neither np.Array object will throw an error if passed into either plot or pcolormesh.

The image is wrong because the structure of the data stored in that np.Array doesn't match the assumptions of the function - plot assumes it's getting a list of ys and x is unimportant , pcolormesh assumes it's getting a grid - not because of the value in np.Array.shape . I figure since we're using shape as a proxy for structure, we'd be better of being more explicit by talking about structure directly.

@jklymak
Copy link
Member

jklymak commented Jul 20, 2023

This is the point I'm trying to make - that I didn't actively drop anything but that the function implicitly dropped it due to the assumptions it makes about what an NxM array holds.

And my point is that if you start with x, y, and Z, and do plot(y, Z) you have actively dropped x. Your plot method no longer knows anything about x because it never saw x.

In your example, you have chosen to forget about the co-ordinates altogether, which is usually a very bad idea and leads to practical, and apparently conceptual, problems.

@story645
Copy link
Member Author

Your plot method no longer knows anything about x because it never saw x.

plot and imshow got the exact same input plot(Z) and imshow(Z) and it's an implementation choice which dimension plot treats as x and which it treats as distinct lines.

you have chosen to forget about the co-ordinates altogether

I passed in the exact same data to two functions. What I'd like to make clearer in this PR is what assumptions about the structure of their input these different families of functions are making such that passing in the same data will cause it to be read differently depending on the plot type.

Yes like "line plot != heatmap" but what I'm getting at here is giving a bit more guidance on what the data structure looks like for each. The reason for that is that one of the goals of this page is to orient (especially new) users to the plots matplotib can make & it's modeled on the excel "chart types here's what we recommend for what we see". The major reason for including the signatures is to highlight the assumed/implied structure of the data & I just want to push that one level up.

@jklymak
Copy link
Member

jklymak commented Jul 20, 2023

@story645 plot(Z) is a convenience for plot(np.arange(N), Z) and loses all co-ordindate information. (Indeed in your first line plot the x axis is flipped, and clearly does not represent degrees latitude).

At the risk of repeating myself,

  • x versus y plots (plot, scatter, bar, pie),
  • gridded x, y, Z plots (pcolor(mesh), imshow, contour, quiver (Z1, Z2)),
  • unstructured x, y, z plots (tricontour, tripcolor)

The point is that pcolormesh(x, y, Z) retains the x and y information. plot(y, Z) drops the x co-ordinate information. That is the fundamental difference between the first category and the second two - the first category of functions only have one coordinate (y in this case), the second and third categories have two coordinates (x and y).

@story645
Copy link
Member Author

story645 commented Jul 20, 2023

x versus y plots (plot, scatter, bar, pie)

pie doesn't have a y but it's also not in this section..and going back to the very beginning of this thread, this x versus y is b/c of the underlying assumption of a tabular structure (x_i, y_i) for i in table index.

  • x versus y plots (plot, scatter, bar, pie),
  • gridded x, y, Z plots (pcolor(mesh), imshow, contour, quiver (Z1, Z2)),
  • unstructured x, y, z plots (tricontour, tripcolor)

I don't think we're far apart here. What I'm trying to get at is basically Munzner's taxonomy of data types
Screenshot_20230720-015102~2.png

@story645
Copy link
Member Author

story645 commented Aug 1, 2023

Switched to pairwise and moved tabular into framing. the stats plots are all distribution plots which means it's the structure, but I kept stats in the title cause I figured otherwise I'd confuse folk. Switched coordinate x,y to i,j to differentiated from data x,y.

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the motivation to go by the type of the data. The section title texts cover the contents well.

I have a bit of a hard time with the longer explanations and parts of the "structural" literals.

For the literals, please adopt a consistent whitespace formatting.

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data descriptions are very subtle and it's difficult to convey precisely what we mean. I've tried to make suggestions to make it as clear as possible.

Comment on lines 3 to 7
Gridded data: :math:`Z_{x, y}`
-----------------------------

Plotting for arrays of data ``Z(x, y)`` and fields ``U(x, y), V(x, y)``.
Plots of arrays and images :math:`Z_{x, y}` and fields :math:`U_{x, y}, V_{x, y}`
on `regular grids <https://en.wikipedia.org/wiki/Regular_grid>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to keep the index notation :math:`Z_{i, j}` here. All this naming is a bit suble but

  1. Its confusing to use Z_{x,y} both for regular and and irregular grids
  2. the i,j indices hint more at the regular structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change but that's also sort of the point I'm trying to make about structure->That the irregular and regular gridded data are structurally the same -> data on a continuous 2D surface, and the difference is in how that surface is broken up. That's why the plots look visually similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That the irregular and regular gridded data are structurally the same

You mean the visualizations look alike (and the data are semantically similar)? The data are structurally different.

for regular grids, you have 2D structures $Z_{i,j}$ (and optionally $X_{i,j}$, $Y_{i,j}$). For irregular grids, you have 1D structures $x_i$, $y_i$ (and optionally $z_i$).

I know, it's a hassle to be precise here 😄.


Maybe actually putting all the indices indices helps?

Pairwise data

Plots of pairwise $(x_i, y_i)$, tabular $(v_{i,0}, \ldots, v_{i,n})$ and functional $f(x_i) = y_i$ data.

Gridded data

Plots of arrays and images $Z_{i,j}$ and fields $U_{i,j}$, $V_{i,j}$ on regular grids, optionally, with coordinates $X_{i,j}$, $Y_{i,j}$.

Irrgularly grided data

Plots of unstructured coordinate grids $(x_i, y_i)$ and data $z_i$ on such grids and 2D functions $f(x_i, y_i) = z_i$.

3D and volumetric data

Plots of three-dimensional $(x_i, y_i, z_i)$, surface $f(x_i, y_i) = z_i$, and volumetric $({x_i, y_i, z_i, v_i})$ data using the mpl_toolkits.mplot3d library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for regular grids, you have 2D structures $Z_{i,j}$ (and optionally $X_{i,j}$, $Y_{i,j}$). For irregular grids, you have 1D structures $x_i$, $y_i$ (and optionally $z_i$).

For regular and irregular grids, the data can be melted down to (x, y, z) - $Z_{x,y}$ is used to denote that (x,y) is a point on a continuous surface and Z is a value at that point. Whether the surface is regularly or irregularly gridded determines the locations of (x,y). Here's a good break down of what I'm trying to get at https://www.bu.edu/tech/support/research/training-consulting/online-tutorials/introduction-to-scientific-visualization-tutorial/data-representation/

Copy link
Member Author

@story645 story645 Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically yes, they're structurally different because the topology of an irregular grid is by definition different from the topology of a regular grid, but the difference in those topologies is how a 2D surface is cut up.

So it's in the axiomatic construction of the topology rather than in the surface and this is all new to me so I can't say that w/ a ton of confidence...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the headers w/o math better so I will edit that when I get to a computer. Will see if the explicit index makes things more or less confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing the math from the header makes the explanations make more sense because now they're not tied to the model representation in the header, which I think resolves the ambiguity issue better than trying to use another layer of indexing to help clarify.

plot types

Co-authored-by: Oscar Gustafsson <[email protected]>
Co-authored-by: Tim Hoffmann <[email protected]>
Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot cleaner now. Thanks for the intense discussion!

@timhoffm timhoffm added this to the v3.7.3 milestone Aug 3, 2023
@timhoffm timhoffm merged commit 54f1612 into matplotlib:main Aug 3, 2023
@lumberbot-app
Copy link

lumberbot-app bot commented Aug 3, 2023

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout v3.7.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 54f16125df8338a1e3784bdd489cc0488ac6ffe9
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #26328: [DOC]: improve consistency of plot types gallery'
  1. Push to a named branch:
git push YOURFORK v3.7.x:auto-backport-of-pr-26328-on-v3.7.x
  1. Create a PR against branch v3.7.x, I would have named this PR:

"Backport PR #26328 on branch v3.7.x ([DOC]: improve consistency of plot types gallery)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

@story645
Copy link
Member Author

story645 commented Aug 3, 2023

Thanks for the review! It was helpful not just here but in my general "wait how do I describe data?" struggles.

@QuLogic
Copy link
Member

QuLogic commented Aug 19, 2023

There's a big conflict on gallery sorting order, so I won't backport this.

@QuLogic QuLogic modified the milestones: v3.7.3, v3.8.0 Aug 19, 2023
@story645 story645 deleted the plot_types branch October 12, 2023 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation: plot types files in galleries/plot_types Documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants