Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit b1b2d7c

Browse files
wide form docs draft
1 parent 122b3ca commit b1b2d7c

File tree

2 files changed

+264
-4
lines changed

2 files changed

+264
-4
lines changed

doc/python/px-arguments.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ jupyter:
2222
pygments_lexer: ipython3
2323
version: 3.7.7
2424
plotly:
25-
description: Arguments accepted by Plotly Express functions
25+
description: Input data arguments accepted by Plotly Express functions
2626
display_as: file_settings
2727
language: python
2828
layout: base
@@ -43,7 +43,7 @@ Plotly Express provides functions to visualize a variety of types of data. Most
4343

4444
*Until version 4.8, Plotly Express only operated on long-form (previously called "tidy") data, but [now accepts wide-form and mixed-form data](/python/wide-form/) as well.*
4545

46-
There are three common conventions for storing data in a data frame:
46+
There are three common conventions for storing column-oriented data, usually in a data frame with column names:
4747

4848
* **long-form data** is suitable for storing multivariate data (i.e. dimensions greater than 2), with one row per observation, and one column per variable.
4949
* **wide-form data** is suitable for storing 2-dimensional data, with one row per value of one of the first variable, and one column per value of the second variable.
@@ -83,14 +83,14 @@ fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Fo
8383
fig.show()
8484
```
8585

86-
You might notice that y-axis and legend labels are slightly different for the second plot: they are "value" and "variable", respectively. This is because Plotly Express performed an [internal Pandas `melt()` operation](https://pandas.pydata.org/docs/reference/api/pandas.melt.html) to convert the wide-form data into long-form for plotting, and used the Pandas convention for assign column names to the intermediate long-form data. Note that the labels "medal" and "count" do not appear in the wide-form data frame, so in this case, you must supply these yourself, or [you can use a data frame with named row- and column-indexes](/python/wide-form/). You can [rename these labels with the `labels` argument](/python/styling-plotly-express/):
86+
You might notice that y-axis and legend labels are slightly different for the second plot: they are "value" and "variable", respectively, and this is also reflected in the hoverlabel text. This is because Plotly Express performed an [internal Pandas `melt()` operation](https://pandas.pydata.org/docs/reference/api/pandas.melt.html) to convert the wide-form data into long-form for plotting, and used the Pandas convention for assign column names to the intermediate long-form data. Note that the labels "medal" and "count" do not appear in the wide-form data frame, so in this case, you must supply these yourself, or [you can use a data frame with named row- and column-indexes](/python/wide-form/). You can [rename these labels with the `labels` argument](/python/styling-plotly-express/):
8787

8888
```python
8989
import plotly.express as px
9090
wide_df = px.data.short_track_wide()
9191

9292
fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input, relabelled",
93-
labels={"value": "count", "variable": "medals"})
93+
labels={"value": "count", "variable": "medal"})
9494
fig.show()
9595
```
9696

doc/python/wide-form.md

Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
---
2+
jupyter:
3+
jupytext:
4+
notebook_metadata_filter: all
5+
text_representation:
6+
extension: .md
7+
format_name: markdown
8+
format_version: '1.2'
9+
jupytext_version: 1.4.2
10+
kernelspec:
11+
display_name: Python 3
12+
language: python
13+
name: python3
14+
language_info:
15+
codemirror_mode:
16+
name: ipython
17+
version: 3
18+
file_extension: .py
19+
mimetype: text/x-python
20+
name: python
21+
nbconvert_exporter: python
22+
pygments_lexer: ipython3
23+
version: 3.7.7
24+
plotly:
25+
description: Plotly Express' 2D-Cartesian functions accept data in long-, wide-,
26+
and mixed-form.
27+
display_as: file_settings
28+
language: python
29+
layout: base
30+
name: Plotly Express Wide-Form Support
31+
order: 33
32+
page_type: u-guide
33+
permalink: python/wide-form/
34+
thumbnail: thumbnail/plotly-express.png
35+
---
36+
37+
### Column-oriented, Matrix or Geographic Data
38+
39+
Plotly Express provides functions to visualize a variety of types of data. Most functions such as `px.bar` or `px.scatter` expect to operate on column-oriented data of the type you might store in a Pandas `DataFrame` (in either "long" or "wide" format, see below). [`px.imshow` operates on matrix-like data](/python/imshow/) you might store in a `numpy` or `xarray` array and functions like [`px.choropleth` and `px.choropleth_mapbox` can operate on geographic data](/python/maps/) of the kind you might store in a GeoPandas `GeoDataFrame`. This page details how to provide a specific form of column-oriented data to 2D-Cartesian Plotly Express functions, but you can also check out our [detailed column-input-format documentation](/python/px-arguments/).
40+
41+
### Long-, Wide-, and Mixed-Form Data
42+
43+
*Until version 4.8, Plotly Express only operated on long-form (previously called "tidy") data, but now accepts wide-form and mixed-form data as well.*
44+
45+
There are three common conventions for storing column-oriented data, usually in a data frame with column names:
46+
47+
* **long-form data** is suitable for storing multivariate data (i.e. dimensions greater than 2), with one row per observation, and one column per variable.
48+
* **wide-form data** is suitable for storing 2-dimensional data, with one row per value of one of the first variable, and one column per value of the second variable.
49+
* **mixed-form data** is a hybrid of long-form and wide-form data, with one row per value of one variable, and some columns representing values of another, and some columns representing more variables
50+
51+
All Plotly Express functions can operate on long-form data, and the following 2D-Cartesian functions can operate on wide-form data as well:: `px.scatter`, `px.line`, `px.area`, `px.bar`, `px.histogram`, `px.violin`, `px.box`, `px.strip`, `px.funnel`, `px.density_heatmap` and `px.density_contour`.
52+
53+
By way of example here is the same data, represented in long-form first, and then in wide-form:
54+
55+
```python
56+
import plotly.express as px
57+
long_df = px.data.short_track_long()
58+
long_df
59+
```
60+
61+
```python
62+
import plotly.express as px
63+
wide_df = px.data.short_track_wide()
64+
wide_df
65+
```
66+
67+
Plotly Express can produce the same plot from either form:
68+
69+
```python
70+
import plotly.express as px
71+
long_df = px.data.short_track_long()
72+
73+
fig = px.bar(long_df, x="nation", y="count", color="medal", title="Long-Form Input")
74+
fig.show()
75+
```
76+
77+
```python
78+
import plotly.express as px
79+
wide_df = px.data.short_track_wide()
80+
81+
fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input")
82+
fig.show()
83+
```
84+
85+
### Labeling axes, legends and hover text
86+
87+
You might notice that y-axis and legend labels are slightly different for the second plot: they are "value" and "variable", respectively, and this is also reflected in the hoverlabel text. This is because Plotly Express performed an [internal Pandas `melt()` operation](https://pandas.pydata.org/docs/reference/api/pandas.melt.html) to convert the wide-form data into long-form for plotting, and used the Pandas convention for assign column names to the intermediate long-form data. Note that the labels "medal" and "count" do not appear in the wide-form data frame, so in this case, you must supply these yourself, (or see below regarding using a data frame with named row- and column-indexes). You can [rename these labels with the `labels` argument](/python/styling-plotly-express/):
88+
89+
```python
90+
import plotly.express as px
91+
wide_df = px.data.short_track_wide()
92+
93+
fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input, relabelled",
94+
labels={"value": "count", "variable": "medal"})
95+
fig.show()
96+
```
97+
98+
Plotly Express figures created using wide-form data can be [styled just like any other Plotly Express figure](/python/styling-plotly-express/):
99+
100+
```python
101+
import plotly.express as px
102+
wide_df = px.data.short_track_wide()
103+
104+
fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"],
105+
title="Wide-Form Input, styled",
106+
labels={"value": "Medal Count", "variable": "Medal", "nation": "Olympic Nation"},
107+
color_discrete_map={"gold":"gold", "silver": "silver", "bronze": "#c96"},
108+
template="simple_white"
109+
)
110+
fig.update_layout(font_family="Rockwell", showlegend=False)
111+
fig.show()
112+
```
113+
114+
### Data Frames with Named Indexes
115+
116+
Pandas `DataFrames` support not only column names and "row names" via the value of `index`, but the indexes themselves can be named. Here is how to assign one column of the wide sample data frame above as the index, and to name the column index. The result "indexed" sample data frame can also be obtained by calling `px.data.short_track_wide(indexed=True)`
117+
118+
```python
119+
import plotly.express as px
120+
wide_df = px.data.short_track_wide()
121+
wide_df = wide_df.set_index("nation")
122+
wide_df.columns.name = "medals"
123+
wide_df
124+
```
125+
126+
When working with a data frame like the one above, you can pass the index references directly as arguments, to benefit from automatic labelling for everything except the y axis label, which will default to "values", but this can be overridden with the `labels` argument as above:
127+
128+
```python
129+
import plotly.express as px
130+
wide_df = px.data.short_track_wide(indexed=True)
131+
132+
fig = px.bar(wide_df, x=wide_df.index, y=wide_df.columns)
133+
fig.show()
134+
```
135+
136+
If you transpose `x` and `y`, thereby assigning the columns to `x`, the orientation will be switched to horizontal:
137+
138+
```python
139+
import plotly.express as px
140+
wide_df = px.data.short_track_wide(indexed=True)
141+
142+
fig = px.bar(wide_df, x=wide_df.columns, y=wide_df.index)
143+
fig.show()
144+
```
145+
146+
### Wide-Form Defaults
147+
148+
For bar, scatter, line and area charts, this pattern of assigning `x=df.index` and `y=df.columns` is so common that if you provide neither `x` nor `y` this is the default behaviour
149+
150+
```python
151+
import plotly.express as px
152+
wide_df = px.data.short_track_wide(indexed=True)
153+
154+
fig = px.bar(wide_df)
155+
fig.show()
156+
157+
fig = px.area(wide_df)
158+
fig.show()
159+
160+
fig = px.line(wide_df)
161+
fig.show()
162+
163+
fig = px.scatter(wide_df)
164+
fig.show()
165+
```
166+
167+
### Orientation Control When Using Defaults
168+
169+
If you specify neither `x` nor `y`, you can specify whether the Y or X xaxis is assigned to the index with `orientation`.
170+
171+
```python
172+
import plotly.express as px
173+
wide_df = px.data.short_track_wide(indexed=True)
174+
175+
fig = px.bar(wide_df, orientation="h")
176+
fig.show()
177+
```
178+
179+
### Assigning Columns to Non-Color Arguments
180+
181+
182+
In the examples above, the columns of the wide data frame are always assigned to the `color` argument, but this is not a hard constraint. The columns can be assigned to any Plotly Express argument, for example to accomplish faceting, and `color` can be reassigned to any other value. When plotting with a data frame without named indexes, you can reassign the inferred column named `"variable"` and `"value"` to any argument:
183+
184+
```python
185+
import plotly.express as px
186+
wide_df = px.data.short_track_wide(indexed=False)
187+
188+
fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], facet_col="variable", color="nation")
189+
fig.show()
190+
```
191+
192+
If using a data frame's named indexes, either explicitly or relying on the defaults, the index references or names must be used:
193+
194+
```python
195+
import plotly.express as px
196+
wide_df = px.data.short_track_wide(indexed=True)
197+
198+
fig = px.bar(wide_df, facet_col="medal", color=wide_df.index)
199+
fig.show()
200+
```
201+
202+
### Mixed-Form Data
203+
204+
In some cases, a data frame is neither clearly long-form nor wide-form, and we can call this "mixed-form". For example, in the data frame below, if it contained only the `experiment` columns, the data could be described as wide-form, and if it contained only `gender` and `group` it could be described as long-form, but it contains both:
205+
206+
```python
207+
import plotly.express as px
208+
mixed_df = px.data.experiment(indexed=True)
209+
mixed_df.head()
210+
```
211+
212+
We can visualize just the wide-form portion of the data frame easily with a [violin chart](/python/violin/). As a special note, we'll assign the index, which is the participant ID, to the hover_data, so that hovering over outlier points will identify their row.
213+
214+
```python
215+
import plotly.express as px
216+
mixed_df = px.data.experiment(indexed=True)
217+
218+
fig = px.violin(mixed_df, y=["experiment_1", "experiment_2", "experiment_3"], hover_data=[mixed_df.index])
219+
fig.show()
220+
```
221+
222+
223+
224+
225+
We can also leverage the long-form portion of the data frame, for example to color by `gender` and facet by `group`:
226+
227+
```python
228+
import plotly.express as px
229+
mixed_df = px.data.experiment(indexed=True)
230+
231+
fig = px.violin(mixed_df, y=["experiment_1", "experiment_2", "experiment_3"],
232+
color="gender", facet_col="group", hover_data=[mixed_df.index])
233+
fig.show()
234+
```
235+
236+
And of course, we can reassign `variable` to another argument as well, in this case we'll assign it to `x` and facet by the wide variable, and we'll switch to a [box plot](/python/box-plots/) for variety.
237+
238+
```python
239+
import plotly.express as px
240+
mixed_df = px.data.experiment(indexed=True)
241+
242+
fig = px.box(mixed_df, x="group", y=["experiment_1", "experiment_2", "experiment_3"],
243+
color="gender", facet_col="variable", hover_data=[mixed_df.index])
244+
fig.show()
245+
```
246+
247+
One interesting thing about a mixed-form data frame like this is that it remains easy to plot, say, one experiment against another, which would require some preliminary data wrangling if this was represented as a pure long-form dataset:
248+
249+
```python
250+
import plotly.express as px
251+
mixed_df = px.data.experiment(indexed=True)
252+
253+
fig = px.scatter(mixed_df, x="experiment_1", y="experiment_2",
254+
color="group", facet_col="gender", hover_data=[mixed_df.index])
255+
fig.show()
256+
```
257+
258+
```python
259+
260+
```

0 commit comments

Comments
 (0)