|
| 1 | +--- |
| 2 | +jupyter: |
| 3 | + jupytext: |
| 4 | + notebook_metadata_filter: all |
| 5 | + text_representation: |
| 6 | + extension: .md |
| 7 | + format_name: markdown |
| 8 | + format_version: '1.2' |
| 9 | + jupytext_version: 1.4.2 |
| 10 | + kernelspec: |
| 11 | + display_name: Python 3 |
| 12 | + language: python |
| 13 | + name: python3 |
| 14 | + language_info: |
| 15 | + codemirror_mode: |
| 16 | + name: ipython |
| 17 | + version: 3 |
| 18 | + file_extension: .py |
| 19 | + mimetype: text/x-python |
| 20 | + name: python |
| 21 | + nbconvert_exporter: python |
| 22 | + pygments_lexer: ipython3 |
| 23 | + version: 3.7.7 |
| 24 | + plotly: |
| 25 | + description: Plotly Express' 2D-Cartesian functions accept data in long-, wide-, |
| 26 | + and mixed-form. |
| 27 | + display_as: file_settings |
| 28 | + language: python |
| 29 | + layout: base |
| 30 | + name: Plotly Express Wide-Form Support |
| 31 | + order: 33 |
| 32 | + page_type: u-guide |
| 33 | + permalink: python/wide-form/ |
| 34 | + thumbnail: thumbnail/plotly-express.png |
| 35 | +--- |
| 36 | + |
| 37 | +### Column-oriented, Matrix or Geographic Data |
| 38 | + |
| 39 | +Plotly Express provides functions to visualize a variety of types of data. Most functions such as `px.bar` or `px.scatter` expect to operate on column-oriented data of the type you might store in a Pandas `DataFrame` (in either "long" or "wide" format, see below). [`px.imshow` operates on matrix-like data](/python/imshow/) you might store in a `numpy` or `xarray` array and functions like [`px.choropleth` and `px.choropleth_mapbox` can operate on geographic data](/python/maps/) of the kind you might store in a GeoPandas `GeoDataFrame`. This page details how to provide a specific form of column-oriented data to 2D-Cartesian Plotly Express functions, but you can also check out our [detailed column-input-format documentation](/python/px-arguments/). |
| 40 | + |
| 41 | +### Long-, Wide-, and Mixed-Form Data |
| 42 | + |
| 43 | +*Until version 4.8, Plotly Express only operated on long-form (previously called "tidy") data, but now accepts wide-form and mixed-form data as well.* |
| 44 | + |
| 45 | +There are three common conventions for storing column-oriented data, usually in a data frame with column names: |
| 46 | + |
| 47 | +* **long-form data** is suitable for storing multivariate data (i.e. dimensions greater than 2), with one row per observation, and one column per variable. |
| 48 | +* **wide-form data** is suitable for storing 2-dimensional data, with one row per value of one of the first variable, and one column per value of the second variable. |
| 49 | +* **mixed-form data** is a hybrid of long-form and wide-form data, with one row per value of one variable, and some columns representing values of another, and some columns representing more variables |
| 50 | + |
| 51 | +All Plotly Express functions can operate on long-form data, and the following 2D-Cartesian functions can operate on wide-form data as well:: `px.scatter`, `px.line`, `px.area`, `px.bar`, `px.histogram`, `px.violin`, `px.box`, `px.strip`, `px.funnel`, `px.density_heatmap` and `px.density_contour`. |
| 52 | + |
| 53 | +By way of example here is the same data, represented in long-form first, and then in wide-form: |
| 54 | + |
| 55 | +```python |
| 56 | +import plotly.express as px |
| 57 | +long_df = px.data.short_track_long() |
| 58 | +long_df |
| 59 | +``` |
| 60 | + |
| 61 | +```python |
| 62 | +import plotly.express as px |
| 63 | +wide_df = px.data.short_track_wide() |
| 64 | +wide_df |
| 65 | +``` |
| 66 | + |
| 67 | +Plotly Express can produce the same plot from either form: |
| 68 | + |
| 69 | +```python |
| 70 | +import plotly.express as px |
| 71 | +long_df = px.data.short_track_long() |
| 72 | + |
| 73 | +fig = px.bar(long_df, x="nation", y="count", color="medal", title="Long-Form Input") |
| 74 | +fig.show() |
| 75 | +``` |
| 76 | + |
| 77 | +```python |
| 78 | +import plotly.express as px |
| 79 | +wide_df = px.data.short_track_wide() |
| 80 | + |
| 81 | +fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input") |
| 82 | +fig.show() |
| 83 | +``` |
| 84 | + |
| 85 | +### Labeling axes, legends and hover text |
| 86 | + |
| 87 | +You might notice that y-axis and legend labels are slightly different for the second plot: they are "value" and "variable", respectively, and this is also reflected in the hoverlabel text. This is because Plotly Express performed an [internal Pandas `melt()` operation](https://pandas.pydata.org/docs/reference/api/pandas.melt.html) to convert the wide-form data into long-form for plotting, and used the Pandas convention for assign column names to the intermediate long-form data. Note that the labels "medal" and "count" do not appear in the wide-form data frame, so in this case, you must supply these yourself, (or see below regarding using a data frame with named row- and column-indexes). You can [rename these labels with the `labels` argument](/python/styling-plotly-express/): |
| 88 | + |
| 89 | +```python |
| 90 | +import plotly.express as px |
| 91 | +wide_df = px.data.short_track_wide() |
| 92 | + |
| 93 | +fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input, relabelled", |
| 94 | + labels={"value": "count", "variable": "medal"}) |
| 95 | +fig.show() |
| 96 | +``` |
| 97 | + |
| 98 | +Plotly Express figures created using wide-form data can be [styled just like any other Plotly Express figure](/python/styling-plotly-express/): |
| 99 | + |
| 100 | +```python |
| 101 | +import plotly.express as px |
| 102 | +wide_df = px.data.short_track_wide() |
| 103 | + |
| 104 | +fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], |
| 105 | + title="Wide-Form Input, styled", |
| 106 | + labels={"value": "Medal Count", "variable": "Medal", "nation": "Olympic Nation"}, |
| 107 | + color_discrete_map={"gold":"gold", "silver": "silver", "bronze": "#c96"}, |
| 108 | + template="simple_white" |
| 109 | + ) |
| 110 | +fig.update_layout(font_family="Rockwell", showlegend=False) |
| 111 | +fig.show() |
| 112 | +``` |
| 113 | + |
| 114 | +### Data Frames with Named Indexes |
| 115 | + |
| 116 | +Pandas `DataFrames` support not only column names and "row names" via the value of `index`, but the indexes themselves can be named. Here is how to assign one column of the wide sample data frame above as the index, and to name the column index. The result "indexed" sample data frame can also be obtained by calling `px.data.short_track_wide(indexed=True)` |
| 117 | + |
| 118 | +```python |
| 119 | +import plotly.express as px |
| 120 | +wide_df = px.data.short_track_wide() |
| 121 | +wide_df = wide_df.set_index("nation") |
| 122 | +wide_df.columns.name = "medals" |
| 123 | +wide_df |
| 124 | +``` |
| 125 | + |
| 126 | +When working with a data frame like the one above, you can pass the index references directly as arguments, to benefit from automatic labelling for everything except the y axis label, which will default to "values", but this can be overridden with the `labels` argument as above: |
| 127 | + |
| 128 | +```python |
| 129 | +import plotly.express as px |
| 130 | +wide_df = px.data.short_track_wide(indexed=True) |
| 131 | + |
| 132 | +fig = px.bar(wide_df, x=wide_df.index, y=wide_df.columns) |
| 133 | +fig.show() |
| 134 | +``` |
| 135 | + |
| 136 | +If you transpose `x` and `y`, thereby assigning the columns to `x`, the orientation will be switched to horizontal: |
| 137 | + |
| 138 | +```python |
| 139 | +import plotly.express as px |
| 140 | +wide_df = px.data.short_track_wide(indexed=True) |
| 141 | + |
| 142 | +fig = px.bar(wide_df, x=wide_df.columns, y=wide_df.index) |
| 143 | +fig.show() |
| 144 | +``` |
| 145 | + |
| 146 | +### Wide-Form Defaults |
| 147 | + |
| 148 | +For bar, scatter, line and area charts, this pattern of assigning `x=df.index` and `y=df.columns` is so common that if you provide neither `x` nor `y` this is the default behaviour |
| 149 | + |
| 150 | +```python |
| 151 | +import plotly.express as px |
| 152 | +wide_df = px.data.short_track_wide(indexed=True) |
| 153 | + |
| 154 | +fig = px.bar(wide_df) |
| 155 | +fig.show() |
| 156 | + |
| 157 | +fig = px.area(wide_df) |
| 158 | +fig.show() |
| 159 | + |
| 160 | +fig = px.line(wide_df) |
| 161 | +fig.show() |
| 162 | + |
| 163 | +fig = px.scatter(wide_df) |
| 164 | +fig.show() |
| 165 | +``` |
| 166 | + |
| 167 | +### Orientation Control When Using Defaults |
| 168 | + |
| 169 | +If you specify neither `x` nor `y`, you can specify whether the Y or X xaxis is assigned to the index with `orientation`. |
| 170 | + |
| 171 | +```python |
| 172 | +import plotly.express as px |
| 173 | +wide_df = px.data.short_track_wide(indexed=True) |
| 174 | + |
| 175 | +fig = px.bar(wide_df, orientation="h") |
| 176 | +fig.show() |
| 177 | +``` |
| 178 | + |
| 179 | +### Assigning Columns to Non-Color Arguments |
| 180 | + |
| 181 | + |
| 182 | +In the examples above, the columns of the wide data frame are always assigned to the `color` argument, but this is not a hard constraint. The columns can be assigned to any Plotly Express argument, for example to accomplish faceting, and `color` can be reassigned to any other value. When plotting with a data frame without named indexes, you can reassign the inferred column named `"variable"` and `"value"` to any argument: |
| 183 | + |
| 184 | +```python |
| 185 | +import plotly.express as px |
| 186 | +wide_df = px.data.short_track_wide(indexed=False) |
| 187 | + |
| 188 | +fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], facet_col="variable", color="nation") |
| 189 | +fig.show() |
| 190 | +``` |
| 191 | + |
| 192 | +If using a data frame's named indexes, either explicitly or relying on the defaults, the index references or names must be used: |
| 193 | + |
| 194 | +```python |
| 195 | +import plotly.express as px |
| 196 | +wide_df = px.data.short_track_wide(indexed=True) |
| 197 | + |
| 198 | +fig = px.bar(wide_df, facet_col="medal", color=wide_df.index) |
| 199 | +fig.show() |
| 200 | +``` |
| 201 | + |
| 202 | +### Mixed-Form Data |
| 203 | + |
| 204 | +In some cases, a data frame is neither clearly long-form nor wide-form, and we can call this "mixed-form". For example, in the data frame below, if it contained only the `experiment` columns, the data could be described as wide-form, and if it contained only `gender` and `group` it could be described as long-form, but it contains both: |
| 205 | + |
| 206 | +```python |
| 207 | +import plotly.express as px |
| 208 | +mixed_df = px.data.experiment(indexed=True) |
| 209 | +mixed_df.head() |
| 210 | +``` |
| 211 | + |
| 212 | +We can visualize just the wide-form portion of the data frame easily with a [violin chart](/python/violin/). As a special note, we'll assign the index, which is the participant ID, to the hover_data, so that hovering over outlier points will identify their row. |
| 213 | + |
| 214 | +```python |
| 215 | +import plotly.express as px |
| 216 | +mixed_df = px.data.experiment(indexed=True) |
| 217 | + |
| 218 | +fig = px.violin(mixed_df, y=["experiment_1", "experiment_2", "experiment_3"], hover_data=[mixed_df.index]) |
| 219 | +fig.show() |
| 220 | +``` |
| 221 | + |
| 222 | + |
| 223 | + |
| 224 | + |
| 225 | +We can also leverage the long-form portion of the data frame, for example to color by `gender` and facet by `group`: |
| 226 | + |
| 227 | +```python |
| 228 | +import plotly.express as px |
| 229 | +mixed_df = px.data.experiment(indexed=True) |
| 230 | + |
| 231 | +fig = px.violin(mixed_df, y=["experiment_1", "experiment_2", "experiment_3"], |
| 232 | + color="gender", facet_col="group", hover_data=[mixed_df.index]) |
| 233 | +fig.show() |
| 234 | +``` |
| 235 | + |
| 236 | +And of course, we can reassign `variable` to another argument as well, in this case we'll assign it to `x` and facet by the wide variable, and we'll switch to a [box plot](/python/box-plots/) for variety. |
| 237 | + |
| 238 | +```python |
| 239 | +import plotly.express as px |
| 240 | +mixed_df = px.data.experiment(indexed=True) |
| 241 | + |
| 242 | +fig = px.box(mixed_df, x="group", y=["experiment_1", "experiment_2", "experiment_3"], |
| 243 | + color="gender", facet_col="variable", hover_data=[mixed_df.index]) |
| 244 | +fig.show() |
| 245 | +``` |
| 246 | + |
| 247 | +One interesting thing about a mixed-form data frame like this is that it remains easy to plot, say, one experiment against another, which would require some preliminary data wrangling if this was represented as a pure long-form dataset: |
| 248 | + |
| 249 | +```python |
| 250 | +import plotly.express as px |
| 251 | +mixed_df = px.data.experiment(indexed=True) |
| 252 | + |
| 253 | +fig = px.scatter(mixed_df, x="experiment_1", y="experiment_2", |
| 254 | + color="group", facet_col="gender", hover_data=[mixed_df.index]) |
| 255 | +fig.show() |
| 256 | +``` |
| 257 | + |
| 258 | +```python |
| 259 | + |
| 260 | +``` |
0 commit comments