Replies: 12 comments
-
|
I don't think this is a great fit for seaborn. It's already in pandas (as you note) and also |
Beta Was this translation helpful? Give feedback.
-
|
@mwaskom Coincidentally, I might have an interesting use case for this where it would be beneficial to have an easy way to add additional axes (or at least a second one similar to I want to visualize the result of a grid search on a regression model while tracking two metrics/scores. The catch is that one metric (max_error) is absolute, and the other (MAPE) is a percentage. For me, both metrics are useful because they give me an estimate of both overall performance and worst-case performance. One way I can currently do this is by using a facet over metrics: (
so.Plot(grid_result, x="max_depth", y="score")
.facet(col="metric")
.add(so.Line(), so.Agg())
.add(so.Band())
.share(y=False)
)This is nice, but a bit hard to read, because I need to go back and forth between figures. With base matplotlib, I can use fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
sns.lineplot(grid_result.query("metric == 'mape'"), x="max_depth", y="score", color="tab:blue", ax=ax1)
sns.lineplot(grid_result.query("metric == 'max_error'"), x="max_depth", y="score", color="tab:red", ax=ax2)
ax1.set_ylabel("mape (blue)")
ax2.set_ylabel("max_error (red)")It would be nice if we could get this done in seaborn without having to drop down to matplotlib; especially so because this would free up the dimensions used by a facet to be used with by other variables, e.g., grid search parameters. |
Beta Was this translation helpful? Give feedback.
-
|
I think a |
Beta Was this translation helpful? Give feedback.
-
Isn't |
Beta Was this translation helpful? Give feedback.
-
I'm having trouble seeing it that way. In a parallel coordinates plot there isn't a separate (
sns.load_dataset("iris")
.rename_axis("example")
.reset_index()
.melt(["example", "species"])
.pipe(so.Plot, x="variable", y="value", color="species")
.add(so.Lines(alpha=.5), group="example")
)BTW
This seems to work for me? (Of course it has the same limitations of not playing nicely with faceting, etc., as the function interface) f, ax1 = plt.subplots()
ax2 = ax1.twinx()
p = so.Plot(healthexp, x="Year", group="Country")
p.add(so.Line(), so.Agg(), y="Spending_USD").on(ax1).plot()
p.add(so.Line(color="r"), so.Agg(), y="Life_Expectancy").on(ax2).plot() |
Beta Was this translation helpful? Give feedback.
-
|
In the first plot above, would it be possible to (minmax) normalise the data on the Y-axis? |
Beta Was this translation helpful? Give feedback.
-
Right! I have indeed misunderstood the parallel coordinates plot and they are separate things; sorry about that. @mwaskom Should I create a new issue/feature request to track
Cool! Then this was user-error on my side. I didn't call healthexp = sns.load_dataset("healthexp")
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
(
so.Plot(healthexp, x="Year", group="Country", y="Spending_USD")
.add(so.Line(color="tab:blue"), so.Agg())
.on(ax1)
)
(
so.Plot(healthexp, x="Year", group="Country", y="Life_Expectancy")
.add(so.Line(color="tab:red"), so.Agg())
.on(ax2)
)
@EwoutH Absolutely. Just transform your data before handing it over to the plot :) import numpy as np
import pandas as pd
import seaborn.objects as so
iris: pd.DataFrame = sns.load_dataset("iris")
def normalize(df, columns):
normalized = df.loc[:, columns].apply(
# min/max normalization of a column
lambda data: (data - np.min(data)) / np.ptp(data)
)
return df.assign(**{col: normalized[col] for col in normalized})
(
iris.rename_axis("example")
.reset_index()
.transform(
normalize,
columns=["sepal_length", "sepal_width", "petal_length", "petal_width"],
)
.melt(["example", "species"])
.pipe(so.Plot, x="variable", y="value", color="species")
.add(so.Lines(alpha=0.5), group="example")
) |
Beta Was this translation helpful? Give feedback.
-
This isn't good enough tracking for you? :) Line 602 in 021a20f
You don't need to invoke The key thing is explicitly calling |
Beta Was this translation helpful? Give feedback.
-
You could also do this with a move transform: class NormByOrient(so.Move):
def __call__(self, df, groupby, orient, scales):
other = {"x": "y", "y": "x"}[orient]
return df.assign(**{
other: df.groupby(orient)[other]
.transform(lambda x: (x - x.min()) / (x.max() - x.min()))
})
(
iris
.rename_axis("example")
.reset_index()
.melt(["example", "species"])
.pipe(so.Plot, x="variable", y="value", color="species", group="example")
.add(so.Lines(alpha=.5), NormByOrient())
)I'm 👎 on adding a move transform that does this specifically but open to having it work within a more general operation. The existing But also I suspect that in most cases where you're doing a parallel coordinates plot your data are going to be in "wide form" as that's how you'd hand them to an ML library so the |
Beta Was this translation helpful? Give feedback.
-
Indeed that's the crux. I actually think the documentation is fine as is; it's just a bit imperceptible because it is part of the detailed explanation of If you are willing to accept a PR for this I can look into that. |
Beta Was this translation helpful? Give feedback.
-
|
Duplication of the information doesn't sound like a great idea but maybe "notes" would be a better section, then again, the numpydoc standard says:
Of course, the docs don't really adhere to that standard religiously... |
Beta Was this translation helpful? Give feedback.
-
|
Revisiting the discussion, without reopening #3879 (comment) There's two reasons I kind of expected seaborn to have it as an option.. The first is that it is a powerful data visual analysis method, which is exactly the kind of thing seaborn excels in supporting. I only use this kind of plot for examining the result of a multi-parameter regression, often of mixed numeric and categorical factors. . I plot the best 10% or 1% of the results as lines across the categories. I'm looking to see if the lines follow similar patterns, if there's a lot of scatter on one category that means it probably isn't useful. If it's very very tight on another catergory, then I should focus my search area more. The main thing I'm looking for are the emergence of clusters of similar high fitness, but different routes. The second is that a one click quick analysis method isn't available anywhere else yet. There is an existing parallel coordinates plot in pandas, but it is very basic. It does not normalise the scale between columns, and this makes it totally unsuitable for general analysis.
this uses @jannikmi 's code. It is closer. better fit is a lower score. I may need to resort to get the plot order of lines correct to show the best one consistently, and I may need to extend the search area on a few varaiables.
|
Beta Was this translation helpful? Give feedback.









Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
When visualizing high-dimensional datasets, parallel coordinates plots are sometimes very useful. I would love for Seaborn to have a build in function to do this!
Resources
Beta Was this translation helpful? Give feedback.
All reactions