Thanks to visit codestin.com
Credit goes to github.com

Skip to content

plot_precision_recall_curve and plot_roc_curve don't allow picking positive class #15573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Nov 8, 2019 · 9 comments · Fixed by #17651
Closed

plot_precision_recall_curve and plot_roc_curve don't allow picking positive class #15573

amueller opened this issue Nov 8, 2019 · 9 comments · Fixed by #17651
Milestone

Comments

@amueller
Copy link
Member

amueller commented Nov 8, 2019

In #15555 we removed the inconsistent pos_label attribute.

It would be good to allow users to specify which class is the semantically positive class and slice predict_proba or decision_function accordingly.

I'm not sure if that parameter should be called pos_label as it is somewhat semantically different. Maybe positive_label? positive_class? but it's not clear that that's different. We could do use_as_positive= but that's a bit obscure maybe?

Also see #15405 (comment)

@amueller amueller added this to the 0.22 milestone Nov 8, 2019
@ogrisel
Copy link
Member

ogrisel commented Nov 30, 2019

Do you mean as a generic constructor params for all (binary) classifiers?

@jnothman jnothman modified the milestones: 0.22, 0.23 Dec 5, 2019
@thomasjpfan thomasjpfan modified the milestones: 0.23, 0.24 Apr 20, 2020
@claramatos
Copy link
Contributor

can I start to work on this issue?

@ogrisel
Copy link
Member

ogrisel commented Jun 11, 2020

@claramatos we first need to agree on what's need to be done. I am not sure what @amueller has in mind.

@glemaitre
Copy link
Member

glemaitre commented Jun 11, 2020

Initial post in #17565

I think that we should expose the pos_label as one of the parameters of plot_precision_recall_curve. I even think that we should issue a warning in case of class imbalance and that the positive class considered is the one with most samples. In most of cases, you are reporting the wrong part of your result but it might be what we give as a result with defaults.

import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import plot_precision_recall_curve

X, y = fetch_openml(
    name="blood-transfusion-service-center",
    as_frame=True, return_X_y=True,
)
# Make columns and classes more human-readable
X.columns = ["Recency", "Frequency", "Monetary", "Time"]
y = y.apply(
    lambda x: "donated" if x == "2" else "not donated"
).astype("category")

X_train, X_test, y_train, y_test = train_test_split(
    X, y, shuffle=True, random_state=0, test_size=0.5
)

classifier = LogisticRegression().fit(X_train, y_train)
plot_precision_recall_curve(classifier, X_test, y_test)

pr_curve

One would have expected the following instead:

xxx

@claramatos
Copy link
Contributor

@glemaitre I was thinking about implementing it as you are suggesting, exposing the pos_label (having 1 as default)
if it is ok with you (and @amueller) I'll start to work on that

@glemaitre
Copy link
Member

@claramatos You can open a PR. IMO, pos_label is fine but it might be changing during the review process but I really think that we should solve this issue. In the meanwhile, I will address #17572.

I already opened #17569 before to find this issue. So you can focus on the plot_roc_curve function.
You can use a similar test than in #17569 to ensure that we plot the proper curve.

@glemaitre
Copy link
Member

@claramatos did you have time to work on the issue related to the plot_roc_curve?

@claramatos
Copy link
Contributor

@glemaitre I'm planning to work on it over the weekend

@claramatos
Copy link
Contributor

I was looking into plot_precision_recall_curve and plot_roc_curve and there are some pieces that are common. Should I take the opportunity and move them to base.py?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants