Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Consolidation of the naming of y_pred_proba, y_score vs probas_pred #27994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adam2392 opened this issue Dec 21, 2023 · 3 comments Β· Fixed by #28092
Closed

Consolidation of the naming of y_pred_proba, y_score vs probas_pred #27994

adam2392 opened this issue Dec 21, 2023 · 3 comments Β· Fixed by #28092

Comments

@adam2392
Copy link
Member

Describe the issue linked to the documentation

I am trying to leverage the classification metrics that rely on a posterior probability (i.e. P(Y | X=x)). This is commonly named y_pred_proba in the sklearn API.

However, I noticed a discrepancy in the naming of the argument for this in various metrics. For example:

Based on the glossary, only y_score has anything related by ctrl+f.

Suggest a potential alternative/fix

Perhaps we can name them all y_score to be consistent? E.g. the following two metrics

@adam2392 adam2392 added Documentation Needs Triage Issue requires triage labels Dec 21, 2023
@adrinjalali
Copy link
Member

If I'm not mistaken, @glemaitre had touched on this, but not sure.

@glemaitre
Copy link
Member

Nop, I did not but I would indeed like that we have a consistency there.

When it comes for estimators, I would like to have y_proba or y_pred_proba for the output of predict_proba to acknowledge that y_proba.sum(axis=1) == 1.

y_score would be reserved to decision_function. I would consider that y_score as less constraints than y_proba.

So when it comes to the metric, I would make it consistent and choose either y_proba or y_score depending on the type of prediction the functions take. If it take both the output of decision_function and predict_proba, then I would favor y_score.

In terms of deprecation cycle, I assume that those variables are not passed as keyword only. Therefore, we will need to rename the variable but still have a deprecation cycle for the old name (that we can put at the end of the signature).

@amueller
Copy link
Member

amueller commented Feb 9, 2024

I think we had a discussion on this like a decade ago, but I think proba instead of prob was a french-ism, and so I wonder if we should make it the standard. Though it would be consistent with predict_proba...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants