Description
Describe the workflow you want to enable
Let's take a probability-based regression model like sklearn.linear_model.LogisticRegression
.
We have two methods to get the predictions:
- predict_proba() outputs a probability for each class
- predict() outputs a class (the one with the most important probability)
I would like to activate the threshold specification scenario so the user can decide the threshold. This scenario is of course not the default one when using LogisticRegression, but may be useful for anomaly/opportunity detection.
Real life example:
A user created a LogisticRegression model in order to detect trading opportunities that highly lead to benefits.
There are two possible class: 1 for "opportunity", 0 for "no opportunity".
The user would like to get only best probabilities predictions (> 90%) to create a position on the market.
Describe your proposed solution
My proposed solution is to add an optionnal parameter threshold (default: 1/NB_CLASSES) to the predict() method.
We could also edit other probability regression models if needed.
Additional context
I would like to work on this feature if possible.
I know this is possible to do this kind of things as a post-processing step on the user side.
However, if we provide a predict() method that choose a class, I think we should provide a way to define the threshold to the user.