-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH Add multi-threshold classification to FixedThresholdClassifier #31544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tests failing. Once CI is green feel free to ping me back. |
72c5623
to
05bc405
Compare
Hi @adrinjalali . "sklearn/model_selection/_classification_threshold.py:docstring of sklearn.model_selection._classification_threshold.FixedThresholdClassifier:105: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]" |
This should fix the issue: diff --git a/sklearn/model_selection/_classification_threshold.py b/sklearn/model_selection/_classification_threshold.py
index ab8ab266f4..37c1380cd8 100644
--- a/sklearn/model_selection/_classification_threshold.py
+++ b/sklearn/model_selection/_classification_threshold.py
@@ -293,7 +293,7 @@ class FixedThresholdClassifier(BaseThresholdClassifier):
>>> from sklearn.metrics import confusion_matrix
>>> from sklearn.model_selection import FixedThresholdClassifier, train_test_split
- # Binary classification with custom threshold
+ >>> # Binary classification with custom threshold
>>> X, y = make_classification(
... n_samples=1_000, weights=[0.9, 0.1], class_sep=0.8, random_state=42
... )
@@ -311,13 +311,13 @@ class FixedThresholdClassifier(BaseThresholdClassifier):
[[184 40]
[ 6 20]]
- # Multi-threshold classification using discretized scores
+ >>> # Multi-threshold classification using discretized scores
+ >>> import numpy as np
>>> base_clf = LogisticRegression(random_state=0)
>>> clf_multi = FixedThresholdClassifier(
... base_clf, threshold=[0.3, 0.6], labels=["Low", "Medium", "High"]
... ).fit(X_train, y_train)
>>> y_pred = clf_multi.predict(X_test)
- >>> import numpy as np
>>> labels, counts = np.unique(y_pred, return_counts=True)
>>> result = [(str(label), int(count)) for label, count in zip(labels, counts)]
>>> print(result)
@@ -327,7 +327,6 @@ class FixedThresholdClassifier(BaseThresholdClassifier):
High: 14
Low: 218
Medium: 18
- <BLANKLINE>
"""
_parameter_constraints: dict = { |
05bc405
to
f2ef469
Compare
Hello @adrinjalali . |
Yes, this requires an enhancement or a feature changelog entry. cc @glemaitre for a review. |
…-learn#30452 This commit adds support for multi-threshold classification to FixedThresholdClassifier, enabling discretization of continuous decision scores into multiple classes. The classifier now accepts a list of thresholds and optional labels for each bin. Includes: - Unit tests covering the new functionality - Documentation and usage example Co-authored-by: Pedro Lopes <[email protected]>
f2ef469
to
a29bef0
Compare
Hi @glemaitre , |
This commit adds support for multi-threshold classification to FixedThresholdClassifier, enabling discretization of continuous decision scores into multiple classes.
The classifier now accepts a list of thresholds and optional labels for each bin.
Includes:
Reference Issues/PRs
#30452
Multiple thresholds in FixedThresholdClassifier
What does this implement/fix? Explain your changes.
This PR extends FixedThresholdClassifier to support discretization of continuous decision scores into multiple classes, by allowing the threshold parameter to be a list of values. This enables users to bin decision function outputs (such as probabilities or scores) into more than two risk classes or categories, a common requirement in real-world applications such as credit scoring, risk modeling, and regulatory compliance.
Key additions: