Thanks to visit codestin.com
Credit goes to github.com

Skip to content

TunedThresholdClassifierCV: add other metrics #29061

Closed
@koaning

Description

@koaning

Describe the workflow you want to enable

I figured that I might use the new tuned thresholder to turn code like this into something that's a bit more like gridsearch with all the parallism benefits.

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from sklearn.model_selection import FixedThresholdClassifier, train_test_split
from tqdm import trange


X, y = make_classification(
    n_samples=10_000, weights=[0.9, 0.1], class_sep=0.8, random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, random_state=42
)

classifier = LogisticRegression(random_state=0).fit(X_train, y_train)

n_steps = 200
metrics = []
for i in trange(1, n_steps):
    classifier_other_threshold = FixedThresholdClassifier(
        classifier, threshold=i/n_steps, response_method="predict_proba"
    ).fit(X_train, y_train)
    
    y_pred = classifier_other_threshold.predict(X_train)
    metrics.append({
        'threshold': i/n_steps,
        'f1': f1_score(y_train, y_pred),
        'precision': precision_score(y_train, y_pred),
        'recall': recall_score(y_train, y_pred),
        'accuracy': accuracy_score(y_train, y_pred)
    })

This data can give me a very pretty plot with a lot of information.

CleanShot 2024-05-21 at 10 18 42

But I think I can't make this chart with the new tuned thresholder in the 1.5 release candidate.

I can do this:

from sklearn.model_selection import TunedThresholdClassifierCV
from sklearn.metrics import make_scorer

classifier_other_threshold = TunedThresholdClassifierCV(
    classifier,  
    scoring=make_scorer(f1_score), 
    response_method="predict_proba", 
    thresholds=200, 
    n_jobs=-1, 
    store_cv_results=True
)
classifier_other_threshold.fit(X_train, y_train)

And this gives me data for a pretty plot as well, but it only contains the f1 score. There is no way to add extra metrics in the current implementation.

CleanShot 2024-05-21 at 10 20 49

Describe your proposed solution

Maybe it makes sense to also allow a metrics input to the tuned cv estimator. That way, it can still collect any extra metrics that we might be interested in.

Describe alternatives you've considered, if relevant

The aforementioned code shows the chart that I am mainly interested in. A single metric never tells me the full story and the extra metrics help prevent me overfit on a single variable. Reality tends to be more complex than what a single metric can provide, so I like to nudge extra context with some (custom) metrics.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions