Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Function to get scorers for task #12385

Open
@jnothman

Description

@jnothman

I would like to see a utility which would construct a set of applicable scorers for a particular task, returning a Mapping from string to callable scorer. It will be hard to design the API of this right the first time. [Maybe this should be initially developed outside this project and contributed to scikit-learn-contrib, but I think it reduces risk of mis-specifying scorers, so it's of benefit to this project.]

The user will be able to select a subset of the scorers, either with a dict comprehension or with some specialised methods or function parameters. Initially it wouldn't be efficient to run all these scorers, but hopefully we can do something to fix #10802 :|.

Let's take for instance a binary classification task. The function get_applicable_scorers(y, pos_label='yes') for binary y might produce something like:

{
    'accuracy': make_scorer(accuracy_score),
    'balanced_accuracy': make_scorer(balanced_accuracy_score),
    'matthews_corrcoef': make_scorer(matthews_corrcoef),
    'cohens_kappa': make_scorer(cohens_kappa),
    'precision': make_scorer(precision_score, pos_label='yes'),
    'recall': make_scorer(recall_score, pos_label='yes'),
    'f1': make_scorer(f1_score, pos_label='yes'),
    'f0.5': make_scorer(f1_score, pos_label='yes', beta=0.5),
    'f2': make_scorer(f1_score, pos_label='yes', beta=2),
    'specificity': ...,
    'miss_rate': ...,
    ...
    'roc_auc': make_scorer(roc_auc_score, needs_threshold=True),
    'average_precision': make_scorer(average_precision_score, needs_threshold=True),
    'neg_log_loss': make_scorer(average_precision_score, needs_proba=True, greater_is_better=False),
    'neg_brier_score_loss': make_scorer(average_precision_score, needs_proba=True, greater_is_better=False),
}

Doing the same for multiclass classification would pass labels as appropriate, and would optionally would get per-class binary metrics, as well as overall multiclass metrics.

I'm not sure how sample_weight fits in here, but ha! we still don't support weighted scoring in cross validation (#1574), so let's not worry about that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    APIModerateAnything that requires some knowledge of conventions and best practicesNew Feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions