make_scorer needs_threshold makes wrong assumptions #2588

mblondel · 2013-11-14T04:50:32Z

I found two issues with make_scorer's needs_threshold option.

First, it doesn't work if the base estimator is a regressor. Using a regressor is a perfectly valid use case, e.g., if you want do use a regressor but still want to optimize your hyper-parameters against AUC (pointwise ranking with two relevance levels). It does work with Ridge but this is because decision_function and predict are aliases of each others. See also the discussion in issue #1404.

Second, _ThresholdScorer checks that the number of unique values in y_true is 2 but this need not be the case. For example, I can use a regressor and optimize my hyper-parameters against NDCG with more than 2 relevance levels.

The text was updated successfully, but these errors were encountered:

jnothman · 2013-11-14T23:04:00Z

I think threshold was intended to mean "aggregate classification metric
across a range of thresholds", not "score over continuous targets" which is
what you want. The fact that you might want to use the same score names for
regression presents a problem...

On Thu, Nov 14, 2013 at 3:50 PM, Mathieu Blondel
[email protected]:

I found two issues with make_scorer's needs_threshold option.

First, it doesn't work if the base estimator is a regressor. Using a
regressor is a perfectly valid use case, e.g., if you want do use a
regressor but still want to optimize your hyper-parameters against AUC
(pointwise ranking). It does work with Ridge but this is because
decision_function and predict are aliases of each others. See also the
discussion in issue #1404 #1404
.

Second, _ThresholdScorer checks that the number of unique values in y_trueis 2 but this need not be the case. For example, I can use a regressor and
optimize my hyper-parameters against NDCG with more than 2 relevance levels.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2588
.

mblondel · 2013-11-15T00:41:53Z

Well, here's what the documentation about need_threshold says:

 needs_threshold : boolean, default=False
Whether score_func takes a continuous decision certainty.
This only works for binary classification using estimators that
have either a decision_function or predict_proba method.

For example ``average_precision`` or the area under the roc curve
can not be computed using discrete predictions alone.

So it does mention continuous outputs. I actually think need_threshold should be renamed to need_scores or need_continuous_scores to be even more explicit. Furthermore, whether there are two unique label values or not should be enforced by the metric itself (e.g., AUC), not by make_scorer.

mblondel · 2013-11-15T13:14:13Z

CC @ogrisel @amueller @larsmans

If we agree on a fix, I can try to send a PR.

mblondel · 2013-11-23T01:46:04Z

I think this is an issue we should fix before next release.

amueller · 2013-12-27T16:50:57Z

@mblondel I agree that we should fix this before the release. Do you have a suggestion for fixing the first problem (without relying on the class structure ;)? The basic issue is identical with #1404, right? I agree that the documentation is confusing with your application in mind (which I hadn't and which @larsmans might not have had when he rewrote it). So we should either have a decision_function on all regressors or need to figure out if something is a regressor and call predict instead of decision_function (in which case I would suggest removing decision_function maybe?)

Is it ok to just remove the check for the second problem?

mblondel · 2014-01-01T02:34:41Z

The basic issue is identical with #1404, right?

Exactly. A few people opposed having decision_function in all regressors. I think we could just do a try / except.

Is it ok to just remove the check for the second problem?

I think so (the check is already done by roc_auc_score anyway).

amueller · 2014-01-01T14:30:57Z

Ok so you propose to use

        try:
            y_pred = clf.decision_function(X)
            [...]

        except (NotImplementedError, AttributeError):
            try:
                y_pred = clf.predict_proba(X)
                [...]
            except (NotImplementedError, AttributeError):
                y_pred = clf.predict(X)

This might give some weird results when people pass something that doesn't have a decision function, but is not a regressor, for example k-means, or if there is a classifier that doesn't have a decision_function. Then there won't be a warning but the results will be very bad.

mblondel · 2014-01-02T15:37:43Z

I think (correct me if I'm wrong) most of our classifiers (if not all) have at least decision_function or predict_proba. For clustering, we can't be blamed if people do things that do not make sense: "garbage in garbage out".

mblondel · 2014-01-02T16:38:32Z

Another idea that comes to mind is to have an utility function or class for turning any regressor into a binary classifier:

clf = make_classifier(Lasso()) # now clf has fit, decision_function and predict methods

The created classifier can be passed without ambiguity to OneVsRestClassifier or GridSearchCV.
We could also have a to_classifier method in RegressorMixin:

clf = Lasso().to_classifier()

I think this solution is less convenient to use. Also, unless we dynamically create classes, when doing parameter tuning, we will need to use parameters like estimator__alpha.

amueller · 2014-01-04T22:23:59Z

I think all classifiers do have a decision_function or predict_proba but I'm not sure we should rely on this.
Can you explain your last point about estimator__alpha? I didn't quite get that.

First I thought we want a robust way to find out if something is a regressor. But then I noticed that is not what your solution implements. You are dividing "classifiers and regressors with decision_function" and "regressors without decision_function". I'm not saying this is a bad idea, but I think we should make our API assumptions obvious and easy to understand.

Maybe it would then make easier to just add decision_function to the RegressorMixin. I think I opposed but your use-case seems solid and it seems the easiest solution.

Somehow I feel this blurs the line between doing classification and regression, and in general I feel it is easier if one is clear about what one is doing. Can you maybe explain the motivation again? You have three relevance levels that you want to regress? It would be nice to see the exact example as I don't entirely understand the use case. If we are building a ranking API that would be good to know ;)

mblondel · 2014-01-05T02:56:08Z

Can you explain your last point about estimator__alpha? I didn't quite get that.

As an alternative solution to try / except, I suggested a helper utility / class to turn any regressor into a binary classifier. We could implement this in two ways: by a meta-estimator or by dynamically building the class. In the former case, one would have to use estimator__alpha when doing grid-search. And wrapping the binary classifier in OneVsRestClassifier would be a meta-meta-estimator.

Somehow I feel this blurs the line between doing classification and regression, and in general I feel it is easier if one is clear about what one is doing

In retrospect, I often feel that we should have had predict_category and predict_real methods instead of a common predict method.

Maybe it would then make easier to just add decision_function to the RegressorMixin

This would be my preferred solution too if people who opposed could reconsider
CC @larsmans @ogrisel @GaelVaroquaux

Can you maybe explain the motivation again?

Regressors are a perfectly valid way to do ranking (they are called pointwise methods in the learning to rank literature). If you have only two relevance levels (0 and 1), binary classifiers can also be used. So I would like to be able to use both regressors or binary classifiers. Since we need continuous scores, we need to call predict for regressors and decision_function for classifiers. But we don't have any way to tell regressors and classifiers apart yet that doesn't rely on the class hierarchy.

(As a side note, in my experience so far, pointwise methods are just as good as pairwise or listwise methods if you optimize hyper-parameters)

Likewise, as I argued in #1404, regressors are a perfectly valid way to do binary classification (the squared loss has good properties).

If we are building a ranking API that would be good to know ;)

We can already do ranking with our current API. What our API doesn't allow yet is to specify groups (query ids in the information retrieval context). In my research, I do ranking with a single group (the entire training set). Thus, our current API is fine.

amueller · 2014-01-05T09:37:06Z

Thank you for your explanations. When thinking about the problem, I also thought that predict is a bit overloaded. I guess there is not much we can do about it at this point, though.

Do you think in you setting it makes sense to differentiate between regressors and classifiers in your setting? If we say squared loss is a valid loss for classification (I would claim it is a bad old habit) then the only difference between regression and classification is the semantics of predict, and the tie with the loss-function becomes a bit odd.

As we won't rename predict any time soon (probably shouldn't?), I am now a bit unhappy about the api but think implementing decision_function is the right way to go.

mblondel · 2014-01-07T08:51:18Z

Do you think in you setting it makes sense to differentiate between regressors and classifiers in your setting?

I guess it's not necessary if you write your own estimator but I'd like to reuse existing regressors classifiers in scikit-learn.

mblondel · 2014-01-07T09:01:21Z

Sorry for insisting but can we agree on a solution? This is blocking me. Can you vote between adding decision_function to RegressorMixin and doing a try/except here? So far we have +2 for the former. CC @ogrisel @larsmans @jnothman @GaelVaroquaux

amueller · 2015-03-18T19:00:14Z

OMG this has been a year. That is horrible.
Btw, the third option would be to make an "if" based on whether something is a classifier or regressor. @ogrisel was confused by the decision_function recently ;)

mblondel mentioned this issue Dec 3, 2013

Make Pipeline compatible with AdaBoost #2630

Closed

amueller mentioned this issue Jan 5, 2014

LinearRegression has decision_function #1404

Closed

mblondel mentioned this issue Jan 16, 2014

[WIP] Multiple-metric grid search #2759

Closed

amueller modified the milestones: 0.15.1, 0.15 Jul 18, 2014

This was referenced Mar 18, 2015

[MRG + 1] Add SVR mathematical description to the tutorial and a test for decision_function #4388

Merged

[MRG + 2] Classifier and regressor tags #4418

Merged

amueller closed this as completed in #4418 Apr 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make_scorer needs_threshold makes wrong assumptions #2588

make_scorer needs_threshold makes wrong assumptions #2588

mblondel commented Nov 14, 2013

jnothman commented Nov 14, 2013

mblondel commented Nov 15, 2013

mblondel commented Nov 15, 2013

mblondel commented Nov 23, 2013

amueller commented Dec 27, 2013

mblondel commented Jan 1, 2014

amueller commented Jan 1, 2014

mblondel commented Jan 2, 2014

mblondel commented Jan 2, 2014

amueller commented Jan 4, 2014

mblondel commented Jan 5, 2014

amueller commented Jan 5, 2014

mblondel commented Jan 7, 2014

mblondel commented Jan 7, 2014

amueller commented Mar 18, 2015

make_scorer needs_threshold makes wrong assumptions #2588

make_scorer needs_threshold makes wrong assumptions #2588

Comments

mblondel commented Nov 14, 2013

jnothman commented Nov 14, 2013

mblondel commented Nov 15, 2013

mblondel commented Nov 15, 2013

mblondel commented Nov 23, 2013

amueller commented Dec 27, 2013

mblondel commented Jan 1, 2014

amueller commented Jan 1, 2014

mblondel commented Jan 2, 2014

mblondel commented Jan 2, 2014

amueller commented Jan 4, 2014

mblondel commented Jan 5, 2014

amueller commented Jan 5, 2014

mblondel commented Jan 7, 2014

mblondel commented Jan 7, 2014

amueller commented Mar 18, 2015