Thanks to visit codestin.com
Credit goes to github.com

Skip to content

LinearRegression has decision_function #1404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Nov 24, 2012 · 24 comments
Closed

LinearRegression has decision_function #1404

amueller opened this issue Nov 24, 2012 · 24 comments
Labels
Easy Well-defined and straightforward way to resolve

Comments

@amueller
Copy link
Member

As mentioned by @tjanez in #1393.
This doesn't make sense.
We should check if other linear models for regression also have it.

@mblondel
Copy link
Member

I'm the one who added this. The motivation was to be able to use regressors (e.g. Lasso or ElasticNet) as base estimator in the multiclass module. I don't really see the problem as for me, in the regression case, decision_function and predict should be aliases of each other. We can modifiy the multiclass module to directly call predict when the base estimator is a regressor (that seems like a good idea in any case to support regressors that don't implement decision_function).

@GaelVaroquaux
Copy link
Member

I'm the one who added this. The motivation was to be able to use regressors
(e.g. Lasso or ElasticNet) as base estimator in the multiclass module.

I think that this is a good motivation. Using squared loss (regression
models) for classification is something that can work well in certain
cases (mainly if the classes are well-separable).

@mblondel
Copy link
Member

Yes and the squared loss is a proper loss function so it is suitable for class probability estimation (but I'm not very familiar with this subject...).

@amueller
Copy link
Member Author

@mblondel Sorry, I didn't know that was on purpose.
It seems to be a bit confusing at least. Maybe the docstring should be adjusted? ("Identical to predict"?)
From a user perspective, it is a bit odd to have two methods that do the same thing.

@mblondel
Copy link
Member

@amueller I'm fine with removing decision_function. In the multiclass module, we just need to make sure that predict is called whenever the base estimator is a regressor. Maybe other people can give their opinion. CC @agramfort @ogrisel @pprett

@amueller
Copy link
Member Author

That sounds like a good solution to me, if @GaelVaroquaux is ok with using the class structure to find out if something is a regressor.

@GaelVaroquaux
Copy link
Member

That sounds like a good solution to me, if @GaelVaroquaux is ok with using the
class structure to find out if something is a regressor.

I wanted to avoid it. I don't believe at all in inheritance and class
structure as a way to design a library. That said, we can also see this
as a stopgap solution, while we are waiting for duck-typing for
regression/classification.

@amueller
Copy link
Member Author

We could add attributes _is_regressor, _is_classifier in the mixins?
So we can check getattr("_is_regressor", False) ?

@agramfort
Copy link
Member

We could add attributes _is_regressor, _is_classifier in the mixins?
So we can check getattr("_is_regressor", False) ?

I like this better than relying on class inheritance.

@mblondel
Copy link
Member

@amueller @agramfort Why is it better?

I would prefer to implement a helper function similar to base.is_classifier.

@amueller
Copy link
Member Author

@mblondel it is "better" in the sense that a user can make an estimator a Regressor without having to inherit from an sklearn class. The argument (which is @GaelVaroquaux's, not mine) is that people should be able to take full advantage of all sklearn features without having a dependency on sklearn.

@mblondel
Copy link
Member

@amueller I see. The user still needs to be aware of the _regressor
convention though.

@larsmans
Copy link
Member

I do believe that if someone wants to use a regression model as a classifier, they should do some coding themselves. It's extremely simple:

class LRClassifier(LinearRegression, ClassifierMixin):
    def decision_function(self, X):
        return super(LRClassifier, self).predict(X)
    def predict(self, X):
        return self.decision_function(X) > 0

Currently, linear models are mixing up classification and regression in some places, which is understandable mathematically but quite confusing. I'd rather not support undocumented features in this way.

@GaelVaroquaux
Copy link
Member

Agreed.

@larsmans
Copy link
Member

See #2444 for a fix.

@mblondel
Copy link
Member

Just make sure that regressors work with OneVsRest and friends. I think I added unit tests for that.

larsmans added a commit to larsmans/scikit-learn that referenced this issue Sep 16, 2013
This method only existed to support an undocumented feature in
sklearn.multiclass. Fixes scikit-learn#1404.
@larsmans
Copy link
Member

Saw it. Do you suggest adapting the test with the class I wrote above?

@arjoly
Copy link
Member

arjoly commented Sep 16, 2013

You mean that the OneVsRest and friends should rely on the predict function, if there is no predict_proba and no decision_function?

@mblondel
Copy link
Member

@larsmans No, I want to call predict whenever the base estimator is a regressor. But currently we don't have an utility function similar to is_classifier. See #1404 (comment) and replies.

@arjoly Calling predict without checking if the base estimator is a regressor could work but I wonder about undefined behavior with clustering algorithms (most have a predict method).

@mblondel
Copy link
Member

I don't want users to have to wrap a regressor in a class. I would expect that multiclass estimators can take regressors directly. This was my original motivation for adding decision_function as an alias to predict.

@mblondel
Copy link
Member

BTW the multiclass doc does mention regressors:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/multiclass.py#L12

The docstring for estimator should be clearer though.

@larsmans
Copy link
Member

I understand the use case, but it's adding a hack to a whole set of estimators to add a feature to a few others. I'd rather have a constructor attribute (say, predict_method) on OneVsRestClassifier. Explicit is better than implicit and all that.

@mblondel
Copy link
Member

I'd be fine with such a constructor parameter if it is set to "auto" by default and automatically chooses "predict" for regressors (but we don't have a way to know that right know, and @GaelVaroquaux apparently doesn't want to add a is_regressor utility...)

We could also call predict in last resort if neitheir decision_function nor predict_proba were available (without even checking if the estimator is a regressor). For classifier estimators with only a predict method, this will amount to some kind of voting. If the user passes a clustering estimator, then we can just say "garbage in, garbage out".

@amueller
Copy link
Member Author

amueller commented Jan 5, 2014

Closing in light of #2588.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants