-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
OneVsRestClassifier's fit method does not accept kwargs #10882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
OneVsRestClassifier
's fit
method does not accept kwargs
Here's a workaround for
import warnings
import numpy as np
from sklearn import clone
from sklearn.externals.joblib import Parallel, delayed
from sklearn.multiclass import OneVsRestClassifier, _ConstantPredictor
from sklearn.preprocessing import LabelBinarizer
# alternative to sklearn.multiclass._fit_binary
def _fit_binary(estimator, X, y, classes=None, **kwargs):
unique_y = np.unique(y)
if len(unique_y) == 1:
if classes is not None:
if y[0] == -1:
c = 0
else:
c = y[0]
warnings.warn("Label %s is present in all training examples." % str(classes[c]))
estimator = _ConstantPredictor().fit(X, unique_y)
else:
estimator = clone(estimator)
estimator.fit(X, y, **kwargs)
return estimator
class OneVsRestClassifierPatched(OneVsRestClassifier):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def fit(self, X, y, **kwargs):
self.label_binarizer_ = LabelBinarizer(sparse_output=True)
Y = self.label_binarizer_.fit_transform(y)
Y = Y.tocsc()
self.classes_ = self.label_binarizer_.classes_
columns = (col.toarray().ravel() for col in Y.T)
self.estimators_ = Parallel(n_jobs=self.n_jobs)(delayed(_fit_binary)(
self.estimator, X, column, classes=[
"not %s" % self.label_binarizer_.classes_[i],
self.label_binarizer_.classes_[i]], **kwargs)
for i, column in enumerate(columns))
return self Code based on https://stackoverflow.com/a/49535681/599739 |
You can use SGDClassifier (and all classifiers in scikit-learn) for multiclass. It has inbuilt OvR. But I agree that OneVsOneClassifier.fit should support kwargs. |
@jnothman Multi-class isnt an issue here. The SGDClassifier dont support multi-label indicator matrix for y in |
sorry, misread. OvR is the wrong name for the method in the multilabel
case, but not to worry about that... I note that MultiOutputClassifier
should support the multilabel case with sample_weight
|
This doesn't look too hard, actually. I'm considering to contribute a patch for this. Here's what I would do:
We could also adjust other classes that extend the |
I've implemented most of the changes that I lined out in my last comment, but while I was writing unit tests I ran into an issue that I'm not sure how to solve. It's about parameter validation. Many Meta-Estimators check the parameters of the I've added support for sample weights to pretty much all meta estimators, but not by explicitly adding a However, this kind of validation doesn't work when we use kwargs instead of explicit parameter names. So now we could:
Any thoughts on this? |
I don't think, in general, that has_fit_parameter is a good idea. There are
rare exceptions. But usually, you should just be able to filter among the
kwargs as being None or not None. Removing the Nones may suffice.
|
I agree, |
I may have misunderstood the context you're talking about... but if we're
passing some keyword arguments on, then {k:v for k, v in kwargs.items() if
v is not None} might help avoid TypeError...????
…On 2 April 2018 at 22:33, Sebastian Straub ***@***.***> wrote:
I agree, has_fit_parameter is not a sufficient solution for these cases,
but I'm not sure what solution you have in mind. Should we get rid of the
check for sample_weight and test if the fit function accepts arbitrary
kwags instead? Also not sure what you mean with "Removing the Nones may
suffice", could you elaborate on that?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10882 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_a3IoWE60damLuIPObgmSh3576fks5tkhqWgaJpZM4S-thJ>
.
|
Yeah, wrong context. I was talking about function inspection, which is used by has_fit_parameter to decide, whether an estimator's fit method can accept sample weights in the first place. Currently, there are checks in several places that look if the fit method has a I'm just testing a solution where I replace the calls to |
OK, I'm stopping to work on this until we find a solution for the problem of deciding whether an estimator supports sample weights. Here's the issue in short:
In my opinion it is generally a bad idea to have meta estimators reason about underlying code by inspecting highly implementation-specific stuff like function arguments. Accepting sample weights is an inherent property of an estimator and we should have a well-defined way to check whether an estimator has it. Then, meta estimators could forward the question, whether sample weights are supported, to their underlying estimator, through as many layers as there may be (e.g. Pipeline would ask its If we want to do this properly, we'll have to touch a lot of code... Maybe we should split this issue, have a simple solution for OvR and OvO (which will have the same problems as the other implementations, but it'll probably suffice for 80% of the use cases) and have a separate effort to find a proper solution for the sample weight detection problem. |
iirc, has_fit_parameter is not so widely used. I would be in favour of
allowing has_fit_parameter, or a replacement thereof, to handle the "maybe"
case. In the example of bagging, this would use trial and error.
I strongly dislike its use in VotingClassifier, for instance, where it is
just used for early validation. But if we returned "maybe" it would just
work.
Would you be willing to contribute that change? Testing would be the
annoying part.
|
Hi I've been dealing with something related lately and came across this issue. I read through earlier discussions. I agree that in general we really should avoid having meta-models "inspecting implementation-specific stuff like function arguments". However, if we limit this function That said, we probably should remove the use of BTW it seems that the use of @jnothman Any thoughts? |
#24027 should fix this. |
Now fixed with metadata routing. |
Description
The
fit
method ofOneVsRestClassifier
and other MetaEstimators (e.g.OneVsOneClassifier
) only acceptsX
andy
as parameters. Underlying estimators likeSGDClassifier
accept more optional keyword args to this function which are essential for some tasks (e.g. thesample_weight
parameter is the only way to add weights to training samples in multi-label classification problems).Steps/Code to Reproduce
Here's how I solve a multi-label classification task with a linear SVM
see also this related question on stackoverflow.
Expected Results
For regular (single-label) classification tasks, I can pass the
sample_weight
kwarg directly toSGDClassifier.fit
, but withOneVsRestClassifier
this is not possibleActual Results
I cannot add weights to my training samples when
OneVsRestClassifier
(or a similar meta-estimator) is used.Feature Request
Please let the
fit
method ofOneVsRestClassifier
(and similar meta-estimators) accept arbitrary kwargs and pass them on to thefit
method of the wrapped estimator. The same may be useful forpartial_fit
andscore
, though I'm not sure about that.The text was updated successfully, but these errors were encountered: