Thanks to visit codestin.com
Credit goes to github.com

Skip to content

SGDClassifier -- class_weights & sample_weights #3928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
trevorstephens opened this issue Dec 2, 2014 · 6 comments
Closed

SGDClassifier -- class_weights & sample_weights #3928

trevorstephens opened this issue Dec 2, 2014 · 6 comments
Labels

Comments

@trevorstephens
Copy link
Contributor

Easy one first, there is an unused class_weight parameter in the fit method signature, class_weight flows in through the constructor:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/stochastic_gradient.py#L527

Just to prove it:

from sklearn.linear_model import SGDClassifier
from sklearn.datasets import make_classification
from sklearn.utils import compute_class_weight
import numpy as np

X, y = make_classification(n_features=5, weights=[0.7, 0.3],
                           n_clusters_per_class=1, random_state=415)

# Baseline
clf = SGDClassifier()
clf.fit(X, y)
print clf.coef_
# [[ 2.13434174  1.8734288   2.12685039  5.08116123  2.89369872]]

# With unused fit class_weight attribute
clf = SGDClassifier()
clf.fit(X, y, class_weight='auto')
print clf.coef_
# [[ 2.13434174  1.8734288   2.12685039  5.08116123  2.89369872]]

Now weighting the samples in different (equivalent) ways:

# With auto-weights
clf = SGDClassifier(class_weight='auto')
clf.fit(X, y)
print clf.coef_
# [[ 10.10838607  -4.29529238  13.14026606  -5.99728163   7.65887541]]

weights = compute_class_weight('auto', clf.classes_, y)
weights = dict(zip(clf.classes_, weights))
mapper = np.vectorize(lambda c: weights[c])
weights = mapper(y)

# With manual auto-weights
clf = SGDClassifier()
clf.fit(X, y, sample_weight=weights)
print clf.coef_
# [[ 10.10838607  -4.29529238  13.14026606  -5.99728163   7.65887541]]

# With manual auto-weights & unused fit class_weight attribute
clf = SGDClassifier()
clf.fit(X, y, sample_weight=weights, class_weight='auto')
print clf.coef_
# [[ 10.10838607  -4.29529238  13.14026606  -5.99728163   7.65887541]]

All fine so far, but if you do both class_weight in the constructor and sample_weights in the fitting, the resulting weights appear to be multiplicative.

# With manual auto-weights, squared
clf = SGDClassifier()
clf.fit(X, y, sample_weight=weights**2)
print clf.coef_
# [[  3.22495438  14.11510502   0.58504094   6.38631993   9.55338404]]

# With auto-weights manual auto-weights -- multiplicative
clf = SGDClassifier(class_weight='auto')
clf.fit(X, y, sample_weight=weights)
print clf.coef_
# [[  3.22495438  14.11510502   0.58504094   6.38631993   9.55338404]]

Whether this is desirable or not is one thing, but it does not appear to be documented anywhere, ie neither class_weight nor sample_weight refer to one another in their docstrings. I feel like perhaps a warning or error should be raised, or at least a mention of the interaction in the docstring.

@amueller amueller added the Bug label Dec 2, 2014
@amueller
Copy link
Member

amueller commented Dec 2, 2014

I think multiplicative weights are fine, but should be documented.
The unused parameter is a bug. Should we just remove it or raise a warning for a release?

@agramfort
Copy link
Member

@dsullivan7 any insight?

@dsullivan7
Copy link
Contributor

It is probably a good idea to raise a warning for a release and then deprecate the unused parameter after that. Just so I understand, the use case for sample_weight is that the user wants modified weights for a particular sample of the data, not just a particular class? If that's the case (and I suspect it is), then I think it makes sense to have multiplicative weights and add documentation.

I can go ahead and add the warning.

@dsullivan7
Copy link
Contributor

Actually, I'm looking at it again. What do you think about overriding the class_weight passed into the constructor if class_weight is specified in the fit method?

@dsullivan7
Copy link
Contributor

Take a look at this PR. It adds support for passing class_weight into the fit method. I can make it a warning instead though if that's what we want to do.

@amueller
Copy link
Member

amueller commented Dec 4, 2014

we like to avoid fit parameters as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants