-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
RidgeCV and Ridge produce different results when fitted with sample_weight #5364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I get
on master. First thought it was related to #4781 but maybe it isn't |
ping @eickenberg? |
FYI I am using 0.16.1. Could be dup of #4490 ? |
I will take a look at this during this afternoon |
Looking into this. If you pass in cv=N (so that it does standard cross validation, it works). The issue is down in the _RidgeGCV class, and only shows up with weights. Simplified code to show issue: import numpy as np
from sklearn.linear_model import RidgeCV, Ridge
from sklearn.datasets import load_boston
from sklearn.preprocessing import scale
boston = scale(load_boston().data)
target = load_boston().target
alphas = [.1]
for desc, weights in (('UNWEIGHTED', np.linspace(1, 1, len(target))),
('WEIGHTED', np.linspace(1, 2, len(target)))):
print('\n\n==== {} ====='.format(desc))
print("---With CV---")
ridge_cv = (RidgeCV(alphas=alphas, gcv_mode='eigen')
.fit(boston, target, sample_weight=weights))
print("alpha:", ridge_cv.alpha_)
print("coef:", ridge_cv.coef_[:5])
print("---Without CV---")
ridge = (Ridge(ridge_cv.alpha_)
.fit(boston, target, sample_weight=weights))
print("alpha:", ridge.alpha)
print("coef:", ridge.coef_[:5])
diff = np.mean(np.abs(ridge_cv.predict(boston) - ridge.predict(boston)))
print("\nMEAN DIFF: ", diff) Output
|
sample_weights for RidgeGCV has been broken since the beginning. They On Monday, November 16, 2015, Justin Vincent [email protected]
|
#4490 fixes this:
|
There is a small difference in between Ridge and RidgeCV which is cross-validation. Normal Ridge doesn't perform cross validation but whereas the RidgeCV will perform Leave-One-Out cross-validation even if you give cv = None(Node is taken by default). Maybe this is why they produce a different set of results. |
gives:
If
sample_weight
isNone
, all three models give the same coefficients.The text was updated successfully, but these errors were encountered: