RidgeCV and Ridge produce different results when fitted with sample_weight #5364

mtrbean · 2015-10-07T22:09:35Z

import numpy as np
from sklearn.linear_model import RidgeCV, Ridge
from sklearn.datasets import load_boston
from sklearn.preprocessing import scale

boston = scale(load_boston().data)
target = load_boston().target
alphas = np.linspace(0.1, 200)
weight = np.logspace(0, -2, len(target))

print "RidgeCV(eigen)"
fit0 = RidgeCV(alphas=alphas, store_cv_values=True, gcv_mode='eigen').fit(boston, target, sample_weight=weight)    
print "alpha:", fit0.alpha_
print "cv:", fit0.cv_values_[0,0:5]
print "coef:", fit0.coef_[:5]

print "RidgeCV(svd)"
fit1 = RidgeCV(alphas=alphas, store_cv_values=True, gcv_mode='svd').fit(boston, target, sample_weight=weight)    
print "alpha:", fit1.alpha_
print "cv:", fit1.cv_values_[0,0:5]
print "coef:", fit1.coef_[:5]

print "Ridge"
fit2 = Ridge(fit0.alpha_).fit(boston, target, sample_weight=weight)
print "coef:", fit2.coef_[:5]

gives:

RidgeCV(eigen)
alpha: 167.363265306
cv: [ 1.63311277  1.58236816  1.53267447  1.48401844  1.43638695]
coef: [-0.90546722  1.0438252   0.09577594  0.68459321 -2.00971137]

RidgeCV(svd)
alpha: 167.363265306
cv: [ 1.63311277  1.58236816  1.53267447  1.48401844  1.43638695]
coef: [-0.90546722  1.0438252   0.09577594  0.68459321 -2.00971137]

Ridge
coef: [-0.11587241  0.37139676 -0.35468307  0.26534304 -0.29640185]

If sample_weight is None, all three models give the same coefficients.

The text was updated successfully, but these errors were encountered:

amueller · 2015-10-12T21:59:46Z

I get

RidgeCV(eigen)
alpha: 200.0
cv: [ 78.16301385  78.56061227  78.95707779  79.35240503  79.7465887 ]
coef: [-0.90268438  1.03901964  0.0881258   0.68583855 -2.00008704]
RidgeCV(svd)
alpha: 200.0
cv: [ 78.16301385  78.56061227  78.95707779  79.35240503  79.7465887 ]
coef: [-0.90268438  1.03901964  0.0881258   0.68583855 -2.00008704]
Ridge
coef: [-0.10848951  0.36202508 -0.34585975  0.23836357 -0.28916756]

on master.
Not sure why it is different. But also not good.

First thought it was related to #4781 but maybe it isn't

amueller · 2015-10-12T22:01:09Z

ping @eickenberg?

mtrbean · 2015-10-13T00:02:17Z

FYI I am using 0.16.1. Could be dup of #4490 ?

eickenberg · 2015-10-19T11:12:50Z

I will take a look at this during this afternoon

justinvf-zz · 2015-11-16T21:18:08Z

Looking into this. If you pass in cv=N (so that it does standard cross validation, it works). The issue is down in the _RidgeGCV class, and only shows up with weights.

Simplified code to show issue:

import numpy as np
from sklearn.linear_model import RidgeCV, Ridge
from sklearn.datasets import load_boston
from sklearn.preprocessing import scale

boston = scale(load_boston().data)
target = load_boston().target
alphas = [.1]

for desc, weights in (('UNWEIGHTED', np.linspace(1, 1, len(target))),
                      ('WEIGHTED', np.linspace(1, 2, len(target)))):
    print('\n\n==== {} ====='.format(desc))
    print("---With CV---")
    ridge_cv = (RidgeCV(alphas=alphas, gcv_mode='eigen')
                .fit(boston, target, sample_weight=weights))
    print("alpha:", ridge_cv.alpha_)
    print("coef:", ridge_cv.coef_[:5])

    print("---Without CV---")
    ridge = (Ridge(ridge_cv.alpha_)
             .fit(boston, target, sample_weight=weights))
    print("alpha:", ridge.alpha)
    print("coef:", ridge.coef_[:5])

    diff = np.mean(np.abs(ridge_cv.predict(boston) - ridge.predict(boston)))
    print("\nMEAN DIFF: ", diff)

Output

==== UNWEIGHTED =====
---With CV---
nonezo
alpha: 0.1
coef: [-0.91956882  1.07943945  0.14054342  0.6825521  -2.05727602]
---Without CV---
alpha: 0.1
coef: [-0.91956882  1.07943945  0.14054342  0.6825521  -2.05727602]

MEAN DIFF:  6.41791407019e-12


==== WEIGHTED =====
---With CV---
nonezo
alpha: 0.1
coef: [-0.91863371  1.07728128  0.13826842  0.68280656 -2.05499068]
---Without CV---
alpha: 0.1
coef: [-0.91956882  1.07943945  0.14054342  0.6825521  -2.05727602]

eickenberg · 2015-11-16T21:29:25Z

sample_weights for RidgeGCV has been broken since the beginning. They
weight the eigenspaces of the gram matrix. I made a PR to fix it, but dont
have access to github right now (phone) to find the number

On Monday, November 16, 2015, Justin Vincent [email protected]
wrote:

Looking into this. If you pass in cv=N (so that it does standard cross
validation, it works). The issue is down in the _RidgeGCV class, and only
shows up with weights.

Simplified code to show issue:

import numpy as npfrom sklearn.linear_model import RidgeCV, Ridgefrom sklearn.datasets import load_bostonfrom sklearn.preprocessing import scale

boston = scale(load_boston().data)
target = load_boston().target
alphas = [.1]
for desc, weights in (('UNWEIGHTED', np.linspace(1, 1, len(target))),
('WEIGHTED', np.linspace(1, 2, len(target)))):
print('\n\n==== {} ====='.format(desc))
print("---With CV---")
ridge_cv = (RidgeCV(alphas=alphas, gcv_mode='eigen')
.fit(boston, target, sample_weight=weights))
print("alpha:", ridge_cv.alpha_)
print("coef:", ridge_cv.coef_[:5])
print("---Without CV---")
ridge = (Ridge(ridge_cv.alpha_)
         .fit(boston, target, sample_weight=weights))
print("alpha:", ridge.alpha)
print("coef:", ridge.coef_[:5])

diff = np.mean(np.abs(ridge_cv.predict(boston) - ridge.predict(boston)))
print("\nMEAN DIFF: ", diff)
Output

==== UNWEIGHTED =====
---With CV---
nonezo
alpha: 0.1
coef: [-0.91956882 1.07943945 0.14054342 0.6825521 -2.05727602]
---Without CV---
alpha: 0.1
coef: [-0.91956882 1.07943945 0.14054342 0.6825521 -2.05727602]

MEAN DIFF: 6.41791407019e-12

==== WEIGHTED =====
---With CV---
nonezo
alpha: 0.1
coef: [-0.91863371 1.07728128 0.13826842 0.68280656 -2.05499068]
---Without CV---
alpha: 0.1
coef: [-0.91956882 1.07943945 0.14054342 0.6825521 -2.05727602]

—
Reply to this email directly or view it on GitHub
#5364 (comment)
.

justinvf-zz · 2015-11-16T21:39:46Z

#4490 fixes this:

==== UNWEIGHTED =====
---With CV---
alpha: 0.1
coef: [-0.91956882  1.07943945  0.14054342  0.6825521  -2.05727602]
---Without CV---
alpha: 0.1
coef: [-0.91956882  1.07943945  0.14054342  0.6825521  -2.05727602]

MEAN DIFF:  6.41791407019e-12


==== WEIGHTED =====
---With CV---
alpha: 0.1
coef: [-0.9281295   1.17051021  0.08689879  0.77151052 -2.31879632]
---Without CV---
alpha: 0.1
coef: [-0.9281295   1.17051021  0.08689879  0.77151052 -2.31879632]

MEAN DIFF:  1.21976561482e-11

tejamaripuri · 2017-04-23T20:55:29Z

There is a small difference in between Ridge and RidgeCV which is cross-validation. Normal Ridge doesn't perform cross validation but whereas the RidgeCV will perform Leave-One-Out cross-validation even if you give cv = None(Node is taken by default). Maybe this is why they produce a different set of results.

amueller added the Bug label Oct 12, 2015

amueller added this to the 0.17 milestone Oct 12, 2015

amueller modified the milestones: 0.17, 0.18 Nov 2, 2015

amueller modified the milestones: 0.18, 0.19 Sep 22, 2016

jeromedockes mentioned this issue May 2, 2019

[MRG] handle sparse x and intercept in _RidgeGCV #13350

Merged

thomasjpfan closed this as completed in #13350 May 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RidgeCV and Ridge produce different results when fitted with sample_weight #5364

RidgeCV and Ridge produce different results when fitted with sample_weight #5364

mtrbean commented Oct 7, 2015

amueller commented Oct 12, 2015

Uh oh!

amueller commented Oct 12, 2015

Uh oh!

mtrbean commented Oct 13, 2015

Uh oh!

eickenberg commented Oct 19, 2015

Uh oh!

justinvf-zz commented Nov 16, 2015

Uh oh!

eickenberg commented Nov 16, 2015

Uh oh!

justinvf-zz commented Nov 16, 2015

Uh oh!

tejamaripuri commented Apr 23, 2017

Uh oh!

Uh oh!

RidgeCV and Ridge produce different results when fitted with sample_weight #5364

RidgeCV and Ridge produce different results when fitted with sample_weight #5364

Comments

mtrbean commented Oct 7, 2015

amueller commented Oct 12, 2015

Uh oh!

amueller commented Oct 12, 2015

Uh oh!

mtrbean commented Oct 13, 2015

Uh oh!

eickenberg commented Oct 19, 2015

Uh oh!

justinvf-zz commented Nov 16, 2015

Uh oh!

eickenberg commented Nov 16, 2015

Uh oh!

justinvf-zz commented Nov 16, 2015

Uh oh!

tejamaripuri commented Apr 23, 2017

Uh oh!