Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RidgeCV cv_values_ are for preprocessed data: centered and scaled by sample weights. #13998

Closed
@jeromedockes

Description

@jeromedockes

when store_cv_values=True, RidgeCV stores the leave-one-out squared errors,
when scoring=None, or the leave-one-out predictions, when scoring is
provided by the user, in its cv_values_ attribute.

However, when scoring is not None, it stores the predictions for the
preprocessed data, i.e. rescaled by the square roots of the sample weights and
with the mean of y removed:

import numpy as np
from sklearn.linear_model import RidgeCV
from sklearn.datasets import make_regression

x, y = make_regression(n_samples=6, n_features=2, random_state=0)
squared_error = RidgeCV(
    store_cv_values=True, alphas=[10.]).fit(x, y).cv_values_.ravel()
custom_scoring = RidgeCV(
    store_cv_values=True, scoring='neg_mean_squared_error',
    alphas=[10.]).fit(x, y)
# to get the actual predictions we need to add the y mean
custom = (y - (custom_scoring.cv_values_.ravel() + y.mean()))**2
assert np.allclose(squared_error, custom)

sw = np.arange(6) + 1
squared_error = RidgeCV(store_cv_values=True, alphas=[10.]).fit(
    x, y, sample_weight=sw).cv_values_.ravel()
custom_scoring = RidgeCV(
    store_cv_values=True, scoring='neg_mean_squared_error',
    alphas=[10.]).fit(x, y, sample_weight=sw)
# to get the actual predictions we need to rescale by inverse square root
# sample weights and add the y mean
custom = sw * (y
               - (custom_scoring.cv_values_.ravel() / np.sqrt(sw)
                  + np.average(y, weights=sw)))**2
assert np.allclose(squared_error, custom)

I think that for a user, it would be easier to get directly the predictions in
the original space, and not need to do this post-processing of cv_values_.

Should we rescale the cv values and add the intercept during fit?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions