`RidgeCV` `cv_values_` are for preprocessed data: centered and scaled by sample weights.

when `store_cv_values=True`, `RidgeCV` stores the leave-one-out squared errors,
when `scoring=None`, or the leave-one-out predictions, when `scoring` is
provided by the user, in its `cv_values_` attribute.

However, when `scoring` is not `None`, it stores the predictions for the
preprocessed data, i.e. rescaled by the square roots of the sample weights and
with the mean of `y` removed:

```python
import numpy as np
from sklearn.linear_model import RidgeCV
from sklearn.datasets import make_regression

x, y = make_regression(n_samples=6, n_features=2, random_state=0)
squared_error = RidgeCV(
    store_cv_values=True, alphas=[10.]).fit(x, y).cv_values_.ravel()
custom_scoring = RidgeCV(
    store_cv_values=True, scoring='neg_mean_squared_error',
    alphas=[10.]).fit(x, y)
# to get the actual predictions we need to add the y mean
custom = (y - (custom_scoring.cv_values_.ravel() + y.mean()))**2
assert np.allclose(squared_error, custom)

sw = np.arange(6) + 1
squared_error = RidgeCV(store_cv_values=True, alphas=[10.]).fit(
    x, y, sample_weight=sw).cv_values_.ravel()
custom_scoring = RidgeCV(
    store_cv_values=True, scoring='neg_mean_squared_error',
    alphas=[10.]).fit(x, y, sample_weight=sw)
# to get the actual predictions we need to rescale by inverse square root
# sample weights and add the y mean
custom = sw * (y
               - (custom_scoring.cv_values_.ravel() / np.sqrt(sw)
                  + np.average(y, weights=sw)))**2
assert np.allclose(squared_error, custom)
```

I think that for a user, it would be easier to get directly the predictions in
the original space, and not need to do this post-processing of `cv_values_`.

Should we rescale the cv values and add the intercept during `fit`?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`RidgeCV` `cv_values_` are for preprocessed data: centered and scaled by sample weights. #13998

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RidgeCV cv_values_ are for preprocessed data: centered and scaled by sample weights. #13998

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`RidgeCV` `cv_values_` are for preprocessed data: centered and scaled by sample weights. #13998