ENH add `store_cv_models` option to `ElasticNetCV` #28726 #31545

henriquessss · 2025-06-13T21:37:28Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR introduces a new optional parameter store_cv_models to ElasticNetCV.
When store_cv_models=True, the object retains all models trained during cross-validation — not just the best one. This enables users to access:

Coefficients (cv_coefs_)
Intercepts (cv_intercepts_)
Mean squared errors (cv_mse_)
...for every combination of fold and hyperparameters.

This is useful for:

Analyzing how model weights evolve across folds
Creating advanced visualizations (e.g., regularization paths)
Performing custom diagnostics and validation studies

Default behavior remains unchanged (store_cv_models=False), preserving backward compatibility and avoiding unnecessary memory usage for most users.

Any other comments?

This addition offers deeper access to the training process for power users without affecting default performance.

Co-authored-by: Guilherme Henriques <[email protected]>

github-actions · 2025-06-13T21:38:18Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`ruff check`

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


sklearn/linear_model/_coordinate_descent.py:1824:89: E501 Line too long (105 > 88)
     |
1822 |         n_targets = y.shape[1] if y.ndim == 2 else 1
1823 |         if store_cv:
1824 |             cv_coefs = np.empty((n_folds, n_l1_ratio, n_alphas, n_targets, n_features), dtype=np.float64)
     |                                                                                         ^^^^^^^^^^^^^^^^^ E501
1825 |             cv_intercepts = np.empty((n_folds, n_l1_ratio, n_alphas, n_targets), dtype=np.float64)
1826 |             cv_alphas = np.empty((n_folds, n_l1_ratio, n_alphas), dtype=np.float64)
     |

sklearn/linear_model/_coordinate_descent.py:1825:89: E501 Line too long (98 > 88)
     |
1823 |         if store_cv:
1824 |             cv_coefs = np.empty((n_folds, n_l1_ratio, n_alphas, n_targets, n_features), dtype=np.float64)
1825 |             cv_intercepts = np.empty((n_folds, n_l1_ratio, n_alphas, n_targets), dtype=np.float64)
     |                                                                                         ^^^^^^^^^^ E501
1826 |             cv_alphas = np.empty((n_folds, n_l1_ratio, n_alphas), dtype=np.float64)
1827 |             cv_mse = np.empty((n_folds, n_l1_ratio, n_alphas), dtype=np.float64)
     |

sklearn/linear_model/_coordinate_descent.py:1851:89: E501 Line too long (97 > 88)
     |
1849 |                     )
1850 |                 )
1851 |                 fold_l1_pairs.append((fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas))
     |                                                                                         ^^^^^^^^^ E501
1852 |
1853 |         # If storing CV models, we need to also fit and store all model params for each fold/l1/alpha
     |

sklearn/linear_model/_coordinate_descent.py:1853:89: E501 Line too long (101 > 88)
     |
1851 |                 fold_l1_pairs.append((fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas))
1852 |
1853 |         # If storing CV models, we need to also fit and store all model params for each fold/l1/alpha
     |                                                                                         ^^^^^^^^^^^^^ E501
1854 |         if store_cv:
1855 |             for idx, (fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas) in enumerate(fold_l1_pairs):
     |

sklearn/linear_model/_coordinate_descent.py:1855:89: E501 Line too long (109 > 88)
     |
1853 |         # If storing CV models, we need to also fit and store all model params for each fold/l1/alpha
1854 |         if store_cv:
1855 |             for idx, (fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas) in enumerate(fold_l1_pairs):
     |                                                                                         ^^^^^^^^^^^^^^^^^^^^^ E501
1856 |                 # Prepare path params
1857 |                 path_params_fold = path_params.copy()
     |

sklearn/linear_model/_coordinate_descent.py:1884:89: E501 Line too long (121 > 88)
     |
1882 |                 intercepts = np.moveaxis(intercepts, -1, 1)  # (n_targets, n_alphas)
1883 |                 # Store
1884 |                 cv_coefs[fold_idx, l1_idx, :, :, :] = np.transpose(coefs, (2, 0, 1))  # (n_alphas, n_targets, n_features)
     |                                                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E501
1885 |                 cv_intercepts[fold_idx, l1_idx, :, :] = intercepts.T  # (n_alphas, n_targets)
1886 |                 cv_alphas[fold_idx, l1_idx, :] = alphas_out
     |

sklearn/linear_model/_coordinate_descent.py:1885:89: E501 Line too long (93 > 88)
     |
1883 |                 # Store
1884 |                 cv_coefs[fold_idx, l1_idx, :, :, :] = np.transpose(coefs, (2, 0, 1))  # (n_alphas, n_targets, n_features)
1885 |                 cv_intercepts[fold_idx, l1_idx, :, :] = intercepts.T  # (n_alphas, n_targets)
     |                                                                                         ^^^^^ E501
1886 |                 cv_alphas[fold_idx, l1_idx, :] = alphas_out
1887 |                 # Compute test MSE for each alpha
     |

sklearn/linear_model/_coordinate_descent.py:1890:89: E501 Line too long (96 > 88)
     |
1888 |                 # y_pred shape: (n_samples_test, n_targets, n_alphas)
1889 |                 y_pred = np.stack([
1890 |                     safe_sparse_dot(X_test, coefs[target_idx, :, :]) + intercepts[target_idx, :]
     |                                                                                         ^^^^^^^^ E501
1891 |                     for target_idx in range(n_targets)
1892 |                 ], axis=1)
     |

sklearn/linear_model/_coordinate_descent.py:1894:89: E501 Line too long (90 > 88)
     |
1892 |                 ], axis=1)
1893 |                 if y.ndim == 1:
1894 |                     mse = np.mean((y_pred.squeeze() - y_test[:, np.newaxis]) ** 2, axis=0)
     |                                                                                         ^^ E501
1895 |                 else:
1896 |                     mse = np.mean((y_pred - y_test[:, :, np.newaxis]) ** 2, axis=(0, 1))
     |

sklearn/linear_model/_coordinate_descent.py:2416:89: E501 Line too long (108 > 88)
     |
2415 |     cv_coefs_ : ndarray, optional
2416 |         Coefficient values for all models along the regularization path, for each fold, l1_ratio, and alpha.
     |                                                                                         ^^^^^^^^^^^^^^^^^^^^ E501
2417 |         Only available if ``store_cv_models=True``.
     |

sklearn/linear_model/_coordinate_descent.py:2420:89: E501 Line too long (106 > 88)
     |
2419 |     cv_intercepts_ : ndarray, optional
2420 |         Intercept values for all models along the regularization path, for each fold, l1_ratio, and alpha.
     |                                                                                         ^^^^^^^^^^^^^^^^^^ E501
2421 |         Only available if ``store_cv_models=True``.
     |

sklearn/linear_model/_coordinate_descent.py:2424:89: E501 Line too long (94 > 88)
     |
2423 |     cv_alphas_ : ndarray, optional
2424 |         Alpha values for all models along the regularization path, for each fold and l1_ratio.
     |                                                                                         ^^^^^^ E501
2425 |         Only available if ``store_cv_models=True``.
     |

sklearn/linear_model/_coordinate_descent.py:2428:89: E501 Line too long (100 > 88)
     |
2427 |     cv_mse_ : ndarray, optional
2428 |         MSE values for all models along the regularization path, for each fold, l1_ratio, and alpha.
     |                                                                                         ^^^^^^^^^^^^ E501
2429 |         Only available if ``store_cv_models=True``.
     |

sklearn/linear_model/tests/test_coordinate_descent.py:1486:1: W293 [*] Blank line contains whitespace
     |
1484 |     )
1485 |     clf.fit(X, y)
1486 |     
     | ^^^^ W293
1487 |     # Check attributes exist
1488 |     assert hasattr(clf, "cv_coefs_")
     |
     = help: Remove whitespace from blank line

sklearn/linear_model/tests/test_coordinate_descent.py:1495:89: E501 Line too long (94 > 88)
     |
1493 |     # Check shapes
1494 |     n_targets = 1
1495 |     assert clf.cv_coefs_.shape == (n_folds, len(l1_ratio), len(alphas), n_targets, X.shape[1])
     |                                                                                         ^^^^^^ E501
1496 |     assert clf.cv_intercepts_.shape == (n_folds, len(l1_ratio), len(alphas), n_targets)
1497 |     assert clf.cv_alphas_.shape == (n_folds, len(l1_ratio), len(alphas))
     |

Found 15 errors.
[*] 1 fixable with the `--fix` option.

`ruff format`

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


--- sklearn/linear_model/_coordinate_descent.py
+++ sklearn/linear_model/_coordinate_descent.py
@@ -1821,8 +1821,12 @@
         n_features = X.shape[1]
         n_targets = y.shape[1] if y.ndim == 2 else 1
         if store_cv:
-            cv_coefs = np.empty((n_folds, n_l1_ratio, n_alphas, n_targets, n_features), dtype=np.float64)
-            cv_intercepts = np.empty((n_folds, n_l1_ratio, n_alphas, n_targets), dtype=np.float64)
+            cv_coefs = np.empty(
+                (n_folds, n_l1_ratio, n_alphas, n_targets, n_features), dtype=np.float64
+            )
+            cv_intercepts = np.empty(
+                (n_folds, n_l1_ratio, n_alphas, n_targets), dtype=np.float64
+            )
             cv_alphas = np.empty((n_folds, n_l1_ratio, n_alphas), dtype=np.float64)
             cv_mse = np.empty((n_folds, n_l1_ratio, n_alphas), dtype=np.float64)
 
@@ -1848,11 +1852,20 @@
                         dtype=X.dtype.type,
                     )
                 )
-                fold_l1_pairs.append((fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas))
+                fold_l1_pairs.append(
+                    (fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas)
+                )
 
         # If storing CV models, we need to also fit and store all model params for each fold/l1/alpha
         if store_cv:
-            for idx, (fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas) in enumerate(fold_l1_pairs):
+            for idx, (
+                fold_idx,
+                l1_idx,
+                train,
+                test,
+                this_l1_ratio,
+                this_alphas,
+            ) in enumerate(fold_l1_pairs):
                 # Prepare path params
                 path_params_fold = path_params.copy()
                 path_params_fold["alphas"] = this_alphas
@@ -1881,17 +1894,27 @@
                 coefs = np.moveaxis(coefs, -1, 2)  # (n_targets, n_features, n_alphas)
                 intercepts = np.moveaxis(intercepts, -1, 1)  # (n_targets, n_alphas)
                 # Store
-                cv_coefs[fold_idx, l1_idx, :, :, :] = np.transpose(coefs, (2, 0, 1))  # (n_alphas, n_targets, n_features)
-                cv_intercepts[fold_idx, l1_idx, :, :] = intercepts.T  # (n_alphas, n_targets)
+                cv_coefs[fold_idx, l1_idx, :, :, :] = np.transpose(
+                    coefs, (2, 0, 1)
+                )  # (n_alphas, n_targets, n_features)
+                cv_intercepts[fold_idx, l1_idx, :, :] = (
+                    intercepts.T
+                )  # (n_alphas, n_targets)
                 cv_alphas[fold_idx, l1_idx, :] = alphas_out
                 # Compute test MSE for each alpha
                 # y_pred shape: (n_samples_test, n_targets, n_alphas)
-                y_pred = np.stack([
-                    safe_sparse_dot(X_test, coefs[target_idx, :, :]) + intercepts[target_idx, :]
-                    for target_idx in range(n_targets)
-                ], axis=1)
+                y_pred = np.stack(
+                    [
+                        safe_sparse_dot(X_test, coefs[target_idx, :, :])
+                        + intercepts[target_idx, :]
+                        for target_idx in range(n_targets)
+                    ],
+                    axis=1,
+                )
                 if y.ndim == 1:
-                    mse = np.mean((y_pred.squeeze() - y_test[:, np.newaxis]) ** 2, axis=0)
+                    mse = np.mean(
+                        (y_pred.squeeze() - y_test[:, np.newaxis]) ** 2, axis=0
+                    )
                 else:
                     mse = np.mean((y_pred - y_test[:, :, np.newaxis]) ** 2, axis=(0, 1))
                 cv_mse[fold_idx, l1_idx, :] = mse

--- sklearn/linear_model/tests/test_coordinate_descent.py
+++ sklearn/linear_model/tests/test_coordinate_descent.py
@@ -1483,7 +1483,7 @@
         random_state=0,
     )
     clf.fit(X, y)
-    
+
     # Check attributes exist
     assert hasattr(clf, "cv_coefs_")
     assert hasattr(clf, "cv_intercepts_")
@@ -1492,7 +1492,13 @@
 
     # Check shapes
     n_targets = 1
-    assert clf.cv_coefs_.shape == (n_folds, len(l1_ratio), len(alphas), n_targets, X.shape[1])
+    assert clf.cv_coefs_.shape == (
+        n_folds,
+        len(l1_ratio),
+        len(alphas),
+        n_targets,
+        X.shape[1],
+    )
     assert clf.cv_intercepts_.shape == (n_folds, len(l1_ratio), len(alphas), n_targets)
     assert clf.cv_alphas_.shape == (n_folds, len(l1_ratio), len(alphas))
     assert clf.cv_mse_.shape == (n_folds, len(l1_ratio), len(alphas))

2 files would be reformatted, 921 files already formatted

_{Generated for commit: 87d5235. Link to the linter CI: here}

adrinjalali

I'm not really sure if this is a good idea. It's adding quite a bit of computation to store the attributes, and to me it seems the user is better off doing a normal GridSearchCV on ElasticNet instead, to have all required attributes at this point.

adrinjalali · 2025-06-15T16:39:51Z

doc/modules/linear_model.rst

The changelog and the docstring should be enough, we probably don't need to make the user guide here longer.

ENH add store_cv_models option to ElasticNetCV

87d5235

Co-authored-by: Guilherme Henriques <[email protected]>

github-actions bot added the module:linear_model label Jun 13, 2025

adrinjalali reviewed Jun 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH add `store_cv_models` option to `ElasticNetCV` #28726 #31545

ENH add `store_cv_models` option to `ElasticNetCV` #28726 #31545

Uh oh!

henriquessss commented Jun 13, 2025

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

adrinjalali left a comment

Uh oh!

adrinjalali Jun 15, 2025

Uh oh!

Uh oh!

Uh oh!

ENH add store_cv_models option to ElasticNetCV #28726 #31545

Are you sure you want to change the base?

ENH add store_cv_models option to ElasticNetCV #28726 #31545

Uh oh!

Conversation

henriquessss commented Jun 13, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Jun 13, 2025

❌ Linting issues

ruff check

ruff format

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ENH add `store_cv_models` option to `ElasticNetCV` #28726 #31545

ENH add `store_cv_models` option to `ElasticNetCV` #28726 #31545

`ruff check`

`ruff format`