Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH add store_cv_models option to ElasticNetCV #28726 #31545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

henriquessss
Copy link

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR introduces a new optional parameter store_cv_models to ElasticNetCV.
When store_cv_models=True, the object retains all models trained during cross-validation — not just the best one. This enables users to access:

  • Coefficients (cv_coefs_)
  • Intercepts (cv_intercepts_)
  • Mean squared errors (cv_mse_)
    ...for every combination of fold and hyperparameters.

This is useful for:

  • Analyzing how model weights evolve across folds
  • Creating advanced visualizations (e.g., regularization paths)
  • Performing custom diagnostics and validation studies

Default behavior remains unchanged (store_cv_models=False), preserving backward compatibility and avoiding unnecessary memory usage for most users.

Any other comments?

This addition offers deeper access to the training process for power users without affecting default performance.

Copy link

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff check

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


sklearn/linear_model/_coordinate_descent.py:1824:89: E501 Line too long (105 > 88)
     |
1822 |         n_targets = y.shape[1] if y.ndim == 2 else 1
1823 |         if store_cv:
1824 |             cv_coefs = np.empty((n_folds, n_l1_ratio, n_alphas, n_targets, n_features), dtype=np.float64)
     |                                                                                         ^^^^^^^^^^^^^^^^^ E501
1825 |             cv_intercepts = np.empty((n_folds, n_l1_ratio, n_alphas, n_targets), dtype=np.float64)
1826 |             cv_alphas = np.empty((n_folds, n_l1_ratio, n_alphas), dtype=np.float64)
     |

sklearn/linear_model/_coordinate_descent.py:1825:89: E501 Line too long (98 > 88)
     |
1823 |         if store_cv:
1824 |             cv_coefs = np.empty((n_folds, n_l1_ratio, n_alphas, n_targets, n_features), dtype=np.float64)
1825 |             cv_intercepts = np.empty((n_folds, n_l1_ratio, n_alphas, n_targets), dtype=np.float64)
     |                                                                                         ^^^^^^^^^^ E501
1826 |             cv_alphas = np.empty((n_folds, n_l1_ratio, n_alphas), dtype=np.float64)
1827 |             cv_mse = np.empty((n_folds, n_l1_ratio, n_alphas), dtype=np.float64)
     |

sklearn/linear_model/_coordinate_descent.py:1851:89: E501 Line too long (97 > 88)
     |
1849 |                     )
1850 |                 )
1851 |                 fold_l1_pairs.append((fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas))
     |                                                                                         ^^^^^^^^^ E501
1852 |
1853 |         # If storing CV models, we need to also fit and store all model params for each fold/l1/alpha
     |

sklearn/linear_model/_coordinate_descent.py:1853:89: E501 Line too long (101 > 88)
     |
1851 |                 fold_l1_pairs.append((fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas))
1852 |
1853 |         # If storing CV models, we need to also fit and store all model params for each fold/l1/alpha
     |                                                                                         ^^^^^^^^^^^^^ E501
1854 |         if store_cv:
1855 |             for idx, (fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas) in enumerate(fold_l1_pairs):
     |

sklearn/linear_model/_coordinate_descent.py:1855:89: E501 Line too long (109 > 88)
     |
1853 |         # If storing CV models, we need to also fit and store all model params for each fold/l1/alpha
1854 |         if store_cv:
1855 |             for idx, (fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas) in enumerate(fold_l1_pairs):
     |                                                                                         ^^^^^^^^^^^^^^^^^^^^^ E501
1856 |                 # Prepare path params
1857 |                 path_params_fold = path_params.copy()
     |

sklearn/linear_model/_coordinate_descent.py:1884:89: E501 Line too long (121 > 88)
     |
1882 |                 intercepts = np.moveaxis(intercepts, -1, 1)  # (n_targets, n_alphas)
1883 |                 # Store
1884 |                 cv_coefs[fold_idx, l1_idx, :, :, :] = np.transpose(coefs, (2, 0, 1))  # (n_alphas, n_targets, n_features)
     |                                                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E501
1885 |                 cv_intercepts[fold_idx, l1_idx, :, :] = intercepts.T  # (n_alphas, n_targets)
1886 |                 cv_alphas[fold_idx, l1_idx, :] = alphas_out
     |

sklearn/linear_model/_coordinate_descent.py:1885:89: E501 Line too long (93 > 88)
     |
1883 |                 # Store
1884 |                 cv_coefs[fold_idx, l1_idx, :, :, :] = np.transpose(coefs, (2, 0, 1))  # (n_alphas, n_targets, n_features)
1885 |                 cv_intercepts[fold_idx, l1_idx, :, :] = intercepts.T  # (n_alphas, n_targets)
     |                                                                                         ^^^^^ E501
1886 |                 cv_alphas[fold_idx, l1_idx, :] = alphas_out
1887 |                 # Compute test MSE for each alpha
     |

sklearn/linear_model/_coordinate_descent.py:1890:89: E501 Line too long (96 > 88)
     |
1888 |                 # y_pred shape: (n_samples_test, n_targets, n_alphas)
1889 |                 y_pred = np.stack([
1890 |                     safe_sparse_dot(X_test, coefs[target_idx, :, :]) + intercepts[target_idx, :]
     |                                                                                         ^^^^^^^^ E501
1891 |                     for target_idx in range(n_targets)
1892 |                 ], axis=1)
     |

sklearn/linear_model/_coordinate_descent.py:1894:89: E501 Line too long (90 > 88)
     |
1892 |                 ], axis=1)
1893 |                 if y.ndim == 1:
1894 |                     mse = np.mean((y_pred.squeeze() - y_test[:, np.newaxis]) ** 2, axis=0)
     |                                                                                         ^^ E501
1895 |                 else:
1896 |                     mse = np.mean((y_pred - y_test[:, :, np.newaxis]) ** 2, axis=(0, 1))
     |

sklearn/linear_model/_coordinate_descent.py:2416:89: E501 Line too long (108 > 88)
     |
2415 |     cv_coefs_ : ndarray, optional
2416 |         Coefficient values for all models along the regularization path, for each fold, l1_ratio, and alpha.
     |                                                                                         ^^^^^^^^^^^^^^^^^^^^ E501
2417 |         Only available if ``store_cv_models=True``.
     |

sklearn/linear_model/_coordinate_descent.py:2420:89: E501 Line too long (106 > 88)
     |
2419 |     cv_intercepts_ : ndarray, optional
2420 |         Intercept values for all models along the regularization path, for each fold, l1_ratio, and alpha.
     |                                                                                         ^^^^^^^^^^^^^^^^^^ E501
2421 |         Only available if ``store_cv_models=True``.
     |

sklearn/linear_model/_coordinate_descent.py:2424:89: E501 Line too long (94 > 88)
     |
2423 |     cv_alphas_ : ndarray, optional
2424 |         Alpha values for all models along the regularization path, for each fold and l1_ratio.
     |                                                                                         ^^^^^^ E501
2425 |         Only available if ``store_cv_models=True``.
     |

sklearn/linear_model/_coordinate_descent.py:2428:89: E501 Line too long (100 > 88)
     |
2427 |     cv_mse_ : ndarray, optional
2428 |         MSE values for all models along the regularization path, for each fold, l1_ratio, and alpha.
     |                                                                                         ^^^^^^^^^^^^ E501
2429 |         Only available if ``store_cv_models=True``.
     |

sklearn/linear_model/tests/test_coordinate_descent.py:1486:1: W293 [*] Blank line contains whitespace
     |
1484 |     )
1485 |     clf.fit(X, y)
1486 |     
     | ^^^^ W293
1487 |     # Check attributes exist
1488 |     assert hasattr(clf, "cv_coefs_")
     |
     = help: Remove whitespace from blank line

sklearn/linear_model/tests/test_coordinate_descent.py:1495:89: E501 Line too long (94 > 88)
     |
1493 |     # Check shapes
1494 |     n_targets = 1
1495 |     assert clf.cv_coefs_.shape == (n_folds, len(l1_ratio), len(alphas), n_targets, X.shape[1])
     |                                                                                         ^^^^^^ E501
1496 |     assert clf.cv_intercepts_.shape == (n_folds, len(l1_ratio), len(alphas), n_targets)
1497 |     assert clf.cv_alphas_.shape == (n_folds, len(l1_ratio), len(alphas))
     |

Found 15 errors.
[*] 1 fixable with the `--fix` option.

ruff format

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


--- sklearn/linear_model/_coordinate_descent.py
+++ sklearn/linear_model/_coordinate_descent.py
@@ -1821,8 +1821,12 @@
         n_features = X.shape[1]
         n_targets = y.shape[1] if y.ndim == 2 else 1
         if store_cv:
-            cv_coefs = np.empty((n_folds, n_l1_ratio, n_alphas, n_targets, n_features), dtype=np.float64)
-            cv_intercepts = np.empty((n_folds, n_l1_ratio, n_alphas, n_targets), dtype=np.float64)
+            cv_coefs = np.empty(
+                (n_folds, n_l1_ratio, n_alphas, n_targets, n_features), dtype=np.float64
+            )
+            cv_intercepts = np.empty(
+                (n_folds, n_l1_ratio, n_alphas, n_targets), dtype=np.float64
+            )
             cv_alphas = np.empty((n_folds, n_l1_ratio, n_alphas), dtype=np.float64)
             cv_mse = np.empty((n_folds, n_l1_ratio, n_alphas), dtype=np.float64)
 
@@ -1848,11 +1852,20 @@
                         dtype=X.dtype.type,
                     )
                 )
-                fold_l1_pairs.append((fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas))
+                fold_l1_pairs.append(
+                    (fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas)
+                )
 
         # If storing CV models, we need to also fit and store all model params for each fold/l1/alpha
         if store_cv:
-            for idx, (fold_idx, l1_idx, train, test, this_l1_ratio, this_alphas) in enumerate(fold_l1_pairs):
+            for idx, (
+                fold_idx,
+                l1_idx,
+                train,
+                test,
+                this_l1_ratio,
+                this_alphas,
+            ) in enumerate(fold_l1_pairs):
                 # Prepare path params
                 path_params_fold = path_params.copy()
                 path_params_fold["alphas"] = this_alphas
@@ -1881,17 +1894,27 @@
                 coefs = np.moveaxis(coefs, -1, 2)  # (n_targets, n_features, n_alphas)
                 intercepts = np.moveaxis(intercepts, -1, 1)  # (n_targets, n_alphas)
                 # Store
-                cv_coefs[fold_idx, l1_idx, :, :, :] = np.transpose(coefs, (2, 0, 1))  # (n_alphas, n_targets, n_features)
-                cv_intercepts[fold_idx, l1_idx, :, :] = intercepts.T  # (n_alphas, n_targets)
+                cv_coefs[fold_idx, l1_idx, :, :, :] = np.transpose(
+                    coefs, (2, 0, 1)
+                )  # (n_alphas, n_targets, n_features)
+                cv_intercepts[fold_idx, l1_idx, :, :] = (
+                    intercepts.T
+                )  # (n_alphas, n_targets)
                 cv_alphas[fold_idx, l1_idx, :] = alphas_out
                 # Compute test MSE for each alpha
                 # y_pred shape: (n_samples_test, n_targets, n_alphas)
-                y_pred = np.stack([
-                    safe_sparse_dot(X_test, coefs[target_idx, :, :]) + intercepts[target_idx, :]
-                    for target_idx in range(n_targets)
-                ], axis=1)
+                y_pred = np.stack(
+                    [
+                        safe_sparse_dot(X_test, coefs[target_idx, :, :])
+                        + intercepts[target_idx, :]
+                        for target_idx in range(n_targets)
+                    ],
+                    axis=1,
+                )
                 if y.ndim == 1:
-                    mse = np.mean((y_pred.squeeze() - y_test[:, np.newaxis]) ** 2, axis=0)
+                    mse = np.mean(
+                        (y_pred.squeeze() - y_test[:, np.newaxis]) ** 2, axis=0
+                    )
                 else:
                     mse = np.mean((y_pred - y_test[:, :, np.newaxis]) ** 2, axis=(0, 1))
                 cv_mse[fold_idx, l1_idx, :] = mse

--- sklearn/linear_model/tests/test_coordinate_descent.py
+++ sklearn/linear_model/tests/test_coordinate_descent.py
@@ -1483,7 +1483,7 @@
         random_state=0,
     )
     clf.fit(X, y)
-    
+
     # Check attributes exist
     assert hasattr(clf, "cv_coefs_")
     assert hasattr(clf, "cv_intercepts_")
@@ -1492,7 +1492,13 @@
 
     # Check shapes
     n_targets = 1
-    assert clf.cv_coefs_.shape == (n_folds, len(l1_ratio), len(alphas), n_targets, X.shape[1])
+    assert clf.cv_coefs_.shape == (
+        n_folds,
+        len(l1_ratio),
+        len(alphas),
+        n_targets,
+        X.shape[1],
+    )
     assert clf.cv_intercepts_.shape == (n_folds, len(l1_ratio), len(alphas), n_targets)
     assert clf.cv_alphas_.shape == (n_folds, len(l1_ratio), len(alphas))
     assert clf.cv_mse_.shape == (n_folds, len(l1_ratio), len(alphas))

2 files would be reformatted, 921 files already formatted

Generated for commit: 87d5235. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure if this is a good idea. It's adding quite a bit of computation to store the attributes, and to me it seems the user is better off doing a normal GridSearchCV on ElasticNet instead, to have all required attributes at this point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog and the docstring should be enough, we probably don't need to make the user guide here longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants