Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ElasticNetCV does not handle sample weights as expected #29248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
snath-xoc opened this issue Jun 13, 2024 · 4 comments
Closed

ElasticNetCV does not handle sample weights as expected #29248

snath-xoc opened this issue Jun 13, 2024 · 4 comments
Labels

Comments

@snath-xoc
Copy link
Contributor

Describe the bug

It seems that the _alpha_grid computations ignore sample weights and as a result the model coefficients do not match after fitting on two versions of the same data, one with weighted samples and the other with repeated samples.

Steps/Code to Reproduce

import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import ElasticNet, ElasticNetCV

rng = np.random.RandomState(0)

X, y = make_regression(
        n_samples=100, n_features=5, random_state=10
    )

sample_weight = rng.randint(0, 5, size=X.shape[0])
X_resampled_by_weights = np.repeat(X, sample_weight, axis=0)
y_resampled_by_weights = np.repeat(y, sample_weight, axis=0)

est_weighted = ElasticNet(selection='cyclic').fit(X,y,sample_weight=sample_weight)
est_repeated = ElasticNet(selection='cyclic').fit(X_resampled_by_weights,y_resampled_by_weights)

np.testing.assert_allclose(est_weighted.coef_, est_repeated.coef_)

est_weighted = ElasticNetCV(selection='cyclic').fit(X,y,sample_weight=sample_weight)
est_repeated = ElasticNetCV(selection='cyclic').fit(X_resampled_by_weights,y_resampled_by_weights)

np.testing.assert_allclose(est_weighted.alphas_, est_repeated.alphas_)

Expected Results

No error is thrown

Actual Results

Assertion on the coef_ for ElasticNet (without CV) is fine
Assertion on the ElasticNetCV alphas_ fails with the following error message

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 100 / 100 (100%)
Max absolute difference: 28.39466113
Max relative difference: 0.20225973
 x: array([111.992461, 104.444544,  97.405331,  90.840538,  84.71819 ,
        79.008467,  73.683561,  68.717536,  64.086204,  59.767008,
        55.738912,  51.982296,  48.478863,  45.21155 ,  42.164443,...
 y: array([140.387122, 130.9255  , 122.10156 , 113.872323, 106.19771 ,
        99.04034 ,  92.365352,  86.140237,  80.334673,  74.920385,
        69.871002,  65.16193 ,  60.770234,  56.674524,  52.85485 ,...

Versions

System:
    python: 3.12.0 (main, Nov 17 2023, 17:04:21) [Clang 15.0.0 (clang-1500.0.40.1)]
executable: /Users/shrutinath/.pyenv/versions/3.12.0/envs/tf_2023/bin/python
   machine: macOS-14.1.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.5.dev0
          pip: 23.3.1
   setuptools: 68.2.2
        numpy: 1.26.2
        scipy: 1.11.3
       Cython: 3.0.9
       pandas: 2.1.3
   matplotlib: 3.8.1
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/shrutinath/.pyenv/versions/3.12.0/envs/tf_2023/lib/python3.12/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: armv8

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: /opt/homebrew/Cellar/libomp/18.1.2/lib/libomp.dylib
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/shrutinath/.pyenv/versions/3.12.0/envs/tf_2023/lib/python3.12/site-packages/scipy/.dylibs/libopenblas.0.dylib
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: armv8
@snath-xoc snath-xoc added Bug Needs Triage Issue requires triage labels Jun 13, 2024
@ogrisel ogrisel removed the Needs Triage Issue requires triage label Jun 13, 2024
@ogrisel
Copy link
Member

ogrisel commented Jun 13, 2024

Thanks for the report @snath-xoc. Please feel free to open a PR with the reproducer as a non-regression test.

@jeremiedbb
Copy link
Member

duplicate of #22914. It seems that there was several attempt PRs to fix the issue but they're all closed.

@ogrisel
Copy link
Member

ogrisel commented Jun 17, 2024

Indeed. The code has changed a bit in main because of the removal of the normalize=True case, but it's likely that we could revive and adapt the last attempt to fix the problem.

@Tialo
Copy link
Contributor

Tialo commented Sep 17, 2024

It probably was resolved by #29442

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

4 participants