Thanks to visit codestin.com
Credit goes to github.com

Skip to content

LinearRegression on sparse matrices is not sample weight consistent #30131

Closed
@antoinebaker

Description

@antoinebaker

Part of #16298.

Describe the bug

When using a sparse container like csr_array for X, LinearRegression even fails to give the same coefficients for unit or no sample weight, and more generally fails the test_linear_regression_sample_weight_consitency checks. In that setting, the underlying solver is scipy.sparse.linalg.lsqr.

Steps/Code to Reproduce

from sklearn.utils.fixes import csr_array
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.utils._testing import assert_allclose

X, y = make_regression(100, 100, random_state=42)
X = csr_array(X)
reg = LinearRegression(fit_intercept=True)
reg.fit(X, y)
coef1 = reg.coef_
reg.fit(X, y, sample_weight=np.ones_like(y))
coef2 = reg.coef_
assert_allclose(coef1, coef2, rtol=1e-7, atol=1e-9)

Expected Results

The assert_allclose should pass.

Actual Results

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=1e-09

Mismatched elements: 100 / 100 (100%)
Max absolute difference among violations: 0.00165048
Max relative difference among violations: 0.02621317
 ACTUAL: array([-2.450778e-01,  2.917985e+01,  1.678916e+00,  7.534454e+01,
        1.241587e+01,  1.076716e+00, -4.975206e-01, -9.262295e-01,
       -1.373931e+00, -1.624112e-01, -8.644422e-01, -5.986218e-01,...
 DESIRED: array([-2.452359e-01,  2.918078e+01,  1.678681e+00,  7.534410e+01,
        1.241459e+01,  1.076624e+00, -4.962305e-01, -9.257701e-01,
       -1.373862e+00, -1.622824e-01, -8.652183e-01, -5.981715e-01,...

The test also fails for fit_intercept=False. Note that this test and other sample weight consistency checks pass if we do not wrap X in a sparse container.

Versions

System:
    python: 3.12.5 | packaged by conda-forge | (main, Aug  8 2024, 18:32:50) [Clang 16.0.6 ]
executable: /Users/abaker/miniforge3/envs/sklearn-dev/bin/python
   machine: macOS-14.5-arm64-arm-64bit

Python dependencies:
      sklearn: 1.6.dev0
          pip: 24.2
   setuptools: 73.0.1
        numpy: 2.1.0
        scipy: 1.14.1
       Cython: 3.0.11
       pandas: 2.2.2
   matplotlib: 3.9.2
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/abaker/miniforge3/envs/sklearn-dev/lib/libopenblas.0.dylib
        version: 0.3.27
threading_layer: openmp
   architecture: VORTEX

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: /Users/abaker/miniforge3/envs/sklearn-dev/lib/libomp.dylib
        version: None

EDIT: discovered while working on #30040 for the case of dense inputs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions