-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
LinearRegression on sparse matrices is not sample weight consistent #30131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note, that this is not the case for >>> from sklearn.utils.fixes import csr_array
... from sklearn.datasets import make_regression
... from sklearn.linear_model import Ridge
... from sklearn.utils._testing import assert_allclose
...
... X, y = make_regression(100, 100, random_state=42)
... X = csr_array(X)
... reg = Ridge(solver="lsqr", alpha=0, fit_intercept=True, tol=1e-12)
... reg.fit(X, y)
... coef1 = reg.coef_
... reg.fit(X, y, sample_weight=np.ones_like(y))
... coef2 = reg.coef_
... assert_allclose(coef1, coef2, rtol=1e-7, atol=1e-9) But the same fails with a larger >>> reg = Ridge(solver="lsqr", alpha=0, fit_intercept=True, tol=1e-4)
... reg.fit(X, y)
... coef1 = reg.coef_
... reg.fit(X, y, sample_weight=np.ones_like(y))
... coef2 = reg.coef_
... assert_allclose(coef1, coef2, rtol=1e-7, atol=1e-9)
Traceback (most recent call last):
Cell In[9], line 13
assert_allclose(coef1, coef2, rtol=1e-7, atol=1e-9)
File ~/code/scikit-learn/sklearn/utils/_testing.py:232 in assert_allclose
np_assert_allclose(
File ~/miniforge3/envs/dev/lib/python3.12/site-packages/numpy/testing/_private/utils.py:1688 in assert_allclose
assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
File ~/miniforge3/envs/dev/lib/python3.12/contextlib.py:81 in inner
return func(*args, **kwds)
File ~/miniforge3/envs/dev/lib/python3.12/site-packages/numpy/testing/_private/utils.py:889 in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=1e-09
Mismatched elements: 100 / 100 (100%)
Max absolute difference among violations: 0.00143505
Max relative difference among violations: 0.03345462
ACTUAL: array([ 5.135931e-01, 3.037594e+01, 2.145667e+00, 7.394602e+01,
1.324545e+01, -4.438229e-02, -1.202226e+00, 7.553014e-01,
-1.661572e+00, 1.483041e-01, -3.624484e-01, -1.915631e+00,...
DESIRED: array([ 5.125267e-01, 3.037609e+01, 2.145875e+00, 7.394601e+01,
1.324581e+01, -4.334871e-02, -1.202492e+00, 7.555202e-01,
-1.661108e+00, 1.495079e-01, -3.613935e-01, -1.916570e+00,... I think we need to pass |
I think we should expose |
Part of #16298.
Describe the bug
When using a sparse container like
csr_array
forX
,LinearRegression
even fails to give the same coefficients for unit or no sample weight, and more generally fails thetest_linear_regression_sample_weight_consitency
checks. In that setting, the underlying solver isscipy.sparse.linalg.lsqr
.Steps/Code to Reproduce
Expected Results
The
assert_allclose
should pass.Actual Results
The test also fails for
fit_intercept=False
. Note that this test and other sample weight consistency checks pass if we do not wrapX
in a sparse container.Versions
EDIT: discovered while working on #30040 for the case of dense inputs.
The text was updated successfully, but these errors were encountered: