-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add tolerance tol
to LinearRegression for sparse matrix solver lsqr
#24601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Further tests indicate that the problem has to do with tolerances in scipy.sparse.linalg.lsqr. Take the following code and try different values für import pandas as pd
import numpy as np
import scipy.linalg
import scipy.sparse.linalg
from sklearn.utils._testing import assert_allclosea
atol=1e-6
btol=1e-6
rng = np.random.RandomState(0)
n_samples = 200
n_features = 14
X = rng.randn(n_samples, n_features)
X[X < 0.1] = 0.0
Xcsr = scipy.sparse.csr_matrix(X)
y = rng.rand(n_samples)
g1 = scipy.linalg.lstsq(X, y)
g2 = scipy.sparse.linalg.lsqr(Xcsr, y, atol=atol, btol=btol)
assert_allclose(g1[0], g2[0]) The problem, however, is to determine Edit: The documentation of |
As you noticed, it's not a bug but a matter of precision/tolerance. To real problem is that there is no parameter |
This is a bit related to #14268. |
tol
to LinearRegression for sparse matrix lsqr
tol
to LinearRegression for sparse matrix lsqrtol
to LinearRegression for sparse matrix solver lsqr
===BEGIN EDIT===
Description
LinearRegression
should have a parametertol
that is passed to the LSQR routine for solving with sparse matrices. This way, it should be (more or less) equal toRidge(alpha=0, sol="lsqr", tol=..)
===END EDIT===
Description
linear_model.LinearRegression
performs different on sparse matrix than on numpy arrays. The built-in unit testtest_linear_regression_sparse_equal_dense
works well with two features, but not with other feature counts, e.g.n_features=14
. Other combinations ofn_sample
andn_features
lead to even higher discrepancies.There was a similar issue (#13460) in 2019, that was fixed (#13279), and complemented by mentioned unit test.
Steps/Code to Reproduce
Original code from
scikit-learn/sklearn/linear_model/tests/test_base.py
Line 213 in c674e58
Only modification:
n_features = 14
.Expected Results
Coefficients from both regressions should be equal. The test shouldn't throw any error.
Actual Results
Test throws an AssertionError as follows:
Versions
The text was updated successfully, but these errors were encountered: