Add option to use different solver to LinearRegression #14268

ogrisel · 2019-07-05T16:16:53Z

As reported in #13923 the currently used scipy.linalg.lstsq can be significantly slower than Ridge(solver="cholesky", alpha=0) for tall dense X.

We should also call scipy.linalg.lstsq with check_finite=False as we already do input validation in fit.

Also the scipy.linalg.lstsq function has an optional lapack_driver that accepts the following options: 'gelsd', 'gelsy', 'gelss'. The default is'gelsd'. Maybe we should expose the others in our API and benchmark them to see if it would make sense to change the default.

Related issue for Ridge: #14269

The text was updated successfully, but these errors were encountered:

rth · 2019-07-05T16:41:02Z

Can't we use Ridge(solver="cholesky", alpha=0) underneath in LinearRegression, to avoid many similar but not identical implementations? There may have been an issue with alpha=0 exactly though last time I checked.

ogrisel · 2019-07-05T16:46:50Z

I would be in favor of factoring the common code in a private helper function instead of having public estimators call into oneanother.

agramfort · 2019-07-05T16:54:52Z

+1 for avoid called other estimators in a fit function and rely on a plain private function.

rithvikrao · 2020-06-09T22:57:15Z

To summarize, this would involve the following tasks?

changing the scipy.linalg.lstsq call in LinearRegression to include the check_finite=False param
benchmarking lapack_driver choices for scipy.linalg.lstsq (should there also be an optional user param to choose lapack_driver?)
factoring out some Ridge code such that something like Ridge(solver="cholesky", alpha=0) can be used in LinearRegression without explicitly calling another estimator
determining when to use scipy.linalg.lstsq vs. cholesky—would this involve running experiments to find a good heuristic for tall/dense X?

amueller · 2020-06-10T19:07:19Z

hi @rithvikrao yes, I think that sounds right; apart from 1 maybe? Where did you get that from? We already have a global option to control that and in general we want to check for finiteness. Only if there's redundant checks would we want to remove them.

It's not entirely clear how easy 4 is, maybe starting with 3 would be good?

rithvikrao · 2020-06-10T20:08:55Z

Hi @amueller, sounds good, I'll work on 3 first. 1 came from the first comment in this issue - I believe that fit in LinearRegression calls check_array, which is defined in utils/validation.py. This function by default raises an error on np.inf, np.nan, pd.NA in an array passed in. So I think the finiteness check is redundant, but I may be wrong there.

amueller · 2020-06-10T22:25:36Z

@rithvikrao sorry didn't see that from @ogrisel's original comment, you're right we can remove that check, that might even be the easiest first step.

ogrisel · 2022-03-16T16:41:50Z

This is related to #22855.

ogrisel added Enhancement Performance labels Jul 5, 2019

This was referenced Jul 5, 2019

Linear regression unnecessarily slow #13923

Open

Benchmark linear models in higher dimensions IntelPython/scikit-learn_bench#7

Closed

ogrisel mentioned this issue Sep 13, 2019

Better selection of the default solver for Ridge based on the shape of data #14269

Open

rithvikrao mentioned this issue Jun 11, 2020

[MRG] LinearRegression Optimizations #17560

Closed

glemaitre mentioned this issue Jan 19, 2021

Linear Regression not Regressing Correctly #19206

Closed

cmarmo added module:linear_model help wanted labels Mar 10, 2022

jjerphan mentioned this issue Mar 24, 2022

ENH LinearRegression Optimizations #22940

Closed

4 tasks

cmarmo removed the help wanted label Aug 15, 2022

lorentzenchr mentioned this issue Oct 24, 2022

Add tolerance tol to LinearRegression for sparse matrix solver lsqr #24601

Open

glemaitre added this to Losses and solvers May 17, 2024

glemaitre moved this to Discussion in Losses and solvers May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to use different solver to LinearRegression #14268

Add option to use different solver to LinearRegression #14268

ogrisel commented Jul 5, 2019 •

edited

Loading

rth commented Jul 5, 2019

ogrisel commented Jul 5, 2019

agramfort commented Jul 5, 2019 via email

rithvikrao commented Jun 9, 2020

amueller commented Jun 10, 2020

rithvikrao commented Jun 10, 2020 •

edited

Loading

amueller commented Jun 10, 2020

ogrisel commented Mar 16, 2022

Add option to use different solver to LinearRegression #14268

Add option to use different solver to LinearRegression #14268

Comments

ogrisel commented Jul 5, 2019 • edited Loading

rth commented Jul 5, 2019

ogrisel commented Jul 5, 2019

agramfort commented Jul 5, 2019 via email

rithvikrao commented Jun 9, 2020

amueller commented Jun 10, 2020

rithvikrao commented Jun 10, 2020 • edited Loading

amueller commented Jun 10, 2020

ogrisel commented Mar 16, 2022

ogrisel commented Jul 5, 2019 •

edited

Loading

rithvikrao commented Jun 10, 2020 •

edited

Loading