Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Benchmark linear models in higher dimensions #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ogrisel opened this issue Jun 13, 2019 · 4 comments
Closed

Benchmark linear models in higher dimensions #7

ogrisel opened this issue Jun 13, 2019 · 4 comments

Comments

@ogrisel
Copy link

ogrisel commented Jun 13, 2019

The current benchmarks only use 50 features for 1e6 samples. I would argue that this is not a case where won't would use a linear model as it would under-fit and the same test accuracy could probably be reached much faster with 1e3 data points instead of 1e6 yielding a speed up in the order of 1000x.

It would therefore be more interesting to benchmark linear regression, ridge regression and logistic regression in regimes in the order of 1e3 to 1e5 features.

In particular, Ridge regression is likely to be most useful in cases where num_features >> n_samples, otherwise, Linear regression (no penalty) is likely to give the same result.

@oleksandr-pavlyk
Copy link
Contributor

@ogrisel That's your suggestion is to keeping ration n/p on the sample from 10 to 1000 for the purposes of benchmarking?

@oleksandr-pavlyk
Copy link
Contributor

I do not think DAAL allows num_features > n_samples for Ridge regression but I see your point.

@amueller
Copy link

amueller commented Jul 5, 2019

Also, could you compare against Ridge(alpha=1e-9, solver="cholesky", copy_X=False) instead of LinearRegression, which should give the same result but much faster?

@ogrisel
Copy link
Author

ogrisel commented Jul 5, 2019

I do not think DAAL allows num_features > n_samples for Ridge regression but I see your point.

Ok let's close this issue then as there nothing to do on DAAL's side. One may argue that users still want to run linear regression on those regime.

For reference a user also reported a related performance problem in scikit-learn: scikit-learn/scikit-learn#13923

I opened scikit-learn/scikit-learn#14268 and scikit-learn/scikit-learn#14269 on scikit-learn's side.

@ogrisel ogrisel closed this as completed Jul 5, 2019
razdoburdin pushed a commit to razdoburdin/scikit-learn_bench that referenced this issue Jun 13, 2023
adding logistic regression and fixing indents in generated code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants