Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] MNT Cblas to scipy cython blas in liblinear and remove bundled cblas #13203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jeremiedbb
Copy link
Member

Fixes #11638
This PR includes #13084 which needs to be merged first for easier reviews.

After #13084, the only remaining calls to cblas are from c++ code in liblinear, which is tricky because we want to use cython functions from scipy. There are 2 solutions to that AFIK:

  • The first one, implemented in this PR, is to pass pointers to the cython blas functions (grouped in a struct) in each function call down to where blas is called (i.e. in tron.cpp). It's fine because there aren't too many.

  • The other one would be to have such a struct as a global variable initialized when liblinear is imported, in order to avoid passing it through many functions.

I'd like to have some opinions on which solution would be preferable.

All this finally allows to remove the whole bundled Cblas, resulting in a -12000 diff !!

ping @ogrisel

@ogrisel
Copy link
Member

ogrisel commented Feb 20, 2019

I think I'm fine with both options. The one that is implemented in this PR has the advantage of being already implemented ;)

I would be curious as to what other people think.

Anyway I am really glad that we can finally get rid of the embedded cblas source code 🎆

@ogrisel
Copy link
Member

ogrisel commented Feb 20, 2019

BTW, I merged #13084 so you can rebase this on top of master.

@jnothman
Copy link
Member

jnothman commented Feb 21, 2019 via email

@jnothman
Copy link
Member

get_blas_info seems to be failing on one Travis instance :(

@ogrisel
Copy link
Member

ogrisel commented Feb 21, 2019

I also just noticed the following slowdown:

========================== slowest 20 test durations ===========================
25.10s call     metrics/tests/test_pairwise.py::test_pairwise_distances_data_derived_params[Y is not X-pairwise_distances_chunked-seuclidean-2]
25.00s call     metrics/tests/test_pairwise.py::test_pairwise_distances_data_derived_params[Y is not X-pairwise_distances_chunked-mahalanobis-2]
24.57s call     metrics/tests/test_pairwise.py::test_pairwise_distances_data_derived_params[Y is X-pairwise_distances_chunked-mahalanobis-2]
24.41s call     metrics/tests/test_pairwise.py::test_pairwise_distances_data_derived_params[Y is X-pairwise_distances_chunked-seuclidean-2]

I am pretty sure that those tests were running much faster before. However it was already there in some builds on travis that dates back to more than 24h ago, that is prior to merging #13084.

@ogrisel
Copy link
Member

ogrisel commented Feb 21, 2019

BTW could you please run a quick benchmark to check that the new TRON solver code that uses function pointers to call BLAS is not significantly slower than calling the embedded ATLAS cblas routines directly.

For instance, you can try with LinearSVC or LogisticRegression on a make_classification problem large enough to last a couple of seconds.

@jeremiedbb
Copy link
Member Author

jeremiedbb commented Feb 21, 2019

I also just noticed the following slowdown:

========================== slowest 20 test durations ===========================
25.10s call metrics/tests/test_pairwise.py::test_pairwise_distances_data_derived_params[Y is not X-pairwise_distances_chunked-seuclidean-2]
25.00s call metrics/tests/test_pairwise.py::test_pairwise_distances_data_derived_params[Y is not X-pairwise_distances_chunked-mahalanobis-2]
24.57s call metrics/tests/test_pairwise.py::test_pairwise_distances_data_derived_params[Y is X-pairwise_distances_chunked-mahalanobis-2]
24.41s call metrics/tests/test_pairwise.py::test_pairwise_distances_data_derived_params[Y is X-pairwise_distances_chunked-seuclidean-2]

I don't see how it would be related to cblas, since for those metrics we directly call scipy cdist/pdist.
However, there's definitely something going on. First, the slowdown only appear on the distrib=ubuntu job on travis. Then, it only affects pairwise_distances_chunked (and not pairwise_distances) when n_jobs=2. I think this requires a dedicated issue (opened #13208).

I am pretty sure that those tests were running much faster before. However it was already there in some builds on travis that dates back to more than 24h ago, that is prior to merging #13084.

I just looked at travis on master and it's there since the test was added.

@jeremiedbb jeremiedbb force-pushed the cblas-to-scipy-cython-blas-liblinear branch from 8443070 to b014885 Compare February 21, 2019 12:56
@jeremiedbb
Copy link
Member Author

quick question:
should _cython_blas_helpers.h go into utils, utils/src or svm/src/liblinear (where it's only used) ?

@ogrisel
Copy link
Member

ogrisel commented Feb 21, 2019

I just looked at travis on master and it's there since the test was added.

Alright. I thought I had run the full test suite on scikit-learn master locally yesterday and I did not recall those this is why I was misled. This should be ignored for this PR. Those tests should be sped up but this is unrelated.

@ogrisel
Copy link
Member

ogrisel commented Feb 21, 2019

should _cython_blas_helpers.h go into utils, utils/src or svm/src/liblinear (where it's only used) ?

I would vote for svm/src/liblinear for now. If we ever need to reuse it somewhere else we will move it to utils/src while making the BLAS covered API exposed in this .h file more exhaustive (that is to cover the full BLAS API which is not needed now).

@ogrisel
Copy link
Member

ogrisel commented Feb 21, 2019

Weird failure in circle ci:

generating gallery for auto_examples/linear_model... [ 93%] plot_lasso_model_selection.py
generating gallery for auto_examples/linear_model... [ 96%] plot_sparse_logistic_regression_20newsgroups.py
generating gallery for auto_examples/linear_model... [100%] plot_sgd_early_stopping.py
Killed
Makefile:43: recipe for target 'html' failed
make: *** [html] Error 137
Exited with code 2

probably unrelated but annoying. I cannot find any other issue that references this failure.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for after a quick benchmark to check that the new code does not introduce any significant performance regression.

@jeremiedbb
Copy link
Member Author

Here are the result of the small benchmark you propose.

from sklearn.datasets import make_classification                                                                                    
from sklearn.linear_model import LogisticRegression                                                                                 

X, y = make_classification(n_samples=100000, n_features=100, n_informative=50, n_classes=5)                                         
lr = LogisticRegression(solver='liblinear', random_state=0, multi_class='auto')                                                     
%timeit lr.fit(X,y)     

First, on a machine with no blas installed, forcing the use of bundled cblas, on master:
23.5 s ± 110 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
and on this branch, hence using blas shipped with scipy (pip installed):
22.6 s ± 133 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So clearly no regression here :)

Then, on a machine with mkl installed, on master:
23.4 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
and on this branch (scipy uses the installed mkl):
24 s ± 216 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
There might be a small overhead in using scipy cython bindings instead of direct blas call but it seems quite negligible.

@jeremiedbb jeremiedbb changed the title [WIP] MNT Cblas to scipy cython blas in liblinear and remove bundled cblas [MRG] MNT Cblas to scipy cython blas in liblinear and remove bundled cblas Feb 21, 2019
@jeremiedbb
Copy link
Member Author

I switched to [MRG] but there's still the circle ci failure which I don't understand. Does not seem to appear on master...

@jeremiedbb
Copy link
Member Author

green \o/

@ogrisel ogrisel merged commit 1ffd2d3 into scikit-learn:master Feb 21, 2019
@ogrisel
Copy link
Member

ogrisel commented Feb 21, 2019

Merged! Thank you very much for this work @jeremiedbb!

@ogrisel
Copy link
Member

ogrisel commented Feb 21, 2019

Hum, circle CI has failed with the same error on master: https://circleci.com/gh/scikit-learn/scikit-learn/47352

So this is probably caused by the change in this PR unfortunately. Maybe this is is causing some extra memory usage when running the linear model examples?

@jeremiedbb
Copy link
Member Author

I don't think so. The same failure appeared on master earlier today:
https://circleci.com/gh/scikit-learn/scikit-learn/47309

@ogrisel
Copy link
Member

ogrisel commented Feb 21, 2019

Alright then.

@jakirkham
Copy link
Contributor

Thanks so much for tackling this @jeremiedbb ! 😄

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
jackmitch pushed a commit to jackmitch/scikit-learn that referenced this pull request Jul 2, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
@jeremiedbb jeremiedbb deleted the cblas-to-scipy-cython-blas-liblinear branch July 20, 2020 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants