PERF Use joblib.Parallel backend='threading' for sparse_encode when called with algorithm='lasso_cd' #4896

ogrisel · 2015-06-25T08:21:30Z

Since we released the GIL in most of the Cython coordinate descent solver (#3102), we could now sparse encode in parallel efficiently with threads when using that solver.

Making this change in the code of sparse_encode should be straightforward and the tests should stay the same but accepting a PR for that will require running some benchmarks to check that switching to the threading backend improves memory usage, reduces scheduling overhead and therefore should slightly improve overall sparse encoding speed.

Note: Using threading for the LARS solver might not be efficiently parallelizable with threads. The LARS solver is primarily written in Python / NumPy although we should check as numpy releases the GIL often.

The text was updated successfully, but these errors were encountered:

ogrisel · 2015-06-25T08:22:00Z

@arthurmensch you might be interested in this.

arthurmensch · 2015-06-25T13:25:38Z

Profiling sparse_encode with lasso_cd, using joblib.Parallel and multiprocessing.pool.threadPool, we observe that 80% of computation time is still spent in acquire... It might be due to _sparse_encode which acquire the GIL, maybe we could directly implement a n_jobs option within Lasso

ogrisel · 2015-06-25T17:17:59Z

If this is the case it means that the Cython code is actually not in the critical part. Upon further inspection of some profiling results with %lprun -f Lasso.path sparse_encode(some_data...) it seems that redundant input validation checks are dominating the sparse encoding time. We should move those calls to check_array out of the inner loop by adding a check_input=True optional argument to the Lasso.path method and pass check_input=False when it's called in a sparse_encode context.

ogrisel · 2015-06-25T17:19:27Z

Removing the "Easy" tag, as it's not such an easy issue anymore...

arthurmensch · 2015-07-27T17:11:03Z

On plot_faces_decomposition, I have a 2,5x improvement in dict_learning_online performance using cd and bypassing checks. Using a parallel pool becomes useful for reasonably large number of feature and batch size. More to come

arthurmensch · 2015-10-19T11:26:38Z

This is outdated and should be close

amueller · 2015-10-23T16:07:58Z

@ogrisel close?

rth · 2019-06-20T20:40:58Z

This is outdated and should be closed

Closing.

ogrisel added Easy Well-defined and straightforward way to resolve Enhancement labels Jun 25, 2015

ogrisel removed the Easy Well-defined and straightforward way to resolve label Jun 25, 2015

rth closed this as completed Jun 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF Use joblib.Parallel backend='threading' for sparse_encode when called with algorithm='lasso_cd' #4896

PERF Use joblib.Parallel backend='threading' for sparse_encode when called with algorithm='lasso_cd' #4896

ogrisel commented Jun 25, 2015

ogrisel commented Jun 25, 2015

arthurmensch commented Jun 25, 2015

ogrisel commented Jun 25, 2015

ogrisel commented Jun 25, 2015

arthurmensch commented Jul 27, 2015

arthurmensch commented Oct 19, 2015

amueller commented Oct 23, 2015

rth commented Jun 20, 2019

PERF Use joblib.Parallel backend='threading' for sparse_encode when called with algorithm='lasso_cd' #4896

PERF Use joblib.Parallel backend='threading' for sparse_encode when called with algorithm='lasso_cd' #4896

Comments

ogrisel commented Jun 25, 2015

ogrisel commented Jun 25, 2015

arthurmensch commented Jun 25, 2015

ogrisel commented Jun 25, 2015

ogrisel commented Jun 25, 2015

arthurmensch commented Jul 27, 2015

arthurmensch commented Oct 19, 2015

amueller commented Oct 23, 2015

rth commented Jun 20, 2019