Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Try running examples in parallel during doc build #29570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lesteve opened this issue Jul 26, 2024 · 2 comments
Open

Try running examples in parallel during doc build #29570

lesteve opened this issue Jul 26, 2024 · 2 comments

Comments

@lesteve
Copy link
Member

lesteve commented Jul 26, 2024

We are already using sphinx-gallery 0.17 which has added the feature to run examples in parallel see sphinx-gallery/sphinx-gallery#877. See sphinx-gallery doc for how to configure it.

matplotlib is currently trying it and it seems to show interesting improvements in their CI see matplotlib/matplotlib#28617 (comment).

I expect that for scikit-learn the speed-up may be a little bit less than for matplotlib since some examples are already using multiple cores (e.g. with n_jobs=2). I had a quick look during the sphinx-gallery PR and it was making the doc a bit quicker locally: sphinx-gallery/sphinx-gallery#877 (comment).

General directions:

  • configure sphinx-gallery to use 2 cores in doc/conf.py
sphinx_gallery_conf = {
    ...
    'parallel': 2,
}
  • open a PR with [doc build] commit to do a full build
  • also generate the doc locally e.g. with spin docs clean + spin docs html and see how much sphinx-gallery parallel settings make a difference
@github-actions github-actions bot added the Needs Triage Issue requires triage label Jul 26, 2024
@lesteve lesteve added Build / CI and removed Needs Triage Issue requires triage labels Jul 26, 2024
@lesteve lesteve changed the title Try running examples in parralel during doc build Try running examples in parallel during doc build Jul 26, 2024
@thomasjpfan
Copy link
Member

thomasjpfan commented Jul 26, 2024

With parallel=12, this seems to hang (probably need to setup OPENBLAS_NUM_THREADS and other environment variables).

From your comment, I suspect there is oversubscription with parallelizing sphinx gallery. With n_jobs=2, it'll set OPENBLAS_NUM_THREADS to joblib.cpu_count() // 2.

Since CicleCI has 2 cores, I suspect we could set OPENBLAS_NUM_THREADS=1 and use 2 cores for sphinx-gallery. This should work for a majority of examples. The only issue would be examples that set n_jobs=-1 or n_jobs>=2, which would oversubscribe.

@lesteve
Copy link
Member Author

lesteve commented Jul 26, 2024

Yep this is a good point to bear in mind, while we are at it setting OMP_NUM_THREADS=1 is probably a good idea as well.

Side-comment: matplotlib use a CircleCI large runner for its doc build (4 cores 8GB RAM, rather than 2 cores 4GB RAM) , so I guess this may be something to look at for scikit-learn but there may be some caveats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants