Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[libc] Honour LIBC_GPU_TEST_JOBS in lit test runs#193797

Open
kaladron wants to merge 1 commit intollvm:mainfrom
kaladron:lit-gpu-jobs
Open

[libc] Honour LIBC_GPU_TEST_JOBS in lit test runs#193797
kaladron wants to merge 1 commit intollvm:mainfrom
kaladron:lit-gpu-jobs

Conversation

@kaladron
Copy link
Copy Markdown
Contributor

Under CTest, LIBC_GPU_TEST_JOBS controlled a ninja job pool that limited concurrent GPU test processes. The AMD GPU buildbot sets this to 4 to avoid overloading the GPU driver.

When running tests via lit, this constraint was lost because lit uses its own -j flag (defaulting to nproc, or set to 64 on the AMD bot via LLVM_LIT_ARGS). All GPU loader processes launched simultaneously, leading to hangs from GPU resource exhaustion.

Propagated LIBC_GPU_TEST_JOBS into the lit site config as a parallelism group so lit throttles GPU test concurrency independently of the global -j setting.

Under CTest, LIBC_GPU_TEST_JOBS controlled a ninja job pool that limited
concurrent GPU test processes. The AMD GPU buildbot sets this to 4 to avoid
overloading the GPU driver.

When running tests via lit, this constraint was lost because lit uses its own
-j flag (defaulting to nproc, or set to 64 on the AMD bot via LLVM_LIT_ARGS).
All GPU loader processes launched simultaneously, leading to hangs from GPU
resource exhaustion.

Propagated LIBC_GPU_TEST_JOBS into the lit site config as a parallelism group
so lit throttles GPU test concurrency independently of the global -j setting.
@kaladron kaladron requested a review from jhuber6 April 23, 2026 17:07
@kaladron kaladron marked this pull request as ready for review April 23, 2026 17:09
@llvmbot llvmbot added the libc label Apr 23, 2026
@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Apr 23, 2026

@llvm/pr-subscribers-libc

Author: Jeff Bailey (kaladron)

Changes

Under CTest, LIBC_GPU_TEST_JOBS controlled a ninja job pool that limited concurrent GPU test processes. The AMD GPU buildbot sets this to 4 to avoid overloading the GPU driver.

When running tests via lit, this constraint was lost because lit uses its own -j flag (defaulting to nproc, or set to 64 on the AMD bot via LLVM_LIT_ARGS). All GPU loader processes launched simultaneously, leading to hangs from GPU resource exhaustion.

Propagated LIBC_GPU_TEST_JOBS into the lit site config as a parallelism group so lit throttles GPU test concurrency independently of the global -j setting.


Full diff: https://github.com/llvm/llvm-project/pull/193797.diff

2 Files Affected:

  • (modified) libc/cmake/modules/prepare_libc_gpu_build.cmake (+1)
  • (modified) libc/test/lit.site.cfg.py.in (+5)
diff --git a/libc/cmake/modules/prepare_libc_gpu_build.cmake b/libc/cmake/modules/prepare_libc_gpu_build.cmake
index c87a1df926c85..554c6c49b0435 100644
--- a/libc/cmake/modules/prepare_libc_gpu_build.cmake
+++ b/libc/cmake/modules/prepare_libc_gpu_build.cmake
@@ -29,6 +29,7 @@ if(LIBC_GPU_TEST_JOBS)
   set_property(GLOBAL PROPERTY JOB_POOLS LIBC_GPU_TEST_POOL=${LIBC_GPU_TEST_JOBS})
   set(LIBC_HERMETIC_TEST_JOB_POOL JOB_POOL LIBC_GPU_TEST_POOL)
 else()
+  set(LIBC_GPU_TEST_JOBS 1)
   set_property(GLOBAL PROPERTY JOB_POOLS LIBC_GPU_TEST_POOL=1)
   set(LIBC_HERMETIC_TEST_JOB_POOL JOB_POOL LIBC_GPU_TEST_POOL)
 endif()
diff --git a/libc/test/lit.site.cfg.py.in b/libc/test/lit.site.cfg.py.in
index 3668a491cd05c..bc8d0e3e31713 100644
--- a/libc/test/lit.site.cfg.py.in
+++ b/libc/test/lit.site.cfg.py.in
@@ -40,3 +40,8 @@ if hasattr(config, "llvm_tools_dir") and config.llvm_tools_dir:
         [config.llvm_tools_dir, config.environment.get("PATH", "")]
     )
 
+# Limit concurrent GPU tests to avoid overloading the GPU driver.
+libc_gpu_test_jobs = "@LIBC_GPU_TEST_JOBS@"
+if libc_gpu_test_jobs:
+    lit_config.parallelism_groups["libc-gpu"] = int(libc_gpu_test_jobs)
+    config.parallelism_group = "libc-gpu"

Copy link
Copy Markdown
Contributor

@michaelrj-google michaelrj-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to avoid blocking lit switchover

Copy link
Copy Markdown
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to reevaluate this, it's a bit better than it was in the past, but still possible to exhaust scratch.

@kaladron
Copy link
Copy Markdown
Contributor Author

I'll need to reevaluate this, it's a bit better than it was in the past, but still possible to exhaust scratch.

My best guess is that what was taking out the AMD fmul test was running 64 GPU tests simultaneously. I think it did remarkably well. =) The NV tests on my machine happily handled 32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants