[github] pre-cache llvm-project in Linux CI Docker images #133137

gburgessiv · 2025-03-26T18:10:17Z

Meta: I'm not familiar with the actual deployment process here, but I'd ideally like to be able to test this PR prior to submitting it. Not sure whether that'll be worth the effort compared to landing it (or splitting it, waiting for the new Docker image to appear, then messing with the premerge.yaml part of this) though?

Any guidance on how to test any part of this out-of-production is appreciated. :)

=====

Linux CI bots take a bit over a minute to fully set their git repos up, always starting from scratch. This can end up being a significant percentage of their overall runtime.

If llvm-project is built into the images & used as a cache, we could potentially speed that up quite a bit. Try it out and see.

If this is successful, should be easy to port to Windows CI, too. I'm preferring to start small though.

llvmbot · 2025-03-26T18:10:58Z

@llvm/pr-subscribers-github-workflow

Author: George Burgess IV (gburgessiv)

Changes

Meta: I'm not familiar with the actual deployment process here, but I'd ideally like to be able to test this PR prior to submitting it. Not sure whether that'll be worth the effort compared to landing it (or splitting it, waiting for the new Docker image to appear, then messing with the premerge.yaml part of this) though?

Any guidance on how to test any part of this out-of-production is appreciated. :)

=====

Linux CI bots take a bit over a minute to fully set their git repos up, always starting from scratch. This can end up being a significant percentage of their overall runtime.

If llvm-project is built into the images & used as a cache, we could potentially speed that up quite a bit. Try it out and see.

If this is successful, should be easy to port to Windows CI, too. I'm preferring to start small though.

Full diff: https://github.com/llvm/llvm-project/pull/133137.diff

2 Files Affected:

(modified) .github/workflows/containers/github-action-ci/Dockerfile (+11)
(modified) .github/workflows/premerge.yaml (+6-1)

diff --git a/.github/workflows/containers/github-action-ci/Dockerfile b/.github/workflows/containers/github-action-ci/Dockerfile
index bd3720017b7f7..8774c3f8b384f 100644
--- a/.github/workflows/containers/github-action-ci/Dockerfile
+++ b/.github/workflows/containers/github-action-ci/Dockerfile
@@ -103,3 +103,14 @@ RUN mkdir actions-runner && \
     tar xzf ./actions-runner-linux-x64-$GITHUB_RUNNER_VERSION.tar.gz && \
     rm ./actions-runner-linux-x64-$GITHUB_RUNNER_VERSION.tar.gz
 
+# Pre-cache llvm-project in these images; this works with the premerge action
+# to speed up runs; cloning LLVM fresh takes >1min otherwise.
+#
+# A local experiment showed an explicit `git gc --aggressive` reduced this
+# layer's size by 20% (~800MB). Disable GC'ing afterward, so we don't waste time
+# trying to do that in CI.
+RUN \
+  git clone https://github.com/llvm/llvm-project --tags && \
+  cd llvm-project && \
+  git gc --aggressive && \
+  git config gc.auto 0
diff --git a/.github/workflows/premerge.yaml b/.github/workflows/premerge.yaml
index c488421d37450..951c3273d3f4f 100644
--- a/.github/workflows/premerge.yaml
+++ b/.github/workflows/premerge.yaml
@@ -26,6 +26,11 @@ concurrency:
 jobs:
   premerge-checks-linux:
     name: Linux Premerge Checks (Test Only - Please Ignore Results)
+    defaults:
+      run:
+        # The premerge Linux docker instance keeps a pristine clone of LLVM in
+        # /home/gha/llvm-project; that should be used as our main workdir.
+        working-directory: /home/gha/llvm-project
     if: >-
         github.repository_owner == 'llvm' &&
         (github.event_name != 'pull_request' || github.event.action != 'closed')
@@ -34,7 +39,7 @@ jobs:
       - name: Checkout LLVM
         uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
         with:
-          fetch-depth: 2
+          path: /home/gha/llvm-project
       - name: Setup ccache
         uses: hendrikmuhs/ccache-action@a1209f81afb8c005c13b4296c32e363431bffea5 # v1.2.17
         with:

boomanaiden154

Meta: I'm not familiar with the actual deployment process here, but I'd ideally like to be able to test this PR prior to submitting it. Not sure whether that'll be worth the effort compared to landing it (or splitting it, waiting for the new Docker image to appear, then messing with the premerge.yaml part of this) though?

This definitely needs to be split. The deployment isn't really atomic with the new container needing a little bit to build and then roll out. So if you land the change in premerge.yaml at the same time, you're likely to end up with a bunch of build failures assuming the patch has a hard dependency. In this case it looks like it might be fine given it should just clone to the newly specified directory, but keeping them separate is probably best practice.

In order to test it, you need to build the image locally, push it to a container registry somewhere (GHCR on your personal github account I think should work), and then point premerge.yaml at that container, which should let you test it.

boomanaiden154 · 2025-03-26T19:19:16Z

.github/workflows/containers/github-action-ci/Dockerfile

+# to speed up runs; cloning LLVM fresh takes >1min otherwise.
+#
+# A local experiment showed an explicit `git gc --aggressive` reduced this
+# layer's size by 20% (~800MB). Disable GC'ing afterward, so we don't waste time


Is the layer size here 800MB or was the savings 800MB? I'm assuming the former, but might help to make that a bit more clear.

Good point. Savings was 800MB; this layer is 3.29GB on my most recent build. Fixed the wording up as requested - PTAL

tstellar · 2025-03-26T19:26:47Z

How often would we update the git cache? Building the container is very expensive, I don't think we want to rebuild the llvm toolchain every-time the git cache is updated.

boomanaiden154 · 2025-03-26T19:31:59Z

How often would we update the git cache?

I was thinking about as often as we currently build the container (whenever patches land). Running a git fetch even on a checkout that is a month or two old only takes something like ten seconds from my experience, and should always be much quicker than doing the full (shallow) checkout that we do currently.

If it becomes an issue and we want to update it more frequently, it should be trivial to split building the main container from building the agent container. We could then build the agent container on the free Github hosted runners weekly or something depending upon what ends up being best.

gburgessiv · 2025-05-02T15:29:15Z

Hey, apologies for the latency here. It was unintentional, but I got signal on the implicit "what happens if this gets to be a month old" question - posted stats from experimenting here: #133359 (comment). tl;dr from quick experimentation is that syncing goes from ~75s to ~15s when the container has a fresh checkout. When the checkout is 5wks old, syncing took closer to 30-35s.

This definitely needs to be split

Done; if we're cool with this PR, I'll send #133359 for review once CI checks on it pass.

Linux CI bots take a bit over 70 seconds to fully set their git repos up, always starting from scratch. This can end up being a significant percentage of their overall runtime, depending on the change they're vetting. Having a fresh llvm-project in the docker image can bring this down to about 15 seconds; if the llvm-project checkout is 5 weeks old, the number seems to be closer to 30-35 seconds. In any case, that's a meaningful savings per CI bot run, so include a pinned LLVM here.

boomanaiden154

This needs to at least be split into a separate build stage in the container so that it can be done in a separate job.

It's probably easiest to create a new Dockerfile that pulls the agent image (might be a good idea to pull the agent bits out too), adds the cloned repo, and then uploads again. We don't want to have to rebuild the whole toolchain on multiple arches everyimt we want to refresh the repo, which we probably want to do with a cron job.

gburgessiv requested a review from boomanaiden154 March 26, 2025 18:10

llvmbot added the github:workflow label Mar 26, 2025

tstellar self-requested a review March 26, 2025 18:20

boomanaiden154 reviewed Mar 26, 2025

View reviewed changes

gburgessiv mentioned this pull request Mar 28, 2025

Point workflow yaml at new docker image #133359

Draft

gburgessiv force-pushed the github-pre-cache-llvm-project-in-Linux-CI-Docker-images branch from 5c95234 to 141d5fb Compare May 2, 2025 15:29

gburgessiv force-pushed the github-pre-cache-llvm-project-in-Linux-CI-Docker-images branch from 141d5fb to f53f0ad Compare May 2, 2025 15:31

boomanaiden154 requested changes May 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[github] pre-cache llvm-project in Linux CI Docker images #133137

[github] pre-cache llvm-project in Linux CI Docker images #133137

Uh oh!

gburgessiv commented Mar 26, 2025

Uh oh!

llvmbot commented Mar 26, 2025

Uh oh!

boomanaiden154 left a comment

Uh oh!

boomanaiden154 Mar 26, 2025

Uh oh!

gburgessiv May 2, 2025

Uh oh!

tstellar commented Mar 26, 2025

Uh oh!

boomanaiden154 commented Mar 26, 2025

Uh oh!

gburgessiv commented May 2, 2025

Uh oh!

boomanaiden154 left a comment

Uh oh!

Uh oh!

[github] pre-cache llvm-project in Linux CI Docker images #133137

Are you sure you want to change the base?

[github] pre-cache llvm-project in Linux CI Docker images #133137

Uh oh!

Conversation

gburgessiv commented Mar 26, 2025

Uh oh!

llvmbot commented Mar 26, 2025

Uh oh!

boomanaiden154 left a comment

Choose a reason for hiding this comment

Uh oh!

boomanaiden154 Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

gburgessiv May 2, 2025

Choose a reason for hiding this comment

Uh oh!

tstellar commented Mar 26, 2025

Uh oh!

boomanaiden154 commented Mar 26, 2025

Uh oh!

gburgessiv commented May 2, 2025

Uh oh!

boomanaiden154 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!