Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[github] pre-cache llvm-project in Linux CI Docker images #133137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gburgessiv
Copy link
Member

Meta: I'm not familiar with the actual deployment process here, but I'd ideally like to be able to test this PR prior to submitting it. Not sure whether that'll be worth the effort compared to landing it (or splitting it, waiting for the new Docker image to appear, then messing with the premerge.yaml part of this) though?

Any guidance on how to test any part of this out-of-production is appreciated. :)

=====

Linux CI bots take a bit over a minute to fully set their git repos up, always starting from scratch. This can end up being a significant percentage of their overall runtime.

If llvm-project is built into the images & used as a cache, we could potentially speed that up quite a bit. Try it out and see.

If this is successful, should be easy to port to Windows CI, too. I'm preferring to start small though.

@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2025

@llvm/pr-subscribers-github-workflow

Author: George Burgess IV (gburgessiv)

Changes

Meta: I'm not familiar with the actual deployment process here, but I'd ideally like to be able to test this PR prior to submitting it. Not sure whether that'll be worth the effort compared to landing it (or splitting it, waiting for the new Docker image to appear, then messing with the premerge.yaml part of this) though?

Any guidance on how to test any part of this out-of-production is appreciated. :)

=====

Linux CI bots take a bit over a minute to fully set their git repos up, always starting from scratch. This can end up being a significant percentage of their overall runtime.

If llvm-project is built into the images & used as a cache, we could potentially speed that up quite a bit. Try it out and see.

If this is successful, should be easy to port to Windows CI, too. I'm preferring to start small though.


Full diff: https://github.com/llvm/llvm-project/pull/133137.diff

2 Files Affected:

  • (modified) .github/workflows/containers/github-action-ci/Dockerfile (+11)
  • (modified) .github/workflows/premerge.yaml (+6-1)
diff --git a/.github/workflows/containers/github-action-ci/Dockerfile b/.github/workflows/containers/github-action-ci/Dockerfile
index bd3720017b7f7..8774c3f8b384f 100644
--- a/.github/workflows/containers/github-action-ci/Dockerfile
+++ b/.github/workflows/containers/github-action-ci/Dockerfile
@@ -103,3 +103,14 @@ RUN mkdir actions-runner && \
     tar xzf ./actions-runner-linux-x64-$GITHUB_RUNNER_VERSION.tar.gz && \
     rm ./actions-runner-linux-x64-$GITHUB_RUNNER_VERSION.tar.gz
 
+# Pre-cache llvm-project in these images; this works with the premerge action
+# to speed up runs; cloning LLVM fresh takes >1min otherwise.
+#
+# A local experiment showed an explicit `git gc --aggressive` reduced this
+# layer's size by 20% (~800MB). Disable GC'ing afterward, so we don't waste time
+# trying to do that in CI.
+RUN \
+  git clone https://github.com/llvm/llvm-project --tags && \
+  cd llvm-project && \
+  git gc --aggressive && \
+  git config gc.auto 0
diff --git a/.github/workflows/premerge.yaml b/.github/workflows/premerge.yaml
index c488421d37450..951c3273d3f4f 100644
--- a/.github/workflows/premerge.yaml
+++ b/.github/workflows/premerge.yaml
@@ -26,6 +26,11 @@ concurrency:
 jobs:
   premerge-checks-linux:
     name: Linux Premerge Checks (Test Only - Please Ignore Results)
+    defaults:
+      run:
+        # The premerge Linux docker instance keeps a pristine clone of LLVM in
+        # /home/gha/llvm-project; that should be used as our main workdir.
+        working-directory: /home/gha/llvm-project
     if: >-
         github.repository_owner == 'llvm' &&
         (github.event_name != 'pull_request' || github.event.action != 'closed')
@@ -34,7 +39,7 @@ jobs:
       - name: Checkout LLVM
         uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
         with:
-          fetch-depth: 2
+          path: /home/gha/llvm-project
       - name: Setup ccache
         uses: hendrikmuhs/ccache-action@a1209f81afb8c005c13b4296c32e363431bffea5 # v1.2.17
         with:

@tstellar tstellar self-requested a review March 26, 2025 18:20
Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta: I'm not familiar with the actual deployment process here, but I'd ideally like to be able to test this PR prior to submitting it. Not sure whether that'll be worth the effort compared to landing it (or splitting it, waiting for the new Docker image to appear, then messing with the premerge.yaml part of this) though?

This definitely needs to be split. The deployment isn't really atomic with the new container needing a little bit to build and then roll out. So if you land the change in premerge.yaml at the same time, you're likely to end up with a bunch of build failures assuming the patch has a hard dependency. In this case it looks like it might be fine given it should just clone to the newly specified directory, but keeping them separate is probably best practice.

In order to test it, you need to build the image locally, push it to a container registry somewhere (GHCR on your personal github account I think should work), and then point premerge.yaml at that container, which should let you test it.

# to speed up runs; cloning LLVM fresh takes >1min otherwise.
#
# A local experiment showed an explicit `git gc --aggressive` reduced this
# layer's size by 20% (~800MB). Disable GC'ing afterward, so we don't waste time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the layer size here 800MB or was the savings 800MB? I'm assuming the former, but might help to make that a bit more clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Savings was 800MB; this layer is 3.29GB on my most recent build. Fixed the wording up as requested - PTAL

@tstellar
Copy link
Collaborator

How often would we update the git cache? Building the container is very expensive, I don't think we want to rebuild the llvm toolchain every-time the git cache is updated.

@boomanaiden154
Copy link
Contributor

How often would we update the git cache?

I was thinking about as often as we currently build the container (whenever patches land). Running a git fetch even on a checkout that is a month or two old only takes something like ten seconds from my experience, and should always be much quicker than doing the full (shallow) checkout that we do currently.

If it becomes an issue and we want to update it more frequently, it should be trivial to split building the main container from building the agent container. We could then build the agent container on the free Github hosted runners weekly or something depending upon what ends up being best.

@gburgessiv gburgessiv force-pushed the github-pre-cache-llvm-project-in-Linux-CI-Docker-images branch from 5c95234 to 141d5fb Compare May 2, 2025 15:29
@gburgessiv
Copy link
Member Author

Hey, apologies for the latency here. It was unintentional, but I got signal on the implicit "what happens if this gets to be a month old" question - posted stats from experimenting here: #133359 (comment). tl;dr from quick experimentation is that syncing goes from ~75s to ~15s when the container has a fresh checkout. When the checkout is 5wks old, syncing took closer to 30-35s.

This definitely needs to be split

Done; if we're cool with this PR, I'll send #133359 for review once CI checks on it pass.

Linux CI bots take a bit over 70 seconds to fully set their git repos
up, always starting from scratch. This can end up being a significant
percentage of their overall runtime, depending on the change they're
vetting.

Having a fresh llvm-project in the docker image can bring this down to
about 15 seconds; if the llvm-project checkout is 5 weeks old, the
number seems to be closer to 30-35 seconds.

In any case, that's a meaningful savings per CI bot run, so include a
pinned LLVM here.
@gburgessiv gburgessiv force-pushed the github-pre-cache-llvm-project-in-Linux-CI-Docker-images branch from 141d5fb to f53f0ad Compare May 2, 2025 15:31
Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to at least be split into a separate build stage in the container so that it can be done in a separate job.

It's probably easiest to create a new Dockerfile that pulls the agent image (might be a good idea to pull the agent bits out too), adds the cloned repo, and then uploads again. We don't want to have to rebuild the whole toolchain on multiple arches everyimt we want to refresh the repo, which we probably want to do with a cron job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants