feat: use git2 and improve HTTP caching #1257

woodruffw · 2025-10-15T22:40:58Z

Signed-off-by: William Woodruff <[email protected]>

woodruffw · 2025-10-15T23:58:41Z

crates/zizmor/src/github_api.rs

+        .with(ChainedCache(
+            Cache(HttpCache {
+                mode: CacheMode::Default,
+                manager: CACacheManager {
+                    path: cache_dir.into(),
+                    remove_opts: Default::default(),
+                },
+                options: http_cache_options.clone(),
+            }),
+            CacheType::File,
+        ))
+        .with(ChainedCache(
+            Cache(HttpCache {
+                mode: CacheMode::ForceCache,
+                manager: MokaManager::new(MokaCache::new(1000)),
+                options: http_cache_options,
+            }),
+            CacheType::Memory,
+        ))


@Bo98 just making sure I understand the intended behavior here -- is the idea here that we'll hit the in-memory cache first, and then fall back to the file cache if the former misses? In other words, we'll end up priming the memory cache with the contents of the file cache, if it has a hit?

It'll read from the file cache first. It took me a while to understand how this chaining works but basically each middleware is responsible for firing the next one and then it pops up from there.

File cache HIT

Logging middleware (always does next.run) -> File cache (no next.run on a hit) -> Logging post-response

File cache MISS/Memory cache HIT

Logging -> File cache -> Memory -> File cache post-response (will save to cache if policy allows) -> Logging post-response

(In practice, this scenario is for things the file cache rejected anyway)

Both cache MISS

Logging -> File Cache -> Memory -> Default (i.e. fetch remote) -> Memory post-response (save) -> File post-response (save) -> Logging post-response

I initially thought the order was important but it probably isn't so you can probably adjust the order around if you want. Ultimately speed isn't really what matters here - it's mainly so that we avoid make the same request twice in a single run regardless of what Cache-Control says but still respect it across different runs.

Thanks, that's helpful!

Ah I think I remember one thing about the order: for non-200 codes I found the default cache policy (i.e. the file cache) reported cache hits while still querying the remote: https://github.com/06chaynes/http-cache/blob/12dde469fa2519d0fbb13b826db0b9a6706aafd5/http-cache/src/lib.rs#L1709. It's technically correct but was confusing when trying to debug the API.

So behaviour wise in terms of # of API calls it won't (or shouldn't) make a difference, but changing the order might make the debug logging more confusing given a Hit can still do a remote call for the file cache. The logging always takes the bottom of the stack so any form of pass back to the memory cache will ignore what the file cache attempts to claim.

woodruffw · 2025-10-16T00:44:46Z

Noting for myself, this is a baseline from main via #1258:

  Time (mean ± σ):     84.253 s ±  2.481 s    [User: 1.362 s, System: 1.608 s]
  Range (min … max):   80.210 s … 87.656 s    10 runs

woodruffw · 2025-10-16T01:11:19Z

The initial results here are super promising, although the SD/range is huge:

  Time (mean ± σ):     24.903 s ± 19.422 s    [User: 0.616 s, System: 0.843 s]
  Range (min … max):    9.224 s … 48.308 s    10 runs

However, even at the worst end, that's a lot better than main.

(This is with gha-hazmat, which is pretty much the worst possible case in terms of API lookups due to impostor commits.)

Similar results on re-run:

  Time (mean ± σ):     25.593 s ± 19.844 s    [User: 0.622 s, System: 0.728 s]
  Range (min … max):    9.296 s … 49.699 s    10 runs

Interestingly, the same initial outlier behavior occurs:

  Warning: The first benchmarking run for this command was significantly slower than the rest (48.377 s). This could be caused by (filesystem) caches that were not filled until after the first run. You are already using the '--warmup' option which helps to fill these caches before the actual benchmark. You can either try to increase the warmup count further or re-run this benchmark on a quiet system in case it was a random outlier. Alternatively, consider using the '--prepare' option to clear the caches before each timing run.

I don't understand why that is, since we're post-warmup.

woodruffw · 2025-10-16T01:15:52Z

The benchmark for this PR is also slightly pessimized, since I needed to change gha-hazmat to remove a nonexistent reusable workflow (which now causes a 404 that was previously masked as an empty set). That's technically a breaking change but it's a contractually correct one (zizmor doesn't intentionally let 404s slide, that one was an oversight.)

Signed-off-by: William Woodruff <[email protected]>

Bo98 · 2025-10-16T02:05:07Z

Most GitHub API calls are 60 second expiries. I wonder if that is expiring just as the warmup ends. Might be interesting to see what the results are with the file cache commented out.

Other than that I don't really have much of an idea. git2 code does have a cache but it should be all in-memory.

woodruffw · 2025-10-16T02:36:47Z

Most GitHub API calls are 60 second expiries. I wonder if that is expiring just as the warmup ends. Might be interesting to see what the results are with the file cache commented out.

Yeah, that's a good theory. I guess that would also explain the large SD in the warm benchmarks -- the cache gets expired somewhat unevenly across the runs.

This is probably from before 1.0. The finding conveys everything relevant here. Signed-off-by: William Woodruff <[email protected]>

Signed-off-by: William Woodruff <[email protected]>

woodruffw · 2025-10-16T03:15:01Z

crates/zizmor/src/github_api.rs

+                let annotated_tags: HashSet<_> = tags
+                    .iter()
+                    .filter_map(|tag| tag.name.strip_suffix("^{}").map(|n| n.to_string()))
+                    .collect();
+                tags.retain_mut(|tag| {
+                    if let Some(stripped_name) = tag.name.strip_suffix("^{}") {
+                        tag.name = stripped_name.to_string();
+                        true
+                    } else {
+                        !annotated_tags.contains(&tag.name)
+                    }
+                });


I haven't fully thought through this yet, but I think there might be an edge case here -- tags can point to other tags (not just commits), so to fully flatten a tag to its underlying commit we probably need to recurse here.

(OTOH, we have tests for ref-version-mismatch that test this exact situation and they're passing, so I guess this works as is? I'm tired.)

Oh, I'm realizing that test might have changed underneath us, and is no longer the repo's state. I'm going to contrive a test for this.

Yeah that test seem to be a regular annotated tag:

$ git show-ref -d v2.7.8 aa7c1c80a07a27a84c0aa76d0cef0aad3830e330 refs/tags/v2.7.8 9d47c6ad4b02e050fd481d890b2ea34778fd09d6 refs/tags/v2.7.8^{}

Which is still a tag pointing to tag as described so still a good/correct test to have (and is the scenario already covered here), but I guess you're talking about a theoretical tag to a tag to a tag.

If a third level is possible I'd be interested to see how git ls-remote behaves.

Yeah, I was thinking about the fully general case. But it looks like git ls-remove does handle this for us:

https://github.com/woodruffw-experiments/zizmor-recursive-tags

has:

v1.0 (annotated) -> v1 (annotated) -> v1.0.0 (annotated) -> 3fdd4fc

which shows up in ls-remote as:

From [email protected]:woodruffw-experiments/zizmor-recursive-tags.git 3fdd4fca8fc76b254cefefca92381c41b28d1f0d HEAD 3fdd4fca8fc76b254cefefca92381c41b28d1f0d refs/heads/main 1accca34bff60347d96faaf713d328ca1250d37b refs/tags/v1 3fdd4fca8fc76b254cefefca92381c41b28d1f0d refs/tags/v1^{} bcb36f3d551340e11b88c376e74e8ae77fc6cf0b refs/tags/v1.0 3fdd4fca8fc76b254cefefca92381c41b28d1f0d refs/tags/v1.0^{} 06f9d47abf340b709b412900a7b3ce33557d32b5 refs/tags/v1.0.0 3fdd4fca8fc76b254cefefca92381c41b28d1f0d refs/tags/v1.0.0^{}

So the ^{} entry does always fully dereference the commit object, which is nice!

Unrelated to these changed, just removes a pedantic finding. Signed-off-by: William Woodruff <[email protected]>

Signed-off-by: William Woodruff <[email protected]>

github-actions · 2025-10-17T01:25:32Z

Bencher Report

Branch	ww/git2
Testbed	ubuntu-latest

⚠️ WARNING: No Threshold found!
Without a Threshold, no Alerts will ever be generated.
Latency (nanoseconds (ns))
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

Click to view all benchmark results

Benchmark	Latency	milliseconds (ms)
zizmor::cpython-48f88310044c	📈 view plot ⚠️ NO THRESHOLD	152.27 ms

🐰 View full continuous benchmarking report in Bencher

Signed-off-by: William Woodruff <[email protected]>

Bo98 added 2 commits October 15, 2025 21:17

feat: chainable cache

01321a1

feat: use git protocol for branch and tag information

08089d8

woodruffw self-assigned this Oct 15, 2025

woodruffw added the performance label Oct 15, 2025

woodruffw temporarily deployed to bencher October 15, 2025 22:41 — with GitHub Actions Inactive

add a semver helper API

345156f

Signed-off-by: William Woodruff <[email protected]>

woodruffw temporarily deployed to bencher October 15, 2025 23:49 — with GitHub Actions Inactive

woodruffw commented Oct 15, 2025

View reviewed changes

Merge branch 'main' into ww/git2

b2f5c82

bump gha-hazmat online bench

5083f41

Signed-off-by: William Woodruff <[email protected]>

woodruffw temporarily deployed to bencher October 16, 2025 01:24 — with GitHub Actions Inactive

woodruffw added this to the 1.16.0 milestone Oct 16, 2025

remove old warning

f8ebc41

This is probably from before 1.0. The finding conveys everything relevant here. Signed-off-by: William Woodruff <[email protected]>

woodruffw temporarily deployed to bencher October 16, 2025 02:39 — with GitHub Actions Inactive

add a link

d3cf1f8

Signed-off-by: William Woodruff <[email protected]>

woodruffw temporarily deployed to bencher October 16, 2025 03:03 — with GitHub Actions Inactive

woodruffw commented Oct 16, 2025

View reviewed changes

bump snapshot

4108152

Unrelated to these changed, just removes a pedantic finding. Signed-off-by: William Woodruff <[email protected]>

woodruffw temporarily deployed to bencher October 16, 2025 03:17 — with GitHub Actions Inactive

add an explicit nested tag testcase

fdbbf24

Signed-off-by: William Woodruff <[email protected]>

woodruffw temporarily deployed to bencher October 16, 2025 04:00 — with GitHub Actions Inactive

Merge branch 'main' into ww/git2

4e437bb

woodruffw temporarily deployed to bencher October 17, 2025 01:19 — with GitHub Actions Inactive

record changes

76a835f

Signed-off-by: William Woodruff <[email protected]>

language

f9faf11

Signed-off-by: William Woodruff <[email protected]>

woodruffw temporarily deployed to bencher October 17, 2025 01:48 — with GitHub Actions Inactive

update docs

997d52f

Signed-off-by: William Woodruff <[email protected]>

woodruffw temporarily deployed to bencher October 17, 2025 01:55 — with GitHub Actions Inactive

woodruffw merged commit e450a59 into main Oct 17, 2025
9 of 10 checks passed

woodruffw deleted the ww/git2 branch October 17, 2025 02:00

This was referenced Oct 17, 2025

Reduce our GitHub API usage #1255

Closed

feat: use git database to fetch references and content instead of github api #801

Closed

Fix wheel builds #1269

Closed

nijel mentioned this pull request Oct 23, 2025

Error: API rate limit exceeded astral-sh/setup-uv#325

Open

BrewTestBot mentioned this pull request Oct 24, 2025

zizmor 1.16.0 Homebrew/homebrew-core#250908

Merged

Uh oh!

feat: use git2 and improve HTTP caching #1257

feat: use git2 and improve HTTP caching #1257

Uh oh!

Conversation

woodruffw commented Oct 15, 2025

Uh oh!

woodruffw Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Bo98 Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

File cache HIT

File cache MISS/Memory cache HIT

Both cache MISS

Uh oh!

Bo98 Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

woodruffw Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Bo98 Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

woodruffw commented Oct 16, 2025

Uh oh!

woodruffw commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

woodruffw commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bo98 commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

woodruffw commented Oct 16, 2025

Uh oh!

woodruffw Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

woodruffw Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Bo98 Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

woodruffw Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bencher Report

⚠️ WARNING: No Threshold found!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bo98 Oct 16, 2025 •

edited

Loading

Bo98 Oct 16, 2025 •

edited

Loading

Bo98 Oct 16, 2025 •

edited

Loading

woodruffw commented Oct 16, 2025 •

edited

Loading

woodruffw commented Oct 16, 2025 •

edited

Loading

Bo98 commented Oct 16, 2025 •

edited

Loading

Bo98 Oct 16, 2025 •

edited

Loading

github-actions bot commented Oct 17, 2025 •

edited

Loading