Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@woodruffw
Copy link
Member

See #1255.

Signed-off-by: William Woodruff <[email protected]>
Comment on lines +285 to +303
.with(ChainedCache(
Cache(HttpCache {
mode: CacheMode::Default,
manager: CACacheManager {
path: cache_dir.into(),
remove_opts: Default::default(),
},
options: http_cache_options.clone(),
}),
CacheType::File,
))
.with(ChainedCache(
Cache(HttpCache {
mode: CacheMode::ForceCache,
manager: MokaManager::new(MokaCache::new(1000)),
options: http_cache_options,
}),
CacheType::Memory,
))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bo98 just making sure I understand the intended behavior here -- is the idea here that we'll hit the in-memory cache first, and then fall back to the file cache if the former misses? In other words, we'll end up priming the memory cache with the contents of the file cache, if it has a hit?

Copy link
Contributor

@Bo98 Bo98 Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll read from the file cache first. It took me a while to understand how this chaining works but basically each middleware is responsible for firing the next one and then it pops up from there.

File cache HIT

Logging middleware (always does next.run) -> File cache (no next.run on a hit) -> Logging post-response

File cache MISS/Memory cache HIT

Logging -> File cache -> Memory -> File cache post-response (will save to cache if policy allows) -> Logging post-response

(In practice, this scenario is for things the file cache rejected anyway)

Both cache MISS

Logging -> File Cache -> Memory -> Default (i.e. fetch remote) -> Memory post-response (save) -> File post-response (save) -> Logging post-response

Copy link
Contributor

@Bo98 Bo98 Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought the order was important but it probably isn't so you can probably adjust the order around if you want. Ultimately speed isn't really what matters here - it's mainly so that we avoid make the same request twice in a single run regardless of what Cache-Control says but still respect it across different runs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's helpful!

Copy link
Contributor

@Bo98 Bo98 Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think I remember one thing about the order: for non-200 codes I found the default cache policy (i.e. the file cache) reported cache hits while still querying the remote: https://github.com/06chaynes/http-cache/blob/12dde469fa2519d0fbb13b826db0b9a6706aafd5/http-cache/src/lib.rs#L1709. It's technically correct but was confusing when trying to debug the API.

So behaviour wise in terms of # of API calls it won't (or shouldn't) make a difference, but changing the order might make the debug logging more confusing given a Hit can still do a remote call for the file cache. The logging always takes the bottom of the stack so any form of pass back to the memory cache will ignore what the file cache attempts to claim.

@woodruffw
Copy link
Member Author

Noting for myself, this is a baseline from main via #1258:

  Time (mean ± σ):     84.253 s ±  2.481 s    [User: 1.362 s, System: 1.608 s]
  Range (min … max):   80.210 s … 87.656 s    10 runs

@woodruffw
Copy link
Member Author

woodruffw commented Oct 16, 2025

The initial results here are super promising, although the SD/range is huge:

  Time (mean ± σ):     24.903 s ± 19.422 s    [User: 0.616 s, System: 0.843 s]
  Range (min … max):    9.224 s … 48.308 s    10 runs

However, even at the worst end, that's a lot better than main.

(This is with gha-hazmat, which is pretty much the worst possible case in terms of API lookups due to impostor commits.)

Similar results on re-run:

  Time (mean ± σ):     25.593 s ± 19.844 s    [User: 0.622 s, System: 0.728 s]
  Range (min … max):    9.296 s … 49.699 s    10 runs

Interestingly, the same initial outlier behavior occurs:

  Warning: The first benchmarking run for this command was significantly slower than the rest (48.377 s). This could be caused by (filesystem) caches that were not filled until after the first run. You are already using the '--warmup' option which helps to fill these caches before the actual benchmark. You can either try to increase the warmup count further or re-run this benchmark on a quiet system in case it was a random outlier. Alternatively, consider using the '--prepare' option to clear the caches before each timing run.

I don't understand why that is, since we're post-warmup.

@woodruffw
Copy link
Member Author

woodruffw commented Oct 16, 2025

The benchmark for this PR is also slightly pessimized, since I needed to change gha-hazmat to remove a nonexistent reusable workflow (which now causes a 404 that was previously masked as an empty set). That's technically a breaking change but it's a contractually correct one (zizmor doesn't intentionally let 404s slide, that one was an oversight.)

Signed-off-by: William Woodruff <[email protected]>
@woodruffw woodruffw added this to the 1.16.0 milestone Oct 16, 2025
@Bo98
Copy link
Contributor

Bo98 commented Oct 16, 2025

Most GitHub API calls are 60 second expiries. I wonder if that is expiring just as the warmup ends. Might be interesting to see what the results are with the file cache commented out.

Other than that I don't really have much of an idea. git2 code does have a cache but it should be all in-memory.

@woodruffw
Copy link
Member Author

Most GitHub API calls are 60 second expiries. I wonder if that is expiring just as the warmup ends. Might be interesting to see what the results are with the file cache commented out.

Yeah, that's a good theory. I guess that would also explain the large SD in the warm benchmarks -- the cache gets expired somewhat unevenly across the runs.

This is probably from before 1.0. The finding
conveys everything relevant here.

Signed-off-by: William Woodruff <[email protected]>
Signed-off-by: William Woodruff <[email protected]>
Comment on lines +408 to +419
let annotated_tags: HashSet<_> = tags
.iter()
.filter_map(|tag| tag.name.strip_suffix("^{}").map(|n| n.to_string()))
.collect();
tags.retain_mut(|tag| {
if let Some(stripped_name) = tag.name.strip_suffix("^{}") {
tag.name = stripped_name.to_string();
true
} else {
!annotated_tags.contains(&tag.name)
}
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully thought through this yet, but I think there might be an edge case here -- tags can point to other tags (not just commits), so to fully flatten a tag to its underlying commit we probably need to recurse here.

(OTOH, we have tests for ref-version-mismatch that test this exact situation and they're passing, so I guess this works as is? I'm tired.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm realizing that test might have changed underneath us, and is no longer the repo's state. I'm going to contrive a test for this.

Copy link
Contributor

@Bo98 Bo98 Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that test seem to be a regular annotated tag:

$ git show-ref -d v2.7.8
aa7c1c80a07a27a84c0aa76d0cef0aad3830e330 refs/tags/v2.7.8
9d47c6ad4b02e050fd481d890b2ea34778fd09d6 refs/tags/v2.7.8^{}

Which is still a tag pointing to tag as described so still a good/correct test to have (and is the scenario already covered here), but I guess you're talking about a theoretical tag to a tag to a tag.

If a third level is possible I'd be interested to see how git ls-remote behaves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about the fully general case. But it looks like git ls-remove does handle this for us:

https://github.com/woodruffw-experiments/zizmor-recursive-tags

has:

v1.0 (annotated) -> v1 (annotated) -> v1.0.0 (annotated) -> 3fdd4fc

which shows up in ls-remote as:

From [email protected]:woodruffw-experiments/zizmor-recursive-tags.git
3fdd4fca8fc76b254cefefca92381c41b28d1f0d        HEAD
3fdd4fca8fc76b254cefefca92381c41b28d1f0d        refs/heads/main
1accca34bff60347d96faaf713d328ca1250d37b        refs/tags/v1
3fdd4fca8fc76b254cefefca92381c41b28d1f0d        refs/tags/v1^{}
bcb36f3d551340e11b88c376e74e8ae77fc6cf0b        refs/tags/v1.0
3fdd4fca8fc76b254cefefca92381c41b28d1f0d        refs/tags/v1.0^{}
06f9d47abf340b709b412900a7b3ce33557d32b5        refs/tags/v1.0.0
3fdd4fca8fc76b254cefefca92381c41b28d1f0d        refs/tags/v1.0.0^{}

So the ^{} entry does always fully dereference the commit object, which is nice!

Unrelated to these changed, just removes a pedantic finding.

Signed-off-by: William Woodruff <[email protected]>
Signed-off-by: William Woodruff <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 17, 2025

🐰 Bencher Report

Branchww/git2
Testbedubuntu-latest

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

Click to view all benchmark results
BenchmarkLatencymilliseconds (ms)
zizmor::cpython-48f88310044c📈 view plot
⚠️ NO THRESHOLD
152.27 ms
🐰 View full continuous benchmarking report in Bencher

Signed-off-by: William Woodruff <[email protected]>
Signed-off-by: William Woodruff <[email protected]>
Signed-off-by: William Woodruff <[email protected]>
@woodruffw woodruffw merged commit e450a59 into main Oct 17, 2025
9 of 10 checks passed
@woodruffw woodruffw deleted the ww/git2 branch October 17, 2025 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants