-
-
Notifications
You must be signed in to change notification settings - Fork 126
feat: use git2 and improve HTTP caching #1257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: William Woodruff <[email protected]>
| .with(ChainedCache( | ||
| Cache(HttpCache { | ||
| mode: CacheMode::Default, | ||
| manager: CACacheManager { | ||
| path: cache_dir.into(), | ||
| remove_opts: Default::default(), | ||
| }, | ||
| options: http_cache_options.clone(), | ||
| }), | ||
| CacheType::File, | ||
| )) | ||
| .with(ChainedCache( | ||
| Cache(HttpCache { | ||
| mode: CacheMode::ForceCache, | ||
| manager: MokaManager::new(MokaCache::new(1000)), | ||
| options: http_cache_options, | ||
| }), | ||
| CacheType::Memory, | ||
| )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Bo98 just making sure I understand the intended behavior here -- is the idea here that we'll hit the in-memory cache first, and then fall back to the file cache if the former misses? In other words, we'll end up priming the memory cache with the contents of the file cache, if it has a hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll read from the file cache first. It took me a while to understand how this chaining works but basically each middleware is responsible for firing the next one and then it pops up from there.
File cache HIT
Logging middleware (always does next.run) -> File cache (no next.run on a hit) -> Logging post-response
File cache MISS/Memory cache HIT
Logging -> File cache -> Memory -> File cache post-response (will save to cache if policy allows) -> Logging post-response
(In practice, this scenario is for things the file cache rejected anyway)
Both cache MISS
Logging -> File Cache -> Memory -> Default (i.e. fetch remote) -> Memory post-response (save) -> File post-response (save) -> Logging post-response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought the order was important but it probably isn't so you can probably adjust the order around if you want. Ultimately speed isn't really what matters here - it's mainly so that we avoid make the same request twice in a single run regardless of what Cache-Control says but still respect it across different runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's helpful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I think I remember one thing about the order: for non-200 codes I found the default cache policy (i.e. the file cache) reported cache hits while still querying the remote: https://github.com/06chaynes/http-cache/blob/12dde469fa2519d0fbb13b826db0b9a6706aafd5/http-cache/src/lib.rs#L1709. It's technically correct but was confusing when trying to debug the API.
So behaviour wise in terms of # of API calls it won't (or shouldn't) make a difference, but changing the order might make the debug logging more confusing given a Hit can still do a remote call for the file cache. The logging always takes the bottom of the stack so any form of pass back to the memory cache will ignore what the file cache attempts to claim.
|
Noting for myself, this is a baseline from |
|
The initial results here are super promising, although the SD/range is huge: However, even at the worst end, that's a lot better than (This is with Similar results on re-run: Interestingly, the same initial outlier behavior occurs: I don't understand why that is, since we're post-warmup. |
|
The benchmark for this PR is also slightly pessimized, since I needed to change |
Signed-off-by: William Woodruff <[email protected]>
|
Most GitHub API calls are 60 second expiries. I wonder if that is expiring just as the warmup ends. Might be interesting to see what the results are with the file cache commented out. Other than that I don't really have much of an idea. git2 code does have a cache but it should be all in-memory. |
Yeah, that's a good theory. I guess that would also explain the large SD in the warm benchmarks -- the cache gets expired somewhat unevenly across the runs. |
This is probably from before 1.0. The finding conveys everything relevant here. Signed-off-by: William Woodruff <[email protected]>
Signed-off-by: William Woodruff <[email protected]>
| let annotated_tags: HashSet<_> = tags | ||
| .iter() | ||
| .filter_map(|tag| tag.name.strip_suffix("^{}").map(|n| n.to_string())) | ||
| .collect(); | ||
| tags.retain_mut(|tag| { | ||
| if let Some(stripped_name) = tag.name.strip_suffix("^{}") { | ||
| tag.name = stripped_name.to_string(); | ||
| true | ||
| } else { | ||
| !annotated_tags.contains(&tag.name) | ||
| } | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't fully thought through this yet, but I think there might be an edge case here -- tags can point to other tags (not just commits), so to fully flatten a tag to its underlying commit we probably need to recurse here.
(OTOH, we have tests for ref-version-mismatch that test this exact situation and they're passing, so I guess this works as is? I'm tired.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'm realizing that test might have changed underneath us, and is no longer the repo's state. I'm going to contrive a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that test seem to be a regular annotated tag:
$ git show-ref -d v2.7.8
aa7c1c80a07a27a84c0aa76d0cef0aad3830e330 refs/tags/v2.7.8
9d47c6ad4b02e050fd481d890b2ea34778fd09d6 refs/tags/v2.7.8^{}
Which is still a tag pointing to tag as described so still a good/correct test to have (and is the scenario already covered here), but I guess you're talking about a theoretical tag to a tag to a tag.
If a third level is possible I'd be interested to see how git ls-remote behaves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about the fully general case. But it looks like git ls-remove does handle this for us:
https://github.com/woodruffw-experiments/zizmor-recursive-tags
has:
v1.0 (annotated) -> v1 (annotated) -> v1.0.0 (annotated) -> 3fdd4fc
which shows up in ls-remote as:
From [email protected]:woodruffw-experiments/zizmor-recursive-tags.git
3fdd4fca8fc76b254cefefca92381c41b28d1f0d HEAD
3fdd4fca8fc76b254cefefca92381c41b28d1f0d refs/heads/main
1accca34bff60347d96faaf713d328ca1250d37b refs/tags/v1
3fdd4fca8fc76b254cefefca92381c41b28d1f0d refs/tags/v1^{}
bcb36f3d551340e11b88c376e74e8ae77fc6cf0b refs/tags/v1.0
3fdd4fca8fc76b254cefefca92381c41b28d1f0d refs/tags/v1.0^{}
06f9d47abf340b709b412900a7b3ce33557d32b5 refs/tags/v1.0.0
3fdd4fca8fc76b254cefefca92381c41b28d1f0d refs/tags/v1.0.0^{}
So the ^{} entry does always fully dereference the commit object, which is nice!
Unrelated to these changed, just removes a pedantic finding. Signed-off-by: William Woodruff <[email protected]>
Signed-off-by: William Woodruff <[email protected]>
|
| Branch | ww/git2 |
| Testbed | ubuntu-latest |
⚠️ WARNING: No Threshold found!Without a Threshold, no Alerts will ever be generated.
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the--ci-only-thresholdsflag.
Click to view all benchmark results
| Benchmark | Latency | milliseconds (ms) |
|---|---|---|
| zizmor::cpython-48f88310044c | 📈 view plot | 152.27 ms |
Signed-off-by: William Woodruff <[email protected]>
Signed-off-by: William Woodruff <[email protected]>
Signed-off-by: William Woodruff <[email protected]>
See #1255.