Cache ACR tokens in deployment state to avoid redundant authentication#12792
Cache ACR tokens in deployment state to avoid redundant authentication#12792
Conversation
- Add IDeploymentStateManager dependency to AcrLoginService - Cache refresh tokens with expiration timestamps in deployment state - Check for valid cached tokens before performing fresh login - Store tokens in "AcrTokens" section keyed by registry+tenant - Add 5-minute safety margin to expiration checks - Default to 3-hour expiration if not provided by ACR - Gracefully handle caching failures without breaking login - Add comprehensive test coverage for caching behavior Co-authored-by: davidfowl <[email protected]>
- Acquire deployment state section once per login operation - Use registry URL as section name (not combined with tenant) - Store tokens keyed by tenantId within each registry section - Only save section when acquiring fresh token (not when using cached) - Handle 401/403 errors from cached tokens and retry with fresh token - Add test for 401 retry behavior - Update all existing tests to match new section naming scheme Co-authored-by: davidfowl <[email protected]>
|
@copilot Use TimeProvider so we can test the time without sleeping. Use FakeTimeProvider to test it. |
- Inject TimeProvider instead of using DateTime.UtcNow directly - Update all tests to use FakeTimeProvider for time-based testing - Add ToJsonNode method on CachedToken that manually constructs JSON - Remove skipped test - now uses FakeTimeProvider.Advance() instead of Task.Delay - All 8 tests now passing without requiring long delays Co-authored-by: davidfowl <[email protected]>
Done. Updated to use |
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 12792Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 12792" |
There was a problem hiding this comment.
Pull Request Overview
This PR adds token caching functionality to the AcrLoginService to improve performance and reduce unnecessary authentication requests when logging into Azure Container Registry. The caching mechanism uses the IDeploymentStateManager to persist tokens with expiration tracking, and includes retry logic for rejected cached tokens.
Key changes:
- Added token caching using
IDeploymentStateManagerto store ACR refresh tokens with expiration times - Implemented expiration-aware token reuse with a 5-minute safety margin for clock skew
- Added retry logic to fetch fresh tokens when cached tokens are rejected with 401/403 errors
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/Aspire.Hosting.Azure/AcrLoginService.cs | Added token caching infrastructure with IDeploymentStateManager and TimeProvider dependencies; implemented cache lookup, validation, and storage logic with retry on authentication failures |
| tests/Aspire.Hosting.Azure.Tests/AcrLoginServiceTests.cs | Comprehensive test suite covering token caching scenarios including cache hits/misses, expiration, multi-tenant/multi-registry support, retry on 401/403, and graceful degradation on cache failures |
| return new System.Text.Json.Nodes.JsonObject | ||
| { | ||
| ["refresh_token"] = RefreshToken, | ||
| ["expires_at_utc"] = ExpiresAtUtc | ||
| }; |
There was a problem hiding this comment.
The ToJsonNode() method uses System.Text.Json.Nodes.JsonObject with a fully qualified name, which is inconsistent with the style elsewhere in the file. Consider adding using System.Text.Json.Nodes; at the top of the file and using just JsonObject for consistency with the rest of the codebase.
There was a problem hiding this comment.
Fixed. Added using System.Text.Json.Nodes; and using JsonObject directly (commit d313451).
| catch (InvalidOperationException ex) when (ex.Message.Contains("401") || ex.Message.Contains("403") || | ||
| ex.Message.Contains("Unauthorized") || ex.Message.Contains("Forbidden")) |
There was a problem hiding this comment.
Catching exceptions and checking their message strings is fragile and error-prone. Consider documenting which container runtimes throw InvalidOperationException with these specific messages, or refactor the container runtime interface to throw more specific exception types that can be caught directly.
| public static Task<bool> InspectImageAsync(string _, CancellationToken __ = default) => Task.FromResult(true); | ||
| public static Task PullImageAsync(string _, CancellationToken __ = default) => Task.CompletedTask; |
There was a problem hiding this comment.
These methods are declared as static but they're part of a class implementing IContainerRuntime interface. If the interface declares these as instance methods, this will cause a compilation error. These should be instance methods, not static.
| public static Task<bool> InspectImageAsync(string _, CancellationToken __ = default) => Task.FromResult(true); | |
| public static Task PullImageAsync(string _, CancellationToken __ = default) => Task.CompletedTask; | |
| public Task<bool> InspectImageAsync(string _, CancellationToken __ = default) => Task.FromResult(true); | |
| public Task PullImageAsync(string _, CancellationToken __ = default) => Task.CompletedTask; |
There was a problem hiding this comment.
Fixed. Removed static modifier from both methods (commit d313451).
| // Use the registry endpoint as the section name | ||
| // Replace dots and other characters that might not be suitable for section names | ||
| return $"AcrTokens:{registryEndpoint.Replace('.', '_')}"; |
There was a problem hiding this comment.
The comment mentions 'dots and other characters' but the code only replaces dots. Either update the comment to accurately reflect what's being replaced (just dots), or add handling for other potentially problematic characters if they exist.
There was a problem hiding this comment.
Updated comment to say "Replace dots with underscores" instead of "dots and other characters" (commit d313451).
| // Default to 3 hours (10800 seconds) if not provided by ACR | ||
| var expiresIn = tokenResponse.ExpiresIn ?? 10800; | ||
|
|
||
| return (tokenResponse.RefreshToken, expiresIn); |
There was a problem hiding this comment.
@copilot - This is a bad assumption. Change the code to return null for the expiresIn in this case, and handle it in the callers by not caching it.
There was a problem hiding this comment.
Changed to return null for expiresIn when not provided by ACR. Tokens without expiration are now not cached at all. Updated test to verify this behavior (commit d313451).
| var newCachedToken = new CachedToken | ||
| { | ||
| RefreshToken = refreshToken, | ||
| ExpiresAtUtc = _timeProvider.GetUtcNow().AddSeconds(expiresIn).UtcDateTime |
There was a problem hiding this comment.
@copilot - we should do this calculation as soon as possible after getting the expiresIn value from the server. We shouldn't be doing operations in between.
There was a problem hiding this comment.
Moved the expiration calculation to immediately after receiving the expiresIn value from the server, before any logging or section acquisition operations (commit 583d794).
- Add using System.Text.Json.Nodes and simplify JsonObject usage - Fix static modifier on interface methods in test class - Update comment to accurately reflect only dots are replaced - Change expiresIn handling to return null instead of defaulting to 3 hours - Skip caching when ACR doesn't provide expiration time - Update test to verify tokens aren't cached without expiration - Add CultureInfo.InvariantCulture to ToString() call Co-authored-by: eerhardt <[email protected]>
Move the ExpiresAtUtc calculation to immediately after getting the expiresIn value from the server, before any logging or other operations. This minimizes time drift between when the server provides the expiration time and when we calculate the actual timestamp. Co-authored-by: eerhardt <[email protected]>
Description
ACR login performs OAuth2 token exchange on every deployment, causing unnecessary latency and potential rate limiting. This caches refresh tokens in deployment state with expiration tracking.
Implementation:
AcrTokens:{registry}sections store tokens keyed bytenantIdTimeProviderfor time operations, enabling deterministic testing withFakeTimeProviderCachedToken.ToJsonNode()manually constructs JSON for optimal performanceExample state structure:
{ "AcrTokens:myregistry_azurecr_io": { "tenant-1": { "refresh_token": "...", "expires_at_utc": "2025-11-07T10:30:00Z" } } }Testing improvements:
FakeTimeProvider.Advance()instead ofTask.DelayFixes #(issue)
Checklist
<remarks />and<code />elements on your triple slash comments?doc-ideatemplatebreaking-changetemplatediagnostictemplateOriginal prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.