-
Notifications
You must be signed in to change notification settings - Fork 699
Cache list_models across all SDK languages to prevent rate limiting under concurrency #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: friggeri <[email protected]>
Co-authored-by: friggeri <[email protected]>
Co-authored-by: friggeri <[email protected]>
|
@copilot shouldn't you do this across all languages? |
Cross-SDK Consistency ReviewThis PR adds However, I've identified a cross-SDK consistency issue: Current Status Across SDKs
AnalysisAll three other SDKs have the same pattern as the original Python code:
This means Node.js, Go, and .NET users will encounter the same 429 rate limiting errors when running concurrent sessions with model specifications, just as described in issue #299. RecommendationTo maintain feature parity and provide a consistent developer experience, consider applying the same caching strategy to the other SDKs:
Each implementation should follow language-appropriate patterns (e.g., Node.js promises, Go mutexes, .NET locks) but maintain the same semantic behavior:
This PRThe Python implementation looks solid and can serve as a reference for the other SDKs. No changes needed here—this is ready to merge! The follow-up work for other SDKs can be tracked separately.
|
Co-authored-by: friggeri <[email protected]>
Good point! I've now implemented the same caching mechanism across all SDK languages:
All implementations follow the same pattern as the Python SDK:
This ensures consistent behavior across all SDK languages and prevents rate limiting for concurrent sessions in any environment. |
✅ Cross-SDK Consistency ReviewI've reviewed this PR for consistency across all four SDK implementations (Python, Node.js, Go, and .NET). The caching feature is well-implemented with excellent consistency! 🎉 ✅ What's ConsistentImplementation Pattern - All SDKs follow the same design:
API Naming - Correctly follows language conventions:
|
Cross-SDK Consistency Review ✅This PR implements ✅ Implementation ConsistencyCore behavior is identical across all SDKs:
Language-appropriate patterns:
Naming conventions properly follow language idioms:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Implements per-client caching for list_models across the Python, Node.js, .NET, and Go SDKs to reduce redundant models.list calls under concurrency and mitigate 429 rate limiting.
Changes:
- Added per-client model list caches with language-appropriate synchronization to prevent concurrent duplicate fetches.
- Added cache invalidation on disconnect (
stop/force_stopequivalents) to ensure fresh data after reconnect. - Added a Python E2E test intended to validate caching behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| python/copilot/client.py | Adds an async-locked model list cache and clears it on stop/force_stop. |
| python/e2e/test_client.py | Adds an E2E test intended to validate list_models caching. |
| nodejs/src/client.ts | Adds cached listModels() behavior and clears cache on stop/forceStop. |
| dotnet/src/Client.cs | Adds a SemaphoreSlim-guarded cache for ListModelsAsync() and clears it on disconnect cleanup. |
| go/client.go | Adds a mutex-guarded cache for ListModels() and clears it on Stop/ForceStop. |
Comments suppressed due to low confidence (1)
nodejs/src/client.ts:322
stop()clearsmodelsCachewithout coordinating with an in-flightlistModels()call. IflistModels()is awaiting the RPC whenstop()runs, it can still setthis.modelsCache = modelsafterwards, leaving a non-null cache even though the client disconnected. Consider invalidating via the same locking/single-flight mechanism (or a connection “generation” token check) so in-flight fetches can’t repopulate the cache after disconnect.
// Clear models cache
this.modelsCache = null;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Check cache (already inside lock) | ||
| if self._models_cache is not None: | ||
| return list(self._models_cache) # Return a copy to prevent cache mutation | ||
|
|
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning list(self._models_cache) only protects against callers mutating the list container; ModelInfo is a mutable dataclass, so callers can still mutate cached objects and affect subsequent list_models() results. If you want true defensive copies, consider copying/cloning the ModelInfo instances (or making them immutable); otherwise clarify in the comment/docstring that this is a shallow copy of the list.
| # First call should fetch from backend | ||
| models1 = await client.list_models() | ||
| assert isinstance(models1, list) | ||
|
|
||
| # Second call should return from cache (different list object but same content) | ||
| models2 = await client.list_models() | ||
| assert models2 is not models1, "Should return a copy, not the same object" | ||
| assert len(models2) == len(models1), "Cached results should have same content" | ||
| if len(models1) > 0: | ||
| assert models1[0].id == models2[0].id, "Cached models should match" | ||
|
|
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn’t currently validate that caching is happening: even without caching, two list_models() calls will return different list objects with similar contents, so the assertions would still pass. Consider instrumenting/spying on the underlying RPC request (e.g., wrap client._client.request to count models.list calls) and/or running concurrent list_models() calls to assert only one backend request occurs, plus explicitly asserting the cache is cleared after stop().
| // Use promise-based locking to prevent race condition with concurrent calls | ||
| await this.modelsCacheLock; | ||
|
|
||
| let resolveLock: () => void; | ||
| this.modelsCacheLock = new Promise((resolve) => { | ||
| resolveLock = resolve; | ||
| }); | ||
|
|
||
| try { | ||
| // Check cache (already inside lock) | ||
| if (this.modelsCache !== null) { | ||
| return [...this.modelsCache]; // Return a copy to prevent cache mutation | ||
| } | ||
|
|
||
| // Cache miss - fetch from backend while holding lock | ||
| const result = await this.connection.sendRequest("models.list", {}); | ||
| const response = result as { models: ModelInfo[] }; | ||
| const models = response.models; | ||
|
|
||
| // Update cache before releasing lock | ||
| this.modelsCache = models; | ||
|
|
||
| return [...models]; // Return a copy to prevent cache mutation | ||
| } finally { | ||
| resolveLock!(); | ||
| } |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The promise-based “lock” here isn’t actually mutual exclusion: if multiple callers enter while modelsCacheLock is already resolved, they can all pass await this.modelsCacheLock and then race to overwrite this.modelsCacheLock, leading to multiple concurrent models.list requests (the exact rate-limit issue this change is trying to fix). Consider using an atomic chaining approach (e.g., chaining onto the previous lock promise) or a dedicated async mutex / single-flight inFlightModelsPromise that all callers await.
| // Use promise-based locking to prevent race condition with concurrent calls | |
| await this.modelsCacheLock; | |
| let resolveLock: () => void; | |
| this.modelsCacheLock = new Promise((resolve) => { | |
| resolveLock = resolve; | |
| }); | |
| try { | |
| // Check cache (already inside lock) | |
| if (this.modelsCache !== null) { | |
| return [...this.modelsCache]; // Return a copy to prevent cache mutation | |
| } | |
| // Cache miss - fetch from backend while holding lock | |
| const result = await this.connection.sendRequest("models.list", {}); | |
| const response = result as { models: ModelInfo[] }; | |
| const models = response.models; | |
| // Update cache before releasing lock | |
| this.modelsCache = models; | |
| return [...models]; // Return a copy to prevent cache mutation | |
| } finally { | |
| resolveLock!(); | |
| } | |
| // Chain onto the existing lock promise to ensure mutual exclusion and single-flight behavior. | |
| // Any previous rejection is caught so it does not poison future lock acquisitions. | |
| const lockPromise = this.modelsCacheLock = this.modelsCacheLock | |
| .catch(() => {}) | |
| .then(async () => { | |
| // Inside the critical section: check and populate the cache if needed. | |
| if (this.modelsCache === null) { | |
| const result = await this.connection!.sendRequest("models.list", {}); | |
| const response = result as { models: ModelInfo[] }; | |
| const models = response.models; | |
| this.modelsCache = models; | |
| } | |
| }); | |
| // Wait for the critical section (including any in-flight fetch) to complete. | |
| await lockPromise; | |
| // At this point the cache should be populated unless the request failed, | |
| // in which case the error will have been thrown by await lockPromise above. | |
| if (this.modelsCache === null) { | |
| throw new Error("Failed to load models"); | |
| } | |
| // Return a copy to prevent callers from mutating the cache. | |
| return [...this.modelsCache]; |
| catch (Exception ex) { errors?.Add(ex); } | ||
|
|
||
| // Clear models cache | ||
| _modelsCache = null; |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CleanupConnectionAsync clears _modelsCache without taking _modelsCacheLock. If ListModelsAsync is in-flight, it can repopulate _modelsCache after the disconnect cleanup runs, which breaks the “cache cleared on disconnect” guarantee. Consider acquiring _modelsCacheLock (or otherwise synchronizing) when clearing the cache during cleanup.
| _modelsCache = null; | |
| lock (_modelsCacheLock) | |
| { | |
| _modelsCache = null; | |
| } |
| if (_modelsCache is not null) | ||
| { | ||
| return new List<ModelInfo>(_modelsCache); // Return a copy to prevent cache mutation | ||
| } |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new List<ModelInfo>(_modelsCache) only copies the list container; ModelInfo is a mutable reference type (settable properties), so callers can still mutate the cached objects and affect subsequent results. If the goal is to prevent cached data mutation, consider cloning the ModelInfo objects (deep-ish copy) or making the model types immutable; otherwise, please adjust the comment/remarks to clarify this is only a shallow collection copy.
Concurrent
send_and_waitcalls trigger redundantlist_modelsAPI requests, causing 429 errors at ~100 concurrent sessions. This fix implements caching across all SDK languages (Python, Node.js, .NET, and Go) to ensure consistent behavior.Changes
list_models()now caches results after first successful call across all SDKsstop()andforce_stop()/forceStop()/ForceStop()to ensure fresh data after reconnectImplementation by Language
Python
Node.js
.NET
SemaphoreSlimfor async lockingList<ModelInfo>copiesGo
sync.Mutexfor thread-safe accesscopy()functionModels are now fetched once per client connection regardless of concurrent session count across all SDK implementations.
Original prompt
send_and_waitcalls with model specified in session triggersFailed to list models: 429rate limiting #299💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.