Cache list_models across all SDK languages to prevent rate limiting under concurrency #300

Copilot · 2026-01-30T17:17:04Z

Concurrent send_and_wait calls trigger redundant list_models API requests, causing 429 errors at ~100 concurrent sessions. This fix implements caching across all SDK languages (Python, Node.js, .NET, and Go) to ensure consistent behavior.

Changes

Cache models list per client connection: list_models() now caches results after first successful call across all SDKs
Prevent race conditions: Language-appropriate locking mechanisms ensure only one request fetches when cache is empty; subsequent concurrent calls wait and receive cached result
Cache invalidation: Clear on stop() and force_stop()/forceStop()/ForceStop() to ensure fresh data after reconnect
Return defensive copies: Prevent callers from mutating cached data

Implementation by Language

Python

async def list_models(self) -> list["ModelInfo"]:
    async with self._models_cache_lock:
        if self._models_cache is not None:
            return list(self._models_cache)  # Return copy
        
        # Fetch while holding lock - other coroutines wait here
        response = await self._client.request("models.list", {})
        models = [ModelInfo.from_dict(m) for m in response.get("models", [])]
        self._models_cache = models
        
        return list(models)

Node.js

Uses Promise-based locking
Returns array copies via spread operator

.NET

Uses SemaphoreSlim for async locking
Returns new List<ModelInfo> copies

Go

Uses sync.Mutex for thread-safe access
Returns copies via Go's copy() function

Models are now fetched once per client connection regardless of concurrent session count across all SDK implementations.

Original prompt

This section details on the original issue you should resolve

<issue_title>[BUG] Concurrent send_and_wait calls with model specified in session triggers Failed to list models: 429 rate limiting</issue_title>
<issue_description>## Summary

When creating multiple concurrent sessions with a specific model specified in the session config, the SDK calls list_models on each send_and_wait() call rather than caching the result. This causes 429 rate limiting errors at moderate concurrency levels (~100 sessions).

Environment

SDK: GitHub Copilot Python SDK (copilot package)
Python: 3.11+
OS: Windows 11

Steps to Reproduce

import asyncio
import os

from copilot import CopilotClient

MODEL = "claude-opus-4.5"
NUM_SESSIONS = 100


async def create_session(client, index):
    # Step 1: Create session
    try:
        session = await client.create_session({"model": MODEL})
    except Exception as e:
        return False, f"[create_session] {e}"
    
    # Step 2: Send and wait
    try:
        await session.send_and_wait({"prompt": "Say hello"}, timeout=120)
    except Exception as e:
        return False, f"[send_and_wait] {e}"
    
    # Step 3: Cleanup
    try:
        await session.destroy()
    except:
        pass
    
    return True, None


async def main():
    client = CopilotClient()
    await client.start()
    
    print(f"Creating {NUM_SESSIONS} sessions with model={MODEL}...")
    
    results = await asyncio.gather(*[
        create_session(client, i) for i in range(NUM_SESSIONS)
    ])
    
    await client.stop()
    
    success = sum(1 for ok, _ in results if ok)
    failed = [err for ok, err in results if not ok]
    
    print(f"\nResults: {success}/{NUM_SESSIONS} succeeded")
    if failed:
        print(f"Errors ({len(failed)}):")
        for err in set(failed):
            print(f"  [{failed.count(err)}x] {err[:80]}")


if __name__ == "__main__":
    asyncio.run(main())

Output

Creating 100 sessions with model=claude-opus-4.5...

Results: 66/100 succeeded
Errors (34):
  [34x] [send_and_wait] Session error: Execution failed: Error: Failed to list models: 429

Potential Expected Behavior

Use a cached list of models after the first session.send_and_wait or bound the list of models upon client.create_session({"model": MODEL})
</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes [BUG] Concurrent send_and_wait calls with model specified in session triggers Failed to list models: 429 rate limiting #299

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: friggeri <[email protected]>

friggeri · 2026-01-30T19:13:39Z

@copilot shouldn't you do this across all languages?

github-actions · 2026-01-30T19:16:33Z

Cross-SDK Consistency Review

This PR adds list_models() caching to the Python SDK to prevent rate limiting under concurrent load (issue #299). This is an excellent fix! 🎉

However, I've identified a cross-SDK consistency issue:

Current Status Across SDKs

SDK	Has `listModels` method?	Implements caching?	Vulnerable to rate limiting?
Python	✅ Yes	✅ Yes (after this PR)	✅ Fixed
Node.js	✅ Yes (`nodejs/src/client.ts:644`)	❌ No	⚠️ Yes
Go	✅ Yes (`go/client.go:1010`)	❌ No	⚠️ Yes
.NET	✅ Yes (`dotnet/src/Client.cs:544`)	❌ No	⚠️ Yes

Analysis

All three other SDKs have the same pattern as the original Python code:

Make a fresh API call to models.list on every invocation
No caching mechanism
No protection against concurrent calls

This means Node.js, Go, and .NET users will encounter the same 429 rate limiting errors when running concurrent sessions with model specifications, just as described in issue #299.

Recommendation

To maintain feature parity and provide a consistent developer experience, consider applying the same caching strategy to the other SDKs:

Node.js: Add caching to listModels() in nodejs/src/client.ts
Go: Add caching to ListModels() in go/client.go
.NET: Add caching to ListModelsAsync() in dotnet/src/Client.cs

Each implementation should follow language-appropriate patterns (e.g., Node.js promises, Go mutexes, .NET locks) but maintain the same semantic behavior:

Cache results after first successful call
Clear cache on disconnect/stop
Return defensive copies to prevent mutation

This PR

The Python implementation looks solid and can serve as a reference for the other SDKs. No changes needed here—this is ready to merge! The follow-up work for other SDKs can be tracked separately.

AI generated by SDK Consistency Review Agent

Co-authored-by: friggeri <[email protected]>

Copilot · 2026-01-30T19:19:57Z

@copilot shouldn't you do this across all languages?

Good point! I've now implemented the same caching mechanism across all SDK languages:

Node.js: Added modelsCache with promise-based locking (commit 72fb8c1)
.NET: Added _modelsCache with SemaphoreSlim for async locking (commit 72fb8c1)
Go: Added modelsCache with sync.Mutex for thread-safe access (commit 72fb8c1)

All implementations follow the same pattern as the Python SDK:

Cache results after first successful call
Use language-appropriate locking to prevent race conditions
Return defensive copies to prevent cache mutation
Clear cache on disconnect (both stop() and forceStop()/ForceStop())

This ensures consistent behavior across all SDK languages and prevents rate limiting for concurrent sessions in any environment.

github-actions · 2026-01-30T22:46:12Z

✅ Cross-SDK Consistency Review

I've reviewed this PR for consistency across all four SDK implementations (Python, Node.js, Go, and .NET). The caching feature is well-implemented with excellent consistency! 🎉

✅ What's Consistent

Implementation Pattern - All SDKs follow the same design:

✅ Cache variable initialized to null/nil
✅ Language-appropriate locking mechanisms (asyncio.Lock, Promise-based, sync.Mutex, SemaphoreSlim)
✅ Cache-check inside lock to prevent race conditions
✅ Defensive copies returned to prevent mutation
✅ Cache cleared on both stop() and force_stop()/forceStop()/ForceStop()
✅ Consistent documentation/comments across all implementations

API Naming - Correctly follows language conventions:

Python: list_models() (snake_case) ✅
Node.js: listModels() (camelCase) ✅
Go: ListModels() (PascalCase for exported) ✅
.NET: ListModelsAsync() (PascalCase) ✅

⚠️ Minor Gap: Test Coverage

Only Python has comprehensive cache testing (test_should_cache_models_list added in this PR). The other three SDKs only have basic listModels tests without cache-specific verification.

Suggestion: Consider adding equivalent cache tests to Node.js, Go, and .NET to verify:

Multiple calls return different objects (defensive copies)
Cached content matches original
Cache clears after disconnect/reconnect

Example test structure (from Python implementation):

# First call - fetches from backend
models1 = await client.list_models()

# Second call - returns from cache (different object)
models2 = await client.list_models()
assert models2 is not models1  # Defensive copy check
assert len(models2) == len(models1)  # Content match

# After stop/restart - cache cleared
await client.stop()
await client.start()
models3 = await client.list_models()
assert models3 is not models1  # Cache was cleared

This would strengthen confidence that the caching behavior works identically across all SDKs. However, this is not blocking - the implementations are solid and the basic listModels tests in all SDKs do provide baseline coverage.

🎯 Verdict

Excellent work on maintaining cross-SDK consistency! The cache invalidation on disconnect, defensive copying, proper locking, and consistent documentation are all well-executed across all four languages. The test gap is minor and can be addressed in a follow-up if desired.

AI generated by SDK Consistency Review Agent

github-actions · 2026-01-30T22:49:54Z

Cross-SDK Consistency Review ✅

This PR implements list_models() caching across all four SDK languages to prevent rate limiting under concurrent sessions. The implementation maintains excellent cross-SDK consistency!

✅ Implementation Consistency

Core behavior is identical across all SDKs:

Cache models after first successful list_models() call
Use language-appropriate async/thread-safe locking to prevent race conditions
Clear cache on both stop() and force_stop() methods
Return defensive copies to prevent cache mutation
Consistent documentation explaining caching behavior

Language-appropriate patterns:

Python: asyncio.Lock() for async locking ✅
Node.js: Promise-based locking pattern ✅
.NET: SemaphoreSlim for async locking ✅
Go: sync.Mutex for thread-safe access ✅

Naming conventions properly follow language idioms:

Python: list_models() (snake_case)
Node.js: listModels() (camelCase)
.NET: ListModelsAsync() (PascalCase)
Go: ListModels() (PascalCase for exported)

⚠️ Test Coverage Gap

Only Python has a cache-specific test (test_should_cache_models_list in python/e2e/test_client.py).

Suggestion: Consider adding equivalent tests to Node.js, Go, and .NET that verify:

Models are cached after the first call
Subsequent calls return cached data (different object, same content)
Cache is cleared after stop() and reconnect

This would ensure the caching behavior is tested consistently across all SDK implementations, matching the excellent consistency in the implementation itself.

Overall: This is a great example of maintaining feature parity across multi-language SDKs. The only improvement would be matching test coverage. Nice work! 🎉

AI generated by SDK Consistency Review Agent

Copilot

Pull request overview

Implements per-client caching for list_models across the Python, Node.js, .NET, and Go SDKs to reduce redundant models.list calls under concurrency and mitigate 429 rate limiting.

Changes:

Added per-client model list caches with language-appropriate synchronization to prevent concurrent duplicate fetches.
Added cache invalidation on disconnect (stop/force_stop equivalents) to ensure fresh data after reconnect.
Added a Python E2E test intended to validate caching behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
python/copilot/client.py	Adds an async-locked model list cache and clears it on stop/force_stop.
python/e2e/test_client.py	Adds an E2E test intended to validate `list_models` caching.
nodejs/src/client.ts	Adds cached `listModels()` behavior and clears cache on stop/forceStop.
dotnet/src/Client.cs	Adds a `SemaphoreSlim`-guarded cache for `ListModelsAsync()` and clears it on disconnect cleanup.
go/client.go	Adds a mutex-guarded cache for `ListModels()` and clears it on Stop/ForceStop.

Comments suppressed due to low confidence (1)

nodejs/src/client.ts:322

stop() clears modelsCache without coordinating with an in-flight listModels() call. If listModels() is awaiting the RPC when stop() runs, it can still set this.modelsCache = models afterwards, leaving a non-null cache even though the client disconnected. Consider invalidating via the same locking/single-flight mechanism (or a connection “generation” token check) so in-flight fetches can’t repopulate the cache after disconnect.

        // Clear models cache
        this.modelsCache = null;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-30T22:50:05Z

python/copilot/client.py

+            # Check cache (already inside lock)
+            if self._models_cache is not None:
+                return list(self._models_cache)  # Return a copy to prevent cache mutation
+


Returning list(self._models_cache) only protects against callers mutating the list container; ModelInfo is a mutable dataclass, so callers can still mutate cached objects and affect subsequent list_models() results. If you want true defensive copies, consider copying/cloning the ModelInfo instances (or making them immutable); otherwise clarify in the comment/docstring that this is a shallow copy of the list.

Copilot · 2026-01-30T22:50:06Z

python/e2e/test_client.py

+            # First call should fetch from backend
+            models1 = await client.list_models()
+            assert isinstance(models1, list)
+
+            # Second call should return from cache (different list object but same content)
+            models2 = await client.list_models()
+            assert models2 is not models1, "Should return a copy, not the same object"
+            assert len(models2) == len(models1), "Cached results should have same content"
+            if len(models1) > 0:
+                assert models1[0].id == models2[0].id, "Cached models should match"
+


This test doesn’t currently validate that caching is happening: even without caching, two list_models() calls will return different list objects with similar contents, so the assertions would still pass. Consider instrumenting/spying on the underlying RPC request (e.g., wrap client._client.request to count models.list calls) and/or running concurrent list_models() calls to assert only one backend request occurs, plus explicitly asserting the cache is cleared after stop().

Copilot · 2026-01-30T22:50:06Z

nodejs/src/client.ts

+        // Use promise-based locking to prevent race condition with concurrent calls
+        await this.modelsCacheLock;
+
+        let resolveLock: () => void;
+        this.modelsCacheLock = new Promise((resolve) => {
+            resolveLock = resolve;
+        });
+
+        try {
+            // Check cache (already inside lock)
+            if (this.modelsCache !== null) {
+                return [...this.modelsCache]; // Return a copy to prevent cache mutation
+            }
+
+            // Cache miss - fetch from backend while holding lock
+            const result = await this.connection.sendRequest("models.list", {});
+            const response = result as { models: ModelInfo[] };
+            const models = response.models;
+
+            // Update cache before releasing lock
+            this.modelsCache = models;
+
+            return [...models]; // Return a copy to prevent cache mutation
+        } finally {
+            resolveLock!();
+        }


The promise-based “lock” here isn’t actually mutual exclusion: if multiple callers enter while modelsCacheLock is already resolved, they can all pass await this.modelsCacheLock and then race to overwrite this.modelsCacheLock, leading to multiple concurrent models.list requests (the exact rate-limit issue this change is trying to fix). Consider using an atomic chaining approach (e.g., chaining onto the previous lock promise) or a dedicated async mutex / single-flight inFlightModelsPromise that all callers await.

Suggested change

// Use promise-based locking to prevent race condition with concurrent calls

await this.modelsCacheLock;

let resolveLock: () => void;

this.modelsCacheLock = new Promise((resolve) => {

resolveLock = resolve;

});

try {

// Check cache (already inside lock)

if (this.modelsCache !== null) {

return [...this.modelsCache]; // Return a copy to prevent cache mutation

}

// Cache miss - fetch from backend while holding lock

const result = await this.connection.sendRequest("models.list", {});

const response = result as { models: ModelInfo[] };

const models = response.models;

// Update cache before releasing lock

this.modelsCache = models;

return [...models]; // Return a copy to prevent cache mutation

} finally {

resolveLock!();

}

// Chain onto the existing lock promise to ensure mutual exclusion and single-flight behavior.

// Any previous rejection is caught so it does not poison future lock acquisitions.

const lockPromise = this.modelsCacheLock = this.modelsCacheLock

.catch(() => {})

.then(async () => {

// Inside the critical section: check and populate the cache if needed.

if (this.modelsCache === null) {

const result = await this.connection!.sendRequest("models.list", {});

const response = result as { models: ModelInfo[] };

const models = response.models;

this.modelsCache = models;

}

});

// Wait for the critical section (including any in-flight fetch) to complete.

await lockPromise;

// At this point the cache should be populated unless the request failed,

// in which case the error will have been thrown by await lockPromise above.

if (this.modelsCache === null) {

throw new Error("Failed to load models");

}

// Return a copy to prevent callers from mutating the cache.

return [...this.modelsCache];

Copilot · 2026-01-30T22:50:07Z

dotnet/src/Client.cs

        catch (Exception ex) { errors?.Add(ex); }

+        // Clear models cache
+        _modelsCache = null;


CleanupConnectionAsync clears _modelsCache without taking _modelsCacheLock. If ListModelsAsync is in-flight, it can repopulate _modelsCache after the disconnect cleanup runs, which breaks the “cache cleared on disconnect” guarantee. Consider acquiring _modelsCacheLock (or otherwise synchronizing) when clearing the cache during cleanup.

Suggested change

_modelsCache = null;

lock (_modelsCacheLock)

{

_modelsCache = null;

}

Copilot · 2026-01-30T22:50:07Z

dotnet/src/Client.cs

+            if (_modelsCache is not null)
+            {
+                return new List<ModelInfo>(_modelsCache); // Return a copy to prevent cache mutation
+            }


new List<ModelInfo>(_modelsCache) only copies the list container; ModelInfo is a mutable reference type (settable properties), so callers can still mutate the cached objects and affect subsequent results. If the goal is to prevent cached data mutation, consider cloning the ModelInfo objects (deep-ish copy) or making the model types immutable; otherwise, please adjust the comment/remarks to clarify this is only a shallow collection copy.

Initial plan

407217f

Copilot AI assigned Copilot and friggeri Jan 30, 2026

Copilot started work on behalf of friggeri January 30, 2026 17:17 View session

Copilot AI and others added 4 commits January 30, 2026 17:21

Add caching for list_models to prevent rate limiting

bc3aa20

Co-authored-by: friggeri <[email protected]>

Fix linting: remove quotes from type annotation

0c49ace

Fix race condition in list_models caching and add missing test decorator

22008db

Co-authored-by: friggeri <[email protected]>

Return copies of cached models list and improve test robustness

a178775

Co-authored-by: friggeri <[email protected]>

Copilot AI changed the title ~~[WIP] Fix rate limiting issue with concurrent send_and_wait calls~~ Cache list_models to prevent rate limiting under concurrency Jan 30, 2026

Copilot finished work on behalf of friggeri January 30, 2026 17:29

Copilot AI requested a review from friggeri January 30, 2026 17:29

Copilot started work on behalf of friggeri January 30, 2026 19:14 View session

Add list_models caching across all SDK languages (nodejs, dotnet, go)

72fb8c1

Co-authored-by: friggeri <[email protected]>

Copilot AI changed the title ~~Cache list_models to prevent rate limiting under concurrency~~ Cache list_models across all SDK languages to prevent rate limiting under concurrency Jan 30, 2026

msftnate mentioned this pull request Jan 30, 2026

[BUG] Concurrent send_and_wait calls with model specified in session triggers Failed to list models: 429 rate limiting #299

Open

Copilot finished work on behalf of friggeri January 30, 2026 20:55

friggeri marked this pull request as ready for review January 30, 2026 22:43

friggeri requested a review from a team as a code owner January 30, 2026 22:43

Copilot AI review requested due to automatic review settings January 30, 2026 22:43

Copilot started reviewing on behalf of friggeri January 30, 2026 22:43 View session

Merge branch 'main' into copilot/fix-rate-limiting-issue

6218712

Copilot AI reviewed Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache list_models across all SDK languages to prevent rate limiting under concurrency #300

Cache list_models across all SDK languages to prevent rate limiting under concurrency #300

Copilot AI commented Jan 30, 2026 •

edited

Loading

Uh oh!

friggeri commented Jan 30, 2026

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

Copilot AI commented Jan 30, 2026

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cache list_models across all SDK languages to prevent rate limiting under concurrency #300

Are you sure you want to change the base?

Cache list_models across all SDK languages to prevent rate limiting under concurrency #300

Conversation

Copilot AI commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Implementation by Language

Python

Node.js

.NET

Go

Environment

Steps to Reproduce

Output

Potential Expected Behavior

Comments on the Issue (you are @copilot in this section)

Uh oh!

friggeri commented Jan 30, 2026

Uh oh!

github-actions bot commented Jan 30, 2026

Cross-SDK Consistency Review

Current Status Across SDKs

Analysis

Recommendation

This PR

Uh oh!

Copilot AI commented Jan 30, 2026

Uh oh!

github-actions bot commented Jan 30, 2026

✅ Cross-SDK Consistency Review

✅ What's Consistent

⚠️ Minor Gap: Test Coverage

🎯 Verdict

Uh oh!

github-actions bot commented Jan 30, 2026

Cross-SDK Consistency Review ✅

✅ Implementation Consistency

⚠️ Test Coverage Gap

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 30, 2026 •

edited

Loading