Update README and documentation to clarify model selection and expert delegation. Enhance model fetching functions to support filtering by parameters, ensuring only tool-capable models are returned. Adjust limitations documentation to reflect changes in expert model routing and improve clarity on known issues. Update local models documentation to emphasize the impact of model selection on tool-calling quality. Modify tests to validate new model handling and ensure accurate representation in the frontend model list.

TKassis · TKassis · commit 91f3fdedec56 · 2026-05-05T15:15:32.000-07:00
diff --git a/README.md b/README.md
@@ -16,7 +16,7 @@ It is built for scientists, analysts, and curious people who want a powerful AI
 
 - **Answer questions and take on tasks.** Chat with Kady like any AI assistant. For bigger work, Kady delegates to a specialist "expert" agent that runs with a full Python environment and scientific tools.
 - **Run up to 10 chats in parallel.** Open a new tab for each thread of work — every tab keeps its own message history, model, attached files, and cost meter, but all tabs share the project's sandbox so files written in one tab are immediately available in the others. Tabs keep streaming in the background while you switch between them.
-- **Pick any AI model, any time.** Choose from 30+ models across 10 providers (OpenAI, Anthropic, Google, xAI, Qwen, and more) with a simple dropdown. Switch models message to message. You can also use free local models through [Ollama](./docs/local-models-ollama.md).
+- **Pick any tool-capable AI model, any time.** Choose from the full set of OpenRouter models that support tool calling (OpenAI, Anthropic, Google, xAI, Qwen, and more) with a simple dropdown. Switch the orchestrator and expert models per chat tab. You can also use free local models through [Ollama](./docs/local-models-ollama.md).
 - **170+ scientific skills, pre-installed.** Covers genomics, proteomics, drug discovery, materials science, and more. Kady passes the right skills to the expert automatically for each task.
 - **326 ready-to-run workflow templates.** Browse a built-in library across 22 disciplines - genomics, drug discovery, finance, astrophysics, and more. Pick one, fill in the blanks, and launch.
 - **229 scientific and financial databases.** Connect to databases in 18 categories - Biomedical & Health, Chemistry & Materials, Scholarly Publications, Stock Market, Earth & Climate, Astronomy & Space, and more.
@@ -76,7 +76,7 @@ Press **Ctrl+C** in the terminal.
 
 - **Send a message.** Type a question or task and hit enter. Kady will either answer directly or hand off to an expert for bigger work.
 - **Open multiple chats.** Click `+` in the chat tab strip to start a new chat in the same project (up to 10). Double-click a tab title or use the pencil icon to rename it. Closing a tab cancels any turn it had running. The cost pill in the header shows both the active tab's session cost (`sess`) and the project total across every tab (`proj`).
-- **Switch models.** Use the model dropdown in the input bar - any message can use any model. Each tab keeps its own model selection.
+- **Switch models.** Use the model dropdown in the input bar - any message can use any supported model. Each tab keeps its own orchestrator and expert model selections.
 - **Upload files.** Drag files into the file browser or directly onto the input bar. Use `@filename` in your message to reference files.
 - **Launch a workflow.** Open the workflows panel, pick one, fill in the blanks, and click Launch. Workflows run in whichever chat tab is currently active.
 - **Open Settings** (the gear icon in the top-right) for API keys, MCP servers, browser automation, and appearance.
@@ -87,6 +87,7 @@ Press **Ctrl+C** in the terminal.
 These guides live in the [`docs/`](./docs) folder:
 
 - **[Architecture](./docs/architecture.md)** - how the three local services fit together and what each folder in the project is for.
+- **[Model selection](./docs/model-selection.md)** - how Kady builds the OpenRouter model list and routes orchestrator vs expert calls.
 - **[Custom MCP servers](./docs/custom-mcp-servers.md)** - add your own tools to Kady's expert agents.
 - **[Browser automation](./docs/browser-automation.md)** - let Kady drive a real browser.
 - **[Local models with Ollama](./docs/local-models-ollama.md)** - run everything with local models, no API keys required.
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -90,8 +90,10 @@ k-dense-byok/
         └── sessions.db       ← Chat history (SQLite, one session per chat tab)
 ```
 
-## A note on the expert model
+## Model selection and routing
 
-The model you select in Kady's dropdown only applies to Kady (the main agent). When Kady delegates a task, the expert runs through the **Gemini CLI**, which always uses a Gemini model on [OpenRouter](https://openrouter.ai/) regardless of your dropdown choice.
+Kady keeps separate model choices for the orchestrator (the main agent) and the delegated expert in each chat tab. Both OpenRouter-hosted choices are routed through the local LiteLLM proxy, which accepts the `openrouter/*` model ids shown in the picker.
 
-The one exception is local Ollama models - if you pick an Ollama model, both Kady and the expert run through your local daemon. See [Local models with Ollama](./local-models-ollama.md).
+The expert still runs inside the **Gemini CLI** process, but K-Dense routes that CLI through the same LiteLLM proxy, so it can target any OpenRouter model in the picker that supports tool calling. The recommended expert default is Gemini 3.1 Pro Preview because it has strong native tool use and a large context window, but users can override it per tab.
+
+Local Ollama models are the main exception - if you pick an Ollama model, both Kady and the expert run through your local daemon. See [Local models with Ollama](./local-models-ollama.md) and [Model selection](./model-selection.md).
diff --git a/docs/limitations.md b/docs/limitations.md
@@ -1,22 +1,22 @@
 # Known Limitations
 
-K-Dense BYOK is in beta. The most important rough edges today are all on the expert-delegation path, which relies on the Gemini CLI and Gemini models.
+K-Dense BYOK is in beta. The most important rough edges today are on the expert-delegation path, which runs through the Gemini CLI even when the selected expert model is routed through OpenRouter or Ollama.
 
-## Gemini models and the Gemini CLI with Skills
+## Expert models and the Gemini CLI with Skills
 
-The expert delegation system relies on the Gemini CLI, which uses Gemini models to execute tasks with our scientific skills. While this works well for many workflows, there are some rough edges to be aware of:
+The expert delegation system relies on the Gemini CLI to execute tasks with our scientific skills. K-Dense routes that CLI through the local LiteLLM proxy, so the expert can use any model in the OpenRouter picker that supports tool calling. Gemini 3.1 Pro Preview remains the recommended expert default because it tends to be strongest for tool-heavy work, but other supported models can be selected per chat tab. While this works well for many workflows, there are some rough edges to be aware of:
 
-- **Skill activation is not always reliable.** Gemini models sometimes skip a relevant skill, use it partially, or misinterpret the skill's instructions. This is especially noticeable with complex multi-step skills that require strict adherence to a procedure.
+- **Skill activation is not always reliable.** Models sometimes skip a relevant skill, use it partially, or misinterpret the skill's instructions. This is especially noticeable with complex multi-step skills that require strict adherence to a procedure.
 - **Tool-calling consistency varies.** The Gemini CLI occasionally drops tool calls mid-execution or calls tools with incorrect arguments, which can cause expert tasks to stall or produce incomplete results.
-- **Long-context degradation.** When a skill injects a large amount of context (detailed protocols, multiple reference databases), Gemini models may lose track of earlier instructions or produce less focused output.
-- **Structured output can drift.** For skills that require specific output formats (tables, JSON, citations), Gemini models sometimes deviate from the requested structure.
+- **Long-context degradation.** When a skill injects a large amount of context (detailed protocols, multiple reference databases), models may lose track of earlier instructions or produce less focused output.
+- **Structured output can drift.** For skills that require specific output formats (tables, JSON, citations), models sometimes deviate from the requested structure.
 
-These are upstream limitations of the Gemini model family and the Gemini CLI tooling, not bugs in K-Dense BYOK itself. Google is actively improving both, and we see meaningful progress with every new model release and CLI update. As these improve, the expert delegation experience will get better automatically without any changes on your end.
+These are upstream limitations of the selected model and the Gemini CLI tooling, not bugs in K-Dense BYOK itself. As model tool calling and CLI support improve, the expert delegation experience will get better automatically without any changes on your end.
 
 **Workarounds:**
 
 - If a skill isn't behaving as expected, try **re-running the task** - results can vary between runs.
-- You can switch Kady's main model (via the dropdown) to a non-Gemini model for the orchestration layer while experts continue to use Gemini under the hood.
+- Try a different expert model in the dropdown. The model list is limited to OpenRouter models that advertise `tools` support, but tool-calling quality still varies across providers.
 
 ## Ollama / small local models
 
diff --git a/docs/local-models-ollama.md b/docs/local-models-ollama.md
@@ -21,13 +21,13 @@ You can run Kady and the expert agent entirely against local models served by [O
 
 3. **(Optional) Custom Ollama host.** If your Ollama server lives somewhere other than `http://localhost:11434`, set `OLLAMA_BASE_URL` in `kady_agent/.env`.
 
-4. **Pick the model in the app.** Open the model dropdown in the chat input. Pulled models appear under the **Local (Ollama)** section at the bottom. Picking one routes both Kady and the Gemini-CLI-backed expert through your local daemon.
+4. **Pick the model in the app.** Open the model dropdown in the chat input. Pulled models appear under the **Local (Ollama)** section at the bottom. Picking one routes Kady, and optionally the Gemini-CLI-backed expert, through your local daemon.
 
 The list is populated live from Ollama's `GET /api/tags` endpoint, so pulling a new model and re-opening the dropdown is enough - no app restart needed.
 
 ## Caveats
 
-Local models amplify the limitations of the Gemini CLI tooling (see [Known limitations](./limitations.md)):
+Local models amplify the limitations of the Gemini CLI tooling and model tool-calling quality (see [Known limitations](./limitations.md)):
 
 - **Tool-calling fidelity is noticeably weaker** on sub-frontier models.
 - **Skills that rely on multi-tool choreography** (browsing, running scripts, producing structured output) are the most fragile.
diff --git a/docs/model-selection.md b/docs/model-selection.md
@@ -0,0 +1,47 @@
+# Model Selection
+
+Kady uses two model choices in each chat tab:
+
+- **Orchestrator model:** the main Kady agent that reads your message, decides what to do, and streams the response.
+- **Expert model:** the model used by delegated expert tasks that run inside the Gemini CLI process.
+
+Both choices are stored per tab, so different chats in the same project can use different orchestrator and expert models.
+
+## OpenRouter models
+
+The OpenRouter model picker is generated from models that advertise tool-calling support. Kady sends tool definitions to the orchestrator and expert, so models that do not support the `tools` parameter are excluded from the dropdown.
+
+The generator calls the OpenRouter SDK with:
+
+```python
+client.models.list(supported_parameters="tools")
+```
+
+The resulting entries are written to `web/src/data/models.json` with ids prefixed as `openrouter/<provider>/<model>`. The LiteLLM proxy has an `openrouter/*` wildcard route, so both the orchestrator and the Gemini CLI-backed expert can use any generated OpenRouter id.
+
+To refresh the checked-in model list:
+
+```bash
+uv run python - <<'PY'
+from dotenv import load_dotenv
+load_dotenv("kady_agent/.env")
+
+from kady_agent.utils import update_models_json
+update_models_json()
+PY
+```
+
+By default, this includes all OpenRouter models with `tools` support, preserves the orchestrator and expert recommended defaults, and omits retired GPT-5.4 base/pro entries.
+
+## Defaults
+
+- The orchestrator default is `openrouter/anthropic/claude-opus-4.7`.
+- The expert default is `openrouter/google/gemini-3.1-pro-preview`.
+
+Gemini 3.1 Pro Preview is recommended for expert tasks because expert delegation is tool-heavy and often benefits from a large context window. You can still choose a different tool-capable OpenRouter model per tab.
+
+## Local Ollama models
+
+Pulled Ollama models are discovered live from the local Ollama daemon and appear under the **Local (Ollama)** section in the picker. Selecting an Ollama model routes through the local LiteLLM `ollama/*` wildcard instead of OpenRouter.
+
+Local models are useful for privacy and cost control, but tool-calling quality varies widely. For complex delegated expert tasks, frontier OpenRouter models are usually more reliable.
diff --git a/kady_agent/utils.py b/kady_agent/utils.py
@@ -161,13 +161,16 @@ def download_scientific_skills(
 def fetch_openrouter_models(
     api_key: str | None = None,
     max_age_days: int | None = None,
+    supported_parameters: str | None = None,
 ) -> list[dict]:
     """
     Fetch all available models from OpenRouter using the official SDK.
 
     Args:
         api_key: OpenRouter API key (falls back to OPENROUTER_API_KEY env var).
         max_age_days: If set, only return models created within this many days.
+        supported_parameters: Comma-separated OpenRouter parameters to require
+            (for example, "tools" to return only tool-calling models).
 
     Returns a list of dicts, each with:
         id, name, provider, context_length, modality, created,
@@ -183,7 +186,7 @@ def fetch_openrouter_models(
         )
 
     with OpenRouter(api_key=key) as client:
-        res = client.models.list()
+        res = client.models.list(supported_parameters=supported_parameters)
 
     if not res or not res.data:
         return []
@@ -235,6 +238,7 @@ def search_openrouter_models(
     max_prompt_price: float | None = None,
     modality: str | None = None,
     max_age_days: int | None = None,
+    supported_parameters: str | None = None,
     api_key: str | None = None,
 ) -> list[dict]:
     """
@@ -247,9 +251,15 @@ def search_openrouter_models(
         max_prompt_price: Maximum prompt price per 1M tokens.
         modality: Filter by modality string (e.g. "text->text").
         max_age_days: Only include models added within this many days (e.g. 90).
+        supported_parameters: Comma-separated OpenRouter parameters to require
+            (for example, "tools" to return only tool-calling models).
         api_key: OpenRouter API key (falls back to OPENROUTER_API_KEY env var).
     """
-    all_models = fetch_openrouter_models(api_key=api_key, max_age_days=max_age_days)
+    all_models = fetch_openrouter_models(
+        api_key=api_key,
+        max_age_days=max_age_days,
+        supported_parameters=supported_parameters,
+    )
     results = all_models
 
     if query:
@@ -342,26 +352,40 @@ def _pricing_tier(prompt_price: float) -> str:
 def update_models_json(
     output_path: str = "web/src/data/models.json",
     default_model_id: str = "anthropic/claude-opus-4.7",
-    max_age_days: int = 90,
+    expert_default_model_id: str = "google/gemini-3.1-pro-preview",
+    max_age_days: int | None = None,
+    supported_parameters: str | None = "tools",
+    excluded_model_ids: set[str] | None = None,
     api_key: str | None = None,
 ) -> None:
     """Fetch models from OpenRouter and overwrite the frontend models.json.
 
     Args:
         output_path: Path to the output JSON file.
         default_model_id: The OpenRouter model ID to mark as the default.
+        expert_default_model_id: The OpenRouter model ID to mark as the expert
+            default in the frontend picker.
         max_age_days: Only include models added within this many days.
-            Pass 0 or None to include all models.
+            Pass None to include all models.
+        supported_parameters: Comma-separated OpenRouter parameters to require.
+            Defaults to "tools" because Kady sends tool definitions.
+        excluded_model_ids: OpenRouter model IDs to omit even if the API
+            returns them.
         api_key: OpenRouter API key (falls back to OPENROUTER_API_KEY env var).
     """
+    excluded_model_ids = excluded_model_ids or {"openai/gpt-5.4", "openai/gpt-5.4-pro"}
     raw_models = fetch_openrouter_models(
         api_key=api_key,
-        max_age_days=max_age_days or None,
+        max_age_days=max_age_days,
+        supported_parameters=supported_parameters,
     )
     out = Path(output_path)
 
     entries = []
     for m in raw_models:
+        if m["id"] in excluded_model_ids:
+            continue
+
         p_in = m["pricing"]["prompt_per_1m"]
         p_out = m["pricing"]["completion_per_1m"]
         if p_in < 0 or p_out < 0:
@@ -380,6 +404,8 @@ def update_models_json(
         }
         if m["id"] == default_model_id:
             entry["default"] = True
+        if m["id"] == expert_default_model_id:
+            entry["expertDefault"] = True
         entries.append(entry)
 
     tier_order = {"flagship": 0, "high": 1, "mid": 2, "budget": 3}
diff --git a/tests/test_utils_models.py b/tests/test_utils_models.py
@@ -45,14 +45,41 @@ def test_search_and_update_models_json(tmp_path, monkeypatch) -> None:
             "pricing": {"prompt_per_1m": 0.25, "completion_per_1m": 1.0},
             "description": "Fast budget model",
         },
+        {
+            "id": "google/gemini-3.1-pro-preview",
+            "name": "Google: Gemini 3.1 Pro Preview",
+            "provider": "google",
+            "created": "2026-01-01",
+            "context_length": 1_000_000,
+            "modality": "text->text",
+            "pricing": {"prompt_per_1m": 2.0, "completion_per_1m": 12.0},
+            "description": "Expert default model",
+        },
+        {
+            "id": "openai/gpt-5.4",
+            "name": "OpenAI: GPT-5.4",
+            "provider": "openai",
+            "created": "2026-01-01",
+            "context_length": 1_000_000,
+            "modality": "text->text",
+            "pricing": {"prompt_per_1m": 2.5, "completion_per_1m": 15.0},
+            "description": "Retired model",
+        },
     ]
-    monkeypatch.setattr(utils, "fetch_openrouter_models", lambda **kwargs: models)
+    fetch_kwargs = {}
+
+    def fake_fetch_openrouter_models(**kwargs):
+        fetch_kwargs.update(kwargs)
+        return models
+
+    monkeypatch.setattr(utils, "fetch_openrouter_models", fake_fetch_openrouter_models)
 
     assert [m["id"] for m in utils.search_openrouter_models(query="flash")] == [
         "google/gemini-flash"
     ]
     assert [m["provider"] for m in utils.search_openrouter_models(providers=["google"])] == [
-        "google"
+        "google",
+        "google",
     ]
     assert utils.search_openrouter_models(min_context=500_000)[0]["id"] == "google/gemini-flash"
     assert utils.search_openrouter_models(max_prompt_price=1.0)[0]["id"] == "google/gemini-flash"
@@ -62,7 +89,12 @@ def test_search_and_update_models_json(tmp_path, monkeypatch) -> None:
     data = json.loads(output.read_text(encoding="utf-8"))
     assert data[0]["id"] == "openrouter/anthropic/claude-opus-4.7"
     assert data[0]["default"] is True
-    assert data[1]["tier"] == "budget"
+    assert "openrouter/openai/gpt-5.4" not in {m["id"] for m in data}
+    assert fetch_kwargs["supported_parameters"] == "tools"
+    assert any(
+        m["id"] == "openrouter/google/gemini-3.1-pro-preview" and m["expertDefault"]
+        for m in data
+    )
 
 
 def test_fetch_openrouter_models_requires_key(monkeypatch) -> None:
diff --git a/web/src/data/models.json b/web/src/data/models.json