Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 91f3fde

Browse files
committed
Update README and documentation to clarify model selection and expert delegation. Enhance model fetching functions to support filtering by parameters, ensuring only tool-capable models are returned. Adjust limitations documentation to reflect changes in expert model routing and improve clarity on known issues. Update local models documentation to emphasize the impact of model selection on tool-calling quality. Modify tests to validate new model handling and ensure accurate representation in the frontend model list.
1 parent 866caa0 commit 91f3fde

8 files changed

Lines changed: 3438 additions & 262 deletions

File tree

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ It is built for scientists, analysts, and curious people who want a powerful AI
1616

1717
- **Answer questions and take on tasks.** Chat with Kady like any AI assistant. For bigger work, Kady delegates to a specialist "expert" agent that runs with a full Python environment and scientific tools.
1818
- **Run up to 10 chats in parallel.** Open a new tab for each thread of work — every tab keeps its own message history, model, attached files, and cost meter, but all tabs share the project's sandbox so files written in one tab are immediately available in the others. Tabs keep streaming in the background while you switch between them.
19-
- **Pick any AI model, any time.** Choose from 30+ models across 10 providers (OpenAI, Anthropic, Google, xAI, Qwen, and more) with a simple dropdown. Switch models message to message. You can also use free local models through [Ollama](./docs/local-models-ollama.md).
19+
- **Pick any tool-capable AI model, any time.** Choose from the full set of OpenRouter models that support tool calling (OpenAI, Anthropic, Google, xAI, Qwen, and more) with a simple dropdown. Switch the orchestrator and expert models per chat tab. You can also use free local models through [Ollama](./docs/local-models-ollama.md).
2020
- **170+ scientific skills, pre-installed.** Covers genomics, proteomics, drug discovery, materials science, and more. Kady passes the right skills to the expert automatically for each task.
2121
- **326 ready-to-run workflow templates.** Browse a built-in library across 22 disciplines - genomics, drug discovery, finance, astrophysics, and more. Pick one, fill in the blanks, and launch.
2222
- **229 scientific and financial databases.** Connect to databases in 18 categories - Biomedical & Health, Chemistry & Materials, Scholarly Publications, Stock Market, Earth & Climate, Astronomy & Space, and more.
@@ -76,7 +76,7 @@ Press **Ctrl+C** in the terminal.
7676

7777
- **Send a message.** Type a question or task and hit enter. Kady will either answer directly or hand off to an expert for bigger work.
7878
- **Open multiple chats.** Click `+` in the chat tab strip to start a new chat in the same project (up to 10). Double-click a tab title or use the pencil icon to rename it. Closing a tab cancels any turn it had running. The cost pill in the header shows both the active tab's session cost (`sess`) and the project total across every tab (`proj`).
79-
- **Switch models.** Use the model dropdown in the input bar - any message can use any model. Each tab keeps its own model selection.
79+
- **Switch models.** Use the model dropdown in the input bar - any message can use any supported model. Each tab keeps its own orchestrator and expert model selections.
8080
- **Upload files.** Drag files into the file browser or directly onto the input bar. Use `@filename` in your message to reference files.
8181
- **Launch a workflow.** Open the workflows panel, pick one, fill in the blanks, and click Launch. Workflows run in whichever chat tab is currently active.
8282
- **Open Settings** (the gear icon in the top-right) for API keys, MCP servers, browser automation, and appearance.
@@ -87,6 +87,7 @@ Press **Ctrl+C** in the terminal.
8787
These guides live in the [`docs/`](./docs) folder:
8888

8989
- **[Architecture](./docs/architecture.md)** - how the three local services fit together and what each folder in the project is for.
90+
- **[Model selection](./docs/model-selection.md)** - how Kady builds the OpenRouter model list and routes orchestrator vs expert calls.
9091
- **[Custom MCP servers](./docs/custom-mcp-servers.md)** - add your own tools to Kady's expert agents.
9192
- **[Browser automation](./docs/browser-automation.md)** - let Kady drive a real browser.
9293
- **[Local models with Ollama](./docs/local-models-ollama.md)** - run everything with local models, no API keys required.

docs/architecture.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,10 @@ k-dense-byok/
9090
└── sessions.db ← Chat history (SQLite, one session per chat tab)
9191
```
9292

93-
## A note on the expert model
93+
## Model selection and routing
9494

95-
The model you select in Kady's dropdown only applies to Kady (the main agent). When Kady delegates a task, the expert runs through the **Gemini CLI**, which always uses a Gemini model on [OpenRouter](https://openrouter.ai/) regardless of your dropdown choice.
95+
Kady keeps separate model choices for the orchestrator (the main agent) and the delegated expert in each chat tab. Both OpenRouter-hosted choices are routed through the local LiteLLM proxy, which accepts the `openrouter/*` model ids shown in the picker.
9696

97-
The one exception is local Ollama models - if you pick an Ollama model, both Kady and the expert run through your local daemon. See [Local models with Ollama](./local-models-ollama.md).
97+
The expert still runs inside the **Gemini CLI** process, but K-Dense routes that CLI through the same LiteLLM proxy, so it can target any OpenRouter model in the picker that supports tool calling. The recommended expert default is Gemini 3.1 Pro Preview because it has strong native tool use and a large context window, but users can override it per tab.
98+
99+
Local Ollama models are the main exception - if you pick an Ollama model, both Kady and the expert run through your local daemon. See [Local models with Ollama](./local-models-ollama.md) and [Model selection](./model-selection.md).

docs/limitations.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,22 @@
11
# Known Limitations
22

3-
K-Dense BYOK is in beta. The most important rough edges today are all on the expert-delegation path, which relies on the Gemini CLI and Gemini models.
3+
K-Dense BYOK is in beta. The most important rough edges today are on the expert-delegation path, which runs through the Gemini CLI even when the selected expert model is routed through OpenRouter or Ollama.
44

5-
## Gemini models and the Gemini CLI with Skills
5+
## Expert models and the Gemini CLI with Skills
66

7-
The expert delegation system relies on the Gemini CLI, which uses Gemini models to execute tasks with our scientific skills. While this works well for many workflows, there are some rough edges to be aware of:
7+
The expert delegation system relies on the Gemini CLI to execute tasks with our scientific skills. K-Dense routes that CLI through the local LiteLLM proxy, so the expert can use any model in the OpenRouter picker that supports tool calling. Gemini 3.1 Pro Preview remains the recommended expert default because it tends to be strongest for tool-heavy work, but other supported models can be selected per chat tab. While this works well for many workflows, there are some rough edges to be aware of:
88

9-
- **Skill activation is not always reliable.** Gemini models sometimes skip a relevant skill, use it partially, or misinterpret the skill's instructions. This is especially noticeable with complex multi-step skills that require strict adherence to a procedure.
9+
- **Skill activation is not always reliable.** Models sometimes skip a relevant skill, use it partially, or misinterpret the skill's instructions. This is especially noticeable with complex multi-step skills that require strict adherence to a procedure.
1010
- **Tool-calling consistency varies.** The Gemini CLI occasionally drops tool calls mid-execution or calls tools with incorrect arguments, which can cause expert tasks to stall or produce incomplete results.
11-
- **Long-context degradation.** When a skill injects a large amount of context (detailed protocols, multiple reference databases), Gemini models may lose track of earlier instructions or produce less focused output.
12-
- **Structured output can drift.** For skills that require specific output formats (tables, JSON, citations), Gemini models sometimes deviate from the requested structure.
11+
- **Long-context degradation.** When a skill injects a large amount of context (detailed protocols, multiple reference databases), models may lose track of earlier instructions or produce less focused output.
12+
- **Structured output can drift.** For skills that require specific output formats (tables, JSON, citations), models sometimes deviate from the requested structure.
1313

14-
These are upstream limitations of the Gemini model family and the Gemini CLI tooling, not bugs in K-Dense BYOK itself. Google is actively improving both, and we see meaningful progress with every new model release and CLI update. As these improve, the expert delegation experience will get better automatically without any changes on your end.
14+
These are upstream limitations of the selected model and the Gemini CLI tooling, not bugs in K-Dense BYOK itself. As model tool calling and CLI support improve, the expert delegation experience will get better automatically without any changes on your end.
1515

1616
**Workarounds:**
1717

1818
- If a skill isn't behaving as expected, try **re-running the task** - results can vary between runs.
19-
- You can switch Kady's main model (via the dropdown) to a non-Gemini model for the orchestration layer while experts continue to use Gemini under the hood.
19+
- Try a different expert model in the dropdown. The model list is limited to OpenRouter models that advertise `tools` support, but tool-calling quality still varies across providers.
2020

2121
## Ollama / small local models
2222

docs/local-models-ollama.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,13 @@ You can run Kady and the expert agent entirely against local models served by [O
2121

2222
3. **(Optional) Custom Ollama host.** If your Ollama server lives somewhere other than `http://localhost:11434`, set `OLLAMA_BASE_URL` in `kady_agent/.env`.
2323

24-
4. **Pick the model in the app.** Open the model dropdown in the chat input. Pulled models appear under the **Local (Ollama)** section at the bottom. Picking one routes both Kady and the Gemini-CLI-backed expert through your local daemon.
24+
4. **Pick the model in the app.** Open the model dropdown in the chat input. Pulled models appear under the **Local (Ollama)** section at the bottom. Picking one routes Kady, and optionally the Gemini-CLI-backed expert, through your local daemon.
2525

2626
The list is populated live from Ollama's `GET /api/tags` endpoint, so pulling a new model and re-opening the dropdown is enough - no app restart needed.
2727

2828
## Caveats
2929

30-
Local models amplify the limitations of the Gemini CLI tooling (see [Known limitations](./limitations.md)):
30+
Local models amplify the limitations of the Gemini CLI tooling and model tool-calling quality (see [Known limitations](./limitations.md)):
3131

3232
- **Tool-calling fidelity is noticeably weaker** on sub-frontier models.
3333
- **Skills that rely on multi-tool choreography** (browsing, running scripts, producing structured output) are the most fragile.

docs/model-selection.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Model Selection
2+
3+
Kady uses two model choices in each chat tab:
4+
5+
- **Orchestrator model:** the main Kady agent that reads your message, decides what to do, and streams the response.
6+
- **Expert model:** the model used by delegated expert tasks that run inside the Gemini CLI process.
7+
8+
Both choices are stored per tab, so different chats in the same project can use different orchestrator and expert models.
9+
10+
## OpenRouter models
11+
12+
The OpenRouter model picker is generated from models that advertise tool-calling support. Kady sends tool definitions to the orchestrator and expert, so models that do not support the `tools` parameter are excluded from the dropdown.
13+
14+
The generator calls the OpenRouter SDK with:
15+
16+
```python
17+
client.models.list(supported_parameters="tools")
18+
```
19+
20+
The resulting entries are written to `web/src/data/models.json` with ids prefixed as `openrouter/<provider>/<model>`. The LiteLLM proxy has an `openrouter/*` wildcard route, so both the orchestrator and the Gemini CLI-backed expert can use any generated OpenRouter id.
21+
22+
To refresh the checked-in model list:
23+
24+
```bash
25+
uv run python - <<'PY'
26+
from dotenv import load_dotenv
27+
load_dotenv("kady_agent/.env")
28+
29+
from kady_agent.utils import update_models_json
30+
update_models_json()
31+
PY
32+
```
33+
34+
By default, this includes all OpenRouter models with `tools` support, preserves the orchestrator and expert recommended defaults, and omits retired GPT-5.4 base/pro entries.
35+
36+
## Defaults
37+
38+
- The orchestrator default is `openrouter/anthropic/claude-opus-4.7`.
39+
- The expert default is `openrouter/google/gemini-3.1-pro-preview`.
40+
41+
Gemini 3.1 Pro Preview is recommended for expert tasks because expert delegation is tool-heavy and often benefits from a large context window. You can still choose a different tool-capable OpenRouter model per tab.
42+
43+
## Local Ollama models
44+
45+
Pulled Ollama models are discovered live from the local Ollama daemon and appear under the **Local (Ollama)** section in the picker. Selecting an Ollama model routes through the local LiteLLM `ollama/*` wildcard instead of OpenRouter.
46+
47+
Local models are useful for privacy and cost control, but tool-calling quality varies widely. For complex delegated expert tasks, frontier OpenRouter models are usually more reliable.

kady_agent/utils.py

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -161,13 +161,16 @@ def download_scientific_skills(
161161
def fetch_openrouter_models(
162162
api_key: str | None = None,
163163
max_age_days: int | None = None,
164+
supported_parameters: str | None = None,
164165
) -> list[dict]:
165166
"""
166167
Fetch all available models from OpenRouter using the official SDK.
167168
168169
Args:
169170
api_key: OpenRouter API key (falls back to OPENROUTER_API_KEY env var).
170171
max_age_days: If set, only return models created within this many days.
172+
supported_parameters: Comma-separated OpenRouter parameters to require
173+
(for example, "tools" to return only tool-calling models).
171174
172175
Returns a list of dicts, each with:
173176
id, name, provider, context_length, modality, created,
@@ -183,7 +186,7 @@ def fetch_openrouter_models(
183186
)
184187

185188
with OpenRouter(api_key=key) as client:
186-
res = client.models.list()
189+
res = client.models.list(supported_parameters=supported_parameters)
187190

188191
if not res or not res.data:
189192
return []
@@ -235,6 +238,7 @@ def search_openrouter_models(
235238
max_prompt_price: float | None = None,
236239
modality: str | None = None,
237240
max_age_days: int | None = None,
241+
supported_parameters: str | None = None,
238242
api_key: str | None = None,
239243
) -> list[dict]:
240244
"""
@@ -247,9 +251,15 @@ def search_openrouter_models(
247251
max_prompt_price: Maximum prompt price per 1M tokens.
248252
modality: Filter by modality string (e.g. "text->text").
249253
max_age_days: Only include models added within this many days (e.g. 90).
254+
supported_parameters: Comma-separated OpenRouter parameters to require
255+
(for example, "tools" to return only tool-calling models).
250256
api_key: OpenRouter API key (falls back to OPENROUTER_API_KEY env var).
251257
"""
252-
all_models = fetch_openrouter_models(api_key=api_key, max_age_days=max_age_days)
258+
all_models = fetch_openrouter_models(
259+
api_key=api_key,
260+
max_age_days=max_age_days,
261+
supported_parameters=supported_parameters,
262+
)
253263
results = all_models
254264

255265
if query:
@@ -342,26 +352,40 @@ def _pricing_tier(prompt_price: float) -> str:
342352
def update_models_json(
343353
output_path: str = "web/src/data/models.json",
344354
default_model_id: str = "anthropic/claude-opus-4.7",
345-
max_age_days: int = 90,
355+
expert_default_model_id: str = "google/gemini-3.1-pro-preview",
356+
max_age_days: int | None = None,
357+
supported_parameters: str | None = "tools",
358+
excluded_model_ids: set[str] | None = None,
346359
api_key: str | None = None,
347360
) -> None:
348361
"""Fetch models from OpenRouter and overwrite the frontend models.json.
349362
350363
Args:
351364
output_path: Path to the output JSON file.
352365
default_model_id: The OpenRouter model ID to mark as the default.
366+
expert_default_model_id: The OpenRouter model ID to mark as the expert
367+
default in the frontend picker.
353368
max_age_days: Only include models added within this many days.
354-
Pass 0 or None to include all models.
369+
Pass None to include all models.
370+
supported_parameters: Comma-separated OpenRouter parameters to require.
371+
Defaults to "tools" because Kady sends tool definitions.
372+
excluded_model_ids: OpenRouter model IDs to omit even if the API
373+
returns them.
355374
api_key: OpenRouter API key (falls back to OPENROUTER_API_KEY env var).
356375
"""
376+
excluded_model_ids = excluded_model_ids or {"openai/gpt-5.4", "openai/gpt-5.4-pro"}
357377
raw_models = fetch_openrouter_models(
358378
api_key=api_key,
359-
max_age_days=max_age_days or None,
379+
max_age_days=max_age_days,
380+
supported_parameters=supported_parameters,
360381
)
361382
out = Path(output_path)
362383

363384
entries = []
364385
for m in raw_models:
386+
if m["id"] in excluded_model_ids:
387+
continue
388+
365389
p_in = m["pricing"]["prompt_per_1m"]
366390
p_out = m["pricing"]["completion_per_1m"]
367391
if p_in < 0 or p_out < 0:
@@ -380,6 +404,8 @@ def update_models_json(
380404
}
381405
if m["id"] == default_model_id:
382406
entry["default"] = True
407+
if m["id"] == expert_default_model_id:
408+
entry["expertDefault"] = True
383409
entries.append(entry)
384410

385411
tier_order = {"flagship": 0, "high": 1, "mid": 2, "budget": 3}

tests/test_utils_models.py

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,41 @@ def test_search_and_update_models_json(tmp_path, monkeypatch) -> None:
4545
"pricing": {"prompt_per_1m": 0.25, "completion_per_1m": 1.0},
4646
"description": "Fast budget model",
4747
},
48+
{
49+
"id": "google/gemini-3.1-pro-preview",
50+
"name": "Google: Gemini 3.1 Pro Preview",
51+
"provider": "google",
52+
"created": "2026-01-01",
53+
"context_length": 1_000_000,
54+
"modality": "text->text",
55+
"pricing": {"prompt_per_1m": 2.0, "completion_per_1m": 12.0},
56+
"description": "Expert default model",
57+
},
58+
{
59+
"id": "openai/gpt-5.4",
60+
"name": "OpenAI: GPT-5.4",
61+
"provider": "openai",
62+
"created": "2026-01-01",
63+
"context_length": 1_000_000,
64+
"modality": "text->text",
65+
"pricing": {"prompt_per_1m": 2.5, "completion_per_1m": 15.0},
66+
"description": "Retired model",
67+
},
4868
]
49-
monkeypatch.setattr(utils, "fetch_openrouter_models", lambda **kwargs: models)
69+
fetch_kwargs = {}
70+
71+
def fake_fetch_openrouter_models(**kwargs):
72+
fetch_kwargs.update(kwargs)
73+
return models
74+
75+
monkeypatch.setattr(utils, "fetch_openrouter_models", fake_fetch_openrouter_models)
5076

5177
assert [m["id"] for m in utils.search_openrouter_models(query="flash")] == [
5278
"google/gemini-flash"
5379
]
5480
assert [m["provider"] for m in utils.search_openrouter_models(providers=["google"])] == [
55-
"google"
81+
"google",
82+
"google",
5683
]
5784
assert utils.search_openrouter_models(min_context=500_000)[0]["id"] == "google/gemini-flash"
5885
assert utils.search_openrouter_models(max_prompt_price=1.0)[0]["id"] == "google/gemini-flash"
@@ -62,7 +89,12 @@ def test_search_and_update_models_json(tmp_path, monkeypatch) -> None:
6289
data = json.loads(output.read_text(encoding="utf-8"))
6390
assert data[0]["id"] == "openrouter/anthropic/claude-opus-4.7"
6491
assert data[0]["default"] is True
65-
assert data[1]["tier"] == "budget"
92+
assert "openrouter/openai/gpt-5.4" not in {m["id"] for m in data}
93+
assert fetch_kwargs["supported_parameters"] == "tools"
94+
assert any(
95+
m["id"] == "openrouter/google/gemini-3.1-pro-preview" and m["expertDefault"]
96+
for m in data
97+
)
6698

6799

68100
def test_fetch_openrouter_models_requires_key(monkeypatch) -> None:

0 commit comments

Comments
 (0)