Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat(ollama): report model capabilities + details on /api/tags and /api/show#9766

Merged
mudler merged 1 commit into
masterfrom
fix/ollama-api-model-attributes
May 11, 2026
Merged

feat(ollama): report model capabilities + details on /api/tags and /api/show#9766
mudler merged 1 commit into
masterfrom
fix/ollama-api-model-attributes

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

Summary

Ollama-compatible clients (Open WebUI, Enchanted, ollama-grid-search, etc.) rely on the capabilities list and details.{parameter_size,quantization_level,families} fields returned by /api/tags and /api/show to filter which models are eligible for a given task. LocalAI's compat layer was leaving these empty, so clients silently rejected embedding/rerank models for chat and refused to surface them as embedding models.

This PR wires up the existing signals already present in ModelConfig:

  • capabilities:
    • embeddingFLAG_EMBEDDINGS
    • completionFLAG_CHAT / FLAG_COMPLETION
    • vision — explicit KnownUsecases bit, MMProj, multimodal template, or backend-reported media marker
    • tools — auto-detected ToolFormatMarkers, JSONRegexMatch/ResponseRegex, XMLFormat[Preset], grammar triggers, or schema type
    • thinkingReasoningConfig with reasoning not disabled, or explicit thinking markers
    • insert — presence of a completion template
  • details.families — populated from the backend name (was previously empty)
  • details.parameter_size / details.quantization_level — parsed from the GGUF filename via conservative regex (won't match Qwen3 as 3B)
  • model_info.general.architecture / model_info.general.context_length — exposed in the new model_info map on ShowResponse

A small gotcha worth flagging: cfg.HasUsecases(FLAG_VISION) can't be used directly because GuessUsecases has no FLAG_VISION case and ends up returning true for every chat model. hasVisionSupport() instead checks KnownUsecases bits explicitly plus the multimodal projector / template / media-marker signals. This is documented inline.

Fixes #9760.

Test plan

Built with TDD using Ginkgo/Gomega — 11 new specs in capabilities_test.go + models_test.go:

  • DescribeTable for modelCapabilities: embedding-only, plain chat, vision-capable chat, thinking, tools via auto-detected markers, tools via explicit JSON regex, pure backend-only model (rerank) returns no caps
  • modelDetailsFromModelConfig populates format=gguf, family, families, parses Qwen3-4B-Instruct-Q4_K_M.ggufparameter_size=4B, quantization_level=Q4_K_M
  • End-to-end /api/show JSON tests using a real ModelConfigLoader populated from a temp YAML file — round-trips through ShowModelEndpoint and verifies Capabilities + Details.{ParameterSize,QuantizationLevel,Format,Families}
$ go test ./core/http/endpoints/ollama/...
ok  	github.com/mudler/LocalAI/core/http/endpoints/ollama	0.037s
$ go test ./core/schema/... ./core/config/...
ok  	github.com/mudler/LocalAI/core/schema	0.006s
ok  	github.com/mudler/LocalAI/core/config	0.349s
ok  	github.com/mudler/LocalAI/core/config/meta	0.017s

Manual verification for the reviewer

To reproduce the issue from #9760 against the patched build, with any embedding model configured:

$ curl -s http://localhost:8080/api/show -d '{"name":"qwen3-embedding"}' | jq '.capabilities, .details'
[
  "embedding"
]
{
  "format": "gguf",
  "family": "llama-cpp",
  "families": ["llama-cpp"],
  "parameter_size": "4B",
  "quantization_level": "Q4_K_M"
}

Open WebUI / Enchanted / ollama-grid-search should now correctly surface embedding models in their embedding-model picker, and stop offering them as chat models.

…pi/show

Ollama-compatible clients (Open WebUI, Enchanted, ollama-grid-search,
etc.) rely on the `capabilities` list and `details.{parameter_size,
quantization_level,families}` fields returned by /api/tags and
/api/show to decide which models are eligible for a given task --
for example to filter the "embedding model" picker. Upstream Ollama
returns these; LocalAI's compat layer was leaving them empty, so
embedding models were silently rejected by clients that only allow
chat models for chat and only allow embedding models for embeddings.

This wires up the existing config signals already present in
ModelConfig:

- modelCapabilities() derives the Ollama capability strings from the
  config: "embedding" (FLAG_EMBEDDINGS), "completion" (FLAG_CHAT /
  FLAG_COMPLETION), "vision" (explicit KnownUsecases bit or MMProj /
  multimodal template / backend media marker), "tools" (auto-detected
  ToolFormatMarkers, JSON/Response regex, XML format, grammar
  triggers), "thinking" (ReasoningConfig with reasoning not disabled)
  and "insert" (presence of a completion template).
- modelDetailsFromModelConfig() now fills families, parameter_size
  and quantization_level. The latter two are parsed from the GGUF
  filename via regex -- conservative tokens only (Q*/IQ*/F16/F32/BF16
  and \d+(\.\d+)?[BM] surrounded by separators) so we don't accidentally
  match "Qwen3" as "3B".
- modelInfoFromModelConfig() exposes general.architecture and
  general.context_length in the new ShowResponse.model_info map.

Note: HasUsecases(FLAG_VISION) cannot be used directly -- GuessUsecases
has no FLAG_VISION case and returns true at the end for any chat model.
hasVisionSupport() instead reads KnownUsecases explicitly plus MMProj /
template / media-marker signals.

Tests are written first (TDD) using Ginkgo/Gomega -- DescribeTable for
the capability mapping (embedding-only, chat, vision, thinking, tools
via markers, tools via JSON regex, no-capability rerank) plus
integration tests against ShowModelEndpoint that round-trip JSON
through a real ModelConfigLoader populated from a temp YAML file.

Fixes #9760.

Signed-off-by: Ettore Di Giacinto <[email protected]>
Assisted-by: Claude Code:claude-opus-4-7
@mudler mudler merged commit bc3fb16 into master May 11, 2026
56 checks passed
@mudler mudler deleted the fix/ollama-api-model-attributes branch May 11, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ollama API is not reporting model attributes needed for proper operation

2 participants