feat(ollama): report model capabilities + details on /api/tags and /api/show#9766
Merged
Conversation
…pi/show
Ollama-compatible clients (Open WebUI, Enchanted, ollama-grid-search,
etc.) rely on the `capabilities` list and `details.{parameter_size,
quantization_level,families}` fields returned by /api/tags and
/api/show to decide which models are eligible for a given task --
for example to filter the "embedding model" picker. Upstream Ollama
returns these; LocalAI's compat layer was leaving them empty, so
embedding models were silently rejected by clients that only allow
chat models for chat and only allow embedding models for embeddings.
This wires up the existing config signals already present in
ModelConfig:
- modelCapabilities() derives the Ollama capability strings from the
config: "embedding" (FLAG_EMBEDDINGS), "completion" (FLAG_CHAT /
FLAG_COMPLETION), "vision" (explicit KnownUsecases bit or MMProj /
multimodal template / backend media marker), "tools" (auto-detected
ToolFormatMarkers, JSON/Response regex, XML format, grammar
triggers), "thinking" (ReasoningConfig with reasoning not disabled)
and "insert" (presence of a completion template).
- modelDetailsFromModelConfig() now fills families, parameter_size
and quantization_level. The latter two are parsed from the GGUF
filename via regex -- conservative tokens only (Q*/IQ*/F16/F32/BF16
and \d+(\.\d+)?[BM] surrounded by separators) so we don't accidentally
match "Qwen3" as "3B".
- modelInfoFromModelConfig() exposes general.architecture and
general.context_length in the new ShowResponse.model_info map.
Note: HasUsecases(FLAG_VISION) cannot be used directly -- GuessUsecases
has no FLAG_VISION case and returns true at the end for any chat model.
hasVisionSupport() instead reads KnownUsecases explicitly plus MMProj /
template / media-marker signals.
Tests are written first (TDD) using Ginkgo/Gomega -- DescribeTable for
the capability mapping (embedding-only, chat, vision, thinking, tools
via markers, tools via JSON regex, no-capability rerank) plus
integration tests against ShowModelEndpoint that round-trip JSON
through a real ModelConfigLoader populated from a temp YAML file.
Fixes #9760.
Signed-off-by: Ettore Di Giacinto <[email protected]>
Assisted-by: Claude Code:claude-opus-4-7
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ollama-compatible clients (Open WebUI, Enchanted, ollama-grid-search, etc.) rely on the
capabilitieslist anddetails.{parameter_size,quantization_level,families}fields returned by/api/tagsand/api/showto filter which models are eligible for a given task. LocalAI's compat layer was leaving these empty, so clients silently rejected embedding/rerank models for chat and refused to surface them as embedding models.This PR wires up the existing signals already present in
ModelConfig:capabilities:embedding—FLAG_EMBEDDINGScompletion—FLAG_CHAT/FLAG_COMPLETIONvision— explicitKnownUsecasesbit,MMProj, multimodal template, or backend-reported media markertools— auto-detectedToolFormatMarkers,JSONRegexMatch/ResponseRegex,XMLFormat[Preset], grammar triggers, or schema typethinking—ReasoningConfigwith reasoning not disabled, or explicit thinking markersinsert— presence of a completion templatedetails.families— populated from the backend name (was previously empty)details.parameter_size/details.quantization_level— parsed from the GGUF filename via conservative regex (won't matchQwen3as3B)model_info.general.architecture/model_info.general.context_length— exposed in the newmodel_infomap onShowResponseA small gotcha worth flagging:
cfg.HasUsecases(FLAG_VISION)can't be used directly becauseGuessUsecaseshas noFLAG_VISIONcase and ends up returningtruefor every chat model.hasVisionSupport()instead checksKnownUsecasesbits explicitly plus the multimodal projector / template / media-marker signals. This is documented inline.Fixes #9760.
Test plan
Built with TDD using Ginkgo/Gomega — 11 new specs in
capabilities_test.go+models_test.go:DescribeTableformodelCapabilities: embedding-only, plain chat, vision-capable chat, thinking, tools via auto-detected markers, tools via explicit JSON regex, pure backend-only model (rerank) returns no capsmodelDetailsFromModelConfigpopulatesformat=gguf,family,families, parsesQwen3-4B-Instruct-Q4_K_M.gguf→parameter_size=4B,quantization_level=Q4_K_M/api/showJSON tests using a realModelConfigLoaderpopulated from a temp YAML file — round-trips throughShowModelEndpointand verifiesCapabilities+Details.{ParameterSize,QuantizationLevel,Format,Families}Manual verification for the reviewer
To reproduce the issue from #9760 against the patched build, with any embedding model configured:
Open WebUI / Enchanted / ollama-grid-search should now correctly surface embedding models in their embedding-model picker, and stop offering them as chat models.