Comparing changes

Bumps node from 25-slim to 26-slim. --- updated-dependencies: - dependency-name: node dependency-version: 26-slim dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v4...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4 to 8. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@v4...v8) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-version: '8' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…to 1.42.0 (#9772) chore(deps): bump github.com/anthropics/anthropic-sdk-go Bumps [github.com/anthropics/anthropic-sdk-go](https://github.com/anthropics/anthropic-sdk-go) from 1.27.0 to 1.42.0. - [Release notes](https://github.com/anthropics/anthropic-sdk-go/releases) - [Changelog](https://github.com/anthropics/anthropic-sdk-go/blob/main/CHANGELOG.md) - [Commits](anthropics/anthropic-sdk-go@v1.27.0...v1.42.0) --- updated-dependencies: - dependency-name: github.com/anthropics/anthropic-sdk-go dependency-version: 1.42.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.39.1 to 1.40.0. - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.1...v1.40.0) --- updated-dependencies: - dependency-name: github.com/onsi/gomega dependency-version: 1.40.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…in /backend/python/transformers (#9775) chore(deps): update transformers requirement Updates the requirements on [transformers](https://github.com/huggingface/transformers) to permit the latest version. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](huggingface/transformers@v5.0.0...v5.8.0) --- updated-dependencies: - dependency-name: transformers dependency-version: 5.8.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…9778) Bumps [github.com/fsnotify/fsnotify](https://github.com/fsnotify/fsnotify) from 1.9.0 to 1.10.1. - [Release notes](https://github.com/fsnotify/fsnotify/releases) - [Changelog](https://github.com/fsnotify/fsnotify/blob/main/CHANGELOG.md) - [Commits](fsnotify/fsnotify@v1.9.0...v1.10.1) --- updated-dependencies: - dependency-name: github.com/fsnotify/fsnotify dependency-version: 1.10.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…3.4.7 in /backend/python/vllm (#9779) chore(deps): update charset-normalizer requirement Updates the requirements on [charset-normalizer](https://github.com/jawah/charset_normalizer) to permit the latest version. - [Release notes](https://github.com/jawah/charset_normalizer/releases) - [Changelog](https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md) - [Commits](jawah/charset_normalizer@3.4.0...3.4.7) --- updated-dependencies: - dependency-name: charset-normalizer dependency-version: 3.4.7 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

) Bumps [github.com/mudler/edgevpn](https://github.com/mudler/edgevpn) from 0.31.1 to 0.32.2. - [Release notes](https://github.com/mudler/edgevpn/releases) - [Commits](mudler/edgevpn@v0.31.1...v0.32.2) --- updated-dependencies: - dependency-name: github.com/mudler/edgevpn dependency-version: 0.32.2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix: parse vulkan VRAM from text Assisted-by: opencode:gpt-5.5 Signed-off-by: Andreas Egli <[email protected]> * fix: replace string.split with streaming iteration Assisted-by: Opencode:Gemma4 Signed-off-by: Andreas Egli <[email protected]> --------- Signed-off-by: Andreas Egli <[email protected]>

…dates (#9728) Bumps the npm_and_yarn group with 3 updates in the /core/http/react-ui directory: [fast-uri](https://github.com/fastify/fast-uri), [hono](https://github.com/honojs/hono) and [ip-address](https://github.com/beaugunderson/ip-address). Updates `fast-uri` from 3.1.0 to 3.1.2 - [Release notes](https://github.com/fastify/fast-uri/releases) - [Commits](fastify/fast-uri@v3.1.0...v3.1.2) Updates `hono` from 4.12.14 to 4.12.18 - [Release notes](https://github.com/honojs/hono/releases) - [Commits](honojs/hono@v4.12.14...v4.12.18) Updates `ip-address` from 10.1.0 to 10.2.0 - [Commits](https://github.com/beaugunderson/ip-address/commits) --- updated-dependencies: - dependency-name: fast-uri dependency-version: 3.1.2 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: hono dependency-version: 4.12.18 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: ip-address dependency-version: 10.2.0 dependency-type: indirect dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…9780) Ollama's embedding endpoint accepts both `input` and `prompt` as the input string value (see ollama/ollama docs/api.md#generate-embeddings). LocalAI only accepted `input`, which broke client libraries that send the `prompt` form. Add `Prompt` to OllamaEmbedRequest and have GetInputStrings fall back to it when Input is unset. Input still wins when both are provided. Fixes #9767. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <[email protected]> Co-authored-by: Ettore Di Giacinto <[email protected]>

* ci: close the GC race + cascade-skip + darwin grpc gaps from v4.2.1 v4.2.1's backend.yml run (#25701862853) exposed three independent issues on top of the singletons fix shipped in ea00199. Address all three plus two related cleanups: 1. quay GC race in backend-merge-jobs-multiarch (12/37 merges failed with "manifest not found"). Even after PR #9746 split multi/single-arch merges, the multiarch matrix itself takes ~2h to drain at max-parallel: 8, and the earliest per-arch digests (push-by-digest, no tag) get reaped by quay's GC before the merge runs. The split bounded the race for multiarch; it doesn't eliminate it. Anchor each per-arch digest immediately to a tag in the internal ci-cache image (`keepalive-<run_id><tag-suffix>-<platform-tag>`). Quay won't GC tagged manifests. backend_merge.yml deletes the keepalive tags via quay REST API after publishing the user-facing manifest list. Cleanup is best-effort: if the quay token is not OAuth-scoped the merge does NOT fail, the orphan tags just persist. 2. cascade-skip on backend-merge-jobs-singlearch. v4.2.1 had 2 failed and 2 cancelled singlearch builds (out of 199); GHA's default `needs:` semantics cascade-skipped the entire singlearch merge matrix, so zero singleton tags were applied even though 197 singletons built successfully. Wrap the merge `if:` in `!cancelled() && ...` for both multi and single arch in backend.yml and backend_pr.yml so partial build failures publish the successful tag-suffixes. 3. Darwin llama-cpp grpc-server build fails with `find_package(absl)` not found. Same shape as the ccache/blake3/fmt/hiredis/xxhash/zstd fix already in `Dependencies`: a brew cache hit restores `/opt/homebrew/Cellar/grpc` so `brew install grpc` no-ops, but abseil isn't in our Cellar cache list and never gets installed alongside, leaving grpc's CMake unable to resolve it. Mirror the `brew reinstall ccache` line with `brew reinstall grpc` to re-validate grpc's full transitive dep closure on every cache-hit run. 4. Move the four heaviest CUDA cpp builds back to bigger-runner. v4.2.1 wall-clock: -gpu-nvidia-cuda-12-llama-cpp 5h36m, -gpu-nvidia-cuda-12-turboquant 6h05m, -gpu-nvidia-cuda-13-llama-cpp 5h37m, -gpu-nvidia-cuda-13-turboquant 6h05m. The cuda-12 turboquant and cuda-13 turboquant entries are over GHA's 6h job timeout. Phase 5.3 of the free-tier migration (PR #9730) had explicitly flagged this batch as 'highest-risk' with a per-entry revert path. All other matrix entries (vulkan-llama-cpp ~47m, ROCm hipblas-llama-cpp ~2h, intel sycl-f32 ~1h49m) stay on free-tier ubuntu-latest. Verified locally: all six edited workflow YAMLs parse cleanly. Real verification has to come from the next tag release run. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <[email protected]> * ci: extract keepalive anchor + cleanup into .github/scripts/ The two inline shell blocks from the previous commit are long enough to hurt readability of the workflow YAML and benefit from their own files with self-contained docs. Move them to .github/scripts/: anchor-digest-in-cache.sh backend_build.yml's keepalive anchor cleanup-keepalive-tags.sh backend_merge.yml's best-effort cleanup Workflow steps reduce to a single `run:` invocation each, with all the parameter plumbing handled by env vars on the step. backend_merge.yml also gains a sparse `actions/checkout@v6` step (sparse to .github/scripts only) so the cleanup script is available on the runner — backend_build already checks out for the docker build. Net workflow diff: -36 lines across the two files. Script logic and behavior are byte-identical to the inline version. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <[email protected]> --------- Signed-off-by: Ettore Di Giacinto <[email protected]> Co-authored-by: Ettore Di Giacinto <[email protected]>

…ec-decoding options (#9765) * chore(llama.cpp): bump to 1ec7ba0c14f33f17e980daeeda5f35b225d41994 Picks up the upstream `spec : parallel drafting support` change (ggml-org/llama.cpp#22838) which reshapes the speculative-decoding API and `server_context_impl`. Adapt the grpc-server wrapper accordingly: * `common_params_speculative::type` (single enum) became `types` (`std::vector<common_speculative_type>`). Update both the "default to draft when a draft model is set" branch and the `spec_type`/`speculative_type` option parser. The parser now also tolerates comma-separated lists, mirroring the upstream `common_speculative_types_from_names` semantics. * `common_params_speculative_draft::n_ctx` is gone (draft now shares the target context size). Keep the `draft_ctx_size` option name for backward compatibility and ignore the value rather than failing. * `server_context_impl::model` was renamed to `model_tgt`; update the two reranker / model-metadata call sites. Replaces #9763. Builds cleanly under the linux/amd64 cpu-llama-cpp target locally. Signed-off-by: Ettore Di Giacinto <[email protected]> * feat(llama-cpp): expose new speculative-decoding option keys Upstream `spec : parallel drafting support` (ggml-org/llama.cpp#22838) adds the `ngram_mod`, `ngram_map_k`, and `ngram_map_k4v` speculative families and beefs up the draft-model knobs. The previous bump only adapted the API; this exposes the new fields through the grpc-server options dictionary so model configs can drive them. New `options:` keys (all under `backend: llama-cpp`): ngram_mod (`ngram_mod` type): spec_ngram_mod_n_min / spec_ngram_mod_n_max / spec_ngram_mod_n_match ngram_map_k (`ngram_map_k` type): spec_ngram_map_k_size_n / spec_ngram_map_k_size_m / spec_ngram_map_k_min_hits ngram_map_k4v (`ngram_map_k4v` type): spec_ngram_map_k4v_size_n / spec_ngram_map_k4v_size_m / spec_ngram_map_k4v_min_hits ngram lookup caches (`ngram_cache` type): spec_lookup_cache_static / lookup_cache_static spec_lookup_cache_dynamic / lookup_cache_dynamic Draft-model tuning (active when `spec_type` is `draft`): draft_cache_type_k / spec_draft_cache_type_k draft_cache_type_v / spec_draft_cache_type_v draft_threads / spec_draft_threads draft_threads_batch / spec_draft_threads_batch draft_cpu_moe / spec_draft_cpu_moe (bool flag) draft_n_cpu_moe / spec_draft_n_cpu_moe (first N MoE layers on CPU) draft_override_tensor / spec_draft_override_tensor (comma-separated <tensor regex>=<buffer type>; re-implements upstream's static parse_tensor_buffer_overrides since it isn't exported) `spec_type` already accepted comma-separated lists after the previous commit, matching upstream's `common_speculative_types_from_names`. Docs: refresh `docs/content/advanced/model-configuration.md` with per-family tables and a note about multi-type chaining. Builds locally with `make docker-build-llama-cpp` (linux/amd64 cpu-llama-cpp AVX variant). Signed-off-by: Ettore Di Giacinto <[email protected]> * fix(turboquant): bridge new llama.cpp spec API to the legacy fork layout The previous commits in this series adapted backend/cpp/llama-cpp/grpc-server.cpp to the post-#22838 (parallel drafting) llama.cpp API. The turboquant build reuses the same grpc-server.cpp through backend/cpp/turboquant/Makefile, which copies it into turboquant-<flavor>-build/ and runs patch-grpc-server.sh on the copy. The fork branched before the API refactor, so it errors out on: * `ctx_server.impl->model_tgt` (fork still has `model`) * `params.speculative.{ngram_mod,ngram_map_k,ngram_map_k4v,ngram_cache}.*` (none of these sub-structs exist in the fork) * `params.speculative.draft.{cache_type_k/v, cpuparams[, _batch].n_threads, tensor_buft_overrides}` (fork uses the pre-#22397 flat layout) * `params.speculative.types` vector / `common_speculative_types_from_names` (fork has a scalar `type` and only the singular helper) Approach: 1. backend/cpp/llama-cpp/grpc-server.cpp: introduce a single feature switch `LOCALAI_LEGACY_LLAMA_CPP_SPEC`. When defined, the two `speculative.type[s]` discriminations (the "default to draft when a draft model is set" branch and the `spec_type` / `speculative_type` option parser) fall back to the singular scalar form, and the entire new-option block (ngram_mod / map_k / map_k4v / ngram_cache / draft.{cache_type_*, cpuparams*, tensor_buft_overrides}) is preprocessed out. The macro is *not* defined in the source tree — stock llama-cpp builds get the full new API. 2. backend/cpp/turboquant/patch-grpc-server.sh: two new patch steps applied to the per-flavor build copy at turboquant-<flavor>-build/grpc-server.cpp: - substitute `ctx_server.impl->model_tgt` -> `ctx_server.impl->model` - inject `#define LOCALAI_LEGACY_LLAMA_CPP_SPEC 1` before the first `#include`, so the guarded blocks above drop out for the fork build. Both patches are idempotent and follow the existing sed/awk pattern in this script (KV cache types, `get_media_marker`, flat speculative renames). Stock llama-cpp's `grpc-server.cpp` is never touched. Drop both legacy patches once the turboquant fork rebases past ggml-org/llama.cpp#22397 / #22838. Signed-off-by: Ettore Di Giacinto <[email protected]> * fix(turboquant): close draft_ctx_size brace inside legacy guard The previous turboquant fix wrapped the new option-handler blocks in `#ifndef LOCALAI_LEGACY_LLAMA_CPP_SPEC ... #endif` but placed the guard in the middle of an `else if` chain — the `} else if` openings of the new blocks were responsible for closing the previous block's brace. With the macro defined the new blocks vanish, draft_ctx_size's `{` loses its closer, the for-loop's `}` is consumed instead, and the file ends with a stray opening brace — clang reports it as `function-definition is not allowed here before '{'` on the next top-level `int main(...)` and `expected '}' at end of input`. Move the chain split inside the draft_ctx_size branch: } else if (... "draft_ctx_size") { // ... #ifdef LOCALAI_LEGACY_LLAMA_CPP_SPEC } // legacy: chain ends here #else } else if (... "spec_ngram_mod_n_min") { // modern: chain continues ... } else if (... "draft_override_tensor") { ... } // closes last branch #endif } // closes for-loop Brace count is now balanced under both preprocessor branches (verified with `tr -cd '{' | wc -c` against the patched and unpatched outputs). Local `make docker-build-turboquant` builds the linux/amd64 cpu-llama-cpp `turboquant-avx` variant cleanly. Signed-off-by: Ettore Di Giacinto <[email protected]> * fix(ci): forward AMDGPU_TARGETS into Dockerfile.turboquant builder-prebuilt Dockerfile.turboquant's `builder-prebuilt` stage was missing the `ARG AMDGPU_TARGETS` / `ENV AMDGPU_TARGETS=${AMDGPU_TARGETS}` pair that `builder-fromsource` already has (and that `Dockerfile.llama-cpp` mirrors across both stages). When CI uses the prebuilt base image (quay.io/go-skynet/ci-cache:base-grpc-*, the common path) the build-arg passed by the workflow never reaches the env inside the compile stage. backend/cpp/llama-cpp/Makefile:38 (introduced by #9626) errors out on hipblas builds when AMDGPU_TARGETS is empty, and the turboquant Makefile reuses backend/cpp/llama-cpp via a sibling build dir, so the same check fires from turboquant-fallback under BUILD_TYPE=hipblas: Makefile:38: *** AMDGPU_TARGETS is empty — set it to a comma-separated list of gfx targets e.g. gfx1100,gfx1101. Stop. make: *** [Makefile:66: turboquant-fallback] Error 2 The bug is latent on master because the docker layer cache stays warm across builds — the compile step rarely re-runs from scratch. The llama.cpp bump in this PR invalidates the cache, so the missing env var becomes load-bearing and the hipblas turboquant CI job fails. Mirror the existing pattern from Dockerfile.llama-cpp. Signed-off-by: Ettore Di Giacinto <[email protected]> --------- Signed-off-by: Ettore Di Giacinto <[email protected]> Co-authored-by: Ettore Di Giacinto <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on May 12, 2026

This comparison is taking too long to generate.

Uh oh!