Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: mudler/LocalAI
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v4.2.3
Choose a base ref
...
head repository: mudler/LocalAI
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v4.2.4
Choose a head ref
  • 14 commits
  • 74 files changed
  • 7 contributors

Commits on May 13, 2026

  1. chore: ⬆️ Update ggml-org/llama.cpp to `a9883db8ee021cf16783016a60996…

    …d41820b5195` (#9796)
    
    ⬆️ Update ggml-org/llama.cpp
    
    Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: mudler <[email protected]>
    localai-bot and mudler authored May 13, 2026
    Configuration menu
    Copy the full SHA
    a645c1f View commit details
    Browse the repository at this point in the history
  2. feat: also parse VRAM budget/usage from vulkaninfo (#9800)

    Signed-off-by: Andreas Egli <[email protected]>
    eglia authored May 13, 2026
    Configuration menu
    Copy the full SHA
    a2940e5 View commit details
    Browse the repository at this point in the history
  3. feat(realtime): Add Liquid Audio s2s model and assistant mode on talk…

    … page (#9801)
    
    * feat(liquid-audio): add LFM2.5-Audio any-to-any backend + realtime_audio usecase
    
    Wires LiquidAI's LFM2.5-Audio-1.5B as a self-contained Realtime API model:
    single engine handles VAD, transcription, LLM, and TTS in one bidirectional
    stream — drop-in alternative to a VAD+STT+LLM+TTS pipeline.
    
    Backend
    - backend/python/liquid-audio/ — new Python gRPC backend wrapping the
      `liquid-audio` package. Modes: chat / asr / tts / s2s, voice presets,
      Load/Predict/PredictStream/AudioTranscription/TTS/VAD/AudioToAudioStream/
      Free and StartFineTune/FineTuneProgress/StopFineTune. Runtime monkey-patch
      on `liquid_audio.utils.snapshot_download` so absolute local paths from
      LocalAI's gallery resolve without a HF round-trip. soundfile in place of
      torchaudio.load/save (torchcodec drags NVIDIA NPP we don't bundle).
    - backend/backend.proto + pkg/grpc/{backend,client,server,base,embed,
      interface}.go — new AudioToAudioStream RPC mirroring AudioTransformStream
      (config/frame/control oneof in; typed event+pcm+meta out).
    - core/services/nodes/{health_mock,inflight}_test.go — add stubs for the
      new RPC to the test fakes.
    
    Config + capabilities
    - core/config/backend_capabilities.go — UsecaseRealtimeAudio, MethodAudio
      ToAudioStream, UsecaseInfoMap entry, liquid-audio BackendCapability row.
    - core/config/model_config.go — FLAG_REALTIME_AUDIO bitmask, ModalityGroups
      membership in both speech-input and audio-output groups so a lone flag
      still reads as multimodal, GetAllModelConfigUsecases entry, GuessUsecases
      branch.
    
    Realtime endpoint
    - core/http/endpoints/openai/realtime.go — extract prepareRealtimeConfig()
      so the gate is unit-testable; accept realtime_audio models and self-fill
      empty pipeline slots with the model's own name (user-pinned slots win).
    - core/http/endpoints/openai/realtime_gate_test.go — six specs covering nil
      cfg, empty pipeline, legacy pipeline, self-contained realtime_audio,
      user-pinned VAD slot, and partial legacy pipeline.
    
    UI + endpoints
    - core/http/routes/ui.go — /api/pipeline-models accepts either a legacy
      VAD+STT+LLM+TTS pipeline or a realtime_audio model; surfaces a
      self_contained flag so the Talk page can collapse the four cards.
    - core/http/routes/ui_api.go — realtime_audio in usecaseFilters.
    - core/http/routes/ui_pipeline_models_test.go — covers both code paths.
    - core/http/react-ui/src/pages/Talk.jsx — self-contained badge instead of
      the four-slot grid; rename Edit Pipeline → Edit Model Config; less
      pipeline-specific wording.
    - core/http/react-ui/src/pages/Models.jsx + locales/en/models.json — new
      realtime_audio filter button + i18n.
    - core/http/react-ui/src/utils/capabilities.js — CAP_REALTIME_AUDIO.
    - core/http/react-ui/src/pages/FineTune.jsx — voice + validation-dataset
      fields, surfaced when backend === liquid-audio, plumbed via
      extra_options on submit/export/import.
    
    Gallery + importer
    - gallery/liquid-audio.yaml — config template with known_usecases:
      [realtime_audio, chat, tts, transcript, vad].
    - gallery/index.yaml — four model entries (realtime/chat/asr/tts) keyed by
      mode option. Fixed pre-existing `transcribe` typo on the asr entry
      (loader silently dropped the unknown string → entry never surfaced as a
      transcript model).
    - gallery/lfm.yaml — function block for the LFM2 Pythonic tool-call format
      `<|tool_call_start|>[name(k="v")]<|tool_call_end|>` matching
      common_chat_params_init_lfm2 in vendored llama.cpp.
    - core/gallery/importers/{liquid-audio,liquid-audio_test}.go — detector
      matches LFM2-Audio HF repos (excludes -gguf mirrors); mode/voice
      preferences plumbed through to options.
    - core/gallery/importers/importers.go — register LiquidAudioImporter
      before LlamaCPPImporter.
    - pkg/functions/parse_lfm2_test.go — seven specs for the response/argument
      regex pair on the LFM2 pythonic format.
    
    Build matrix
    - .github/backend-matrix.yml — seven liquid-audio targets (cuda12, cuda13,
      l4t-cuda-13, hipblas, intel, cpu amd64, cpu arm64). Jetpack r36 cuda-12
      is skipped (Ubuntu 22.04 / Python 3.10 incompatible with liquid-audio's
      3.12 floor).
    - backend/index.yaml — anchor + 13 image entries.
    - Makefile — .NOTPARALLEL, prepare-test-extra, test-extra,
      docker-build-liquid-audio.
    
    Docs
    - .agents/plans/liquid-audio-integration.md — phased plan; PR-D (real
      any-to-any wiring via AudioToAudioStream), PR-E (mid-audio tool-call
      detector), PR-G (GGUF entries once upstream llama.cpp PR #18641 lands)
      remain.
    - .agents/api-endpoints-and-auth.md — expand the capability-surface
      checklist with every place a new FLAG_* needs to be registered.
    
    Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
    Signed-off-by: Richard Palethorpe <[email protected]>
    
    * feat(realtime): function calling + history cap for any-to-any models
    
    Three pieces, all on the realtime_audio path that just landed:
    
    1. liquid-audio backend (backend/python/liquid-audio/backend.py):
       - _build_chat_state grows a `tools_prelude` arg.
       - new _render_tools_prelude parses request.Tools (the OpenAI Chat
         Completions function array realtime.go already serialises) and
         emits an LFM2 `<|tool_list_start|>…<|tool_list_end|>` system turn
         ahead of the user history. Mirrors gallery/lfm.yaml's `function:`
         template so the model sees the same prompt shape whether served
         via llama-cpp or here. Without this the backend silently dropped
         tools — function calling was wired end-to-end on the Go side but
         the model never saw a tool list.
    
    2. Realtime history cap (core/http/endpoints/openai/realtime.go):
       - Session grows MaxHistoryItems int; default picked by new
         defaultMaxHistoryItems(cfg) — 6 for realtime_audio models (LFM2.5
         1.5B degrades quickly past a handful of turns), 0/unlimited for
         legacy pipelines composing larger LLMs.
       - triggerResponse runs conv.Items through trimRealtimeItems before
         building conversationHistory. Helper walks the cut left if it
         would orphan a function_call_output, so tool result + call pairs
         stay intact.
       - realtime_gate_test.go: specs for defaultMaxHistoryItems and
         trimRealtimeItems (zero cap, under cap, over cap, tool-call pair
         preservation).
    
    3. Talk page (core/http/react-ui/src/pages/Talk.jsx):
       - Reuses the chat page's MCP plumbing — useMCPClient hook,
         ClientMCPDropdown component, same auto-connect/disconnect effect
         pattern. No bespoke tool registry, no new REST endpoints; tools
         come from whichever MCP servers the user toggles on, exactly as
         on the chat page.
       - sendSessionUpdate now passes session.tools=getToolsForLLM(); the
         update re-fires when the active server set changes mid-session.
       - New response.function_call_arguments.done handler executes via
         the hook's executeTool (which round-trips through the MCP client
         SDK), then replies with conversation.item.create
         {type:function_call_output} + response.create so the model
         completes its turn with the tool output. Mirrors chat's
         client-side agentic loop, translated to the realtime wire shape.
    
    UI changes require a LocalAI image rebuild (Dockerfile:308-313 bakes
    react-ui/dist into the runtime image). Backend.py changes can be
    swapped live in /backends/<id>/backend.py + /backend/shutdown.
    
    Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
    Signed-off-by: Richard Palethorpe <[email protected]>
    
    * feat(realtime): LocalAI Assistant ("Manage Mode") for the Talk page
    
    Mirrors the chat-page metadata.localai_assistant flow so users can ask the
    realtime model what's loaded / installed / configured. Tools are run
    server-side via the same in-process MCP holder that powers the chat
    modality — no transport switch, no proxy, no new wire protocol.
    
    Wire:
    - core/http/endpoints/openai/realtime.go:
      - RealtimeSessionOptions{LocalAIAssistant,IsAdmin}; isCurrentUserAdmin
        helper mirrors chat.go's requireAssistantAccess (no-op when auth
        disabled, else requires auth.RoleAdmin).
      - Session grows AssistantExecutor mcpTools.ToolExecutor.
      - runRealtimeSession, when opts.LocalAIAssistant is set: gate on admin,
        fail closed if DisableLocalAIAssistant or the holder has no tools,
        DiscoverTools and inject into session.Tools, prepend
        holder.SystemPrompt() to instructions.
      - Tool-call dispatch loop: when AssistantExecutor.IsTool(name), run
        ExecuteTool inproc, append a FunctionCallOutput to conv.Items, skip
        the function_call_arguments client emit (the client can't execute
        these — it doesn't know about them). After the loop, if any
        assistant tool ran, trigger another response so the model speaks the
        result. Mirrors chat's agentic loop, driven server-side rather than
        via client round-trip.
    
    - core/http/endpoints/openai/realtime_webrtc.go: RealtimeCallRequest
      gains `localai_assistant` (JSON omitempty). Handshake calls
      isCurrentUserAdmin and builds RealtimeSessionOptions.
    
    - core/http/react-ui/src/pages/Talk.jsx: admin-only "Manage Mode"
      checkbox under the Tools dropdown; passes localai_assistant: true to
      realtimeApi.call's body, captured in the connect callback's deps.
    
    Mirroring chat's pattern means the in-process MCP tools surface "just
    works" for the Talk page without exposing a Streamable-HTTP MCP endpoint
    (which was the alternative). Clients with their own MCP servers can
    still use the existing ClientMCPDropdown path in parallel; the realtime
    handler distinguishes them by AssistantExecutor.IsTool() at dispatch
    time.
    
    Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
    Signed-off-by: Richard Palethorpe <[email protected]>
    
    * feat(realtime): render Manage Mode tool calls in the Talk transcript
    
    Previously the realtime endpoint only emitted response.output_item.added
    for the FunctionCall item, and Talk.jsx's switch ignored the event — so
    server-side tool runs were invisible in the UI. The model would speak
    the result but the user had no way to see what tool was actually
    called.
    
    realtime.go: after executing an assistant tool inproc, emit a second
    output_item.added/.done pair for the FunctionCallOutput item. Mirrors
    the way the chat page displays tool_call + tool_result blocks.
    
    Talk.jsx: handle both response.output_item.added and .done. Render
    FunctionCall (with arguments) and FunctionCallOutput (pretty-printed
    JSON when possible) as two transcript entries — `tool_call` with the
    wrench icon, `tool_result` with the clipboard icon, both in mono-space
    secondary-colour. Resets streamingRef after the result so the next
    assistant text delta starts a fresh transcript entry instead of
    appending to the previous turn.
    
    Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
    Signed-off-by: Richard Palethorpe <[email protected]>
    
    * refactor(realtime): bound the Manage Mode tool-loop + preserve assistant tools
    
    Fallout from a review pass on the Manage Mode patches:
    
    - Bound the server-side agentic loop. triggerResponse used to recurse on
      executedAssistantTool with no cap — a model that kept calling tools
      would blow the goroutine stack. New maxAssistantToolTurns = 10 (mirrors
      useChat.js's maxToolTurns). Public triggerResponse is now a thin shim
      over triggerResponseAtTurn(toolTurn int); recursion increments the
      counter and stops at the cap with an xlog.Warn.
    
    - Preserve Manage Mode tools across client session.update. The handler
      used to blindly overwrite session.Tools, so toggling a client MCP
      server mid-session silently wiped the in-process admin tools. Session
      now caches the original AssistantTools slice at session creation and
      the session.update handler merges them back in (client names win on
      collision — the client is explicit).
    
    - strconv.ParseBool for the localai_assistant query param instead of
      hand-rolled "1" || "true". Mirrors LocalAIAssistantFromMetadata.
    
    - Talk.jsx: render both tool_call and tool_result on
      response.output_item.done instead of splitting them across .added and
      .done. The server's event pairing (added → done) stays correct; the
      UI just doesn't need to inspect both phases of the same item. One
      switch case instead of two, no behavioural change.
    
    Out of scope (noted for follow-ups): extract a shared assistant-tools
    helper between chat.go and realtime.go (duplication is small enough
    that two parallel implementations stay readable for now), and an i18n
    key for the Manage Mode helper text (Talk.jsx doesn't use i18n
    anywhere else yet).
    
    Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code]
    Signed-off-by: Richard Palethorpe <[email protected]>
    
    * ci(test-extra): wire liquid-audio backend smoke test
    
    The backend ships test.py + a `make test` target and is listed in
    backend-matrix.yml, so scripts/changed-backends.js already writes a
    `liquid-audio=true|false` output when files under backend/python/liquid-audio/
    change. The workflow just wasn't reading it.
    
    - Expose the `liquid-audio` output on the detect-changes job
    - Add a tests-liquid-audio job that runs `make` + `make test` in
      backend/python/liquid-audio, gated on the per-backend detect flag
    
    The smoke covers Health() and LoadModel(mode:finetune); fine-tune mode
    short-circuits before any HuggingFace download (backend.py:192), so the
    job needs neither weights nor a GPU. The full-inference path remains
    gated on LIQUID_AUDIO_MODEL_ID, which CI doesn't set.
    
    The four new Go test files (core/gallery/importers/liquid-audio_test.go,
    core/http/endpoints/openai/realtime_gate_test.go,
    core/http/routes/ui_pipeline_models_test.go, pkg/functions/parse_lfm2_test.go)
    are already picked up by the existing test.yml workflow via `make test` →
    `ginkgo -r ./pkg/... ./core/...`; their packages all carry RunSpecs entries.
    
    Assisted-by: Claude:claude-opus-4-7
    Signed-off-by: Richard Palethorpe <[email protected]>
    
    ---------
    
    Signed-off-by: Richard Palethorpe <[email protected]>
    richiejp authored May 13, 2026
    Configuration menu
    Copy the full SHA
    0245b33 View commit details
    Browse the repository at this point in the history
  4. fix(distributed): cascade-clean stale node_models rows + filter routi…

    …ng by healthy status (#9754)
    
    * fix(distributed): cascade-clean stale node_models on drain and filter routing by healthy status
    
    Stale node_models rows (state="loaded") were surviving past the healthy
    state of their owning node, causing /embeddings (and other inference
    paths) to dispatch to a backend whose process was gone or drained. The
    downstream symptom in a live cluster was pgvector rejecting inserts
    with "vector cannot have more than 16000 dimensions (SQLSTATE 54000)"
    because the misbehaving backend silently returned a malformed
    (oversized) tensor; the Models page showed the model as "running"
    without an associated node, like a stale entry, even though the node
    was no longer visible in the Nodes view.
    
    Two changes here, plus a third in a follow-up commit:
    
    - MarkDraining now cascade-deletes node_models rows for the affected
      node, mirroring MarkOffline. Drains are explicit operator actions —
      the box has been intentionally taken out of rotation — so clearing
      the rows stops the Models UI from misreporting and prevents the
      routing layer from picking those rows if scheduling logic is ever
      relaxed. In-flight requests already hold their gRPC client through
      Route() and finish normally; the only observable effect is a
      non-fatal IncrementInFlight warning, acceptable for a drain.
    
      MarkUnhealthy is deliberately left status-only: it fires from
      managers_distributed / reconciler on a single nats.ErrNoResponders
      with no retry, so a transient NATS hiccup must not nuke every loaded
      model and force a full reload on recovery.
    
    - FindAndLockNodeWithModel's inner JOIN now filters on
      backend_nodes.status = healthy in addition to node_models.state =
      loaded. The previous version relied on the second node-fetch step to
      reject non-healthy nodes, but a concurrent reader could still pick
      the same stale row in the same window. Belt-and-braces.
    
    - DistributedConfig.PerModelHealthCheck renamed to
      DisablePerModelHealthCheck and inverted at the call site so
      per-model gRPC probing is on by default. The probe (now made
      consecutive-miss aware in a follow-up commit) independently health-
      checks each model's gRPC address and removes stale node_models rows
      when the backend has crashed even though the worker's node-level
      heartbeat is still arriving.
    
      Migration: the field had no CLI flag, env var binding, or YAML key
      in tree (only the bare struct field), so there is no user-facing
      migration. Anything constructing DistributedConfig in code needs to
      drop the assignment (default now does the right thing) or invert it.
    
    Assisted-by: Claude:claude-opus-4-7 go-vet go-test golangci-lint
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    
    * fix(distributed): require consecutive misses before per-model probe removes a row
    
    The per-model gRPC probe used to remove a node_models row on a single
    failed health check. With the per-model probe now on by default, that
    made any 5-second gRPC blip (network jitter, a long-running request
    hogging the worker's gRPC server thread, brief GC pause) trigger a
    full reload of the affected model — too eager for production.
    
    Require perModelMissThreshold (3) consecutive failed probes before
    removal. At the default 15s tick a model must be unreachable for ~45s
    before reap; a single successful probe in between resets the streak.
    Per-(node, model, replica) state tracked under a mutex on the monitor.
    
    If the removal call itself fails, the miss counter is left in place
    so the next tick retries rather than starting the streak over.
    
    Tests:
    - removes stale model via per-model health check after consecutive
      failures (replaces the single-shot expectation)
    - preserves model row when an intermittent failure is followed by a
      success (covers the reset-on-success path and verifies the counter
      reset by failing twice more without crossing threshold)
    - newTestHealthMonitor initializes the misses map so direct-construct
      test helpers don't nil-map-panic in the probe path
    
    Assisted-by: Claude:claude-opus-4-7 go-vet go-test golangci-lint
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    
    ---------
    
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    Co-authored-by: Ettore Di Giacinto <[email protected]>
    localai-bot and mudler authored May 13, 2026
    Configuration menu
    Copy the full SHA
    b4fdb41 View commit details
    Browse the repository at this point in the history
  5. chore: ⬆️ Update TheTom/llama-cpp-turboquant to `5aeb2fdbe26cd4c534c6…

    …fa15de73cb5749bd0403` (#9740)
    
    ⬆️ Update TheTom/llama-cpp-turboquant
    
    Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: mudler <[email protected]>
    localai-bot and mudler authored May 13, 2026
    Configuration menu
    Copy the full SHA
    ddbbdf4 View commit details
    Browse the repository at this point in the history
  6. fix(http): honor X-Forwarded-Prefix when proxy strips the prefix (#9614)

    * fix(http): honor X-Forwarded-Prefix when proxy strips the prefix
    
    Closes #9145.
    
    Two related issues kept the React UI from loading when a reverse proxy
    rewrites a sub-path with prefix-stripping (e.g. Caddy `handle_path`):
    
    1. `BaseURL` only computed a prefix from the path StripPathPrefix had
       removed, so when the proxy strips the prefix before forwarding, the
       request arrives without it and the base URL was returned without a
       prefix. Extract a `BasePathPrefix` helper and add an
       `X-Forwarded-Prefix` header fallback so the prefix is recovered.
    2. `<base href>` only changes how relative URLs resolve; the build
       emits path-absolute references like `/assets/...` and
       `/favicon.svg`, which still resolve against the origin and bypass
       the proxy prefix. Rewrite those references in the served
       `index.html` so the browser requests them through the proxy.
    
    Adds unit coverage for `BaseURL` with a pre-stripped path and an
    end-to-end test for the proxy-stripped scenario.
    
    Assisted-by: Claude:claude-opus-4-7
    
    * fix(http): gate X-Forwarded-Prefix through SafeForwardedPrefix in BasePathPrefix
    
    BasePathPrefix consumed X-Forwarded-Prefix directly, so a value the
    codebase elsewhere rejects (e.g. "//evil.com") slipped through and was
    interpolated into the SPA index.html — both into the path-absolute asset
    URL rewrite in serveIndex (turning "/assets/..." into "//evil.com/assets/...",
    a protocol-relative URL that loads JS from a foreign origin) and into
    <base href>. Route the header through the existing SafeForwardedPrefix
    validator that StripPathPrefix and prefixRedirect already use, and
    HTML-escape the prefix before injecting it into the asset rewrite as
    defense in depth against attribute breakout.
    
    Tests cover //evil.com, backslashes, control chars, CR/LF and a missing
    leading slash; the integration test asserts an unsafe prefix can't poison
    asset URLs.
    
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    Assisted-by: claude-code:claude-opus-4-7-1m [Read] [Edit] [Bash]
    
    ---------
    
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    Co-authored-by: Ettore Di Giacinto <[email protected]>
    Dennisadira and mudler authored May 13, 2026
    Configuration menu
    Copy the full SHA
    c2fe0a6 View commit details
    Browse the repository at this point in the history
  7. docs: ⬆️ update docs version mudler/LocalAI (#9805)

    ⬆️ Update docs version mudler/LocalAI
    
    Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: mudler <[email protected]>
    localai-bot and mudler authored May 13, 2026
    Configuration menu
    Copy the full SHA
    5a42dbf View commit details
    Browse the repository at this point in the history
  8. ci(image-merge): apply the keepalive+ci-cache source fix to image_merge

    Mirror of 8521af1 (which fixed backend_merge.yml) for image_merge.yml.
    Today's master-push run 25823024353 failed the gpu-vulkan-image-merge job
    with the exact same error pattern the backend merge had on v4.2.2:
    
      ERROR: quay.io/go-skynet/local-ai@sha256:68b22611...: not found
    
    Same root cause: image_build.yml pushes the per-arch manifest to
    quay.io/go-skynet/local-ai with push-by-digest=true (no tag), then the
    merge runs minutes-to-hours later, by which time quay's per-repo manifest
    GC has reaped the untagged digest from local-ai. The blob still lives in
    quay's storage but local-ai@<digest> no longer resolves.
    
    Three matching edits:
    
    1. image_build.yml: anchor each per-arch digest into ci-cache immediately
       after the push, reusing .github/scripts/anchor-digest-in-cache.sh with
       SOURCE_IMAGE=quay.io/go-skynet/local-ai and TAG_SUFFIX defaulting to
       "-core" for the core image (matches the artifact-name convention).
    2. image_merge.yml: change the quay merge source from local-ai@<digest>
       to ci-cache@<digest>. Same correctness argument as backend_merge.yml —
       the manifest content is alive in ci-cache; buildx imagetools create
       republishes it into local-ai and writes the user-facing manifest list
       pointing at it. End state in local-ai is self-contained.
    3. image_merge.yml: add a sparse `actions/checkout@v6` (only
       .github/scripts) so cleanup-keepalive-tags.sh is available, plus the
       cleanup step itself with TAG_SUFFIX matching the anchor's "-core"
       placeholder.
    
    v4.2.3's image.yml run completed successfully (~50 min between push and
    merge — beat quay's GC). This commit closes the race for future releases
    and master pushes regardless of run length.
    
    Assisted-by: Claude:claude-opus-4-7
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    mudler committed May 13, 2026
    Configuration menu
    Copy the full SHA
    6bfe7f8 View commit details
    Browse the repository at this point in the history
  9. fix(agentpool): close truncate-then-read race in agent_jobs.json pers…

    …istence (#9811)
    
    * fix(agentpool): close truncate-then-read race in agent_jobs.json persistence
    
    Three call sites wrote and read agent_jobs.json (and agent_tasks.json)
    through three independent mutexes:
    
      - AgentJobService.ExecuteJob spawns go saveJobs(job) -> fileJobPersister
        holding p.mu
      - AgentJobService.SaveJobsToFile holding service.fileMutex
      - AgentJobService.LoadJobsFromFile on a separate service instance holding
        a different service.fileMutex
    
    Nothing serialized those mutexes, and both writers used os.WriteFile, which
    opens O_TRUNC. A reader landing between the truncate and the write saw a
    zero-byte file and surfaced as `unexpected end of JSON input` at offset 0.
    The macOS tests-apple job started hitting this consistently once the path
    filter was removed from .github/workflows/test.yml and the file-mode race
    test ran on every push (run 25823124797 was the first observed failure).
    
    Two changes close the window:
    
    1. fileJobPersister.saveTasksToFile / saveJobsToFile now write to a
       same-directory temp file and os.Rename to the final path. rename(2) is
       atomic on POSIX, so concurrent readers see either the prior contents or
       the new contents and never a zero-byte window. The helper Syncs before
       close so a crash mid-write leaves either the old file intact or the temp
       behind (cleaned up on next save).
    
    2. AgentJobService.{Load,Save}{Tasks,Jobs}{FromFile,ToFile} are collapsed
       to thin wrappers around fileJobPersister, removing the duplicate write
       path and the redundant service.fileMutex / service.tasksFile /
       service.jobsFile fields. Within a single service all task/job I/O now
       serializes on the persister's mutex; the atomic rename handles the
       cross-instance case the tests exercise.
    
    Adds a regression test that hammers SaveJobsToFile and LoadJobsFromFile
    concurrently for 500ms across two service instances on the same paths.
    On master this reproduces `unexpected end of JSON input` on Linux within
    ~500ms; with the fix the suite ran -until-it-fails for 30s (54 attempts,
    all green).
    
    Assisted-by: Claude:claude-opus-4-7 [Claude Code]
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    
    * refactor(agentpool): route service flush/load through JobPersister interface
    
    The first cut of the race fix made AgentJobService.{Save,Load}{Tasks,Jobs}*
    type-assert s.persister to *fileJobPersister so they could reach the
    unexported saveTasksToFile / saveJobsToFile helpers. That defeats the
    JobPersister interface: the service is back to reasoning about a concrete
    implementation instead of an abstraction.
    
    Promote the bulk-flush operations to the interface as FlushTasks / FlushJobs:
    
      - fileJobPersister.FlushTasks/FlushJobs call the existing private helpers
        (atomic temp+rename writes from the prior commit).
      - dbJobPersister.FlushTasks/FlushJobs are no-ops because SaveTask/SaveJob
        are already write-through to the database.
    
    The service's four file-named methods now talk only to the interface:
    LoadTasks/LoadJobs read through s.persister.LoadTasks/LoadJobs, and the
    Save side calls FlushTasks/FlushJobs. The "FromFile"/"ToFile" suffixes
    stay for backward compat with user_services.go and the existing tests,
    but they no longer claim a file-only contract.
    
    Assisted-by: Claude:claude-opus-4-7 [Claude Code]
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    
    ---------
    
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    Co-authored-by: Ettore Di Giacinto <[email protected]>
    localai-bot and mudler authored May 13, 2026
    Configuration menu
    Copy the full SHA
    ab01ed1 View commit details
    Browse the repository at this point in the history
  10. chore: ⬆️ Update antirez/ds4 to `0cba357ca1bc0e7510421cc26888e420ea94…

    …2123` (#9806)
    
    ⬆️ Update antirez/ds4
    
    Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: mudler <[email protected]>
    localai-bot and mudler authored May 13, 2026
    Configuration menu
    Copy the full SHA
    4430fae View commit details
    Browse the repository at this point in the history
  11. fix(middleware): parse OpenAI-spec tool_choice in /v1/chat/completions (

    #9559)
    
    * fix(middleware): parse OpenAI-spec tool_choice in /v1/chat/completions
    
    Follows up on #9526 (the 3-site setter fix) by addressing the remaining
    clause in #9508 — string mode and OpenAI-spec specific-function shape both
    silently failed in the /v1/chat/completions parsing path.
    
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    
    * fix(middleware): restore LF endings and cover tool_choice parsing with specs
    
    The previous commit on this branch saved core/http/middleware/request.go
    with CRLF line endings, ballooning the diff against master to 684 / 651
    for what is in reality a ~50-line parsing change. Restore LF (matches
    .editorconfig end_of_line = lf).
    
    Add 11 Ginkgo specs under "SetModelAndConfig tool_choice parsing
    (chat completions)" that parallel the existing MergeOpenResponsesConfig
    specs from #9509. They drive the full middleware chain (SetModelAndConfig
    + SetOpenAIRequest) and assert:
    
      * "required"  -> ShouldUseFunctions=true, no specific name
      * "none"      -> ShouldUseFunctions=false (tools disabled per OpenAI spec)
      * "auto"      -> default, tools available, no specific name
      * {type:function, function:{name:X}}  (spec)    -> X is forced
      * {type:function, name:X}             (legacy)  -> X is forced
      * nested wins when both forms are present
      * malformed shapes (no type, wrong type, no name, empty name) are no-ops
    
    Update the inline comment on the string case to describe the actual
    mechanism: "none" reaches SetFunctionCallString("none") downstream and
    is then honored by ShouldUseFunctions() returning false. Before this PR
    json.Unmarshal([]byte("none"), &functions.Tool{}) failed silently, so
    "none" was ignored - making "none" actually work is a real behavior fix
    this PR brings.
    
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    Assisted-by: Claude:opus-4-7 [Claude Code]
    
    * fix(middleware): preserve pre-#9559 support for JSON-string-encoded tool_choice
    
    Some non-spec clients send tool_choice as a JSON-encoded string of an
    object form, e.g. "{\"type\":\"function\",\"function\":{\"name\":\"X\"}}".
    The pre-#9559 code accepted this by accident: its case string: branch
    ran json.Unmarshal([]byte(content), &functions.Tool{}), which succeeded
    for that double-encoded shape even though it failed for the legitimate
    plain string modes "auto" / "none" / "required".
    
    The first version of this PR routed every string straight to
    SetFunctionCallString as a mode, which fixed the plain-string cases but
    silently regressed the double-encoded one (funcs.Select("{...}") returns
    nothing). Restore the fallback: when a string looks like a JSON object,
    try parsing it as a tool_choice map first; fall through to mode-string
    handling only when no usable name comes out.
    
    Factor the map-name extraction into a small helper
    (extractToolChoiceFunctionName) so the string-fallback and the regular
    map case go through identical code, and accept both the OpenAI-spec
    nested shape and the legacy/Anthropic flat shape from either entry
    point.
    
    Add 3 Ginkgo specs covering the double-encoded case (nested form, legacy
    form, and the fall-through when the JSON has no usable name).
    
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    Assisted-by: Claude:opus-4-7 [Claude Code]
    
    * test(middleware): silence errcheck on AfterEach os.RemoveAll
    
    The new tool_choice parsing tests added a second AfterEach that calls
    os.RemoveAll(modelDir) without checking the error; errcheck flagged it.
    Suppress with the standard _ = idiom. The pre-existing AfterEach on the
    earlier Describe still elides the check the same way it did before -
    leaving that untouched to keep this commit minimal.
    
    Assisted-by: Claude:opus-4-7 [Claude Code]
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    
    ---------
    
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    Co-authored-by: Ettore Di Giacinto <[email protected]>
    Anai-Guo and mudler authored May 13, 2026
    Configuration menu
    Copy the full SHA
    67c34bb View commit details
    Browse the repository at this point in the history
  12. chore: ⬆️ Update ikawrakow/ik_llama.cpp to `949bb8f1d660fc1264c137a6f…

    …3dbd619375f6134` (#9807)
    
    ⬆️ Update ikawrakow/ik_llama.cpp
    
    Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: mudler <[email protected]>
    localai-bot and mudler authored May 13, 2026
    Configuration menu
    Copy the full SHA
    ec49995 View commit details
    Browse the repository at this point in the history
  13. chore: ⬆️ Update ggml-org/whisper.cpp to `3e9b7d0fef3528ee2208da3cdb8…

    …73a2c53d2ae2f` (#9808)
    
    ⬆️ Update ggml-org/whisper.cpp
    
    Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: mudler <[email protected]>
    localai-bot and mudler authored May 13, 2026
    Configuration menu
    Copy the full SHA
    0353d3b View commit details
    Browse the repository at this point in the history
  14. ci(image): publish missing :latest-* and :v<X>-* singleton image tags (

    …#9812)
    
    * ci(image): wire singleton merges + `--` artifact separator
    
    Closes the same singletons gap on the LocalAI server image workflow that
    PR #9781 closed for backends. The user observed it as missing
    :latest-gpu-nvidia-cuda-12 etc. on quay.io/go-skynet/local-ai — the
    build matrix has six single-arch entries with no corresponding merge
    step, so their per-arch digests push (push-by-digest=true) and never
    get tagged:
    
      - -gpu-hipblas              (hipblas-jobs)
      - -gpu-nvidia-cuda-12       (core-image-build)
      - -gpu-nvidia-cuda-13       (core-image-build)
      - -gpu-intel                (core-image-build)
      - -nvidia-l4t-arm64         (gh-runner)
      - -nvidia-l4t-arm64-cuda-13 (gh-runner)
    
    Only :latest, :v<X>, :latest-gpu-vulkan and :v<X>-gpu-vulkan were
    actually being published before this commit (the two multiarch suffixes
    that had merge jobs).
    
    Changes:
    
    1. image.yml: add six new merge jobs, one per single-arch entry. Each
       `needs:` only its parent build job (matching the existing pattern
       for core-image-merge / gpu-vulkan-image-merge).
    2. image_build.yml: switch artifact name to
       `digests-localai<suffix>--<platform-tag-or-"single">`. The `--`
       separator anchors the merge-side glob so a singleton tag-suffix
       doesn't over-match a longer suffix that shares its prefix
       (-nvidia-l4t-arm64 vs -nvidia-l4t-arm64-cuda-13). Same convention
       as backend_build.yml's fix.
    3. image_merge.yml: update the download pattern to match.
    
    Next master push or tag release should produce :latest-gpu-hipblas,
    :latest-gpu-nvidia-cuda-12, :latest-gpu-nvidia-cuda-13, :latest-gpu-intel,
    :latest-nvidia-l4t-arm64, :latest-nvidia-l4t-arm64-cuda-13 (and their
    :v<X>-* equivalents) for the first time on the post-#9781 workflow.
    
    Assisted-by: Claude:claude-opus-4-7
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    
    * ci(image): add !cancelled() guard to all 8 image merge jobs
    
    Parity pass with backend.yml's merge jobs (8521af1). Without
    !cancelled(), GHA's default `needs:` cascade skips the merge when ANY
    matrix cell of the parent build job fails or is cancelled — so a single
    flaky leg would suppress publication of every other tag-suffix's
    manifest list. Same fix the backend got after v4.2.1 showed 2 failed
    singlearch builds cascade-skip 199 singlearch merge entries.
    
    Applied to all 8 image merges:
    
      - core-image-merge
      - gpu-vulkan-image-merge
      - gpu-nvidia-cuda-12-image-merge       (added in e5300f1)
      - gpu-nvidia-cuda-13-image-merge       (added in e5300f1)
      - gpu-intel-image-merge                (added in e5300f1)
      - gpu-hipblas-image-merge              (added in e5300f1)
      - nvidia-l4t-arm64-image-merge         (added in e5300f1)
      - nvidia-l4t-arm64-cuda-13-image-merge (added in e5300f1)
    
    Build jobs (hipblas-jobs, core-image-build, gh-runner) are
    intentionally NOT changed — they have no upstream `needs:` to cascade-
    skip from.
    
    Assisted-by: Claude:claude-opus-4-7
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    
    ---------
    
    Signed-off-by: Ettore Di Giacinto <[email protected]>
    Co-authored-by: Ettore Di Giacinto <[email protected]>
    localai-bot and mudler authored May 13, 2026
    Configuration menu
    Copy the full SHA
    42a8db3 View commit details
    Browse the repository at this point in the history
Loading