ds4go is a zero-CGO Go wrapper for the ds4 inference engine. Applications using ds4go loads a pre-built libds4 shared library at runtime with github.com/ebitengine/purego. The shared library owns hardware acceleration. Use a Metal, CUDA, or CPU build of ds4 that matches your machine and model.
ds4 itself is an inference engine focused on the DeepSeek v4 Flash model targeting machines with 96G or more of GPU-accessible RAM.
We try to maintain parity with the upstream ds4 library, wrapping its C API. We build slightly-opinionated tools to facilitate using ds4.
C is a wonderful language for low-level, high-performance, portable code; a clean C API can be wrapped and used by other laguages. Golang is a wonderful language for systems and tools development, and generally more friendly for developers, esepecially when creating networked applications. LLMs are great at programming both. We take the high-performance C engine of ds4 and allow Golang to directly utilize it, simplifying local LLM application development.
Install the ds4go CLI with Homebrew or the Go toolchain:
# Homebrew (macOS/Linux)
brew install nimblemarkets/tap/ds4go
# or with the Go toolchain
go install github.com/NimbleMarkets/ds4go/cmd/ds4go@latestTo use ds4go as a library:
go get github.com/NimbleMarkets/ds4goOnce the CLI is installed, fetch a prebuilt native libds4 from GitHub Releases:
ds4go install --backend autoThe installer downloads from github.com/NimbleMarkets/ds4 by default. Use
--repo, --version, --backend, or --url to select a fork, release, build,
or direct archive. It installs into $DS4_DIR/lib, defaulting to ~/.ds4/lib.
--backend auto selects metal on macOS arm64, cuda on Linux, and cpu elsewhere.
If the library is already installed and up-to-date, the installer exits successfully
without re-downloading. If a different version is present, it will prompt to replace it
(or require --force in non-interactive environments).
DS4_DIR is the ds4 home directory used by ds4go tooling:
$DS4_DIR/lib/ native shared libraries
$DS4_DIR/models/ GGUF model files
Manage curated DeepSeek V4 Flash models with:
ds4go model list
ds4go model download q2-imatrix
ds4go model set q2-imatrixThe default model path for commands and examples is
$DS4_DIR/models/ds4flash.gguf.
Place the shared library in ~/.ds4/lib/, $DS4_DIR/lib/, next to your
executable, or in a lib/ directory next to your executable. You can also point
at it explicitly. The current working directory and the repository root are not
searched, to avoid loading a planted library:
export DS4_LIB=/absolute/path/to/libds4.dylib
# or
export DS4_DIR=/opt/ds4Platform defaults are:
| Platform | Library |
|---|---|
| macOS | libds4.dylib |
| Linux | libds4.so |
| Windows | libds4.dll |
import ds4 "github.com/NimbleMarkets/ds4go"
engine, err := ds4.NewEngine(ds4.EngineOptions{
ModelPath: "/models/ds4flash.gguf",
Backend: ds4.BackendMetal,
})
if err != nil {
panic(err)
}
defer engine.Close()
session, err := engine.NewSession(32768)
if err != nil {
panic(err)
}
defer session.Close()
prompt, err := engine.EncodeChatPrompt("", "Explain Redis streams briefly.", ds4.ThinkHigh)
if err != nil {
panic(err)
}
defer prompt.Free()
_, err = ds4.Generator{Engine: engine, Session: session}.GenerateTokens(prompt, ds4.GenerateOptions{
MaxTokens: 128,
StopOnEOS: true,
OnToken: func(token int) {
text, _ := engine.TokenText(token)
fmt.Print(text)
},
})go run ./cmd/ds4go prompt --model ./ds4flash.gguf -p "Explain Redis streams in one paragraph."
go run ./cmd/ds4go prompt --model ./ds4flash.ggufcmd/ds4go prompt and the examples accept the same arguments as the upstream ds4 C programs, parsed with pflag so options take the --option form. cmd/ds4go prompt, examples/simple, and examples/chat mirror the ds4 CLI (ds4_cli.c); examples/openai-compatible mirrors ds4-server (ds4_server.c). Run any of them with --help for the full list.
The only addition with no C equivalent is --lib, which points at the libds4 shared library the pure-Go wrapper loads at runtime. When empty, ds4go searches DS4_LIB, $DS4_DIR/lib (or ~/.ds4/lib), executable-local paths, and then the platform loader path.
$ ds4go help cheat
ds4go — command cheat sheet
├── completion Generate the autocompletion script for the specified shell
│ ├── bash Generate the autocompletion script for bash
│ ├── fish Generate the autocompletion script for fish
│ ├── powershell Generate the autocompletion script for powershell
│ └── zsh Generate the autocompletion script for zsh
│
├── install Download a prebuilt libds4 shared library
│ └── validate Validate the installed libds4 shared library
│
├── uninstall Uninstall the installed libds4 shared library
│
├── model Browse, download, and manage curated ds4 models
│ ├── delete Delete a downloaded model from disk
│ ├── download Download a curated model from Hugging Face
│ ├── info Show details for a curated model
│ ├── list List installed and available models
│ └── set Set the default chat model
│
└── prompt Run prompt or interactive chat inference
Run 'ds4go help <command>' for detailed usage.
go run ./examples/simple --model ./ds4flash.gguf
go run ./examples/chat --model ./ds4flash.gguf
go run ./examples/toolloop --mock
go run ./examples/toolloop --model ./ds4flash.gguf --nothink --tokens 512
go run ./examples/openai-compatible --model ./ds4flash.gguf --host 127.0.0.1 --port 8000The toolloop example registers a Go add tool and exercises DSML tool-call parsing, tool dispatch, tool-result rendering, and exact replay. Use --mock for a no-model smoke test. The OpenAI-compatible example exposes POST /v1/chat/completions for a minimal local test server.
Most users should import the root package ds4 from github.com/NimbleMarkets/ds4go. It provides Go-native runtime policy and convenience helpers on top of the raw API.
The strict binding layer lives in package ds4api, imported as github.com/NimbleMarkets/ds4go/ds4api. It mirrors the public ds4.h API: engines, sessions, token vectors, chat prompt rendering, tokenization, logprob helpers, MTP metadata, directional steering options, snapshot/payload save-load, and DS4 context-memory helpers. APIs that take FILE * use the package's opaque ds4api.File wrapper around a C FILE*.
ds4_log is exposed as LogString, which safely calls it with a fixed "%s" format. Arbitrary C varargs are intentionally not surfaced as a Go variadic API. SetLogFunc redirects libds4 diagnostics that flow through ds4_log_set into a Go callback, including Metal/CUDA backend messages routed through ds4_gpu_log. SetAbortFunc exposes libds4's fatal-invariant hook, which fires immediately before libds4 aborts the process.
Recent libds4 builds expose ds4_log_set, and ds4go wraps it as
SetLogFunc. The root package also provides SetLogOutput for the common
io.Writer case and DiscardLogs for quiet embedders. Use them to route libds4
diagnostics, including Metal/CUDA backend diagnostics, into your application's
logger or to discard them:
err := ds4.SetLogOutput(log.Writer())
err = ds4.DiscardLogs()The logger is process-global inside libds4, not per engine, so install it once
during startup. The callback may be invoked from native worker threads; keep it
concurrency-safe and quick. Install it before NewEngine, or immediately after
an explicit Load and before Library.NewEngine, if you want to capture
structured model-load and metadata-validation failures from libds4.
Most engine and GPU backend diagnostics now route through the callback. Some
native code paths may still write directly to stderr until upstream ds4
converts them to ds4_log or ds4_gpu_log. For CLI use, redirect stderr with
your shell:
ds4go prompt ... 2>ds4.log
ds4go prompt ... 2>/dev/nullFor Go applications, assigning os.Stderr only affects Go code that writes
through os.Stderr; it does not reliably capture C fprintf(stderr, ...) from
the loaded shared library. Capturing direct native stderr inside one process
requires process-wide file-descriptor redirection, which can interfere with
other goroutines, libraries, and concurrent engines. Prefer SetLogFunc for
routed libds4 diagnostics, shell redirection for CLI runs, or running the model
worker as a subprocess with exec.Cmd.Stderr.
Recent libds4 builds expose ds4_abort_set, and ds4go wraps it as
SetAbortFunc. This is a last-chance fatal-invariant hook: libds4 calls it
after logging the fatal message at LogError and immediately before native
abort().
err := ds4.SetAbortFunc(func(msg string) {
crashReporter.Record("libds4 fatal invariant", msg)
})Returning from the callback does not recover the engine. The native library
still calls abort() because the invariant is already broken. Use the hook for
crash telemetry, flushing logs, or deliberate process termination. Do not call
back into ds4go/libds4 from the callback; it can run from native worker threads
while an FFI call is active.
Do not use signal.NotifyContext around C FFI calls. SIGINT (Ctrl+C) can be delivered to any OS thread, including C worker threads inside libds4 (Metal, CUDA, or CPU). When that happens the C runtime aborts and the process segfaults.
Safe cancellation is programmatic only — pass a context.Context to GenerateOptions.Context and cancel it from Go code. The generator checks ctx.Done() between tokens, so cancellation never interrupts an active FFI call:
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
_, err = ds4.Generator{Engine: engine, Session: session}.GenerateTokens(prompt, ds4.GenerateOptions{
MaxTokens: 128,
Context: ctx,
OnToken: func(token int) {
text, _ := engine.TokenText(token)
fmt.Print(text)
},
})This is exactly how examples/openai-compatible handles client disconnects — it wires r.Context() into generation so the engine stops cleanly when the HTTP connection drops.
Bindings are generated by hand against the public ds4 header at https://github.com/antirez/ds4/blob/main/ds4.h.
Inference runs in-process. The Golang wrapper adds FFI calls but does not proxy tokens through a server or copy model weights. Prefill, generation, Metal/CUDA/CPU execution, MTP, KV reuse, and disk KV payload serialization are all handled by the loaded ds4 shared library.
We welcome contributions and feedback. Please adhere to our Code of Conduct when engaging our community.
Thanks to @antirez for his work on ds4 and for his local-LLM advocacy. Thanks to DeepSeek for their public contributions.
Released under the MIT License, see LICENSE.txt.
Copyright (c) 2026 Neomantra Corp.
Made with ❤️ and 🔥 by the team behind Nimble.Markets.