AGENTS.md

PROJECT KNOWLEDGE BASE

Generated: 2026-06-03 Commit: d9fb076 (master) Workspace: Local-first LLM inference in Rust with Go and Python ports

OVERVIEW

This workspace contains the core Rust LLM inference engine (oxidize-core) and multiple frontends/bindings (CLI, server, Python bindings), plus parallel language ports in Go and pure Python for cross-platform deployment.

STRUCTURE

.
├── oxidize-core/      # Rust core: GGUF, tensors, quantization, generation, backends
│   ├── src/backends/  # CUDA, Metal, Vulkan, MLX, WebGPU (see backends/AGENTS.md)
│   ├── src/compute/   # Tensor ops, KV cache, flash attention, quantization (see compute/AGENTS.md)
│   ├── src/format/    # GGUF, SafeTensors, tokenizer (see format/AGENTS.md)
│   ├── src/mesh/      # Distributed inference (see mesh/AGENTS.md)
│   ├── src/model/     # Inference engine, sampling, DFlash (see model/AGENTS.md)
│   ├── src/paged_attention/ # vLLM-style paging scheduler (see paged_attention/AGENTS.md)
│   └── src/vision/    # Vision encoder / multimodal
├── oxidize-cli/       # Prompt/chat CLI, profiling, pipeline modes (see AGENTS.md)
├── oxidize-server/    # OpenAI-compatible HTTP API (axum) (see src/AGENTS.md)
├── oxidize-quantize/  # Offline weight conversion (see AGENTS.md)
├── oxidize-py/        # Python bindings (pyo3 + maturin) (see AGENTS.md)
├── oxidize-train/     # CSV classifier training
├── oxidize-golang/    # Go port of oxidize-core (see AGENTS.md)
├── oxidize-python/    # Pure-Python port (see AGENTS.md)
└── scripts/           # CI benchmark regression + dashboard

SUBDIRECTORY AGENTS.md MAP

Directory	File	Domain
`oxidize-core/src/compute/`	`compute/AGENTS.md`	CPU tensor ops, quantization, KV cache, flash attention
`oxidize-core/src/model/`	`model/AGENTS.md`	Inference engine, model loading, speculative decoding
`oxidize-core/src/mesh/`	`mesh/AGENTS.md`	Distributed inference (libp2p mesh)
`oxidize-core/src/backends/`	`backends/AGENTS.md`	Hardware compute backends
`oxidize-core/src/format/`	`format/AGENTS.md`	GGUF, SafeTensors, tokenizer
`oxidize-core/src/paged_attention/`	`paged_attention/AGENTS.md`	vLLM-style PagedAttention scheduler
`oxidize-server/src/`	`src/AGENTS.md`	OpenAI-compatible HTTP API (Axum)
`oxidize-cli/`	`AGENTS.md`	CLI for prompt/chat, benchmarking
`oxidize-quantize/`	`AGENTS.md`	Offline weight quantization utility
`oxidize-py/`	`AGENTS.md`	PyO3 Python bindings
`oxidize-golang/`	`AGENTS.md`	Go port of oxidize-core
`oxidize-python/`	`AGENTS.md`	Pure-Python port

CODE MAP

Symbol	Type	Location	Role
`ComputeBackend`	trait	`oxidize-core/src/backend.rs`	Abstraction all backends implement
`Model`	trait	`oxidize-core/src/model.rs`	Implemented by 5 structs (Inference, Llama, LayerWise, MLX, DFlash)
`GgufQuantizationType`	enum	`oxidize-core/src/format/gguf.rs`	Central type hub; 20+ cross-module refs
`tensor.rs`	module	`oxidize-core/src/compute/`	5,153 lines; 135 unsafe blocks; SIMD kernels
`scheduler.rs`	module	`oxidize-core/src/paged_attention/`	vLLM-style request scheduling
`app.rs`	module	`oxidize-server/src/`	Axum route assembly

WHERE TO LOOK (High-Level)

Task	Location	Notes
Add model architecture	`oxidize-core/src/model/inference.rs`	Extend `ModelArchitecture` enum
Add backend	`oxidize-core/src/backends/`	Implement `ComputeBackend` trait, add `XxxBuildInfo`
Add quantization type	`oxidize-core/src/compute/quantization.rs`	Also update `GgufQuantizationType` in `format/gguf.rs`
Tokenizer change	`oxidize-core/src/format/tokenizer.rs`	4 formats: SP, WordPiece, BPE, Tiktoken
Server route	`oxidize-server/src/routes/`	OpenAI-compatible endpoints
CLI subcommand	`oxidize-cli/src/main.rs`	Also check `src/bin/` for aux tools
Distributed logic	`oxidize-core/src/mesh/`	Only dir with real `mod.rs` + privacy boundaries
Port to Go	`oxidize-golang/`	Mirror Rust structure; see `oxidize-golang/AGENTS.md`
Port to Python	`oxidize-python/`	Mirror Go structure; see `oxidize-python/AGENTS.md`

CONVENTIONS

Flat module system: lib.rs uses #[path = "..."] to flatten all modules into crate root. Only mesh/, paged_attention/, vision/ have real mod.rs files.
Config + Error + Trait trinity: Every subsystem has XxxConfig, XxxError, and core trait/struct.
Error chaining: All errors wrap lower-level errors via From impls.
Backend dual-file: vulkan.rs + vulkan_stub.rs pair (only backend with this pattern).
Build info micro-pattern: Every backend exposes XxxBuildInfo + xxx_build_info() for compile-time detection.
Test co-location: Every .rs file has #[cfg(test)] module at bottom; no separate tests/ inside src/.

ANTI-PATTERNS (THIS PROJECT)

StdMutex in async context (oxidize-server/src/runtime/paged.rs) — should be tokio::sync::Mutex.
tensor.rs monolith — 5,153 lines mixing kernels, types, and ops. Refactor candidate.
Quantization constants shadowed in tensor.rs and cuda.rs — should be shared.
unwrap()/expect() proliferation — 1000+ instances in non-test code.

UNIQUE STYLES

Bottom-up file organization (tensor.rs): constants → errors → low-level kernels → high-level functions → Tensor struct (inverse of typical Rust).
WASM worker type embedding: util/web_worker.rs embeds complete TypeScript interface contracts as 60+ line string literals.
MLX macOS fortress: mlx.rs and mlx_inference.rs are heavily #[cfg(target_os = "macos")] gated.

COMMANDS

# Build / test / lint
make build    # release build
make test     # workspace tests
make lint     # clippy -D warnings
make fmt      # format check
make ci       # full CI equivalent

# Run
sfw cargo run -p oxidize-cli -- --prompt "hello"
sfw cargo run -p oxidize-server -- --host 127.0.0.1 --port 8080
sfw cargo run -p oxidize-quantize -- --input in.bin --output out.bin --source F32 --target Q4_0

# WASM
make wasm     # outputs to dist/wasm

NOTES

Rust edition 2024, resolver "3".
Release profile: lto = true, panic = "abort".
cargo-deny audits licenses + security (see deny.toml).
.cargo/config.toml sets custom linker for aarch64-unknown-linux-gnu and WASM runner.
oxidize-core/fuzz/ exists but is NOT in workspace members/exclude.
models/ is gitignored but contains tracked files.
GGUF/SafeTensors draft-model loading + speculative generation summarizing is active development area.

Learned User Preferences

When adding oxidize-python or expanding oxidize-golang, keep all Rust crates and features; do not delete or replace the Rust workspace.
Parallel language ports should reach feature parity with oxidize-core (user asked for every Rust feature in Python/Go, with Python targeting similar CLOC to Rust).
Keep oxidize-py (PyO3/maturin bindings) alongside the pure-Python oxidize-python package.
When syncing ports, bring new master Rust features into oxidize-golang (and follow-on Python work) rather than leaving ports stale.
On feature branches, stage and commit only files related to the task; exclude unrelated workspace changes.
oxidize run <model> should start the OpenAI-compatible HTTP/WebSocket server by default; use --no-api for local inference only.
Contributions should keep tests passing and use clear, ethical PR/markdown descriptions; include benchmarks when claiming performance changes.

Learned Workspace Facts

oxidize-golang/ is the active Go port of oxidize-core; CLI lives in internal/cli/ (run, chat, bench, inspect, list, serve); HF GGUF resolver in hf/.
oxidize-python/ is a pure-Python implementation (oxidize_python, pyproject.toml, uv/pytest); CLI mirrors Go subcommands; HF resolver in oxidize_python/hf/hub.py with cache ~/.cache/oxidize/hf.
Do not modify Rust crates when extending oxidize-python; port from oxidize-golang or Rust sources.
oxidize-py/ is the PyO3 bindings crate, separate from oxidize-python.
Go and Python port tests reuse GGUF fixtures under oxidize-core/tests/fixtures/ (e.g. valid-v3.gguf).
DFlash speculative decoding in oxidize-core/src/model/dflash.rs is an active port target for oxidize-golang (and downstream Python).
Rust oxidize run rewrites to --serve-api by default (background in-process server on --api-host/--api-port); realtime WebSocket at ws://HOST:PORT/v1/realtime (oxidize-server/tests/realtime_ws.rs).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PROJECT KNOWLEDGE BASE

OVERVIEW

STRUCTURE

SUBDIRECTORY AGENTS.md MAP

CODE MAP

WHERE TO LOOK (High-Level)

CONVENTIONS

ANTI-PATTERNS (THIS PROJECT)

UNIQUE STYLES

COMMANDS

NOTES

Learned User Preferences

Learned Workspace Facts

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

PROJECT KNOWLEDGE BASE

OVERVIEW

STRUCTURE

SUBDIRECTORY AGENTS.md MAP

CODE MAP

WHERE TO LOOK (High-Level)

CONVENTIONS

ANTI-PATTERNS (THIS PROJECT)

UNIQUE STYLES

COMMANDS

NOTES

Learned User Preferences

Learned Workspace Facts