We make large language models faster on real hardware.
Zapdev-labs builds quantization, inference runtimes, and benchmarks — Rust and Python tooling that cuts memory, raises tokens-per-second, and ships OpenAI-compatible APIs you can run locally.
- Weight & KV-cache quantization
- Local inference (Rust + Python)
- Cross-backend TPS benchmarks
- Low-VRAM and edge deployment
12
Open-source projects
3-bit
KV cache compression
>
TPS vs stock llama.cpp
zapdev-labs / oxidize — bench session a91c
$ turboforge hardware probe
→ Ryzen 7 PRO 6850H · 28GB RAM · llama.cpp + vLLM candidates
$ turboquant bench --bits 3 --target kv-cache
→ FastVQ: 6x KV memory reduction · recall within baseline
$ oxidize serve --model qwen3.5-4b-q4
ok OpenAI /v1 ready · oxidize runtime (not llama-server)
$ python benchmark.py --backends llama_cpp,oxidize,miniforge
→ same GGUF · llama.cpp baseline · oxidize + miniforge ahead on TPS
ok fastest this run: oxidize (higher tok/s than llama.cpp)
$ ▊
Flagship work
Open source for faster LLMs
These are the repos we actively push on GitHub — inference engines, quantizers, runtimes, and the benchmarks that prove the wins.
More from the org
Benchmarks, memory, and agents
How we ship performance
From probe to production API
Our repos chain together: measure the machine, compress weights and KV cache, benchmark backends, then serve through OpenAI-compatible endpoints.
- 01
Probe hardware
turboforge hardware probe · miniforge config doctor
- 02
Compress
turboquant / oxidize-quantize · 3-bit KV and weight paths
- 03
Benchmark
ollama-performance-benchmark · tokens/sec per backend
- 04
Serve
oxidize-server · turboforge runtime · miniforge streaming
- 05
Ship
OpenAI-compatible APIs and reproducible manifests
People
Who builds here
The team behind Zapdev-labs — inference, quantization, and the benchmarks that back our claims.
Proof, not promises
Benchmarks you can rerun
ollama-performance-benchmark runs each backend sequentially on the same GGUF and prompt. On our hardware, oxidize and miniforge beat stock llama.cpp tokens-per-second; results land in results.csv. turboforge persists benchmark history on its runtime path.
Reference