Thanks to visit codestin.com
Credit goes to zapdev.link

Machine learning lab

We make large language models faster on real hardware.

Zapdev-labs builds quantization, inference runtimes, and benchmarks — Rust and Python tooling that cuts memory, raises tokens-per-second, and ships OpenAI-compatible APIs you can run locally.

  • Weight & KV-cache quantization
  • Local inference (Rust + Python)
  • Cross-backend TPS benchmarks
  • Low-VRAM and edge deployment

12

Open-source projects

3-bit

KV cache compression

>

TPS vs stock llama.cpp

workbench — zapdev-labs

zapdev-labs / oxidize — bench session a91c

$ turboforge hardware probe

→ Ryzen 7 PRO 6850H · 28GB RAM · llama.cpp + vLLM candidates

$ turboquant bench --bits 3 --target kv-cache

→ FastVQ: 6x KV memory reduction · recall within baseline

$ oxidize serve --model qwen3.5-4b-q4

ok OpenAI /v1 ready · oxidize runtime (not llama-server)

$ python benchmark.py --backends llama_cpp,oxidize,miniforge

→ same GGUF · llama.cpp baseline · oxidize + miniforge ahead on TPS

ok fastest this run: oxidize (higher tok/s than llama.cpp)

$

How we ship performance

From probe to production API

Our repos chain together: measure the machine, compress weights and KV cache, benchmark backends, then serve through OpenAI-compatible endpoints.

  1. 01

    Probe hardware

    turboforge hardware probe · miniforge config doctor

  2. 02

    Compress

    turboquant / oxidize-quantize · 3-bit KV and weight paths

  3. 03

    Benchmark

    ollama-performance-benchmark · tokens/sec per backend

  4. 04

    Serve

    oxidize-server · turboforge runtime · miniforge streaming

  5. 05

    Ship

    OpenAI-compatible APIs and reproducible manifests

People

Who builds here

The team behind Zapdev-labs — inference, quantization, and the benchmarks that back our claims.

Proof, not promises

Benchmarks you can rerun

ollama-performance-benchmark runs each backend sequentially on the same GGUF and prompt. On our hardware, oxidize and miniforge beat stock llama.cpp tokens-per-second; results land in results.csv. turboforge persists benchmark history on its runtime path.

turboforge hardware probeoxidize servepython benchmark.py
View benchmark repo
Abstract hardware bench visual for Zapdev-labs

Reference

Common questions

We are a machine learning lab focused on LLM performance: quantization (turboquant), local inference stacks (oxidize, miniforge, turboforge), and honest cross-backend benchmarks. We also ship agent infrastructure like openagentmail and product tooling like zapdev.