Machine learning lab

We make large language models faster on real hardware.

Zapdev-labs builds quantization, inference runtimes, and benchmarks — Rust and Python tooling that cuts memory, raises tokens-per-second, and ships OpenAI-compatible APIs you can run locally.

Weight & KV-cache quantization
Local inference (Rust + Python)
Cross-backend TPS benchmarks
Low-VRAM and edge deployment

github.com/Zapdev-labs See our projects zapdev.link

Open-source projects

3-bit

KV cache compression

TPS vs stock llama.cpp

workbench — zapdev-labs

zapdev-labs / oxidize — bench session a91c

$ turboforge hardware probe

→ Ryzen 7 PRO 6850H · 28GB RAM · llama.cpp + vLLM candidates

$ turboquant bench --bits 3 --target kv-cache

→ FastVQ: 6x KV memory reduction · recall within baseline

$ oxidize serve --model qwen3.5-4b-q4

ok OpenAI /v1 ready · oxidize runtime (not llama-server)

$ python benchmark.py --backends llama_cpp,oxidize,miniforge

→ same GGUF · llama.cpp baseline · oxidize + miniforge ahead on TPS

ok fastest this run: oxidize (higher tok/s than llama.cpp)

$ ▊

Flagship work

Open source for faster LLMs

These are the repos we actively push on GitHub — inference engines, quantizers, runtimes, and the benchmarks that prove the wins.

InferenceRust

oxidize

Fast, local-first LLM inference in Rust — CLI, OpenAI-compatible server, Python bindings, and quantization. Beats stock llama.cpp on the same GGUF in our back-to-back TPS benchmarks.

oxidize

QuantizationPython

turboquant

FastVQ on PyPI — PolarQuant, QJL, and TurboQuant for ~3-bit compression, ~6× smaller KV caches, and optimized GPU kernels with near-zero accuracy loss.

turboquant

RuntimePython

turboforge

Hardware-aware enterprise runtime: planner hints, llama.cpp and vLLM backends, benchmark telemetry, and an OpenAI-compatible API with tokens-per-second in the response.

turboforge

ServingPython

miniforge

High-performance MiniMax M2.7 inference with GGUF quantization, TurboQuant KV cache, and hardware-tuned presets — higher TPS than stock llama-server on the same weights in our benchmarks.

miniforge

Benchmarks, memory, and agents

ollama-performance-benchmarkPython

Back-to-back TPS benchmark on one GGUF and prompt — proves oxidize and miniforge post higher tokens/sec than stock llama-server, alongside Ollama and vLLM.

openagentmailTypeScript

Self-hosted AgentMail-compatible email for AI agents — inboxes, threads, webhooks, and REST APIs.

zapdevTypeScript

AI app builder with live sandboxes, streaming agents, Convex persistence, and E2B isolation.

self-learning-aiPython

Local self-improving coding agent with LoRA fine-tuning, vector memory, and sandboxed evaluation loops.

How we ship performance

From probe to production API

Our repos chain together: measure the machine, compress weights and KV cache, benchmark backends, then serve through OpenAI-compatible endpoints.

Probe hardware

turboforge hardware probe · miniforge config doctor

Compress

turboquant / oxidize-quantize · 3-bit KV and weight paths

Benchmark

ollama-performance-benchmark · tokens/sec per backend

Serve

oxidize-server · turboforge runtime · miniforge streaming

Ship

OpenAI-compatible APIs and reproducible manifests

People

Who builds here

The team behind Zapdev-labs — inference, quantization, and the benchmarks that back our claims.

Jackson Wheeler

Proof, not promises

Benchmarks you can rerun

ollama-performance-benchmark runs each backend sequentially on the same GGUF and prompt. On our hardware, oxidize and miniforge beat stock llama.cpp tokens-per-second; results land in results.csv. turboforge persists benchmark history on its runtime path.

turboforge hardware probeoxidize servepython benchmark.py

View benchmark repo

Abstract hardware bench visual for Zapdev-labs

Reference

Common questions

We are a machine learning lab focused on LLM performance: quantization (turboquant), local inference stacks (oxidize, miniforge, turboforge), and honest cross-backend benchmarks. We also ship agent infrastructure like openagentmail and product tooling like zapdev.

We make large language models faster on real hardware.

Open source for faster LLMs

oxidize

turboquant

turboforge

miniforge

Benchmarks, memory, and agents

From probe to production API

Who builds here

Jackson Wheeler

Benchmarks you can rerun

Common questions

What does Zapdev-labs work on?

Where should I start?

How does TurboQuant relate to Miniforge?

How do I reach the team?