Open-source studio, UK

I build software in the open.

Hi, I am Sarma. I run an open-source studio from a desk in the UK. Nineteen MIT-licensed projects, eighty-seven long-form essays, and three flagships in active development. Everything I build is on GitHub with a whitepaper, an architecture diagram, and a quick-start guide.

My story

I build software in the open.

I am Sarma, a software engineer based in the UK. I work on LLM infrastructure, coding agents, inference servers, storage engines, consensus protocols and the platform tools that hold all of it together.

What pulls me back to the desk every weekend is the same thing that pulled me into the industry, the quiet thrill of building something from scratch. A blank repository, a problem worth solving, a system that did not exist yesterday and ships today. Hours go past unnoticed.

Everything I build lives in the open at github.com/sarmakska, MIT licensed, with a whitepaper, an architecture diagram and a quick-start guide per project.

Open-source repos

Long-form essays

MIT

All projects free to fork

Connect on LinkedIn

Sarma Kaza

Three flagships in the open

Built. Documented. Shipped. All MIT.

Sarmalink-AI

Multi-provider AI gateway

OpenAI-compatible. Thirty-six engines across seven providers. When the primary returns 429, the next engine fires in under fifty milliseconds. Intent auto-router, MCP-shape tool catalog, persistent memory, FLUX image generation, TTS and STT cascades.

v2 in production repo

slipstream

Claude Code plugin + cross-IDE MCP toolkit

Fourteen sp_* tools replace whole-file reads with scoped symbol pulls, reproducible ~95% per-read savings via pnpm benchmark. React + Vite + d3 dashboard with nine routed views. Cross-tab agent bus. 75-skill methodology library. Six editor install paths.

v1.0 shipped 6 Jun 2026 repo

echo

The open Jarvis you actually own

Bring-your-own-subscription. Never an API key. Dispatches each prompt to whichever subscription-backed CLI you already pay for, claude / codex / gemini. Voice in, voice out, vision when it helps. Multi-monitor HUD. One Rust core. Local-first. MIT.

Phase 0 + 1 in, real audio next repo

Plus sixteen more open-source projects across systems software, platform engineering and AI applications. See the full portfolio →

Lab notebook, live

What I am shipping right now.

I write in public and ship in public. This is what is on the desk this week, end of the day on the day you opened this page.

Live, this week

8 Jun 2026

echo, Phase 1 voice loop landing

The brain router across claude, codex and gemini is wired and tested. Memory store with PreSession digests is live. MCP skills bus runs weather, web search and files. Real Porcupine wake word, whisper.cpp speech to text, and Piper text to speech are the next pieces in.

tests green

See the canonical plan

Shipped, this month

6 Jun 2026

slipstream v1.0, first major release

Cross-IDE MCP toolkit with fourteen sp_* tools that replace whole-file reads with scoped symbol pulls. React + Vite + d3 dashboard with nine routed views and an interactive code dependency graph. Cross-tab agent bus, cold-start knowledge feed, seventy-five-skill methodology library. Reproducible ninety-five percent per-read savings via pnpm benchmark.

321

tests, six editors

Install in six editors

Live on the lab notebook

9 Jun 2026

Samsung Galaxy S26 Ultra, ninety days in

I bought the S26 Ultra on the 11th of March, the day it went on sale. Ninety days, three OTA updates, and the One UI 9.0 beta later, this is the honest review. Verified Geekbench numbers, real comparison charts against the iPhone 17 Pro Max, my own shot on the phone.

13 min

read, all data sourced

Read the review

Always shipping

9 Jun 2026

Eighty-seven essays, written in public

Long-form notes on LLM infrastructure, agent orchestration, storage engines, consensus, Rust inference, WebAssembly sandboxes, and the indie SaaS stack. Real numbers, cited sources, no content marketing.

long-form essays

Open the blog

Watch the commits land at github.com/sarmakska

How I think about engineering

Code that survives the six-month test

Code that works in a demo and dies in production is technical theatre. I optimise for the moment six months after launch, when somebody else has to read it, change it, and own the on-call page when something breaks.

Operating principles

Boring tech, surgical complexity. Postgres before Mongo. Server-rendered HTML before another SPA framework. Reach for the exotic only when the boring option genuinely runs out.

Open source by default. Nineteen public repositories under MIT, covering coding agents, gateways, inference, storage engines, consensus, and sandboxes. If a piece of work is generally useful, I publish it.

Numbers over narratives. Every blog post that claims a benchmark cites the source. Every chart marks whether the row is from a public benchmark or my own. A transparency footer on every post invites readers to flag bad numbers.

Ship the smallest thing that proves the next thing. Small commits, frequent deploys, observability before features. Big-bang releases are how products get cancelled mid-flight.

Defaults that respect the user. No silent analytics. No cookie banners I would hate. Real auth on day one, real row-level security on every table. The defaults you would want if you cloned my code at midnight.

What I actually build

Twelve lanes, nineteen repositories

Not abstract capability lists. Each card below maps to repositories you can clone, read, and run today.

Multi-provider AI gateways

SarmaLink-AI: multi-engine failover across fourteen providers, OpenAI-compatible proxy, persistent memory, image generation, live tools. Zero-cost frontier tier as the default route.

Agent orchestration

Durable multi-agent workflows with deterministic replay, journaled state in Postgres, hard tool and token budgets, BullMQ queue, Inspector UI. Workflows that survive restarts and pass audit.

Real-time voice loops

Sub-second WebRTC voice agent with mediasoup SFU, pluggable STT, LLM and TTS adapters, explicit turn-state machine, barge-in cancellation tested across the awkward cases.

Evals as code

Datasets as files, scorers as functions, traces in DuckDB, viewer in HTMX. Six built-in scorers including LLM-as-judge. Regression mode fails CI when a release loses ground against the baseline.

Production infrastructure

Helm charts for Next.js with the full observability stack (Prometheus, Grafana, Loki, Alertmanager) preconfigured. Terraform stack composing Vercel, Supabase, Cloudflare, DigitalOcean.

RAG and document intelligence

A clean end-to-end RAG starter you can clone, run, and ship in ten minutes. PDF chunking, embeddings, cosine retrieval, streaming answers with citations. Receipt OCR with Zod-validated JSON output.

Inference internals

forge-infer: a minimal LLM inference server in Python with paged KV-cache, continuous batching, and speculative decoding. Built to understand the layer that the SDKs hide.

Storage and consensus

lsmdb: a log-structured merge-tree engine in Go with WAL, SSTables, bloom filters, MVCC snapshots. raftkv: a Raft key-value store with a fault-injection harness proving linearizability under partitions.

Sandboxing and isolation

sandboxd: a WebAssembly sandbox in Rust with a deny-by-default host ABI and strict CPU, wall-clock, and memory limits. Built for the moment somebody hands an agent untrusted code.

Observability that pays its rent

Structured logs, RED metrics, exemplar-linked traces, dashboards that diagnose rather than decorate. If a graph never gets opened during an incident, it does not deserve to exist.

Multi-tenant SaaS plumbing

shipyard: tenant isolation by row and by schema, RBAC, audit log, billing hooks, rate limits. The boring foundation under every B2B product, ready to clone.

Developer-grade tooling

slipstream: a token-efficient coding agent with persistent memory and a live local dashboard. The tool I use on myself before I ship it to anybody else.