Sarma Linux sarmakska

sarmalink · the studio's first hosted product, live now

Live at ai.sarmalinux.com. A bring-your-own-keys AI chat workspace. Seventeen providers, Groq, Cerebras, SambaNova, Gemini, OpenRouter, Mistral, DeepSeek and more, with free tiers first and automatic failover. Zero-knowledge encryption done client-side: AES-256-GCM, PBKDF2 at 600,000 iterations, keys and chats encrypted before they leave your device. Chats live in your browser, your GitHub Gists or your Cloudflare R2, never an operator database. Coder mode builds and runs complete apps from a description in a live sandboxed preview, then saves them, single files or full multi-file projects, straight to a folder on your own machine: ai.sarmalinux.com/build. Installable PWA. Free during beta. Built solo in London.

This one is a commercial hosted product, not an open-source repo, and it is separate from Sarmalink-AI, the MIT-licensed gateway below.

Sarmalink-AI · one endpoint, thirty-six engines, zero surprise bills

Drop-in OpenAI-compatible gateway. Every request fans across 36 engines from 7 providers. When the primary returns 429 or 5xx, the next engine fires in under 50 milliseconds. Round-robin key rotation, six specialised modes (Smart, Reasoner, Live, Fast, Coder, Vision), an MCP-shape tool catalog, persistent user memory, FLUX image generation with key rotation, plus TTS / STT cascades. Built so an internal AI product never sees an outage the way a single-provider wrapper does.

How a request flows

%%{init: {'theme':'dark','themeVariables':{'primaryColor':'#0d2e4f','primaryTextColor':'#e6f5ff','lineColor':'#22d3ee','primaryBorderColor':'#22d3ee','actorBkg':'#1e3a5f','actorBorder':'#22d3ee','actorTextColor':'#ffffff'}}}%%
sequenceDiagram
    autonumber
    participant Client
    participant Router as Intent Router
    participant PA as Primary Engine
    participant PB as Failover Engine
    participant Mem as Memory + Tools
    Client->>Router: POST /api/v1/chat
    Router->>Router: classify intent (Smart / Live / Coder / ...)
    Router->>PA: dispatch primary
    PA-->>Router: 429 Too Many Requests
    Note over Router,PB: handoff in under 50ms
    Router->>PB: retry on next engine
    PB->>Mem: recall facts + tools
    Mem-->>PB: context window
    PB-->>Router: 200 streaming
    Router-->>Client: SSE first token ~120ms

Seven providers, thirty-six engines, six modes

_{5 engines
GPT-OSS 120B + 20B}

_{4 engines
DeepSeek V3.2}

_{3 engines
Qwen 3 235B}

_{4 engines
2.5 Flash + 3}

_{17 engines
Nemotron + GLM}

_{images
klein 9B + 4B}

_{live
weather + FX}

$  cp | ctx 12%* ok | mem 4 | obs 37 | opt 71% | skill scoped-read    (~12 steps)

slipstream · Claude Code plugin + cross-IDE MCP toolkit

First major release. Fourteen sp_* tools replace whole-file reads with scoped symbol pulls, reproducible ~95% per-read savings via pnpm benchmark. A React + Vite + d3 dashboard with nine routed views including an interactive code dependency graph. A cross-tab agent bus that lets multiple Claude Code tabs on one project coordinate at turn boundaries. A cold-start knowledge feed on every SessionStart so no session begins blank. Dollar cost of tokens saved, downloadable session reports, a memory doctor, the insights band, the project knowledge brief, and a 75-skill methodology library.

Six editor install paths · 321 tests · MIT

echo · the open Jarvis you actually own

Bring-your-own-subscription. Echo never asks for an API key. It dispatches each prompt to whichever subscription-backed CLI you already pay for, claude, codex or gemini, picked by a router that scores capability, quota remaining and freshness. Voice in. Voice out. Vision when it helps. Memory across years. Translucent multi-monitor HUD planned. Cross-platform from one Rust core. MIT. Local-first.

Where it is now: Foundation + the orchestration layer are in and tested, 64 tests green. The brain router across claude/codex/gemini is wired and proven against a fake CLI; the file-based memory store with PreSession digests is live; an MCP skills bus runs weather / web search / files; the voice traits are defined and the macOS TTS adapter is real.

What is still landing: real Porcupine wake word, real cpal mic capture, real whisper.cpp speech-to-text, real Piper TTS as the cross-platform default, the wired end-to-end voice loop, the setup wizard, sqlite-vss vector memory.

Then: HUD polish + multi-monitor, calendar + mail over one-click OAuth, the senses, a proactive engine, autonomous workflows, signed installers.

About me

I am Sarma. I build open-source software from a desk in the UK.

LLM infrastructure, coding agents, inference servers, storage engines, consensus protocols, WebAssembly sandboxes, platform tools. Every project lives on GitHub with a whitepaper, an architecture diagram and a quick-start guide on sarmalinux.com/products.

What pulls me back to the desk every weekend is the same thing that pulled me into the industry: the quiet thrill of building something from scratch. A blank repository, a problem worth solving, a system that did not exist yesterday and ships today.

When I am not at the desk, I write long-form essays about what I am learning, contribute to the open-source projects I rely on, and run a small weekend charity where I build free websites for local businesses in Hemel Hempstead.

Recent ships

Date	What
2 Jul 2026	sarmalink launched: the studio's first hosted product. BYOK AI chat workspace, seventeen providers with free tiers first and automatic failover, client-side zero-knowledge encryption, chats in your browser / your Gists / your R2, Coder mode with live sandboxed preview, installable PWA, free during beta. Launch post.
8 Jun 2026	echo Phase 0 + brain-router scaffolding in: `Brain` trait + Claude/Codex/Gemini subprocess wrappers, capability-and-quota router, file-based memory with PreSession digests, MCP skills bus with weather/web-search/files, voice traits + macOS TTS. 64 tests green. Real wake word, mic, whisper.cpp and Piper are next. v1.0 now aimed at 1 September 2026.
6 Jun 2026	slipstream v1.0.0: first major release. React dashboard with nine views, interactive code graph, cross-tab agent bus, cold-start knowledge feed, reproducible `pnpm benchmark` hitting ~95% per-read, dollar cost of tokens saved, memory doctor, 75-skill library, 321 tests.
6 Jun 2026	slipstream v0.27.0: production React dashboard (Vite + TypeScript + d3) with grouped sidebar (Now / History / Knowledge), typed JSON client and interactive knowledge graph.
6 Jun 2026	slipstream v0.24.0: reproducible token-savings benchmark. `pnpm benchmark` measures whole-file vs scoped reads on real files and prints a Markdown table.
6 Jun 2026	slipstream v0.8.0: dashboard insights band. Every data tab opens with a natural-language paragraph plus bullets, deterministic templates, zero LLM.
4 Jun 2026	slipstream v0.7.0: tabbed dashboard (Live, Project, Journal, Sessions, Memory) with 365-day heatmap, file leaderboard, kinds donut, distilled lessons.
4 Jun 2026	slipstream v0.6.0: cross-IDE parity (`sp_digest` + `sp_resume` + auto-mode-detect + `slipstream-setup`), nine backend features, redesigned glass-on-dark dashboard.
3 Jun 2026	NVIDIA Computex 2026 recap: Vera Rubin NVL72 in production, RTX Spark, Cosmos 3, Nemotron 3 Ultra.
1 Jun 2026	AI Engineer World's Fair 2026 recap: MCP took the year. Six themes that defined where AI engineering is going.
31 May 2026	echo repo opened, public launch scheduled 1 September 2026.
3 May 2026	Sarmalink-AI v2: intent auto-routing, MCP-shape tool catalog, TTS/STT cascades, image generation rotation.

The portfolio · nineteen MIT-licensed projects

Flagships

Sarmalink-ai · Multi-provider OpenAI-compatible AI gateway with 36-engine failover across 7 providers, intent-based plugin auto-routing, MCP-shape tool catalog and Manus webhook persistence.
slipstream · v1.0 shipped. Claude Code plugin and cross-IDE MCP toolkit. Fourteen sp_* tools, self-building memory, lossless compaction, React dashboard with nine views and an interactive code dependency graph, cross-tab agent bus, cold-start knowledge feed, 75-skill methodology library. 321 tests, MIT.

Coming next

echo · An open Jarvis. Brain-agnostic across Claude Code, Codex CLI, Gemini CLI, Ollama and LM Studio. Translucent multi-monitor HUD planned. Phase 0 + Phase 1 orchestration scaffolding in, 64 tests; real audio I/O and the setup wizard ship next. Public v1.0 on 1 September 2026.

AI infrastructure

agent-orchestrator · Durable multi-agent workflows in TypeScript, deterministic replay, journaled Postgres state, BullMQ step queue, Inspector UI.
voice-agent-starter · Sub-second full-duplex WebRTC voice loop, mediasoup SFU, Fastify model worker, pluggable STT, LLM, TTS adapters.
ai-eval-runner · Evals as code. Python 3.12, Typer CLI, DuckDB store, FastAPI + HTMX viewer.
forge-infer · Minimal LLM inference server in Rust with paged KV-cache, continuous batching and speculative decoding.

MCP and AI applications

mcp-server-toolkit · Production Model Context Protocol server starter in Python and FastAPI.
local-llm-router · OpenAI-compatible proxy routing between Ollama and cloud LLMs by policy.
rag-over-pdf · A minimal, production-shaped RAG starter with cited streaming answers.
receipt-scanner · Vision OCR receipts to Zod-validated JSON.

Systems software

lsmdb · Log-structured merge-tree storage engine in Go. WAL, SSTables, bloom filters, MVCC snapshots.
raftkv · Raft KV store in Go with a fault-injection harness proving linearizability under partitions.
sandboxd · WebAssembly sandbox in Rust with a deny-by-default host ABI and strict CPU, wall-clock and memory bounds.

Platform engineering

terraform-stack · Vercel, Supabase, Cloudflare and DigitalOcean modules in one Terraform repo.
k8s-ops-toolkit · Helm chart for shipping Next.js to Kubernetes with full observability pre-wired.
shipyard · Multi-tenant SaaS scaffold in TypeScript. Tenant isolation, RBAC, billing, audit log, rate limits.

Tools

webhook-to-email · Webhook receiver that forwards events to email via Resend.
staff-portal · Open-source HR and ops portal. Leave, attendance, expenses, kiosk mode.

Every repo has a bespoke product trio on sarmalinux.com/products: whitepaper, architecture diagram, quick-start. All MIT.

Stack

The full eight-tier stack with every choice and why it earned a place lives at sarmalinux.com/technology. Boring tech, surgical complexity. No AWS, no Azure.

Stats

Writing

A handful of good entry points into the eighty-nine long-form engineering essays:

sarmalink is live, a BYOK AI workspace where nobody else can read your chats, the launch post, why BYOK, the free-tier economics, the zero-knowledge architecture
NVIDIA Computex 2026, what AI engineers need to know, Vera Rubin NVL72, RTX Spark, Cosmos 3, Nemotron 3 Ultra
AI Engineer World's Fair 2026, what mattered, six themes that defined the year
SarmaLink-AI failover deep dive, how multi-engine fallback actually works in production
Building Agent Orchestrator, the journaled-Postgres pattern behind deterministic replay
Why I open-sourced 12 repos, the reasoning, the trade-offs
Terraform Stack vs Pulumi vs SST, an honest comparison
F1 2026 mid-season after the cancellation, because not everything is code

Hiring

I am open to permanent, full-time PAYE software engineering roles across the United Kingdom. Remote, hybrid or on-site. Senior or mid-level individual contributor in AI infrastructure, AI engineering, platform engineering, backend or full-stack development. Not taking contract, consulting or agency subcontract work.

The full pitch with a capability matrix, recent ships and selected open-source work lives at sarmalinux.com/hire-me.

_{Built by sarmalinux · UK · All projects MIT licensed · Updated daily}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly