Live at ai.sarmalinux.com. A bring-your-own-keys AI chat workspace. Seventeen providers, Groq, Cerebras, SambaNova, Gemini, OpenRouter, Mistral, DeepSeek and more, with free tiers first and automatic failover. Zero-knowledge encryption done client-side: AES-256-GCM, PBKDF2 at 600,000 iterations, keys and chats encrypted before they leave your device. Chats live in your browser, your GitHub Gists or your Cloudflare R2, never an operator database. Coder mode builds and runs complete apps from a description in a live sandboxed preview, then saves them, single files or full multi-file projects, straight to a folder on your own machine: ai.sarmalinux.com/build. Installable PWA. Free during beta. Built solo in London.
This one is a commercial hosted product, not an open-source repo, and it is separate from Sarmalink-AI, the MIT-licensed gateway below.
Drop-in OpenAI-compatible gateway. Every request fans across 36 engines from 7 providers. When the primary returns 429 or 5xx, the next engine fires in under 50 milliseconds. Round-robin key rotation, six specialised modes (Smart, Reasoner, Live, Fast, Coder, Vision), an MCP-shape tool catalog, persistent user memory, FLUX image generation with key rotation, plus TTS / STT cascades. Built so an internal AI product never sees an outage the way a single-provider wrapper does.
%%{init: {'theme':'dark','themeVariables':{'primaryColor':'#0d2e4f','primaryTextColor':'#e6f5ff','lineColor':'#22d3ee','primaryBorderColor':'#22d3ee','actorBkg':'#1e3a5f','actorBorder':'#22d3ee','actorTextColor':'#ffffff'}}}%%
sequenceDiagram
autonumber
participant Client
participant Router as Intent Router
participant PA as Primary Engine
participant PB as Failover Engine
participant Mem as Memory + Tools
Client->>Router: POST /api/v1/chat
Router->>Router: classify intent (Smart / Live / Coder / ...)
Router->>PA: dispatch primary
PA-->>Router: 429 Too Many Requests
Note over Router,PB: handoff in under 50ms
Router->>PB: retry on next engine
PB->>Mem: recall facts + tools
Mem-->>PB: context window
PB-->>Router: 200 streaming
Router-->>Client: SSE first token ~120ms
5 engines GPT-OSS 120B + 20B |
4 engines DeepSeek V3.2 |
3 engines Qwen 3 235B |
4 engines 2.5 Flash + 3 |
17 engines Nemotron + GLM |
images klein 9B + 4B |
live weather + FX |
First major release. Fourteen sp_* tools replace whole-file reads with scoped symbol pulls, reproducible ~95% per-read savings via pnpm benchmark. A React + Vite + d3 dashboard with nine routed views including an interactive code dependency graph. A cross-tab agent bus that lets multiple Claude Code tabs on one project coordinate at turn boundaries. A cold-start knowledge feed on every SessionStart so no session begins blank. Dollar cost of tokens saved, downloadable session reports, a memory doctor, the insights band, the project knowledge brief, and a 75-skill methodology library.
Six editor install paths · 321 tests · MIT
Bring-your-own-subscription. Echo never asks for an API key. It dispatches each prompt to whichever subscription-backed CLI you already pay for, claude, codex or gemini, picked by a router that scores capability, quota remaining and freshness. Voice in. Voice out. Vision when it helps. Memory across years. Translucent multi-monitor HUD planned. Cross-platform from one Rust core. MIT. Local-first.
Where it is now: Foundation + the orchestration layer are in and tested, 64 tests green. The brain router across claude/codex/gemini is wired and proven against a fake CLI; the file-based memory store with PreSession digests is live; an MCP skills bus runs weather / web search / files; the voice traits are defined and the macOS TTS adapter is real.
What is still landing: real Porcupine wake word, real cpal mic capture, real whisper.cpp speech-to-text, real Piper TTS as the cross-platform default, the wired end-to-end voice loop, the setup wizard, sqlite-vss vector memory.
Then: HUD polish + multi-monitor, calendar + mail over one-click OAuth, the senses, a proactive engine, autonomous workflows, signed installers.
I am Sarma. I build open-source software from a desk in the UK.
LLM infrastructure, coding agents, inference servers, storage engines, consensus protocols, WebAssembly sandboxes, platform tools. Every project lives on GitHub with a whitepaper, an architecture diagram and a quick-start guide on sarmalinux.com/products.
What pulls me back to the desk every weekend is the same thing that pulled me into the industry: the quiet thrill of building something from scratch. A blank repository, a problem worth solving, a system that did not exist yesterday and ships today.
When I am not at the desk, I write long-form essays about what I am learning, contribute to the open-source projects I rely on, and run a small weekend charity where I build free websites for local businesses in Hemel Hempstead.
| Date | What |
|---|---|
| 2 Jul 2026 | sarmalink launched: the studio's first hosted product. BYOK AI chat workspace, seventeen providers with free tiers first and automatic failover, client-side zero-knowledge encryption, chats in your browser / your Gists / your R2, Coder mode with live sandboxed preview, installable PWA, free during beta. Launch post. |
| 8 Jun 2026 | echo Phase 0 + brain-router scaffolding in: Brain trait + Claude/Codex/Gemini subprocess wrappers, capability-and-quota router, file-based memory with PreSession digests, MCP skills bus with weather/web-search/files, voice traits + macOS TTS. 64 tests green. Real wake word, mic, whisper.cpp and Piper are next. v1.0 now aimed at 1 September 2026. |
| 6 Jun 2026 | slipstream v1.0.0: first major release. React dashboard with nine views, interactive code graph, cross-tab agent bus, cold-start knowledge feed, reproducible pnpm benchmark hitting ~95% per-read, dollar cost of tokens saved, memory doctor, 75-skill library, 321 tests. |
| 6 Jun 2026 | slipstream v0.27.0: production React dashboard (Vite + TypeScript + d3) with grouped sidebar (Now / History / Knowledge), typed JSON client and interactive knowledge graph. |
| 6 Jun 2026 | slipstream v0.24.0: reproducible token-savings benchmark. pnpm benchmark measures whole-file vs scoped reads on real files and prints a Markdown table. |
| 6 Jun 2026 | slipstream v0.8.0: dashboard insights band. Every data tab opens with a natural-language paragraph plus bullets, deterministic templates, zero LLM. |
| 4 Jun 2026 | slipstream v0.7.0: tabbed dashboard (Live, Project, Journal, Sessions, Memory) with 365-day heatmap, file leaderboard, kinds donut, distilled lessons. |
| 4 Jun 2026 | slipstream v0.6.0: cross-IDE parity (sp_digest + sp_resume + auto-mode-detect + slipstream-setup), nine backend features, redesigned glass-on-dark dashboard. |
| 3 Jun 2026 | NVIDIA Computex 2026 recap: Vera Rubin NVL72 in production, RTX Spark, Cosmos 3, Nemotron 3 Ultra. |
| 1 Jun 2026 | AI Engineer World's Fair 2026 recap: MCP took the year. Six themes that defined where AI engineering is going. |
| 31 May 2026 | echo repo opened, public launch scheduled 1 September 2026. |
| 3 May 2026 | Sarmalink-AI v2: intent auto-routing, MCP-shape tool catalog, TTS/STT cascades, image generation rotation. |
|
|
Every repo has a bespoke product trio on sarmalinux.com/products: whitepaper, architecture diagram, quick-start. All MIT.
The full eight-tier stack with every choice and why it earned a place lives at sarmalinux.com/technology. Boring tech, surgical complexity. No AWS, no Azure.
A handful of good entry points into the eighty-nine long-form engineering essays:
- sarmalink is live, a BYOK AI workspace where nobody else can read your chats, the launch post, why BYOK, the free-tier economics, the zero-knowledge architecture
- NVIDIA Computex 2026, what AI engineers need to know, Vera Rubin NVL72, RTX Spark, Cosmos 3, Nemotron 3 Ultra
- AI Engineer World's Fair 2026, what mattered, six themes that defined the year
- SarmaLink-AI failover deep dive, how multi-engine fallback actually works in production
- Building Agent Orchestrator, the journaled-Postgres pattern behind deterministic replay
- Why I open-sourced 12 repos, the reasoning, the trade-offs
- Terraform Stack vs Pulumi vs SST, an honest comparison
- F1 2026 mid-season after the cancellation, because not everything is code
I am open to permanent, full-time PAYE software engineering roles across the United Kingdom. Remote, hybrid or on-site. Senior or mid-level individual contributor in AI infrastructure, AI engineering, platform engineering, backend or full-stack development. Not taking contract, consulting or agency subcontract work.
The full pitch with a capability matrix, recent ships and selected open-source work lives at sarmalinux.com/hire-me.
Built by sarmalinux · UK · All projects MIT licensed · Updated daily


