Thanks to visit codestin.com
Credit goes to echokit.dev

Skip to main content

echokit_pty: Giving Claude Code a Remote Control

· 6 min read

Claude Code is amazing. It writes code, fixes bugs, runs tests, explains errors. But it lives in your terminal. To use it, you type commands, get responses. It's a CLI tool, designed for terminal workflows.

But what if you want to build something on top of Claude Code? What if you want a web app? A mobile interface? A physical device? Voice control?

That's why we built echokit_pty.

What Is echokit_pty?

echokit_pty is the web version of Claude Code with a superpower: a WebSocket interface.

It handles Claude Code's input and output, turning its CLI into a WebSocket service. Suddenly, Claude Code isn't just a terminal app—it's a service that any application can talk to.

What makes it special?

  • Full Claude Code capabilities (file editing, command execution, tool use)
  • Clean JSON API for integrations. Check out the full API documentation here.
  • Open and extensible - unlike Claude Code's official Remote Control, you have full control over the protocol and can customize every aspect
  • No subscription required - free and open source vs. Max-only Remote Control
  • Bidirectional streaming (real-time responses)
  • Runs locally
  • Built for developers building custom solutions

The Problem: Claude Code Is Trapped in the Terminal

Claude Code was designed as a CLI tool. You run it in your terminal, type commands, get responses. This works great for terminal workflows.

But what if you want to:

  • Build a web app that uses Claude Code?
  • Create a physical device that talks to Claude Code?
  • Integrate Claude Code into another tool?
  • Build a custom application that needs programmatic access?

Yes, Claude Code now has an official Remote Control feature for mobile and web access. But it's designed as a user-facing feature, not a developer platform. It requires a Max subscription, uses a closed protocol, and can't be integrated into custom applications.

That's where echokit_pty comes in.

The Solution: PTY + WebSocket

The bridge is echokit_pty.

What does "pty" mean?

PTY stands for "pseudo-terminal"—a Unix concept that allows a program to control a terminal as if a user were typing.

echokit_pty uses this technology to create a bridge between:

  • WebSocket clients → send JSON commands
  • Claude Code CLI → executes the commands
  • Response streaming → sends results back

How It Works

echokit_pty is built with Rust. Here's the architecture:

┌─────────────────┐     WebSocket      ┌──────────────┐
│ Any Client │ ◄─────────────────► │ echokit_pty │
│ (Web, Mobile, │ (ws://localhost)│ (Rust) │
│ Device, etc.) │ └──────┬───────┘
└─────────────────┘ │
│ PTY

┌──────────────┐
│ Claude Code │
│ CLI │
└──────────────┘

The flow:

  1. echokit_pty starts a WebSocket server (default: ws://localhost:3000/ws)
  2. Clients connect via WebSocket and send JSON commands
  3. echokit_pty forwards commands to Claude Code CLI through a pseudo-terminal
  4. Claude Code executes the command
  5. Results stream back through the WebSocket in real-time

Example:

// Client sends
{"type": "command", "content": "run the tests"}

// echokit_pty forwards to Claude Code

// Results stream back
{"type": "response", "content": "Running tests...\n142 passed, 3 failed"}

Getting Started

Installation:

First, clone and build echokit_pty:

git clone https://github.com/second-state/echokit_pty.git
cd echokit_pty
cargo build --release --bin echokit_cc

Running echokit_pty:

Set your workspace directory and start the server:

ECHOKIT_WORKING_PATH="/path/to/your/workspace" \
./target/release/echokit_cc -c ./run_cc.sh -b "localhost:3000"

The WebSocket server will start on ws://localhost:3000/ws.

echokit_pty vs. Claude Code Remote Control

You might be wondering: Should I use echokit_pty or Claude Code's official Remote Control?

Use Claude Code Remote Control if:

  • You want to control Claude Code from your phone or browser
  • You have a Max subscription and don't need customization
  • You just need remote access, not programmatic control

Use echokit_pty if:

  • You're building a custom application or service
  • You need full control over the protocol and behavior
  • You want to integrate Claude Code into hardware (voice devices, custom interfaces)
  • You need a free, open-source solution
  • You're building something beyond simple remote control

They're complementary, not competing. Remote Control is perfect for individual developers who want mobile access. echokit_pty is for builders who want to create entirely new experiences on top of Claude Code.

Use Cases: What Can You Build?

The beauty of echokit_pty is that it turns Claude Code into a platform. Here's what you can build:

Voice-Controlled Coding Speak commands, Claude Code executes, hear results. Perfect for hands-free workflows. This is what EchoKit + echokit_pty enables.

Why echokit_pty for voice control? While Claude Code's Remote Control works great for mobile/web access, it doesn't support voice interfaces or custom hardware. echokit_pty's open API lets you build exactly the voice experience you need—whether that's a custom device, specialized voice commands, or integration with other speech services.

But it's just one implementation. For more details, check out EchoKit's full integration documentation.

Web Apps Build a web interface for Claude Code. No terminal required. Just open a browser, connect to the WebSocket, and start coding. Great for presentations, teaching, or developers who prefer GUIs.

Mobile Apps Control Claude Code from your phone. Run tests while walking. Check build status from the couch. Deploy from anywhere. Your coding environment fits in your pocket.

Pair Programming Platforms Create a web app where multiple people can interact with Claude Code simultaneously. Real-time collaboration, shared context, better than screen sharing.

Custom Developer Tools Build your own tools on top of Claude Code. Automations, dashboards, integrations—anything you can imagine. The WebSocket interface makes Claude Code a building block.

IDE Integrations Embed Claude Code directly into your IDE. VS Code extension, JetBrains plugin, custom editor—give Claude Code a proper home in your development environment.

The Vision

Claude Code is the most capable AI coding assistant today. With Remote Control, it can now follow you beyond the terminal. But that's just the beginning.

echokit_pty is about turning Claude Code into a true platform for innovation.

Imagine what we can build:

  • Voice-controlled coding assistants with custom hardware
  • Specialized interfaces for specific workflows
  • Custom developer tools and dashboards
  • AI-powered IDE integrations
  • Educational platforms with tailored experiences

All built on top of echokit_pty's open API.

Claude Code as a platform for builders, not just a tool for users.


Ready to build something?

Start with the echokit_pty repository. See EchoKit's full integration documentation for a complete example. Join our Discord community to share your ideas.

My Coding Assistant Lives in a Box Now | EchoKit

· 7 min read

It was 2 AM. I was deep in a coding session, fingers flying across the keyboard, completely in the zone. Then I hit a bug. I needed to run the tests.

Which meant breaking my flow. Switching windows. Typing the command. Waiting. Switching back.

I thought: What if I could just say it?

Not into my phone. Not unlocking an app. Just speak—to a device sitting on my desk.

A Small Device, Big Idea

That moment sparked an experiment. What if my AI coding assistant wasn't trapped in a terminal window, but lived in a small device on my desk? What if I could speak to it like a pair programmer sitting next to me?

Not voice typing—I hate that. But voice commands. Like having a junior developer who actually does things, not just suggests them.

So I built it.

Today, I'm excited to share how EchoKit became a voice remote control for Claude Code. And why this changes everything about how I work.

It Started with a Problem

Claude Code is amazing. It writes code, fixes bugs, runs tests, explains errors.

Yes, Claude Code now has an official Remote Control feature for mobile and web access. But it's designed for phones and browsers—not for hands-free voice control or physical devices. You still need to look at a screen and tap buttons.

I wanted something different. Something that felt like... magic.

The Missing Piece

I had EchoKit—my open-source voice AI device sitting on my desk. It can hear me, think, and respond. But it couldn't control my code editor.

I needed a bridge.

That bridge is called echokit_pty.

What is echokit_pty? It's the web version of Claude Code, but with a superpower: a WebSocket interface.

See, Claude Code was designed as a CLI tool. You run it in your terminal, type commands, get responses. That's great for terminal workflows. But for voice control? For remote access? For building anything on top of Claude Code?

You need something more.

echokit_pty is that "more."

How echokit_pty Changed Everything

Here's what echokit_pty does: it takes Claude Code and exposes it through a WebSocket server. Suddenly, Claude Code isn't just a terminal app—it's a service that anything can talk to.

My EchoKit device can send commands. A web app could send commands. A mobile app. A game controller. Anything that speaks WebSocket.

But here's the beautiful part: it's still Claude Code. All the capabilities, all the intelligence, everything that makes Claude Code amazing—just accessible through a clean, simple interface.

The Setup: Three Pieces, One Experience

Now my coding setup looks like this:

1. echokit_pty runs on my machine — Starts a WebSocket server (ws://localhost:3000/ws)

2. EchoKit Server connects to it — Handles speech recognition and text-to-speech

3. EchoKit Device sits on my desk — Listens for my voice, speaks back responses

My Voice: "Run the tests"

EchoKit Device (hears me)

EchoKit Server (transcribes speech)

echokit_pty (WebSocket connection)

Claude Code (executes the command)

Tests run, results stream back

EchoKit speaks: "142 tests passed, 3 failed"

All while I keep typing. No window switching. No flow breaking.

A Day in the Life

Let me show you what this actually feels like.

Morning: I sit down with coffee. "EchoKit, run the full test suite." I start reading emails while tests run in the background. Five minutes later: "Tests complete. Two failures in the auth module."

Afternoon: I'm stuck on a bug. "EchoKit, why is the login failing?" It explains the issue while I'm looking at the code. "Can you fix it?" "Done. Want me to run the tests?" "Yes."

Evening: I'm tired, don't want to type. "EchoKit, create a new feature branch called dark-mode." "Deploy staging." "Check if the build passed." Each command happens while I'm leaning back in my chair.

It feels like having a coding companion. Not a tool—a teammate.

Why This Matters

I know what you're thinking: Voice control for coding? Sounds weird. And doesn't Claude Code have Remote Control now?

You're right—it is weird at first. But here's the thing: Claude Code's Remote Control is great for mobile access, but EchoKit isn't your phone. It's a dedicated device that sits on your desk. Always on. Always listening. No unlocking, no apps, no picking it up.

Here's what I discovered:

It's not about voice typing. I'm not dictating code. That would be terrible.

It's about having a physical device. Think of it like a smart speaker for coding. It just sits there, ready to help. No screens to tap, no apps to open, no phone to find.

The magic is the always-there presence. The device lives on my desk. It's part of my workspace. I don't need to grab anything or unlock anything. I just speak.

It keeps me in the flow. That's the biggest one. I can stay focused on coding while EchoKit handles tasks in the background. It's like having a second pair of hands.

The Tech Behind the Magic

If you're curious how echokit_pty works technically, here's the short version:

PTY stands for "pseudo-terminal"—a Unix concept that lets a program control a terminal as if a user were typing. echokit_pty uses this to create a bridge between:

  • WebSocket clients → send JSON commands
  • Claude Code CLI → executes the commands
  • Response streaming → sends results back

It's built with Rust, runs locally, and is completely open source. No cloud required. Your code never leaves your machine.

But here's what I care about: it just works.

What You Can Do

So what does this actually look like in practice?

"Create a web page for me" → Claude Code generates the HTML, EchoKit confirms when done

"Run the tests" → Tests execute, EchoKit tells me the results

"Explain this error" → Claude Code analyzes, EchoKit reads the explanation

"Deploy to staging" → Deployment triggers, EchoKit confirms when complete

"Create a new branch" → Git command executes, no typing required

I can speak from across the room. Keep my hands on the keyboard while EchoKit works in the background. Get voice feedback without breaking my flow.

Building Your Own

This is the part I'm most excited about: everything here is open source.

  • EchoKit — Open hardware, Rust firmware, fully customizable
  • echokit_pty — Open source WebSocket interface for Claude Code
  • EchoKit Server — Rust-based voice AI server

You can build this yourself. Or modify it. Or extend it.

Want to add custom voice commands? Go ahead. Want to integrate with other tools? echokit_pty makes it possible. Want to build a completely different interface? The WebSocket is waiting.

The Future

This experiment showed me something: AI coding assistants can take many forms beyond screens and apps.

Claude Code's Remote Control solved mobile access. But what about specialized hardware? What about completely hands-free experiences? What about devices that do one thing perfectly?

echokit_pty is the bridge that makes these experiments possible. And EchoKit is just one example.

Imagine what else we could build:

  • Voice-controlled development environments
  • Specialized devices for specific workflows
  • Educational tools that feel like magic
  • Assistive technology for developers with disabilities

All built on top of echokit_pty's open WebSocket interface.

Try It Yourself

Ready to turn your AI assistant into a physical device?

Full Documentation: Remote Control Claude Code with Your Voice

EchoKit Hardware:

echokit_pty Repository: github.com/second-state/echokit_pty

Join the Community: EchoKit Discord

Build something cool. Then tell me about it.


PS: The first time I heard EchoKit say "Tests passed" while I was making coffee? That's when I knew this wasn't just a cool experiment. This was how I wanted to work from now on.

Day 26: Generate config.toml with Claude Code Skills | The First 30 Days with EchoKit

· 6 min read

Over the first 25 days of this series, we've configured EchoKit by manually editing config.toml files. That works fine for tweaks, but it's tedious when you're setting up EchoKit for the first time or trying a completely different configuration.

Today, we're introducing a faster way: the EchoKit Config Generator skill for Claude Code.

This skill automates the entire setup process through an interactive conversation—no manual TOML editing required.

Watch the skill in action:

What Are Claude Code Skills?

Claude Code "skills" are reusable prompts that live in .claude/skills/ directories. Think of them as mini-programs written in natural language. Instead of explaining what you want every time, you trigger a skill, and it guides the AI through a structured workflow.

Why do we need Claude Code skill for EchoKit?

Setting up an EchoKit server involves many steps: writing TOML configuration, understanding platform-specific field names, collecting API keys, building the server, finding your IP address, and launching with the right commands. For beginners, this can be overwhelming. Even experienced users can forget details like which section comes first, or whether ElevenLabs uses api_key or token.

The EchoKit Config Generator skill solves this by turning setup into a conversation. More importantly, it teaches you how to configure EchoKit server along the way. As you answer questions, you learn:

  • How to set up EchoKit server — What goes into config.toml and why
  • How to run EchoKit server — The cargo build --release command, launching with debug logging
  • How to get your IP address — The skill shows you exactly how to find your actual local IP (not localhost) and construct the WebSocket URL

Unlike documentation that you read once and forget, the skill guides you through each step interactively. You see the config being generated, understand what each field does, and learn the workflow by doing it—while the skill handles the technical details for you.

The EchoKit Config Generator skill comes bundled with the echokit_server repository. Just clone the repo, and Claude Code discovers it automatically.

Installing the Skill

First, clone the echokit_server repository:

git clone https://github.com/second-state/echokit_server.git
cd echokit_server

That's it. Claude Code automatically discovers skills in .claude/skills/ directories within your workspace. No additional installation required.

Using the Skill

In Claude Code, simply say: "Generate an EchoKit config for a coding assistant"

The skill guides you through a 5-phase process:

Phase 1: Describe Your Assistant — Answer 7 questions about purpose, tone, capabilities, response style, domain knowledge, constraints, and preferences. The skill generates a sophisticated system prompt from your answers.

Phase 2: Choose Platforms — For each service (ASR, TTS, LLM), select from pre-configured options or choose "Custom" to specify any platform. The skill auto-discovers API documentation via web search for custom platforms.

Phase 3: MCP Server — Optionally add an MCP server by providing the URL.

Phase 4: Preview and Generate — Review your complete config.toml, confirm it's correct, and the skill writes both config.toml and SETUP_GUIDE.md to your chosen directory.

Phase 5: API Keys and Launch — The skill shows where to get API keys, collects them from you, updates config.toml, builds the server with cargo build --release, and launches it with debug logging enabled. When the server starts, the skill automatically detects your local IP address and displays the WebSocket URL ready for you to connect.

From zero to running EchoKit in one conversation.

Why This Matters

The Config Generator offers several advantages:

Faster Setup — Answer questions instead of reading docs and writing TOML manually. The skill handles syntax, field names, and structure automatically.

Fewer Errors — No more wrong field names, incorrect section order, or missing fields. The skill knows platform-specific details like ElevenLabs using token instead of api_key.

Custom Platform Discovery — Want to use a new LLM provider? The skill searches the web for API documentation and confirms with you. Groq, DeepSeek, Mistral, Together—all auto-discovered.

Rich System Prompts — The 7-question phase generates sophisticated system prompts tailored to your use case, saving you time crafting them manually.

Complete Workflow — It doesn't just generate a config. It collects API keys, builds the server, launches it, and even detects your local IP address. You get a ready-to-use WebSocket URL—no manual IP lookup required.

Ready Connection Details — After launching, the skill automatically finds your actual local IP address (not localhost) and displays the complete WebSocket URL. Just copy and paste it into your EchoKit device to connect.

When to Use the Skill vs. Manual Configuration

Use the Skill WhenUse Manual Config When
First-time EchoKit setupQuick API key changes
Learning how EchoKit server worksAdjusting history value
Trying new LLM providersMinor parameter tweaks
Creating custom personalitiesVersion-controlling configs
Exploring custom platformsScripting deployments
Understanding the complete workflowYou know exactly what you need

Both approaches are valid. The skill is also a learning tool—it guides you through each step while explaining what's happening, so you understand the setup process deeply. Manual editing provides precision control once you're familiar with the configuration.

Supported Platforms

Pre-configured:

  • ASR: OpenAI Whisper, Local Whisper
  • TTS: OpenAI, ElevenLabs, GPT-SoVITS
  • LLM: OpenAI Chat, OpenAI Responses API

Custom (auto-discovered via web search):

  • Any OpenAI-compatible LLM: Groq, DeepSeek, Mistral, Together, and more
  • Any platform with documented APIs

Choose "Custom" and the skill finds the rest.

What's Next: Day 27

You now have a fully configured EchoKit server running a custom personality—set up through conversation, not configuration files.

But what happens when you want to share your EchoKit setup with others? Or deploy it to multiple devices?

On Day 27, we'll explore configuration management: versioning your configs, sharing setups, and managing multiple EchoKit instances.


Ready to try the Config Generator skill or share your own configurations?

Ready to get your own EchoKit?

Start building your own voice AI agent today.

Day 25: Built-in Web Search with LLM Providers | The First 30 Days with EchoKit

· 5 min read

On Day 15, we introduced EchoKit's ability to connect to MCP servers, giving your voice agent access to external tools and actions. We showed how to connect to Tavily for web search.

On Day 23, we added DuckDuckGo MCP for real-time web search.

Both approaches required you to host or connect to an external search service. That works great, but what if there was an even simpler way?

Today, we're exploring a different approach: using the LLM provider's own built-in web search.

No separate MCP servers to configure. No API keys for search services. No extra infrastructure.

Just enable a tool, and your EchoKit can search the web.

Before diving in, let's clarify the two approaches to web search we've covered:

ApproachSetupInfrastructureControl
MCP Servers)You connect to external search APIsRequires separate MCP serversFull control over search source
Built-in Tools (Today)Enable in config.tomlLLM provider handles everythingProvider manages search

Built-in tools are simpler — the LLM provider (OpenAI, xAI, etc.) handles everything. When your EchoKit needs current information, it just asks the provider, which performs the search and returns results.

MCP servers give you more control — you choose the search engine, can customize results, and can host it yourself.

Both approaches work. Today is about the simpler path: built-in tools.

The EchoKit server now supports the OpenAI Responses API — a stateful API that enables advanced LLM features including built-in web search.

Let's set up EchoKit with different LLM providers' built-in web search.

OpenAI with Web Search Preview

OpenAI offers the web_search_preview tool:

[llm]
platform = "openai_responses"
url = "https://api.openai.com/v1/responses"
api_key = "sk_ABCD"
model = "gpt-5-nano"

[[llm.extra.tools]]
type = "web_search_preview"

Key points:

  • platform = "openai_responses" enables the Responses API
  • type = "web_search_preview" enables OpenAI's built-in search

xAI's Grok offers a web_search tool with optional filtering:

[llm]
platform = "openai_responses"
url = "https://api.x.ai/v1/responses"
api_key = "xai_ABCD"
model = "grok-4-1-fast-non-reasoning"

[[llm.extra.tools]]
type = "web_search"
# Optional: filters = { "allowed_domains" = ["wikipedia.org"] }

Grok also provides a x_search tool to search posts on x.com (Twitter).

Groq's implementation calls it browser_search:

[llm]
platform = "openai_responses"
url = "https://api.groq.com/openai/v1/chat/responses"
api_key = "gsk_ABCD"
model = "openai/gpt-oss-20b"

[[llm.extra.tools]]
type = "browser_search"

Ask EchoKit: "What's the Weather?"

Once configured, restart your EchoKit server and try a question that requires current information:

User: "What's the weather like in San Francisco right now?"

Under the hood, here's what happens with the Responses API:

  1. EchoKit sends query — Only the latest user message is sent
  2. LLM evaluates — The provider determines this needs current information
  3. Web search performed — The provider searches automatically
  4. Response generated — The LLM synthesizes an answer from search results
  5. Context saved — Search results are stored for follow-up questions

EchoKit might respond like this:

"Let me check the current weather...

Currently in San Francisco, it's 62 degrees Fahrenheit with partly cloudy skies. The high today will be around 68 degrees, with a low of 55 tonight."

Try Follow-up Questions

Because the Responses API is stateful, follow-up questions work naturally:

User: "What about tomorrow?"

The LLM provider already has the weather context from the previous search, so it can answer immediately without searching again.

"Tomorrow in San Francisco, expect sunny skies with a high of 72 degrees and a low of 58. Perfect weather for being outdoors."

This context awareness is one of the key advantages of the Responses API.

Built-in Tools vs. MCP: Which to Use?

We've now covered two approaches to web search. When should you use each?

Use Built-in Tools When:

  • You want the simplest possible setup
  • You're already using an LLM provider that offers search
  • You don't need to customize search behavior
  • Performance and simplicity are priorities

Use MCP Servers When:

  • You want to choose your own search engine
  • You need to filter or customize results
  • You want to host search infrastructure yourself
  • You're in a region where built-in search isn't available

Both approaches are valid. The beauty of EchoKit is that you can mix and match — use built-in tools from your provider while also connecting to custom MCP servers for specialized capabilities.

The Agentic Vision

Across Day 15, 23, and 25, we've seen EchoKit evolve from a simple chatbot into a true AI agent:

  • Day 15: Connected to external tools via MCP (Tavily search)
  • Day 23: Added DuckDuckGo for privacy-focused web search
  • Day 25: Enabled built-in search from LLM providers

Each approach adds capabilities. Your EchoKit can now:

  • Retrieve real-time information from the web
  • Reason about current events and live data
  • Respond with accurate, up-to-date answers
  • Act on that information (as we saw on Day 24 with Google Calendar)

This is the vision of agentic AI — not just conversation, but action. Not just static knowledge, but real-time information. Not just a chatbot, but a tool that bridges your voice to the entire internet.


Ready to explore more integrations or share your own agent setups?

Ready to get your own EchoKit?

Start building your own voice AI agent today.

Day 24: CES 2026 in Practice — Voice Agents That Act | The First 30 Days with EchoKit

· 6 min read

At CES 2026, the message was clear: Smartphones are so 2025.

The future isn't a bigger or foldable screen. It's AI pendants around your neck, holographic companions like Razer's Project AVA, robot pets that hug back, and always-on voice agents that act without touching any screen.

These aren't just "better assistants." They're proactive voice AI agents that listen, understand context, reason, act, and respond — all hands-free, no phone needed.

EchoKit is the open-source devkit showing how those AI devices work under the hood.

We've been building toward this. On Day 15, we introduced MCP (Model Context Protocol) as EchoKit's gateway to external tools. We showed how to connect to Tavily search. On Day 23, we added DuckDuckGo for real-time web search.

Those were about information — giving your voice agent the ability to retrieve knowledge from the web.

Today is about action.

Today, your EchoKit learns to do things for you. We will show you how to integrate Zapier's Google MCP server and EchoKit to manage your Google Calendar via voice.

Why Action Matters

Imagine this: You're rushing to get ready in the morning, hands full, and you remember you need to schedule a meeting with your team tomorrow at 2 PM.

Without action capability, your EchoKit could say, "You should schedule that meeting when you get to your computer." Helpful, but not helpful enough.

With action capability, you simply say:

"Schedule a team meeting tomorrow at 2 PM for one hour"

And your EchoKit actually does it.

No phone. No computer. No screens. Just voice.

That's the difference between a conversational AI that talks about your schedule and an agentic AI that manages it.

Zapier's Google Calendar MCP Server

For today's integration, we're using Zapier's Google Calendar MCP server. Zapier has built an excellent MCP implementation that provides:

  • Create events — add calendar entries with title, time, and duration
  • List upcoming events — see what's scheduled
  • Search events — find specific appointments
  • Update events — modify existing calendar entries

The Zapier MCP server handles all the OAuth authentication and API details, exposing clean tools that EchoKit can use to take action on your behalf. Remember that EchoKit supports MCP servers in the SSE and HTTP-Streamable mode.

Setting Up Zapier MCP Server

Before configuring EchoKit, you'll need to set up the Zapier MCP server and get your endpoint URL:

  1. Go to zapier.com/mcp** — This is where you manage MCP integrations
  2. Click "+ New MCP Server" — Zapier will walk you through creating the MCP server you want
  3. Click Rotate token to get the MCP server URL — It looks like: `https://mcp.zapier.com/api/v1/connect?token=YOUR_TOKEN``

Keep this URL handy — you'll need it for the next step.

Configure EchoKit for Google Calendar

Now add the Zapier Google Calendar MCP server to your EchoKit config.toml:

[llm]
llm_chat_url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "YOUR_GROQ_API_KEY"
model = "llama-3.3-70b-versatile" # Or any tool-capable model
history = 5

[[llm.mcp_server]]
server = "https://mcp.zapier.com/api/v1/connect?token=YOUR_TOKEN"
type = "http_streamable"
call_mcp_message = "Hold on a second. Let me check your calendar."

Key points:

  • server: Paste the Zapier MCP server endpoint URL you copied above
  • type: or http_streamable for Zapier MCP servers
  • call_mcp_message: What EchoKit says while accessing your calendar

Ask EchoKit: "Schedule a Team Meeting"

Once configured, restart EchoKit server and try a voice command:

User: "Schedule a team meeting tomorrow at 2 PM for one hour"

Under the hood, here's what happens:

  1. LLM parses the request — understands it's a calendar action with time and duration
  2. Tool call initiated — invokes the Google Calendar create_event tool via MCP
  3. Action executed — Zapier adds the event to your Google Calendar
  4. Confirmation returned — EchoKit confirms the action was completed

EchoKit might respond like this:

"Let me check your calendar...

I've scheduled your team meeting for tomorrow at 2 PM. The event will last one hour."

Notice what happened: EchoKit didn't just say something. It did something.

Try It Now

Restart your EchoKit server and test it:

  1. Say: "What's on my calendar today?"
  2. Wait for EchoKit to check
  3. Say: "Schedule a test meeting tomorrow at 10 AM"
  4. Check your Google Calendar — the event should appear, actually created

If it works, you're ready to go. If not, check the troubleshooting section below.

More Voice Commands to Try

Once you have Google Calendar connected, here are some practical voice commands:

  • "What's on my calendar today?" — Get a rundown of your schedule
  • "Schedule a dentist appointment next Tuesday at 3 PM" — Create events with natural language
  • "When is my next meeting?" — Check upcoming events
  • "Block out time for deep work tomorrow morning" — Reserve focused time
  • "Move my team meeting to 3 PM" — Reschedule existing events

The LLM understands natural language timing — "tomorrow morning," "next Tuesday," "in two hours" — and converts it into proper calendar entries.

What makes Zapier's MCP server powerful is that it's not just about calendars. Zapier connects to 5,000+ apps, and through MCP, EchoKit can potentially interact with many of them:

  • Slack — Send messages, check channels
  • Gmail — Compose emails, search inbox
  • Trello/Asana — Create tasks, update boards
  • Notion — Add database entries, create pages
  • GitHub — Create issues, check repositories

Each Zapier integration you enable adds a new action capability to your voice agent.

From Voice to Action

Your EchoKit has evolved through these 24 days:

It started as a conversational AI that could talk with you.

Then it learned to listen and understand intent.

On Day 15 and 23, it learned to search and retrieve information.

Today, it learned to act.

This is the vision of agentic AI — not just conversation, but action. Not just talking about doing things, but actually doing them.

Your EchoKit isn't just answering questions anymore. It's getting things done.


Ready to give your voice agent action capabilities?

Want to get your own EchoKit?

Start building your voice-powered productivity assistant today.

Day 23: Real-Time Web Search with DuckDuckGo MCP | The First 30 Days with EchoKit

· 4 min read

On Day 15, we introduced EchoKit's ability to connect to MCP (Model Context Protocol) servers, which unlocks access to external tools and actions beyond simple conversation. We showed an example using a Tavily-based search MCP server.

Today, we're diving deeper into real-time web search using DuckDuckGo.

Why DuckDuckGo? It's privacy-focused, doesn't require API keys for basic usage, and provides a simple way to bring real-world, up-to-date information into your voice AI conversations.

Why Real-Time Web Search Matters

LLMs have a knowledge cutoff — they only know what they were trained on. Ask about yesterday's news, today's stock prices, or events that happened after the model's training, and they'll simply... not know.

But when you connect EchoKit to a web search MCP server, something magical happens:

  • The LLM recognizes it needs current information
  • It automatically invokes the search tool
  • Results are retrieved from the web in real-time
  • The LLM synthesizes an answer citing actual sources

Suddenly, your EchoKit isn't just a chatbot anymore — it's an AI agent that can access the entire internet through voice.

DuckDuckGo Web Search MCP Server

For today's demo, we're using a DuckDuckGo-based web search MCP server. DuckDuckGo is an excellent choice because:

  • No API key required for basic usage — just point and go
  • Privacy-focused — searches aren't tracked or profiled
  • Open ecosystem — multiple open-source DuckDuckGo MCP implementations exist

The server exposes a simple search tool that queries DuckDuckGo and returns structured results with titles, URLs, and snippets.

DuckDuckGo doesn't provide an official MCP server. You can check out this GitHub repo for more details: https://github.com/nickclyde/duckduckgo-mcp-server

Remember that EchoKit supports MCP server in the SSE and HTTP-Streamable mode.

Add the DuckDuckGo MCP server to your EchoKit config.toml:

[llm]
llm_chat_url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "YOUR_GROQ_API_KEY"
model = "llama-3.3-70b-versatile" # Or any tool-capable model
history = 5

[[llm.mcp_server]]
server = "MCP Endpoint"
type = "http_streamable"
call_mcp_message = "Let me search the web for the latest information."

Key points:

  • server: The DuckDuckGo MCP server endpoint
  • type: http_streamable for streaming responses or SSE are supported
  • call_mcp_message: What EchoKit says while searching (provides feedback during latency)

Ask EchoKit: "What's New in CES 2026?"

Now for the fun part. Restart EchoKit server and ask a question that requires current information:

User: "What's new in CES 2026?"

Under the hood, here's what happens:

  1. LLM recognizes it needs real-time data about CES 2026
  2. Tool call initiated — the LLM invokes the DuckDuckGo search tool via MCP
  3. Search executed — DuckDuckGo queries the web for CES 2026 news
  4. Results returned — titles, URLs, and snippets come back through MCP
  5. Answer synthesized — the LLM processes the results and generates a natural response

EchoKit might respond like this:

"Let me search the web for the latest information...

CES 2026 highlights (as of the first week of the show) ...."

And it would cite the actual sources it found.

Once you have MCP configured, you're not limited to web search. The same protocol lets EchoKit:

  • Manage Google Calendar — add events, check schedules
  • Send messages — Slack, email, Discord
  • Control smart home — Home Assistant integration for lights, AC, security
  • Read and write files — local file system access
  • Run code — execute scripts and return results

Each MCP server adds a new capability. Mix and match to build the agent you need.

Today's DuckDuckGo web search demo shows how EchoKit breaks free from the LLM's training cutoff. It can now:

  • Answer questions about current events
  • Look up live data (sports scores, stock prices, weather)
  • Provide cited information from real sources
  • Act as a research assistant accessible by voice

This is the vision of agentic AI — not just conversation, but action. Not just static knowledge, but real-time information. Not just a chatbot, but a tool that bridges your voice to the entire internet.


Want to explore more MCP integrations or share your own agent setups?

Ready to get your own EchoKit?

Start building your own voice AI agent today.

Day 22: Flashing EchoKit from the Command Line | The First 30 Days with EchoKit

· 5 min read

Yesterday, we covered how to flash EchoKit firmware using the ESP Launchpad web tool. It's simple, browser-based, and works great for most people.

But if you're a developer — or if you've ever had the web flasher fail on you — you might want something more direct.

Today is about flashing EchoKit from the command line.

This approach is faster, gives you more control, and works even in situations where the browser-based tool might struggle. Plus, it feels more... natural for anyone comfortable with a terminal.

Why Flash from the Command Line?

The ESP Launchpad web tool is fantastic for getting started. It removes all friction: no toolchains, no dependencies, just click and flash.

But the command line approach has some real advantages:

  • Speed: Once set up, flashing is significantly faster
  • Reliability: Some USB configurations or systems don't play nicely with the web flasher — the CLI tool often works where the browser fails
  • Automation: If you're flashing multiple devices or setting up a fleet, CLI is scriptable
  • Developer experience: If you're already in the terminal, why leave it?

Best of all? The setup is straightforward if you have Rust installed.

Prerequisites: Rust Toolchain

The espflash tool we'll use is built in Rust. If you already have Rust installed, you can skip this step.

If not, installing Rust is quick:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

This will install Rust, Cargo, and the standard toolchain. Once it's done, restart your terminal or run:

source $HOME/.cargo/env

Install espflash

With Rust ready, install the flashing tools:

cargo install cargo-espflash espflash ldproxy

This will:

  • Install espflash — the actual flashing utility
  • Install cargo-espflash — a cargo helper for building and flashing
  • Install ldproxy — a linker proxy needed for some ESP32 builds

The compilation might take a few minutes. Once it's complete, you'll have the espflash command available globally.

Flashing EchoKit DIY

To flash EchoKit DIY from the command line, follow these steps:

Step 1: Download the Firmware

curl -L -o echokit https://echokit.dev/firmware/echokit_boards

Step 2: Connect Your Device

Use a USB cable to connect your computer to the USB-C port labeled OTG on your EchoKit DIY.

Your computer may prompt you to accept or trust the USB device — accept it.

Step 3: Flash the Firmware

With the device connected, run:

espflash flash --monitor --flash-size 16mb echokit

The flags are:

  • --flash-size 16mb: Sets the flash size for EchoKit DIY
  • --monitor: Keeps the connection open after flashing so you can see the serial output
  • echokit: The firmware file you downloaded

espflash will detect your serial port and ask you to select it if multiple ports are available. Once flashing completes, you'll see the device boot up in the terminal, and the screen will display the QR code.

Flashing EchoKit Box

Flashing EchoKit Box from the command line follows the same process, with just a couple of differences.

Step 1: Download the Firmware

For EchoKit Box, use the Box firmware binary:

curl -L -o echokit https://echokit.dev/firmware/echokit_box

Step 2: Connect Your Device

Use a USB cable to connect your computer to the USB-C port labeled SLAVE on your EchoKit Box.

Your computer may prompt you to accept or trust the USB device — accept it.

Step 3: Flash the Firmware

The command is identical to the DIY version:

espflash flash --monitor --flash-size 16mb echokit

espflash will detect your EchoKit Box, flash the firmware, and monitor the serial output. When it's done, the device will reboot and display the QR code on screen.

Troubleshooting

If something doesn't work, here are a few things to try:

  • Try the other USB port: On EchoKit DIY, if flashing fails on the OTG port, try the TTL port instead. Sometimes the USB data connection behaves differently on each port.
  • Force a reset: If the device isn't detected, press the RST button to reset it, then immediately run the flash command again.
  • Check USB permissions: On Linux, you might need to add your user to the dialout group or adjust udev rules for USB serial devices.

Both Approaches Have a Place

After yesterday's browser-based flashing and today's CLI approach, you now have two ways to keep your EchoKit firmware up to date:

  • ESP Launchpad (browser): Great for beginners, quick updates, or when you're already in a GUI
  • espflash (CLI): Faster, more reliable in tricky environments, and perfect for developers

Neither is "better" — they're different tools for different situations.

The important thing is that you're comfortable updating your firmware. EchoKit is an active, evolving project. New features land regularly. Being able to flash confidently — whether via browser or terminal — means you can stay current with the latest improvements.


Want to get your own EchoKit device and start building?

Join the EchoKit Discord to share your setup and see what others are building with their voice AI agents.

Day 21: Flashing EchoKit DIY and EchoKit Box from the Browser | The First 30 Days with EchoKit

· 3 min read

Over the last 20 days, we’ve been building EchoKit step by step — from voice pipelines and local models to MCP tools and personalities.

Today, I want to focus on something more foundational:

firmware.

In this post, we’ll walk through how to flash EchoKit firmware using the ESP Launchpad web tool. This approach is direct, dependency-free, and works entirely from the browser.

Want to learn to flash via the command line? We will talk about it tomorrow.

EchoKit Firmware Is Open Source — and Always Moving

EchoKit’s firmware is fully open source. The code is public, changes are visible, and improvements land continuously. Bugs are fixed in the open, and new capabilities are added incrementally.

Because of this, the firmware repository doesn’t stand still. As EchoKit evolves, the firmware evolves with it — whether that’s protocol adjustments, performance improvements, new device capabilities, or better defaults.

This means EchoKit is not a “flash once and forget” system.

You should expect to refresh the firmware from time to time. More importantly, you should feel comfortable doing so.


Flashing with ESP Launchpad

ESP Launchpad allows you to flash prebuilt EchoKit firmware directly from a browser, with no local toolchains or dependencies to install.

You can open the ESP Launchpad page here, which is preconfigured with EchoKit firmware profiles:

https://espressif.github.io/esp-launchpad/?flashConfigURL=https://echokit.dev/firmware/echokit.toml

The flashing process is exactly the same for EchoKit Box and EchoKit DIY. The only difference is which firmware profile you select.

EchoKit Box

To flash EchoKit Box, open the flashing page, connect the device to your computer via USB, select EchoKit Box, and click Flash button.

The flashing process takes a few minutes. Once it completes successfully, the page will prompt you to reset the device. After rebooting, you’ll see the QR code screen, which indicates the firmware is ready. That’s it.

EchoKit DIY

EchoKit DIY uses the exact same flashing flow.

The only difference is the firmware profile you choose. On the same flashing page, connect your DIY device via USB, select EchoKit DIY, and click Flash button.

Everything else is identical: the flashing takes a few minutes, you reset the device when prompted, and the QR code appears after reboot.

Once flashing is complete, move on to the next step: connecting the EchoKit server and your device so EchoKit can talk to you.

Why Firmware Refreshing Is Important

Many products try to hide firmware updates as much as possible.

EchoKit does the opposite.

EchoKit is an open system. You’re encouraged to explore it, modify it, and keep it up to date. Firmware updates are a normal part of the workflow, not an exception.

Using a browser-based flasher removes most of the friction. There are no toolchains to install, no OS-specific instructions, and no dependency management. This makes firmware updates accessible even to non-programmers.

Day 20: Running GPT-SoVITS Locally as EchoKit’s TTS Provider | The First 30 Days with EchoKit

· 4 min read

Over the past few days, we’ve been switching EchoKit between different cloud-based TTS providers and voice styles. It’s fun, it’s flexible, and it really shows how modular the EchoKit pipeline is.

But today, I want to go one step further.

Today is about running TTS fully locally. No hosted APIs. No external requests. Just an open-source model running on your own machine — and EchoKit talking through it.

For Day 20, I’m using GPT-SoVITS as EchoKit’s local TTS provider.

What Is GPT-SoVITS?

GPT-SoVITS is an open-source text-to-speech and voice cloning system that combines:

  • A GPT-style text encoder for linguistic understanding
  • SoVITS-based voice synthesis for natural prosody and timbre

Compared to traditional TTS systems, GPT-SoVITS stands out for two reasons.

First, it produces very natural, expressive speech, especially for longer sentences and conversational content.

Second, it supports high-quality voice cloning with relatively small reference audio, which has made it popular in open-source voice communities.

Most importantly for us: GPT-SoVITS can run entirely on your own hardware.

Running GPT-SoVITS Locally

To make local GPT-SoVITS easier to run, we also ported GPT-SoVITS to a Rust-based implementation.

This significantly simplifies local deployment and makes it much easier to integrate with EchoKit.

Check out Build and run a GPT-SoVITS server for details. The following steps are on a MacBook

First, install the LibTorch dependencies:

curl -LO https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.4.0.zip
unzip libtorch-macos-arm64-2.4.0.zip

Then, tell the system where to find LibTorch:

export DYLD_LIBRARY_PATH=$(pwd)/libtorch/lib:$DYLD_LIBRARY_PATH
export LIBTORCH=$(pwd)/libtorch

Next, clone the source code and build the GPT-SoVITS API server:

git clone https://github.com/second-state/gsv_tts
git clone https://github.com/second-state/gpt_sovits_rs

cd gsv_tts
cargo build --release

Then, download the required models. Since I’m running GPT-SoVITS locally on my MacBook, I’m using the CPU versions:

cd resources
curl -L -o t2s.pt https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/t2s.cpu.pt
curl -L -o vits.pt https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/vits.cpu.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/ssl_model.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/bert_model.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/g2pw_model.pt
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/mini-bart-g2p.pt

Finally, start the GPT-SoVITS API server:

TTS_LISTEN=0.0.0.0:9094 nohup target/release/gsv_tts &

Configure EchoKit to Use the Local TTS Provider

At this point, GPT-SoVITS is running as a local service and exposing a simple HTTP API.

Once the service is up, EchoKit only needs an endpoint that accepts text and returns audio.

Update the TTS section in the EchoKit server configuration:

[tts]
platform = "StreamGSV"
url = "http://localhost:9094/v1/audio/stream_speech"
speaker = "cooper"

Restart the EchoKit server, connect the service to the device, and EchoKit will start using the new local TTS provider.

A Fully Local Voice AI Pipeline

With today’s setup, we can now run the entire voice AI pipeline locally:

  • ASR: local speech-to-text
  • LLM: local open-source language models
  • TTS: GPT-SoVITS running on your own machine

That means:

  • No cloud dependency
  • No external APIs
  • No vendor lock-in

Just a complete, end-to-end voice AI system you can understand, modify, and truly own.


Want to get your own EchoKit device and make it unique?

Join the EchoKit Discord to share your custom voices and see how others are personalizing their voice AI agents.

Day 19: Switching EchoKit’s TTS Provider to Fish.audio| The First 30 Days with EchoKit

· 2 min read

Over the past few days, we’ve been iterating on different parts of EchoKit’s voice pipeline — ASR, LLMs, system prompts, and TTS (including ElevenLabs and Groq).

On Day 19, we switch EchoKit’s Text-to-Speech provider to Fish.audio, purely through a configuration change.

No code changes are required.

What Is Fish.audio

Fish.audio is a modern text-to-speech platform focused on high-quality, expressive voices and fast iteration for developers.

One notable aspect of Fish.audio is the breadth of available voices. It offers a wide range of voice styles, including voices inspired by public figures, pop culture, and anime culture references, which makes it easy to experiment with playful or character-driven agents.

In addition to preset voices, Fish.audio also supports voice cloning, allowing developers to generate speech in a customized voice when needed.

These features make it particularly interesting for conversational and personality-driven voice AI systems.

EchoKit is designed to be provider-agnostic. As long as a TTS service matches the expected interface, it can be plugged into the system without affecting the rest of the pipeline.

The Exact Change in config.toml

Switching to Fish.audio in EchoKit only requires updating the TTS section in the config.toml file:

[tts]
platform = "fish"
speaker = "03397b4c4be74759b72533b663fbd001"
api_key = "YOUR_FISH_AUDIO_API_KEY"

A brief explanation of each field:

  • platform set to "fish" tells EchoKit to use Fish.audio as the TTS provider.
  • speaker specifies the TTS model ID, which can be obtained from the Fish.audio model detail page.
  • api_key is the API key used to authenticate with the Fish.audio service.

After restarting the EchoKit server and reconnecting the device, all voice output is generated by Fish.audio.

Everything else remains unchanged:

  • ASR stays the same
  • The LLM and system prompts stay the same
  • Conversation flow and tool calls stay the same

With Fish.audio added to the list of supported TTS providers, EchoKit’s voice layer becomes even more flexible — making it easier to experiment with different voices without reworking the system.


Want to get your own EchoKit device and make it unique?

Join the EchoKit Discord to share your welcome voices and see how others are personalizing their voice AI agents!