agent-cli is a collection of local-first, AI-powered command-line agents that run entirely on your machine.
It provides a suite of powerful tools for voice and text interaction, designed for privacy, offline capability, and seamless integration with system-wide hotkeys and workflows.
Important
Local and Private by Design All agents in this tool are designed to run 100% locally. Your data, whether it's from your clipboard, microphone, or files, is never sent to any cloud API. This ensures your privacy and allows the tools to work completely offline. You can also optionally configure the agents to use OpenAI/Gemini services.
I got tired of typing long prompts to LLMs. Speaking is faster, so I built this tool to transcribe my voice directly to the clipboard with a hotkey.
What it does:
- Voice transcription to clipboard with system-wide hotkeys (Cmd+Shift+R on macOS)
- Autocorrect any text from your clipboard
- Edit clipboard content with voice commands ("make this more formal")
- Runs locally - no internet required, your audio stays on your machine
- Works with any app that can copy/paste
I use it mostly for the transcribe function when working with LLMs. Being able to speak naturally means I can provide more context without the typing fatigue.
See agent-cli in action: Watch the demo
autocorrect: Correct grammar and spelling in your text (e.g., from clipboard) using a local LLM with Ollama or OpenAI.transcribe: Transcribe audio from your microphone to text in your clipboard using a local Whisper model or OpenAI's Whisper API.speak: Convert text to speech using Piper HTTP server, Wyoming TTS, OpenAI, or Kokoro TTS.voice-edit: A voice-powered clipboard assistant that edits text based on your spoken commands.assistant: A hands-free voice assistant that starts and stops recording based on a wake word.chat: A conversational AI agent with tool-calling capabilities.
If you already have AI services running (or plan to use OpenAI), simply install:
# Using uv (recommended)
uv tool install agent-cli
# Using pip
pip install agent-cliThen use it:
agent-cli autocorrect "this has an eror"We offer two ways to set up agent-cli with all services:
# 1. Clone the repository
git clone https://github.com/basnijholt/agent-cli.git
cd agent-cli
# 2. Run setup (installs all services + agent-cli)
./scripts/setup-macos.sh # or setup-linux.sh
# 3. Start services
./scripts/start-all-services.sh
# 4. (Optional) Set up system-wide hotkeys
./scripts/setup-macos-hotkeys.sh # or setup-linux-hotkeys.sh
# 5. Use it!
agent-cli autocorrect "this has an eror"# 1. Install agent-cli
uv tool install agent-cli
# 2. Install all required services
agent-cli install-services
# 3. Start all services
agent-cli start-services
# 4. (Optional) Set up system-wide hotkeys
agent-cli install-hotkeys
# 5. Use it!
agent-cli autocorrect "this has an eror"The setup scripts automatically install:
- โ Package managers (Homebrew/uv) if needed
- โ All AI services (Ollama, Whisper, TTS, etc.)
- โ
The
agent-clitool - โ System dependencies
- โ Hotkey managers (if using hotkey scripts)
If you already have AI services set up or plan to use cloud services (OpenAI/Gemini):
# Using uv (recommended)
uv tool install agent-cli
# Using pip
pip install agent-cliFor a complete local setup with all AI services:
git clone https://github.com/basnijholt/agent-cli.git
cd agent-cli| Platform | Setup Command | What It Does | Detailed Guide |
|---|---|---|---|
| ๐ macOS | ./scripts/setup-macos.sh |
Installs Homebrew (if needed), uv, Ollama, all services, and agent-cli | macOS Guide |
| ๐ง Linux | ./scripts/setup-linux.sh |
Installs uv, Ollama, all services, and agent-cli | Linux Guide |
| โ๏ธ NixOS | See guide โ | Special instructions for NixOS | NixOS Guide |
| ๐ณ Docker | See guide โ | Container-based setup (slower) | Docker Guide |
./scripts/start-all-services.shThis launches all AI services in a single terminal session using Zellij.
agent-cli autocorrect "this has an eror"
# Output: this has an errorNote
The setup scripts handle everything automatically. For platform-specific details or troubleshooting, see the installation guides.
Development Installation
For contributing or development:
git clone https://github.com/basnijholt/agent-cli.git
cd agent-cli
uv sync
source .venv/bin/activate # On Windows: .venv\Scripts\activateWant system-wide hotkeys? You'll need the repository for the setup scripts:
# If you haven't already cloned it
git clone https://github.com/basnijholt/agent-cli.git
cd agent-cli./scripts/setup-macos-hotkeys.shThis script automatically:
- โ Installs Homebrew if not present
- โ Installs skhd (hotkey daemon) and terminal-notifier
- โ
Configures these system-wide hotkeys:
Cmd+Shift+R- Toggle voice transcriptionCmd+Shift+A- Autocorrect clipboard textCmd+Shift+V- Voice edit clipboard text
Note
After setup, you may need to grant Accessibility permissions to skhd in System Settings โ Privacy & Security โ Accessibility
./scripts/setup-linux-hotkeys.shThis script automatically:
- โ Installs notification tools if needed
- โ Provides configuration for your desktop environment
- โ
Sets up these hotkeys:
Super+Shift+R- Toggle voice transcriptionSuper+Shift+A- Autocorrect clipboard textSuper+Shift+V- Voice edit clipboard text
The script supports Hyprland, GNOME, KDE, Sway, i3, XFCE, and provides instructions for manual configuration on other environments.
The only thing you need to have installed is Git to clone this repository. Everything else is handled automatically!
Our installation scripts automatically handle all dependencies:
- ๐บ Homebrew (macOS) - Installed if not present
- ๐ uv - Python package manager - Installed automatically
- ๐ถ PortAudio - For microphone and speaker I/O - Installed via package manager
- ๐ Clipboard Tools - Pre-installed on macOS, handled on Linux
| Service | Purpose | Auto-installed? |
|---|---|---|
| Ollama | Local LLM for text processing | โ Yes, with default model |
| Wyoming Faster Whisper | Speech-to-text | โ
Yes, via uvx |
| Piper TTS | Text-to-speech (HTTP server) | โ
Yes, via uvx |
| Wyoming Piper | Text-to-speech (Wyoming protocol) | โ๏ธ Alternative to HTTP server |
Note
TTS Provider Update: The default Piper TTS now uses HTTP server mode for better performance.
Scripts have been renamed: run-piper.sh โ run-piper-wyoming.sh, run-piper2.sh โ run-piper-server.sh.
Use --tts-provider piper for the new HTTP server, or --tts-provider local for Wyoming protocol.
| Kokoro-FastAPI | Premium TTS (optional) | โ๏ธ Can be added later |
| Wyoming openWakeWord | Wake word detection | โ
Yes, for assistant |
If you prefer cloud services over local ones:
| Service | Purpose | Setup Required |
|---|---|---|
| OpenAI | LLM, Speech-to-text, TTS | API key in config |
| Gemini | LLM alternative | API key in config |
This package provides multiple command-line tools, each designed for a specific purpose.
These commands help you set up agent-cli and its required services:
install-services: Install all required AI services (Ollama, Whisper, Piper, OpenWakeWord)install-hotkeys: Set up system-wide hotkeys for quick access to agent-cli featuresstart-services: Start all services in a Zellij terminal session
All necessary scripts are bundled with the package, so you can run these commands immediately after installing agent-cli.
All agent-cli commands can be configured using a TOML file. The configuration file is searched for in the following locations, in order:
./agent-cli-config.toml(in the current directory)~/.config/agent-cli/config.toml
You can also specify a path to a configuration file using the --config option, e.g., agent-cli transcribe --config /path/to/your/config.toml.
Command-line options always take precedence over settings in the configuration file.
An example configuration file is provided in example.agent-cli-config.toml.
You can choose to use local services (Wyoming/Ollama) or OpenAI services by setting the service_provider option in the [defaults] section of your configuration file.
[defaults]
# service_provider = "openai" # 'local' or 'openai'
# openai_api_key = "sk-..."Purpose: Quickly fix spelling and grammar in any text you've copied.
Workflow: This is a simple, one-shot command.
- It reads text from your system clipboard (or from a direct argument).
- It sends the text to a local Ollama LLM with a prompt to perform only technical corrections.
- The corrected text is copied back to your clipboard, replacing the original.
How to Use It: This tool is ideal for integrating with a system-wide hotkey.
- From Clipboard:
agent-cli autocorrect - From Argument:
agent-cli autocorrect "this text has an eror"
See the output of agent-cli autocorrect --help
Usage: agent-cli autocorrect [OPTIONS] [TEXT]
Correct text from clipboard using a local or remote LLM.
โญโ General Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ text [TEXT] The text to correct. If not provided, reads from clipboard. โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Provider Selection โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-provider TEXT The LLM provider to use ('local' for Ollama, 'openai', โ
โ 'gemini'). โ
โ [default: local] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: Ollama (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-ollama-model TEXT The Ollama model to use. Default is qwen3:4b. โ
โ [default: qwen3:4b] โ
โ --llm-ollama-host TEXT The Ollama server host. Default is โ
โ http://localhost:11434. โ
โ [default: http://localhost:11434] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. โ
โ [default: gpt-4o-mini] โ
โ --openai-api-key TEXT Your OpenAI API key. Can also be set with the โ
โ OPENAI_API_KEY environment variable. โ
โ [env var: OPENAI_API_KEY] โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: Gemini โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. โ
โ [default: gemini-2.5-flash] โ
โ --gemini-api-key TEXT Your Gemini API key. Can also be set with the โ
โ GEMINI_API_KEY environment variable. โ
โ [env var: GEMINI_API_KEY] โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ General Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --log-level TEXT Set logging level. [default: WARNING] โ
โ --log-file TEXT Path to a file to write logs to. [default: None] โ
โ --quiet -q Suppress console output from rich. โ
โ --config TEXT Path to a TOML configuration file. [default: None] โ
โ --print-args Print the command line arguments, including variables โ
โ taken from the configuration file. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Purpose: A simple tool to turn your speech into text.
Workflow: This agent listens to your microphone and converts your speech to text in real-time.
- Run the command. It will start listening immediately.
- Speak into your microphone.
- Press
Ctrl+Cto stop recording. - The transcribed text is copied to your clipboard.
- Optionally, use the
--llmflag to have an Ollama model clean up the raw transcript (fixing punctuation, etc.).
How to Use It:
- Simple Transcription:
agent-cli transcribe --input-device-index 1 - With LLM Cleanup:
agent-cli transcribe --input-device-index 1 --llm
See the output of agent-cli transcribe --help
Usage: agent-cli transcribe [OPTIONS]
Wyoming ASR Client for streaming microphone audio to a transcription server.
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --extra-instructions TEXT Additional instructions for the LLM to process the โ
โ transcription. โ
โ [default: None] โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Provider Selection โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --asr-provider TEXT The ASR provider to use ('local' for Wyoming, 'openai'). โ
โ [default: local] โ
โ --llm-provider TEXT The LLM provider to use ('local' for Ollama, 'openai', โ
โ 'gemini'). โ
โ [default: local] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --input-device-index INTEGER Index of the PyAudio input device to use. โ
โ [default: None] โ
โ --input-device-name TEXT Device name keywords for partial matching. โ
โ [default: None] โ
โ --list-devices List available audio input and output devices and โ
โ exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration: Wyoming (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --asr-wyoming-ip TEXT Wyoming ASR server IP address. [default: localhost] โ
โ --asr-wyoming-port INTEGER Wyoming ASR server port. [default: 10300] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). โ
โ [default: whisper-1] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: Ollama (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-ollama-model TEXT The Ollama model to use. Default is qwen3:4b. โ
โ [default: qwen3:4b] โ
โ --llm-ollama-host TEXT The Ollama server host. Default is โ
โ http://localhost:11434. โ
โ [default: http://localhost:11434] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. โ
โ [default: gpt-4o-mini] โ
โ --openai-api-key TEXT Your OpenAI API key. Can also be set with the โ
โ OPENAI_API_KEY environment variable. โ
โ [env var: OPENAI_API_KEY] โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: Gemini โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. โ
โ [default: gemini-2.5-flash] โ
โ --gemini-api-key TEXT Your Gemini API key. Can also be set with the โ
โ GEMINI_API_KEY environment variable. โ
โ [env var: GEMINI_API_KEY] โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm --no-llm Use an LLM to process the transcript. [default: no-llm] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Process Management Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --stop Stop any running background process. โ
โ --status Check if a background process is running. โ
โ --toggle Toggle the background process on/off. If the process is running, it โ
โ will be stopped. If the process is not running, it will be started. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ General Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --clipboard --no-clipboard Copy result to clipboard. โ
โ [default: clipboard] โ
โ --log-level TEXT Set logging level. [default: WARNING] โ
โ --log-file TEXT Path to a file to write logs to. โ
โ [default: None] โ
โ --quiet -q Suppress console output from rich. โ
โ --config TEXT Path to a TOML configuration file. โ
โ [default: None] โ
โ --print-args Print the command line arguments, โ
โ including variables taken from the โ
โ configuration file. โ
โ --transcription-log PATH Path to log transcription results โ
โ with timestamps, hostname, model, and โ
โ raw output. โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Purpose: Reads any text out loud.
Workflow: A straightforward text-to-speech utility.
- It takes text from a command-line argument or your clipboard.
- It sends the text to a TTS server (Piper HTTP server, Wyoming TTS, OpenAI, or Kokoro).
- The generated audio is played through your default speakers.
TTS Provider Options:
- Piper HTTP Server (default local): Fast, high-quality TTS via HTTP
- Start server:
./scripts/run-piper-server.sh - Use:
agent-cli speak --tts-provider piper "Hello, world!"
- Start server:
- Wyoming Piper: Alternative Wyoming protocol interface
- Start server:
./scripts/run-piper-wyoming.sh - Use:
agent-cli speak --tts-provider local "Hello, world!"
- Start server:
- OpenAI: Cloud-based TTS (requires API key)
- Use:
agent-cli speak --tts-provider openai "Hello, world!"
- Use:
- Kokoro: High-quality local TTS (optional setup)
- Use:
agent-cli speak --tts-provider kokoro "Hello, world!"
- Use:
How to Use It:
- Speak from Argument:
agent-cli speak "Hello, world!" - Speak from Clipboard:
agent-cli speak - Save to File:
agent-cli speak "Hello" --save-file hello.wav - With Piper HTTP:
agent-cli speak --tts-provider piper "Hello"
See the output of agent-cli speak --help
Usage: agent-cli speak [OPTIONS] [TEXT]
Convert text to speech using Wyoming or OpenAI TTS server.
โญโ General Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ text [TEXT] Text to speak. Reads from clipboard if not provided. โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Provider Selection โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-provider TEXT The TTS provider to use ('local' for Wyoming, 'openai', โ
โ 'kokoro'). โ
โ [default: local] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --output-device-index INTEGER Index of the PyAudio output device to use for โ
โ TTS. โ
โ [default: None] โ
โ --output-device-name TEXT Output device name keywords for partial โ
โ matching. โ
โ [default: None] โ
โ --tts-speed FLOAT Speech speed multiplier (1.0 = normal, 2.0 = โ
โ twice as fast, 0.5 = half speed). โ
โ [default: 1.0] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: Wyoming (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-wyoming-ip TEXT Wyoming TTS server IP address. โ
โ [default: localhost] โ
โ --tts-wyoming-port INTEGER Wyoming TTS server port. [default: 10200] โ
โ --tts-wyoming-voice TEXT Voice name to use for Wyoming TTS (e.g., โ
โ 'en_US-lessac-medium'). โ
โ [default: None] โ
โ --tts-wyoming-language TEXT Language for Wyoming TTS (e.g., 'en_US'). โ
โ [default: None] โ
โ --tts-wyoming-speaker TEXT Speaker name for Wyoming TTS voice. โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-openai-model TEXT The OpenAI model to use for TTS. [default: tts-1] โ
โ --tts-openai-voice TEXT The voice to use for OpenAI TTS. [default: alloy] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: Kokoro โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-kokoro-model TEXT The Kokoro model to use for TTS. [default: kokoro] โ
โ --tts-kokoro-voice TEXT The voice to use for Kokoro TTS. [default: af_sky] โ
โ --tts-kokoro-host TEXT The base URL for the Kokoro API. โ
โ [default: http://localhost:8880/v1] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: Piper โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-piper-host TEXT The base URL for the Piper HTTP server. โ
โ [default: http://localhost:10200] โ
โ --tts-piper-voice TEXT The voice to use for Piper TTS (optional). โ
โ [default: None] โ
โ --tts-piper-speaker TEXT The speaker to use for multi-speaker voices โ
โ (optional). โ
โ [default: None] โ
โ --tts-piper-speaker-id INTEGER The speaker ID to use for multi-speaker โ
โ voices (optional, overrides speaker). โ
โ [default: None] โ
โ --tts-piper-length-scale FLOAT Speaking speed (1.0 = normal speed). โ
โ [default: 1.0] โ
โ --tts-piper-noise-scale FLOAT Speaking variability (optional). โ
โ [default: None] โ
โ --tts-piper-noise-w-scale FLOAT Phoneme width variability (optional). โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --list-devices List available audio input and output devices and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ General Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --save-file PATH Save TTS response audio to WAV file. [default: None] โ
โ --log-level TEXT Set logging level. [default: WARNING] โ
โ --log-file TEXT Path to a file to write logs to. [default: None] โ
โ --quiet -q Suppress console output from rich. โ
โ --config TEXT Path to a TOML configuration file. [default: None] โ
โ --print-args Print the command line arguments, including variables โ
โ taken from the configuration file. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Process Management Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --stop Stop any running background process. โ
โ --status Check if a background process is running. โ
โ --toggle Toggle the background process on/off. If the process is running, it โ
โ will be stopped. If the process is not running, it will be started. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Purpose: A powerful clipboard assistant that you command with your voice.
Workflow: This agent is designed for a hotkey-driven workflow to act on text you've already copied.
- Copy a block of text to your clipboard (e.g., an email draft).
- Press a hotkey to run
agent-cli voice-edit &in the background. The agent is now listening. - Speak a command, such as "Make this more formal" or "Summarize the key points."
- Press the same hotkey again, which should trigger
agent-cli voice-edit --stop. - The agent transcribes your command, sends it along with the original clipboard text to the LLM, and the LLM performs the action.
- The result is copied back to your clipboard. If
--ttsis enabled, it will also speak the result.
How to Use It: The power of this tool is unlocked with a hotkey manager like Keyboard Maestro (macOS) or AutoHotkey (Windows). See the docstring in agent_cli/agents/voice_edit.py for a detailed Keyboard Maestro setup guide.
See the output of agent-cli voice-edit --help
Usage: agent-cli voice-edit [OPTIONS]
Interact with clipboard text via a voice command using local or remote services.
Usage: - Run in foreground: agent-cli voice-edit --input-device-index 1 - Run in
background: agent-cli voice-edit --input-device-index 1 & - Check status: agent-cli
voice-edit --status - Stop background process: agent-cli voice-edit --stop - List output
devices: agent-cli voice-edit --list-output-devices - Save TTS to file: agent-cli
voice-edit --tts --save-file response.wav
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Provider Selection โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --asr-provider TEXT The ASR provider to use ('local' for Wyoming, 'openai'). โ
โ [default: local] โ
โ --llm-provider TEXT The LLM provider to use ('local' for Ollama, 'openai', โ
โ 'gemini'). โ
โ [default: local] โ
โ --tts-provider TEXT The TTS provider to use ('local' for Wyoming, 'openai', โ
โ 'kokoro'). โ
โ [default: local] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --input-device-index INTEGER Index of the PyAudio input device to use. โ
โ [default: None] โ
โ --input-device-name TEXT Device name keywords for partial matching. โ
โ [default: None] โ
โ --list-devices List available audio input and output devices and โ
โ exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration: Wyoming (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --asr-wyoming-ip TEXT Wyoming ASR server IP address. [default: localhost] โ
โ --asr-wyoming-port INTEGER Wyoming ASR server port. [default: 10300] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). โ
โ [default: whisper-1] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: Ollama (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-ollama-model TEXT The Ollama model to use. Default is qwen3:4b. โ
โ [default: qwen3:4b] โ
โ --llm-ollama-host TEXT The Ollama server host. Default is โ
โ http://localhost:11434. โ
โ [default: http://localhost:11434] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. โ
โ [default: gpt-4o-mini] โ
โ --openai-api-key TEXT Your OpenAI API key. Can also be set with the โ
โ OPENAI_API_KEY environment variable. โ
โ [env var: OPENAI_API_KEY] โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: Gemini โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. โ
โ [default: gemini-2.5-flash] โ
โ --gemini-api-key TEXT Your Gemini API key. Can also be set with the โ
โ GEMINI_API_KEY environment variable. โ
โ [env var: GEMINI_API_KEY] โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts --no-tts Enable text-to-speech for responses. โ
โ [default: no-tts] โ
โ --output-device-index INTEGER Index of the PyAudio output device to โ
โ use for TTS. โ
โ [default: None] โ
โ --output-device-name TEXT Output device name keywords for partial โ
โ matching. โ
โ [default: None] โ
โ --tts-speed FLOAT Speech speed multiplier (1.0 = normal, โ
โ 2.0 = twice as fast, 0.5 = half speed). โ
โ [default: 1.0] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: Wyoming (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-wyoming-ip TEXT Wyoming TTS server IP address. โ
โ [default: localhost] โ
โ --tts-wyoming-port INTEGER Wyoming TTS server port. [default: 10200] โ
โ --tts-wyoming-voice TEXT Voice name to use for Wyoming TTS (e.g., โ
โ 'en_US-lessac-medium'). โ
โ [default: None] โ
โ --tts-wyoming-language TEXT Language for Wyoming TTS (e.g., 'en_US'). โ
โ [default: None] โ
โ --tts-wyoming-speaker TEXT Speaker name for Wyoming TTS voice. โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-openai-model TEXT The OpenAI model to use for TTS. [default: tts-1] โ
โ --tts-openai-voice TEXT The voice to use for OpenAI TTS. [default: alloy] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: Kokoro โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-kokoro-model TEXT The Kokoro model to use for TTS. [default: kokoro] โ
โ --tts-kokoro-voice TEXT The voice to use for Kokoro TTS. [default: af_sky] โ
โ --tts-kokoro-host TEXT The base URL for the Kokoro API. โ
โ [default: http://localhost:8880/v1] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: Piper โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-piper-host TEXT The base URL for the Piper HTTP server. โ
โ [default: http://localhost:10200] โ
โ --tts-piper-voice TEXT The voice to use for Piper TTS (optional). โ
โ [default: None] โ
โ --tts-piper-speaker TEXT The speaker to use for multi-speaker voices โ
โ (optional). โ
โ [default: None] โ
โ --tts-piper-speaker-id INTEGER The speaker ID to use for multi-speaker โ
โ voices (optional, overrides speaker). โ
โ [default: None] โ
โ --tts-piper-length-scale FLOAT Speaking speed (1.0 = normal speed). โ
โ [default: 1.0] โ
โ --tts-piper-noise-scale FLOAT Speaking variability (optional). โ
โ [default: None] โ
โ --tts-piper-noise-w-scale FLOAT Phoneme width variability (optional). โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Process Management Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --stop Stop any running background process. โ
โ --status Check if a background process is running. โ
โ --toggle Toggle the background process on/off. If the process is running, it โ
โ will be stopped. If the process is not running, it will be started. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ General Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --save-file PATH Save TTS response audio to WAV file. โ
โ [default: None] โ
โ --clipboard --no-clipboard Copy result to clipboard. โ
โ [default: clipboard] โ
โ --log-level TEXT Set logging level. [default: WARNING] โ
โ --log-file TEXT Path to a file to write logs to. โ
โ [default: None] โ
โ --quiet -q Suppress console output from rich. โ
โ --config TEXT Path to a TOML configuration file. โ
โ [default: None] โ
โ --print-args Print the command line arguments, including โ
โ variables taken from the configuration file. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Purpose: A hands-free voice assistant that starts and stops recording based on a wake word.
Workflow: This agent continuously listens for a wake word (e.g., "Hey Nabu").
- Run the
assistantcommand. It will start listening for the wake word. - Say the wake word to start recording.
- Speak your command or question.
- Say the wake word again to stop recording.
- The agent transcribes your speech, sends it to the LLM, and gets a response.
- The agent speaks the response back to you and then immediately starts listening for the wake word again.
How to Use It:
- Start the agent:
agent-cli assistant --wake-word "ok_nabu" --input-device-index 1 - With TTS:
agent-cli assistant --wake-word "ok_nabu" --tts --voice "en_US-lessac-medium"
See the output of agent-cli assistant --help
Usage: agent-cli assistant [OPTIONS]
Wake word-based voice assistant using local or remote services.
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Provider Selection โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --asr-provider TEXT The ASR provider to use ('local' for Wyoming, 'openai'). โ
โ [default: local] โ
โ --llm-provider TEXT The LLM provider to use ('local' for Ollama, 'openai', โ
โ 'gemini'). โ
โ [default: local] โ
โ --tts-provider TEXT The TTS provider to use ('local' for Wyoming, 'openai', โ
โ 'kokoro'). โ
โ [default: local] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Wake Word Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --wake-server-ip TEXT Wyoming wake word server IP address. โ
โ [default: localhost] โ
โ --wake-server-port INTEGER Wyoming wake word server port. [default: 10400] โ
โ --wake-word TEXT Name of wake word to detect (e.g., 'ok_nabu', โ
โ 'hey_jarvis'). โ
โ [default: ok_nabu] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --input-device-index INTEGER Index of the PyAudio input device to use. โ
โ [default: None] โ
โ --input-device-name TEXT Device name keywords for partial matching. โ
โ [default: None] โ
โ --list-devices List available audio input and output devices and โ
โ exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration: Wyoming (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --asr-wyoming-ip TEXT Wyoming ASR server IP address. [default: localhost] โ
โ --asr-wyoming-port INTEGER Wyoming ASR server port. [default: 10300] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). โ
โ [default: whisper-1] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: Ollama (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-ollama-model TEXT The Ollama model to use. Default is qwen3:4b. โ
โ [default: qwen3:4b] โ
โ --llm-ollama-host TEXT The Ollama server host. Default is โ
โ http://localhost:11434. โ
โ [default: http://localhost:11434] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. โ
โ [default: gpt-4o-mini] โ
โ --openai-api-key TEXT Your OpenAI API key. Can also be set with the โ
โ OPENAI_API_KEY environment variable. โ
โ [env var: OPENAI_API_KEY] โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: Gemini โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. โ
โ [default: gemini-2.5-flash] โ
โ --gemini-api-key TEXT Your Gemini API key. Can also be set with the โ
โ GEMINI_API_KEY environment variable. โ
โ [env var: GEMINI_API_KEY] โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts --no-tts Enable text-to-speech for responses. โ
โ [default: no-tts] โ
โ --output-device-index INTEGER Index of the PyAudio output device to โ
โ use for TTS. โ
โ [default: None] โ
โ --output-device-name TEXT Output device name keywords for partial โ
โ matching. โ
โ [default: None] โ
โ --tts-speed FLOAT Speech speed multiplier (1.0 = normal, โ
โ 2.0 = twice as fast, 0.5 = half speed). โ
โ [default: 1.0] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: Wyoming (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-wyoming-ip TEXT Wyoming TTS server IP address. โ
โ [default: localhost] โ
โ --tts-wyoming-port INTEGER Wyoming TTS server port. [default: 10200] โ
โ --tts-wyoming-voice TEXT Voice name to use for Wyoming TTS (e.g., โ
โ 'en_US-lessac-medium'). โ
โ [default: None] โ
โ --tts-wyoming-language TEXT Language for Wyoming TTS (e.g., 'en_US'). โ
โ [default: None] โ
โ --tts-wyoming-speaker TEXT Speaker name for Wyoming TTS voice. โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-openai-model TEXT The OpenAI model to use for TTS. [default: tts-1] โ
โ --tts-openai-voice TEXT The voice to use for OpenAI TTS. [default: alloy] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: Kokoro โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-kokoro-model TEXT The Kokoro model to use for TTS. [default: kokoro] โ
โ --tts-kokoro-voice TEXT The voice to use for Kokoro TTS. [default: af_sky] โ
โ --tts-kokoro-host TEXT The base URL for the Kokoro API. โ
โ [default: http://localhost:8880/v1] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: Piper โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-piper-host TEXT The base URL for the Piper HTTP server. โ
โ [default: http://localhost:10200] โ
โ --tts-piper-voice TEXT The voice to use for Piper TTS (optional). โ
โ [default: None] โ
โ --tts-piper-speaker TEXT The speaker to use for multi-speaker voices โ
โ (optional). โ
โ [default: None] โ
โ --tts-piper-speaker-id INTEGER The speaker ID to use for multi-speaker โ
โ voices (optional, overrides speaker). โ
โ [default: None] โ
โ --tts-piper-length-scale FLOAT Speaking speed (1.0 = normal speed). โ
โ [default: 1.0] โ
โ --tts-piper-noise-scale FLOAT Speaking variability (optional). โ
โ [default: None] โ
โ --tts-piper-noise-w-scale FLOAT Phoneme width variability (optional). โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Process Management Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --stop Stop any running background process. โ
โ --status Check if a background process is running. โ
โ --toggle Toggle the background process on/off. If the process is running, it โ
โ will be stopped. If the process is not running, it will be started. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ General Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --save-file PATH Save TTS response audio to WAV file. โ
โ [default: None] โ
โ --clipboard --no-clipboard Copy result to clipboard. โ
โ [default: clipboard] โ
โ --log-level TEXT Set logging level. [default: WARNING] โ
โ --log-file TEXT Path to a file to write logs to. โ
โ [default: None] โ
โ --quiet -q Suppress console output from rich. โ
โ --config TEXT Path to a TOML configuration file. โ
โ [default: None] โ
โ --print-args Print the command line arguments, including โ
โ variables taken from the configuration file. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Purpose: A full-featured, conversational AI assistant that can interact with your system.
Workflow: This is a persistent, conversational agent that you can have a conversation with.
- Run the
chatcommand. It will start listening for your voice. - Speak your command or question (e.g., "What's in my current directory?").
- The agent transcribes your speech, sends it to the LLM, and gets a response. The LLM can use tools like
read_fileorexecute_codeto answer your question. - The agent speaks the response back to you and then immediately starts listening for your next command.
- The conversation continues in this loop. Conversation history is saved between sessions.
Interaction Model:
- To Interrupt: Press
Ctrl+Conce to stop the agent from either listening or speaking, and it will immediately return to a listening state for a new command. This is useful if it misunderstands you or you want to speak again quickly. - To Exit: Press
Ctrl+Ctwice in a row to terminate the application.
How to Use It:
- Start the agent:
agent-cli chat --input-device-index 1 --tts - Have a conversation:
- You: "Read the pyproject.toml file and tell me the project version."
- AI: (Reads file) "The project version is 0.1.0."
- You: "Thanks!"
See the output of agent-cli chat --help
Usage: agent-cli chat [OPTIONS]
An chat agent that you can talk to.
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Provider Selection โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --asr-provider TEXT The ASR provider to use ('local' for Wyoming, 'openai'). โ
โ [default: local] โ
โ --llm-provider TEXT The LLM provider to use ('local' for Ollama, 'openai', โ
โ 'gemini'). โ
โ [default: local] โ
โ --tts-provider TEXT The TTS provider to use ('local' for Wyoming, 'openai', โ
โ 'kokoro'). โ
โ [default: local] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --input-device-index INTEGER Index of the PyAudio input device to use. โ
โ [default: None] โ
โ --input-device-name TEXT Device name keywords for partial matching. โ
โ [default: None] โ
โ --list-devices List available audio input and output devices and โ
โ exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration: Wyoming (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --asr-wyoming-ip TEXT Wyoming ASR server IP address. [default: localhost] โ
โ --asr-wyoming-port INTEGER Wyoming ASR server port. [default: 10300] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ ASR (Audio) Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). โ
โ [default: whisper-1] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: Ollama (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-ollama-model TEXT The Ollama model to use. Default is qwen3:4b. โ
โ [default: qwen3:4b] โ
โ --llm-ollama-host TEXT The Ollama server host. Default is โ
โ http://localhost:11434. โ
โ [default: http://localhost:11434] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. โ
โ [default: gpt-4o-mini] โ
โ --openai-api-key TEXT Your OpenAI API key. Can also be set with the โ
โ OPENAI_API_KEY environment variable. โ
โ [env var: OPENAI_API_KEY] โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ LLM Configuration: Gemini โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. โ
โ [default: gemini-2.5-flash] โ
โ --gemini-api-key TEXT Your Gemini API key. Can also be set with the โ
โ GEMINI_API_KEY environment variable. โ
โ [env var: GEMINI_API_KEY] โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts --no-tts Enable text-to-speech for responses. โ
โ [default: no-tts] โ
โ --output-device-index INTEGER Index of the PyAudio output device to โ
โ use for TTS. โ
โ [default: None] โ
โ --output-device-name TEXT Output device name keywords for partial โ
โ matching. โ
โ [default: None] โ
โ --tts-speed FLOAT Speech speed multiplier (1.0 = normal, โ
โ 2.0 = twice as fast, 0.5 = half speed). โ
โ [default: 1.0] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: Wyoming (local) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-wyoming-ip TEXT Wyoming TTS server IP address. โ
โ [default: localhost] โ
โ --tts-wyoming-port INTEGER Wyoming TTS server port. [default: 10200] โ
โ --tts-wyoming-voice TEXT Voice name to use for Wyoming TTS (e.g., โ
โ 'en_US-lessac-medium'). โ
โ [default: None] โ
โ --tts-wyoming-language TEXT Language for Wyoming TTS (e.g., 'en_US'). โ
โ [default: None] โ
โ --tts-wyoming-speaker TEXT Speaker name for Wyoming TTS voice. โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: OpenAI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-openai-model TEXT The OpenAI model to use for TTS. [default: tts-1] โ
โ --tts-openai-voice TEXT The voice to use for OpenAI TTS. [default: alloy] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: Kokoro โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-kokoro-model TEXT The Kokoro model to use for TTS. [default: kokoro] โ
โ --tts-kokoro-voice TEXT The voice to use for Kokoro TTS. [default: af_sky] โ
โ --tts-kokoro-host TEXT The base URL for the Kokoro API. โ
โ [default: http://localhost:8880/v1] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ TTS (Text-to-Speech) Configuration: Piper โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --tts-piper-host TEXT The base URL for the Piper HTTP server. โ
โ [default: http://localhost:10200] โ
โ --tts-piper-voice TEXT The voice to use for Piper TTS (optional). โ
โ [default: None] โ
โ --tts-piper-speaker TEXT The speaker to use for multi-speaker voices โ
โ (optional). โ
โ [default: None] โ
โ --tts-piper-speaker-id INTEGER The speaker ID to use for multi-speaker โ
โ voices (optional, overrides speaker). โ
โ [default: None] โ
โ --tts-piper-length-scale FLOAT Speaking speed (1.0 = normal speed). โ
โ [default: 1.0] โ
โ --tts-piper-noise-scale FLOAT Speaking variability (optional). โ
โ [default: None] โ
โ --tts-piper-noise-w-scale FLOAT Phoneme width variability (optional). โ
โ [default: None] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Process Management Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --stop Stop any running background process. โ
โ --status Check if a background process is running. โ
โ --toggle Toggle the background process on/off. If the process is running, it โ
โ will be stopped. If the process is not running, it will be started. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ History Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --history-dir PATH Directory to store conversation history. โ
โ [default: ~/.config/agent-cli/history] โ
โ --last-n-messages INTEGER Number of messages to include in the conversation โ
โ history. Set to 0 to disable history. โ
โ [default: 50] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ General Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --save-file PATH Save TTS response audio to WAV file. [default: None] โ
โ --log-level TEXT Set logging level. [default: WARNING] โ
โ --log-file TEXT Path to a file to write logs to. [default: None] โ
โ --quiet -q Suppress console output from rich. โ
โ --config TEXT Path to a TOML configuration file. [default: None] โ
โ --print-args Print the command line arguments, including variables โ
โ taken from the configuration file. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
The project uses pytest for testing. To run tests using uv:
uv run pytestThis project uses pre-commit hooks (ruff for linting and formatting, mypy for type checking) to maintain code quality. To set them up:
-
Install pre-commit:
pip install pre-commit
-
Install the hooks:
pre-commit install
Now, the hooks will run automatically before each commit.
Contributions are welcome! If you find a bug or have a feature request, please open an issue. If you'd like to contribute code, please fork the repository and submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.