Skilled developers build muscle memory and navigate the terminal effortlessly, instantly recalling commands and aliases. But what if we could make things.. a lot less precise?
Introducing Talk To Your Terminal - a new, wildly unpredictable way to interact with your computer. Using your voice, you can finally express joy and frustration at your terminal in a natural manner.
2025-05-06_ttyt.mov
You'll need:
- A DeepGram API key for speech-to-text. Deepgram provides 200 USD of free credits, which is enough for many hours of work.
- PortAudio for (cross-platform) audio playback, also available as Homebrew.
- LM Studio, plus your favorite local model.
pyenvto manage the python version.poetryfor dependency management.
-
Create a venv, and run setup to enter your Deepgram API key:
make create-venv make init -
Run the tests:
make test -
Run the agent:
make run -
Talk to your terminal! We try to safeguard against some destructive commands (e.g. deleting files).
The goal of this project is to learn about voice-based computer use with local (small) models, and see how far current speech models are in this domain.
Although this is not a serious contender in the race for computer use agents, there are some genuine use cases for voice-control:
- Voice provides a communication channel that does not interfere with keyboard and mouse usage. You can simultaneously talk and type.
- If you are completely unfamiliar with shell or bash, a natural language interface can show you the commands to run. Of course, you could just use the keyboard here.
- If you have a visual impairment, audio becomes the main way to read the screen. There are much better ways to use audio, such as the excellent NVDA screen reader.