⚠️ Heads-up: this project was vibe-coded together with AI helpers (Claude Code and Codex). I am not a Python developer. If you hit issues, please debug with AI or your own expertise, fix them, and send a PR. Treat this repo as a “here’s how it can work” manual rather than a guaranteed turnkey solution. It works on my Omarchy (Arch Linux) setup, and I’m sharing the path that got me there.
This project reproduces the hands-free dictation setup used on Arch Linux with a Wayland compositor. A dedicated key listener records audio while you hold a hotkey and forwards the audio to Faster Whisper. Once transcription finishes, the recognised text is typed into the focused window via ydotool.
The repository contains ready-to-use Python scripts, configuration templates, and systemd unit files so you can replicate the complete workflow on your own machine.
- Hold-to-talk workflow – press and hold a configurable key (e.g., Right Ctrl) to record; release to transcribe and type the text.
- Wayland-compatible typing – uses
ydotoolinstead ofxdotool, so it works on Sway, Hyprland, GNOME, KDE, etc. - Offline transcription – powered by Faster Whisper running locally on CPU (can be upgraded to GPU if desired).
- Systemd integration – both the key listener and
ydotoolddaemon are managed as services and start automatically after boot.
.
├── config.example.py # Template with all tunable settings
├── key_listener.py # Root hotkey listener (records audio, launches STT)
├── requirements.txt # Python dependencies
├── speech_to_text.py # Faster Whisper transcription + ydotool typing
├── systemd/
│ ├── speech-to-text-listener.service # Service for key_listener.py
│ └── ydotoold.service # ydotool daemon with boot sequencing fix
└── LICENSE
Copy config.example.py to config.py and adjust it for your environment before starting the services.
- Audio & input utilities
sudo pacman -S alsa-utils python-evdev
- Wayland automation tools
sudo pacman -S ydotool
ydotoollives in thecommunityrepository. If you are using another distribution, install it from your package manager or build from source. - Optional key remapping – if you plan to trigger dictation with a mouse button or unusual key, install a remapper such as
input-remapperor Sway/Hyprland keybinds. - Python 3.10+ – required for the virtual environment and Faster Whisper.
GPU acceleration (optional): install CUDA / ROCm drivers and replace the Python dependencies with the GPU build of PyTorch plus
faster-whisperconfigured for your accelerator. The README covers CPU-only setup for reliability.
sudo mkdir -p /opt
sudo chown "$USER" /opt
cd /opt
git clone https://github.com/omarchy/speech-to-text.git
cd speech-to-textFeel free to adjust the target path, but remember to update the systemd unit files accordingly.
python -m venv venv
source venv/bin/activate
pip install --upgrade pip wheel
pip install -r requirements.txtThe default requirements.txt installs a CPU version of Faster Whisper (faster-whisper, numpy, soundfile, evdev).
cp config.example.py config.pyEdit config.py and review every option:
TARGET_USER– the desktop user that owns the Wayland session (receives typed text).DEVICE_PATH– the/dev/input/event*device that should trigger recording. Usesudo evtestto discover the correct device and key codes.TRIGGER_KEYCODE– the key code reported byevtestwhile you press the hotkey (default:KEY_RIGHTCTRL).AUDIO_FILE– temporary WAV file location (default/tmp/recorded_audio.wav).PYTHON_VENV&SPEECH_TO_TEXT_SCRIPT– paths to the interpreter and transcription script. Defaults assume the project lives in/opt/speech-to-text.WHISPER_MODEL_SIZE/WHISPER_COMPUTE_TYPE– pick another model (e.g.tiny,medium) or precision if desired.YDOTOOL_SOCKET– matches the socket path created by the systemd unit (/run/user/<uid>/.ydotool_socket).
Copy the service files and adjust them for your UID/GID and project path.
sudo install -m 0644 systemd/ydotoold.service /etc/systemd/system/ydotoold.service
sudo install -m 0644 systemd/speech-to-text-listener.service /etc/systemd/system/speech-to-text-listener.serviceEdit /etc/systemd/system/ydotoold.service:
- Replace every occurrence of
1000with your user’s numeric UID and GID (seeid -u,id -g). - Update the socket path if you changed it in
config.py.
Edit /etc/systemd/system/speech-to-text-listener.service:
- Update
WorkingDirectoryandExecStartso they match the absolute project path and Python interpreter inside your virtual environment.
Reload systemd and enable the services:
sudo systemctl daemon-reload
sudo systemctl enable --now ydotoold.service
sudo systemctl enable --now speech-to-text-listener.service- Ensure
ydotooldcreated the socket:ls -l /run/user/<uid>/.ydotool_socket
- Monitor logs:
journalctl -u ydotoold.service -b journalctl -u speech-to-text-listener.service -b
The key listener should log that it is watching KEY_RIGHTCTRL (or whichever key you configure) and transitions through recording and transcription when you test it.
┌──────────────────────────┐
│ key_listener.py (root) │
│ • watches DEVICE_PATH │
│ • starts/stops arecord │
│ • calls speech_to_text │
└────────────┬─────────────┘
│ WAV file
▼
┌──────────────────────────┐
│ speech_to_text.py (root) │
│ • loads Faster Whisper │
│ • transcribes segments │
│ • uses ydotool type │
└────────────┬─────────────┘
│ text events via ydotool
▼
Active application
Key points:
key_listener.pymust run as root to read/dev/inputand to interact withsudo -u <user> arecord. The actual audio capture happens as the unprivileged desktop user, so PulseAudio/PipeWire routing behaves normally.speech_to_text.pyruns as root but inherits the user’s runtime environment (XDG_RUNTIME_DIR, Wayland display) soydotoolcan access the compositor socket. The service fixes a boot timing race by ensuring the user runtime directory exists beforeydotooldstarts.
You can run everything manually before enabling the units:
sudo ./venv/bin/python key_listener.pyThen hold the configured hotkey. You should see logs similar to:
INFO: Starting audio recording
INFO: Recording started with PID ...
INFO: Stopping audio recording
INFO: Running speech-to-text
INFO: Recognised: ...
INFO: Typed text successfully
If typing fails, check that ydotoold is running and the socket path matches config.py.
Error: [Errno 19] No such device–DEVICE_PATHinconfig.pyis wrong or the device id changes between boots. Re-runsudo evtestand update the path.failed to connect socket '/run/user/1000/.ydotool_socket'–ydotoolddid not start or the runtime directory was re-created after boot. Confirm the service uses the modified unit provided here.arecordcommand fails – installalsa-utilsand confirm the microphone works (arecord -f S16_LE -r 16000 test.wav).- Whisper model loads slowly – larger models can take several seconds. Consider the
tinyorbasemodel for faster start, or configure GPU acceleration. - Typing lag –
ydotoolsends events sequentially. If performance is an issue, experiment with theydotool type --delayflag by modifyingspeech_to_text.py.
- Both services run as root. Restrict access to the repository directory and review the scripts before installing on production machines.
key_listener.pyinvokessudo -u <TARGET_USER> arecord .... Ensure the root account can runsudowithout prompting (the default for root).- The scripts type whatever Faster Whisper recognises. Consider adding keyword filtering if you plan to use it in sensitive contexts.
- Change the trigger key by editing the
KEY_RIGHTCTRLcheck inkey_listener.pyor remap your preferred key/button to Right Ctrl usinginput-remapperor compositor keybinds. - To support multiple hotkeys or languages, extend
speech_to_text.pyto pick models dynamically or to send the text to other applications (e.g., copy to clipboard instead of typing). - GPU users can install
torch+faster-whisperwithdevice="cuda"inspeech_to_text.pyand adjustWHISPER_COMPUTE_TYPEtofloat16for a large speed boost.
Distributed under the MIT License (see LICENSE). The original idea and much of the inspiration comes from CDNsun’s “Speech-to-Text for Ubuntu” article; this repository adapts that work for Arch Linux + Wayland with additional boot-order fixes.