Offline Speech-to-Text for Arch Linux (Wayland)

⚠️ Heads-up: this project was vibe-coded together with AI helpers (Claude Code and Codex). I am not a Python developer. If you hit issues, please debug with AI or your own expertise, fix them, and send a PR. Treat this repo as a “here’s how it can work” manual rather than a guaranteed turnkey solution. It works on my Omarchy (Arch Linux) setup, and I’m sharing the path that got me there.

This project reproduces the hands-free dictation setup used on Arch Linux with a Wayland compositor. A dedicated key listener records audio while you hold a hotkey and forwards the audio to Faster Whisper. Once transcription finishes, the recognised text is typed into the focused window via ydotool.

The repository contains ready-to-use Python scripts, configuration templates, and systemd unit files so you can replicate the complete workflow on your own machine.

Features

Hold-to-talk workflow – press and hold a configurable key (e.g., Right Ctrl) to record; release to transcribe and type the text.
Wayland-compatible typing – uses ydotool instead of xdotool, so it works on Sway, Hyprland, GNOME, KDE, etc.
Offline transcription – powered by Faster Whisper running locally on CPU (can be upgraded to GPU if desired).
Systemd integration – both the key listener and ydotoold daemon are managed as services and start automatically after boot.

Repository Layout

.
├── config.example.py        # Template with all tunable settings
├── key_listener.py          # Root hotkey listener (records audio, launches STT)
├── requirements.txt         # Python dependencies
├── speech_to_text.py        # Faster Whisper transcription + ydotool typing
├── systemd/
│   ├── speech-to-text-listener.service  # Service for key_listener.py
│   └── ydotoold.service                  # ydotool daemon with boot sequencing fix
└── LICENSE

Copy config.example.py to config.py and adjust it for your environment before starting the services.

Prerequisites (Arch Linux)

Audio & input utilities
```
sudo pacman -S alsa-utils python-evdev
```
Wayland automation tools
```
sudo pacman -S ydotool
```
ydotool lives in the community repository. If you are using another distribution, install it from your package manager or build from source.
Optional key remapping – if you plan to trigger dictation with a mouse button or unusual key, install a remapper such as input-remapper or Sway/Hyprland keybinds.
Python 3.10+ – required for the virtual environment and Faster Whisper.

GPU acceleration (optional): install CUDA / ROCm drivers and replace the Python dependencies with the GPU build of PyTorch plus faster-whisper configured for your accelerator. The README covers CPU-only setup for reliability.

Installation

1. Clone the repository

sudo mkdir -p /opt
sudo chown "$USER" /opt
cd /opt
git clone https://github.com/omarchy/speech-to-text.git
cd speech-to-text

Feel free to adjust the target path, but remember to update the systemd unit files accordingly.

2. Configure Python environment

python -m venv venv
source venv/bin/activate
pip install --upgrade pip wheel
pip install -r requirements.txt

The default requirements.txt installs a CPU version of Faster Whisper (faster-whisper, numpy, soundfile, evdev).

3. Prepare `config.py`

cp config.example.py config.py

Edit config.py and review every option:

TARGET_USER – the desktop user that owns the Wayland session (receives typed text).
DEVICE_PATH – the /dev/input/event* device that should trigger recording. Use sudo evtest to discover the correct device and key codes.
TRIGGER_KEYCODE – the key code reported by evtest while you press the hotkey (default: KEY_RIGHTCTRL).
AUDIO_FILE – temporary WAV file location (default /tmp/recorded_audio.wav).
PYTHON_VENV & SPEECH_TO_TEXT_SCRIPT – paths to the interpreter and transcription script. Defaults assume the project lives in /opt/speech-to-text.
WHISPER_MODEL_SIZE / WHISPER_COMPUTE_TYPE – pick another model (e.g. tiny, medium) or precision if desired.
YDOTOOL_SOCKET – matches the socket path created by the systemd unit (/run/user/<uid>/.ydotool_socket).

4. Install systemd units

Copy the service files and adjust them for your UID/GID and project path.

sudo install -m 0644 systemd/ydotoold.service /etc/systemd/system/ydotoold.service
sudo install -m 0644 systemd/speech-to-text-listener.service /etc/systemd/system/speech-to-text-listener.service

Edit /etc/systemd/system/ydotoold.service:

Replace every occurrence of 1000 with your user’s numeric UID and GID (see id -u, id -g).
Update the socket path if you changed it in config.py.

Edit /etc/systemd/system/speech-to-text-listener.service:

Update WorkingDirectory and ExecStart so they match the absolute project path and Python interpreter inside your virtual environment.

Reload systemd and enable the services:

sudo systemctl daemon-reload
sudo systemctl enable --now ydotoold.service
sudo systemctl enable --now speech-to-text-listener.service

5. Verify services

Ensure ydotoold created the socket:
```
ls -l /run/user/<uid>/.ydotool_socket
```

Monitor logs:

journalctl -u ydotoold.service -b
journalctl -u speech-to-text-listener.service -b

The key listener should log that it is watching KEY_RIGHTCTRL (or whichever key you configure) and transitions through recording and transcription when you test it.

How It Works

┌──────────────────────────┐
│ key_listener.py (root)   │
│  • watches DEVICE_PATH   │
│  • starts/stops arecord  │
│  • calls speech_to_text  │
└────────────┬─────────────┘
             │ WAV file
             ▼
┌──────────────────────────┐
│ speech_to_text.py (root) │
│  • loads Faster Whisper  │
│  • transcribes segments  │
│  • uses ydotool type     │
└────────────┬─────────────┘
             │ text events via ydotool
             ▼
      Active application

Key points:

key_listener.py must run as root to read /dev/input and to interact with sudo -u <user> arecord. The actual audio capture happens as the unprivileged desktop user, so PulseAudio/PipeWire routing behaves normally.
speech_to_text.py runs as root but inherits the user’s runtime environment (XDG_RUNTIME_DIR, Wayland display) so ydotool can access the compositor socket. The service fixes a boot timing race by ensuring the user runtime directory exists before ydotoold starts.

Testing Without systemd

You can run everything manually before enabling the units:

sudo ./venv/bin/python key_listener.py

Then hold the configured hotkey. You should see logs similar to:

INFO: Starting audio recording
INFO: Recording started with PID ...
INFO: Stopping audio recording
INFO: Running speech-to-text
INFO: Recognised: ...
INFO: Typed text successfully

If typing fails, check that ydotoold is running and the socket path matches config.py.

Troubleshooting

Error: [Errno 19] No such device – DEVICE_PATH in config.py is wrong or the device id changes between boots. Re-run sudo evtest and update the path.
failed to connect socket '/run/user/1000/.ydotool_socket' – ydotoold did not start or the runtime directory was re-created after boot. Confirm the service uses the modified unit provided here.
arecord command fails – install alsa-utils and confirm the microphone works (arecord -f S16_LE -r 16000 test.wav).
Whisper model loads slowly – larger models can take several seconds. Consider the tiny or base model for faster start, or configure GPU acceleration.
Typing lag – ydotool sends events sequentially. If performance is an issue, experiment with the ydotool type --delay flag by modifying speech_to_text.py.

Security Notes

Both services run as root. Restrict access to the repository directory and review the scripts before installing on production machines.
key_listener.py invokes sudo -u <TARGET_USER> arecord .... Ensure the root account can run sudo without prompting (the default for root).
The scripts type whatever Faster Whisper recognises. Consider adding keyword filtering if you plan to use it in sensitive contexts.

Extending / Customising

Change the trigger key by editing the KEY_RIGHTCTRL check in key_listener.py or remap your preferred key/button to Right Ctrl using input-remapper or compositor keybinds.
To support multiple hotkeys or languages, extend speech_to_text.py to pick models dynamically or to send the text to other applications (e.g., copy to clipboard instead of typing).
GPU users can install torch + faster-whisper with device="cuda" in speech_to_text.py and adjust WHISPER_COMPUTE_TYPE to float16 for a large speed boost.

License & Credits

Distributed under the MIT License (see LICENSE). The original idea and much of the inspiration comes from CDNsun’s “Speech-to-Text for Ubuntu” article; this repository adapts that work for Arch Linux + Wayland with additional boot-order fixes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Offline Speech-to-Text for Arch Linux (Wayland)

Features

Repository Layout

Prerequisites (Arch Linux)

Installation

1. Clone the repository

2. Configure Python environment

3. Prepare `config.py`

4. Install systemd units

5. Verify services

How It Works

Testing Without systemd

Troubleshooting

Security Notes

Extending / Customising

License & Credits

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
systemd		systemd
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.example.py		config.example.py
key_listener.py		key_listener.py
requirements.txt		requirements.txt
speech_to_text.py		speech_to_text.py

License

michabbb/omarchy-speech-to-text

Folders and files

Latest commit

History

Repository files navigation

Offline Speech-to-Text for Arch Linux (Wayland)

Features

Repository Layout

Prerequisites (Arch Linux)

Installation

1. Clone the repository

2. Configure Python environment

3. Prepare config.py

4. Install systemd units

5. Verify services

How It Works

Testing Without systemd

Troubleshooting

Security Notes

Extending / Customising

License & Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

3. Prepare `config.py`

Packages