Local speech-to-text that runs entirely on your machine. No API keys, no cloud, no subscriptions. Click the orb, speak, get text.
- Click the orb to start recording
- Speak — the orb reacts to your voice in real-time (WebGL shader + FFT audio analysis)
- Click again to stop — transcription happens locally via faster-whisper
- Text auto-copies to your clipboard
The window is transparent and always-on-top — it floats over your desktop like a glass HUD.
| Layer | Tech |
|---|---|
| Window | Electron (frameless, transparent, always-on-top) |
| UI | HTML/CSS glass panels + Three.js WebGL orb with custom GLSL shaders |
| Audio | Web Audio API (FFT visualization) + Python sounddevice (capture) |
| Transcription | faster-whisper medium model (~1.5 GB) |
| Backend | Python Flask server on localhost |
| GPU | CUDA float16 (auto-falls back to CPU int8) |
Prerequisites: Python 3.10+, Node.js 18+, a microphone
# Clone
git clone https://github.com/desertcache/velvet.git
cd velvet
# Python dependencies
pip install -r requirements.txt
# Electron
npm install
# Launch
start.bat # Windows
npx electron . # Any platformElectron automatically starts the Python backend — no need to run it separately.
First run: The whisper model (~1.5 GB) downloads automatically. This takes a few minutes depending on your connection. The status will show "Loading model..." until it's ready.
CUDA is used automatically if available — transcription is significantly faster on GPU. If you don't have a CUDA GPU, it falls back to CPU (int8) with no extra configuration.
For CUDA support, you need the NVIDIA cuDNN and cuBLAS libraries installed. See the faster-whisper docs for details.
- Windows: Transparent window with acrylic blur works out of the box
- macOS/Linux: Electron transparency works but the acrylic blur effect may vary by compositor
The UI has three layers:
- Glass panels — CSS
backdrop-filter: blur(8px)with near-zero opacity backgrounds over a transparent Electron window - Soul Orb — A Three.js sphere with custom vertex/fragment shaders. Simplex noise deforms the mesh, fresnel shading creates edge glow, and voice amplitude drives real-time shape morphing
- Waveform visualizer — Canvas-drawn frequency bars from Web Audio API FFT data, masked with a gradient fade
The orb has four visual states (IDLE / LISTENING / PROCESSING / SPEAKING), each defining colors, noise parameters, scale, and shape. All transitions are lerped frame-by-frame for smooth organic motion.
velvet/
├── main.js # Electron main process (window config + spawns Flask)
├── preload.js # IPC bridge (minimize/close)
├── index.html # UI — glass panels, controls, waveform canvas
├── orb.js # Three.js SoulOrb — WebGL shaders, state machine, audio reactivity
├── server.py # Flask backend — mic capture, whisper transcription
├── start.bat # Windows launcher
├── requirements.txt # Python deps
└── package.json # Electron dep
The Flask backend runs on localhost:5111:
| Endpoint | Method | Description |
|---|---|---|
/status |
GET | Model load state + device (CUDA/CPU) |
/record |
POST | Start mic capture |
/stop |
POST | Stop capture |
/transcribe |
POST | Run whisper on captured audio, return text |
MIT


