Vocalance offers hands free control of your computer, enabling you to switch tabs, move on screen, dictate anywhere and much more!
To find out more about what Vocalance can do, including detailed instructions and guides, refer to the official website:
Vocalance can be set up entirely from the source code in this repository. To do so, follow the instructions below (currently only supported on Windows):
-
Open Windows PowerShell and enter the script below to install UV (Python package manager):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-
Add UV to path (this is specific to this terminal session only, repeat this step every time, or add to permanent path to skip):
$env:Path = "$HOME\.local\bin;$env:Path"
-
Create a 3.13.9 virtual environment named
vocalance_envwith UV:uv venv --python 3.13.9 vocalance_env
-
Activate the environment:
vocalance_env\Scripts\activate
-
Clone the repository:
git clone https://github.com/rick12000/vocalance.git
-
Go to the repository directory:
cd vocalance -
Install Vocalance from
uv.lock:uv sync --active
-
Run the application:
python vocalance.py
The application will start up and download any required models (like speech recognition models) on first run (these are downloaded from Hugging Face or other reputable hosts). This may take several minutes depending on your internet connection.
Then you're good to go! If you haven't already, refer to Vocalance's official website for instructions on how to get started.
If you want to reopen Vocalance after you closed it, you can repeat above steps, but skipping all installation steps.
Specificaly, open a new Windows PowerShell window and enter the following chained commands (taken from set up section):
$env:Path = "$HOME\.local\bin;$env:Path"; vocalance_env\Scripts\activate; cd vocalance; uv sync --active; python vocalance.pyThis will start Vocalance.
The recommended approach is to install Vocalance with uv, since the developers can freeze and document all recommended dependancies in a uv.lock file, which you then install with uv sync --active.
If you're more familiar with a mixture of a virtual environment manager (eg. venv or conda or pyenv) + pip however, you can absolutely replace above uv steps with your environment manager and replace uv sync --active with pip install . to install Vocalance as a package. Note this is at your discretion, and license disclosures in this repository pertain to pinned package versions in uv.lock.
- Operating System: Windows 10/11 (macOS and Linux support planned)
- RAM: 2GB RAM
- Disk: 5GB
- Hardware: It is strongly recommended to purchase a reasonably good headset or microphone to improve Vocalance outputs and recognition, but it will still work without this.
Reach out at [email protected] with title "Contribution" if:
- You have software engineering experience and have feedback on how the architecture of the application could be improved.
- You want to add an original or pre-approved feature.
For now, contributions will be handled on an ad-hoc basis, but in future contribution guidelines will be set up depending on the number of contributors.
If you want to find out more about Vocalance's architecture, refer to the technical documentation pages:
- Developer Introduction - Brief overview of the main architecture and component flow
- Audio Processing - Audio capture and speech recognition
- Command System - Command parsing and execution
- Dictation - Transcription and formatting
- User Interface - UI components and interactions
- Infrastructure - Event bus and service communication
The following features are planned additions to Vocalance, with some in early development and others under consideration:
-
Eye Tracking for Cursor Control: This feature is planned to enable cursor control via eye movements.
- Gaze Tracking Accuracy: Merge gaze tracking with historical screen click data and screen contents to improve accuracy, aiming for good performance even with webcam tracking.
- Zoom Option: Add a zoom option to better direct gaze on screen contents.
-
Context-Aware Commands: Implement context bucketing for commands, allowing the same command phrase (e.g., "previous") to map to different hotkeys depending on the active application (e.g., VSCode vs. Chrome). This aims to avoid disambiguation phrases.
-
LLM-Powered Text Refactoring: Ability to select any text and reformat it via an LLM by speaking a prompt.
-
Improved Text Editing & Navigation: Further enhancements to text editing and text navigation tools.
-
Enhanced Predictive Features: Improve predictive capabilities based on window contents, recent context, gaze patterns, and more.
- Privacy Note: Any feature requiring local storage of potentially sensitive data (e.g., screenshots, window contents) will be deployed as an opt-in feature and disabled by default.