Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Voice-controlled web extension that takes high-level user input and executes multi-step actions.

License

Notifications You must be signed in to change notification settings

doruktakim/vocal-web

Repository files navigation

Vocal Web

Vocal Web is a voice-controlled browser extension that allows users to navigate and interact with the web using natural language. It combines an LLM for translating natural language commands into high-level action plans with lightweight, heuristic-based execution for fast and reliable interactions. Browsing is intuitive and blazing fast compared to compute-heavy alternatives like Claude in Chrome, although some performance is naturally sacrificed in return.

Demos

Voice Input = "Show me cheap flights from Istanbul to New York on January 30th"

book_flight_demo.mp4

Voice Demo: Buying a speaker on eBay

shopping_demo.mp4

Voice Input = "I want to watch a Dwarkesh podcast video."

youtubePodcast.mp4

Text Input = "Search for the Wikipedia article on The French Revolution"

wikipediaSearch.mp4

Documentation

  • Architecture and workflow: ARCHITECTURE.md
  • Folder summaries: **/SUMMARY.md

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • uv
  • direnv (recommended for environment management)

Quickstart

  1. Install Python deps with uv: uv sync.
  2. Set up direnv once, then create your local env file:
    • cp .envrc.example .envrc
    • Generate a strong key for VCAA_API_KEY (minimum 32 characters, letters/numbers/-_ only) using openssl rand -hex 32
    • Run mkcert -install && mkcert localhost 127.0.0.1 ::1 for locally trusted certificates, then point SSL_KEYFILE/SSL_CERTFILE at the generated files in .envrc.
    • Fill in the other secrets, then run direnv allow
    • Recommended to keep asi1-mini as the model, which is currently free and performs great.
  3. Install JS tooling and build the extension bundle:
    npm install
    npm run build:ext
  4. Load the extension/dist/ folder as an unpacked extension in Chrome.
  5. Start the HTTP API bridge: uv run python -m agents.api_server (defaults to port 8081).
  6. Open the extension and paste the authentication key into the API Key field in settings.
  7. Test the extension by using it on an active webpage, press and hold cmd/ctrl+shift+L to activate voice input.

Security

  • See docs/security/tls-setup.md for TLS/HTTPS setup and operational security guidance.
  • This tool automates multi-step browser actions and may interact with logged-in accounts, modify data, or take unintended actions. Prompt injection or malicious web content may influence its behavior. Using a sandboxed environment for safety is recommended. Use at your own discretion.

Next steps

  • Currently creating my own dataset to further improve the element selection algorithms and create challenging tests. The use of language models in the navigator component will likely be reintroduced using a process-of-elimination approach once the selection algorithms are good enough to make it worth the additional cost/compute.
  • Will make <3B local open-source models available for increased privacy and free operation.

Accessibility Goals

  • We started building this project in a hackathon with the idea that LLMs could help make the web more accesible, particularly for individuals who face challenges using traditional input devices like keyboards and mice. There is a really long way to go before this can be considered a true accessibility tool as there is still large performance improvements needed, but I'm very excited to keep building. If you have ideas or feedback, I’d love to hear from you.

About

Voice-controlled web extension that takes high-level user input and executes multi-step actions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published