VYOM is built on a Modular Multi-Threaded Architecture. Unlike linear assistants, VYOM decouples peripheral I/O (Voice/Listen) from core logic (NLP/Action) to prevent UI freezing and ensure real-time responsiveness.
The following diagram illustrates how a voice command propagates through the modular layers:
graph TD
subgraph Input_Layer [Perception]
A[π€ Voice Input] -->|PyAudio / SpeechRecognition| B(Audio Stream)
B -->|Whisper / Google API| C{Speech-to-Text}
end
subgraph Brain_Layer [Processing]
C -->|Raw Text| D[π§ NLP Engine]
D -->|Intent Extraction| E{Action Router}
end
subgraph Execution_Layer [Action]
E -->|System Cmd| F[OS Controller]
E -->|Web Query| G[Browser Automation]
E -->|API Call| H[Weather/IoT/News]
end
subgraph Output_Layer [Feedback]
F & G & H --> I[π£οΈ TTS Engine]
I --> J[π Speaker Output]
end
To maintain the "Always Listening" capability while executing heavy AI tasks, VYOM utilizes Python's threading and asyncio modules:
- Thread 1 (Listener): Continuously monitors the microphone for the wake word.
- Thread 2 (Processor): Handles API calls to Groq/Cohere without blocking the listener.
- Thread 3 (Executor): Manages OS-level tasks and GUI updates.
For SWOC contributors, please refer to this modular map before submitting PRs:
VYOM/
β
βββ Backend/ # Core backend logic for the assistant
β β
β βββ Automation.py # Handles task automation (system tasks, workflows)
β βββ ChatBot.py # Manages chatbot logic and conversational flow
β βββ ImageGeneration.py # Generates images using AI models/APIs
β βββ Model.py # Loads and manages AI/ML models
β βββ Productivity.py # Productivity features (notes, reminders, utilities)
β βββ RealTimeSearchEngine.py # Performs real-time web/search queries
β βββ SpeechToText.py # Converts spoken audio input into text
β βββ TextToSpeech.py # Converts text responses into spoken audio
β
βββ Frontend/ # User interface and client-side logic
β β
β βββ Files/ # Runtime data and application state storage
β β β
β β βββ Database.data # Stores persistent application data
β β βββ ImageGeneration.data # Stores image generation history/results
β β βββ Mic.data # Stores microphone state and audio metadata
β β βββ Responses.data # Stores chatbot responses
β β βββ Status.data # Tracks application and system status
β β
β βββ Graphics/ # UI assets and visual resources
β β β
β β βββ Chats.png # Chat interface icon/image
β β βββ Close.png # Close window button icon
β β βββ GUI.py # GUI layout logic using graphical assets
β β βββ Home.png # Home screen icon/image
β β βββ Mic_off.png # Microphone disabled icon
β β βββ Mic_on.png # Microphone enabled icon
β β βββ Minimize.png # Minimize window icon
β β βββ maximize.png # Maximize window icon
β β βββ minimize2.png # Alternate minimize icon
β β βββ settings.png # Settings icon
β β βββ VYOM.jpeg # Project logo / branding image
β β βββ jarvis.gif # Animated assistant graphic
β β
β βββ automation/ # Frontend automation tests
β β βββ test_gui.py # Automated tests for GUI behavior
β β
β βββ playwright_tests/ # Playwright-based UI testing
β β βββ homepage.png # Screenshot of homepage during tests
β β βββ index.html # Static test page for UI validation
β β βββ test_gui.py # Playwright test cases for GUI
β β
β βββ tests/ # Frontend test specifications
β β βββ test_issue4.spec.js # Test case for reported issue #4
β β
β βββ GUI.py # Main frontend GUI controller
β βββ test_gui.py # Manual/functional GUI test script
β
βββ config/ # Configuration and environment settings
β β
β βββ __init__.py # Marks config as a Python package
β βββ settings.py # Centralized configuration variables
β
βββ utils/ # Shared utility functions
β β
β βββ logger.py # Logging utilities for debugging and monitoring
β βββ memory.py # Memory management and context handling
β
βββ .env.example # Sample environment variables file
βββ .gitignore # Files and folders ignored by Git
βββ CODE_OF_CONDUCT.md # Community guidelines and behavior rules
βββ CONTRIBUTING.md # Contribution guidelines for developers
βββ LICENSE # Project licensing information
βββ README.md # Project overview and documentation
β
βββ main.py # Application entry point
βββ requirements.txt # Python dependencies list
β
βββ test_logger.py # Unit tests for logger utility
βββ test_memory.py # Unit tests for memory utility
- Python 3.13+
- FFmpeg (Required for audio processing)
- C++ Build Tools (Required for PyAudio on Windows)
π§ Linux/Mac Setup (Audio Dependencies)
Most setup errors occur due to missing audio driver headers. Run the following before pip install:
- For Ubuntu/Debian:
sudo apt-get update
sudo apt-get install python3-pyaudio portaudio19-dev libasound2-dev espeak
- For macOS:
brew install portaudio
pip install pyaudio
1. Clone & Environment
git clone [https://github.com/th-shivam/vyom.git](https://github.com/th-shivam/vyom.git) && cd vyom
python -m venv .venv
source .venv/bin/activate # Mac/Linux
# .venv\Scripts\activate # Windows
2. Install & Run
pip install -r requirements.txt
python main.py
We are proud to be an official part of Social Winter of Code (SWOC) 2026! π
We welcome contributors of all skill levels. To ensure a smooth collaboration, please identify your path:
- π± Beginners: Look for issues labeled
good-first-issueanddocumentation. Perfect for your first PR! - π οΈ Advanced: Check for
modular-enhancementandthreading-optimizationto work on the core engine.
- Fork the repository and create your branch.
- Follow the PEP 8 style guide for Python code.
- Ensure your module is placed in the correct directory (see Project Structure).
- Open a PR with a clear description of your changes.
π Full Contributing Guide | ποΈ Architecture Deep Dive
This project is licensed under the MIT License. You are free to use, modify, and distribute this software, provided the original copyright and license notice are included.
TL;DR: Open-source, permissive, and community-friendly.
See the LICENSE file for the full legal text.
If you find VYOM helpful, don't forget to give it a β!
VYOM v2.0 β’ Built with π Python β’ Focused on ποΈ Modular Architecture