Markdown
Vākya, which means "sentence" in Sanskrit, is a conversational AI that allows users to interact with a Large Language Model (LLM) using their voice. Speak. Understand. Reply. That’s Vākya , a full loop of human-like conversation, built with Python, driven by FastAPI, and delivered in a crisp, modern UI.
- Voice-to-Voice Interaction: Talk to the AI and hear it respond back in real time.
- Real-time Transcription: See your speech transcribed instantly on screen.
- Conversational Memory: Maintains context within a session for natural dialogue.
- Session Management: Start new chats or revisit past conversations.
- Voice Customization: Choose from multiple voices for responses.
- Fun Skills Built-in: Weather updates, News headlines, Jokes on demand.
- Modern UI: Cute & aesthetic responsive interface with chat history panel.
- FastAPI — backend framework for REST + WebSocket support
- Jinja2 — template rendering for frontend
- AssemblyAI — speech-to-text (transcription)
- Google Gemini — LLM for text generation
- Murf.ai — text-to-speech (streaming natural voices)
- python-dotenv — for managing API keys securely
- HTML, CSS, JavaScript — responsive UI (with chat history, persona selection)
The application follows a simple client-server architecture:
- User’s voice recorded via browser microphone
- Audio streamed to FastAPI backend over WebSocket
- AssemblyAI transcribes speech → text
- Transcribed text + history → Gemini (LLM) for response generation
- Response text → Murf.ai → natural voice audio stream
- Frontend shows transcription, AI’s reply, and plays back the audio
- Python 3.8+
pip- API keys for AssemblyAI, Google Gemini, and Murf.ai.
-
Clone the repository:
git clone https://github.com/Yasaswini38/Vakya-AI cd Vakya-AI -
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables: Create a
.envfile in the root directory and add your API keys:MURF_API_KEY="YOUR_MURF_API_KEY" ASSEMBLYAI_API_KEY="YOUR_ASSEMBLYAI_API_KEY" GEMINI_API_KEY="YOUR_GEMINI_API_KEY" NEWS_API_KEY="YOUR_NEWS_API_KEY"
-
Run the application:
uvicorn main:app --reload
The server will run on
http://127.0.0.1:8000.
Once the setup is complete, the uvicorn main:app --reload command will start the APP
├── main.py # FastAPI backend
├── templates/
│ └── index.html # Main UI
├── static/
│ ├── script.js # Frontend JS
│ ├── style.css # Styles
│ └── voices.json # Voice list
├── uploads/ # Uploaded audio
├── .env # API keys
└── README.md