A real-time, two-way voice translation agent built with Python that enables seamless spoken conversations between two people speaking different languages. The system listens to live speech, transcribes it, translates it using Google Gemini, and speaks the translated output instantly.
Perfect for cross-language conversations, demos, learning projects, and AI-powered communication tools.
- π€ Live speech recognition using microphone input
- π Bidirectional translation between two speakers
- π€ AI-powered translation via Google Gemini (gemini-1.5-flash)
- π Text-to-speech playback for translated output
- π Turn-based conversation flow
- π οΈ Easily configurable language pairs
- Person 1 speaks in their native language.
- Speech is captured and transcribed using Google Speech Recognition.
- The text is translated using the Gemini Generative AI model.
- The translated text is converted into speech.
- The translated output is played aloud for Person 2.
- Roles switch and the process repeats.
You can extend this list β these are typical defaults included in the example:
- English (en-US)
- Hindi (hi-IN)
- Spanish (es-ES)
- French (fr-FR)
- German (de-DE)
- Italian (it-IT)
- Portuguese (pt-BR)
- Japanese (ja-JP)
- Korean (ko-KR)
- Python 3
- speechrecognition
- google-generativeai (Gemini API)
- gTTS (Google Text-to-Speech)
- playsound
- python-dotenv
- Clone the repository
git clone https://github.com/YashJha52/Translator.git
cd Translator- Install dependencies
pip install -r requirements.txt- Add environment variables
Create a .env file in the project root containing:
GOOGLE_API_KEY=your_google_gemini_api_key- Run the translator
python main.pyYou can change the language pair in main.py, for example:
agent = RealTimeTranslator(person1_lang='en-US', person2_lang='hi-IN')Use any valid Google Speech Recognition / BCP-47 language code.
- π§βπ€βπ§ Cross-language conversations
- π Language learning
- π€ International meetings & demos
- π§ AI & NLP experimentation
- π’ Accessibility and communication tools
- Requires an active internet connection.
- Google Speech Recognition / Gemini API usage limits and costs may apply.
- Accuracy depends on microphone quality and background noise.
- π Continuous conversation detection (non turn-based)
- π§βπ€βπ§ Multi-speaker support
- π± GUI / Web interface
- β‘ Streaming-based, lower-latency translation
- π§ Noise suppression & improved voice activity detection
Updated README formatting by GitHub Copilot for clarity and code blocks.