Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Real-time two-way voice translator built in Python using speech recognition, Google Gemini AI, and text-to-speech for seamless multilingual conversations.

Notifications You must be signed in to change notification settings

YashJha52/Translator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Real-Time Voice Translator (English ↔ Hindi)

A real-time, two-way voice translation agent built with Python that enables seamless spoken conversations between two people speaking different languages. The system listens to live speech, transcribes it, translates it using Google Gemini, and speaks the translated output instantly.

Perfect for cross-language conversations, demos, learning projects, and AI-powered communication tools.

✨ Features

  • 🎀 Live speech recognition using microphone input
  • 🌍 Bidirectional translation between two speakers
  • πŸ€– AI-powered translation via Google Gemini (gemini-1.5-flash)
  • πŸ”Š Text-to-speech playback for translated output
  • πŸ” Turn-based conversation flow
  • πŸ› οΈ Easily configurable language pairs

🧠 How it works

  1. Person 1 speaks in their native language.
  2. Speech is captured and transcribed using Google Speech Recognition.
  3. The text is translated using the Gemini Generative AI model.
  4. The translated text is converted into speech.
  5. The translated output is played aloud for Person 2.
  6. Roles switch and the process repeats.

πŸ—£οΈ Supported languages (default)

You can extend this list β€” these are typical defaults included in the example:

  • English (en-US)
  • Hindi (hi-IN)
  • Spanish (es-ES)
  • French (fr-FR)
  • German (de-DE)
  • Italian (it-IT)
  • Portuguese (pt-BR)
  • Japanese (ja-JP)
  • Korean (ko-KR)

🧩 Tech stack

  • Python 3
  • speechrecognition
  • google-generativeai (Gemini API)
  • gTTS (Google Text-to-Speech)
  • playsound
  • python-dotenv

βš™οΈ Setup instructions

  1. Clone the repository
git clone https://github.com/YashJha52/Translator.git
cd Translator
  1. Install dependencies
pip install -r requirements.txt
  1. Add environment variables

Create a .env file in the project root containing:

GOOGLE_API_KEY=your_google_gemini_api_key
  1. Run the translator
python main.py

πŸ”§ Configuration

You can change the language pair in main.py, for example:

agent = RealTimeTranslator(person1_lang='en-US', person2_lang='hi-IN')

Use any valid Google Speech Recognition / BCP-47 language code.

πŸš€ Use cases

  • πŸ§‘β€πŸ€β€πŸ§‘ Cross-language conversations
  • πŸŽ“ Language learning
  • 🀝 International meetings & demos
  • 🧠 AI & NLP experimentation
  • πŸ“’ Accessibility and communication tools

⚠️ Limitations

  • Requires an active internet connection.
  • Google Speech Recognition / Gemini API usage limits and costs may apply.
  • Accuracy depends on microphone quality and background noise.

πŸ›£οΈ Future improvements

  • πŸ”„ Continuous conversation detection (non turn-based)
  • πŸ§‘β€πŸ€β€πŸ§‘ Multi-speaker support
  • πŸ“± GUI / Web interface
  • ⚑ Streaming-based, lower-latency translation
  • 🎧 Noise suppression & improved voice activity detection

Updated README formatting by GitHub Copilot for clarity and code blocks.

About

Real-time two-way voice translator built in Python using speech recognition, Google Gemini AI, and text-to-speech for seamless multilingual conversations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published