Transform written content into speech using Google AI (Gemini) for text generation and internet-based information retrieval.
This project is based on an example in test/app.ts. It performs the following steps:
- Fetches a voice input
- Sends a request to the Google Gemini API to receive an AI-generated response
- Automatically converts the response to speech using Text-To-Speech (TTS) technology
- Plays the generated audio
This project has been tested on Linux (Ubuntu 24.04 LTS x86_64). Windows users can install SoX via SourceForge. MacOS-specific information is currently unavailable.
| Task | Priority | Status |
|---|---|---|
| Implement Gemini Chat | High | ✅ Completed |
| Develop Voice Recognition | High | ✅ Completed |
| Implement Audio Language Detection | High | ✅ Completed |
| Implement Text Language Detection | Medium | ✅ Completed |
| Implement an Audio Player | Low | ✅ Completed |
| Define Enums | Low | ✅ Completed |
| Integrate Debugging | Low | ✅ Completed |
Before using this repository, ensure the following dependencies are installed on your system:
- SoX:
sudo apt-get install sox - libsox-fmt-all:
sudo apt-get install libsox-fmt-all - FFmpeg:
sudo apt install ffmpeg
- SoX: Download from SourceForge
- FFmpeg:
choco install ffmpeg(using Chocolatey) or Download from official website
MacOS-specific installation instructions are not available at this time.
To install the package, use one of the following commands based on your preferred package manager:
# npm
$ npm install git+https://github.com/Stawa/GTTS.git --legacy-peer-deps
# Bun
$ bun install git+https://github.com/Stawa/GTTS.git --trustBefore diving into the examples, ensure you have the following API keys and credentials:
- Google Gemini API Key (
lib.GoogleGemini)- Obtain from Google Cloud Console
- TikTok SessionID (
lib.TextToSpeech)- Extract from TikTok browser cookies after logging in
- Google Speech API Key (
lib.VoiceRecognition.fetchTranscriptGoogle)- Generate from Google Cloud Console Credentials
- Deepgram API Key (
lib.VoiceRecognition.fetchTranscriptDeepgram)- Create an account and obtain from Deepgram Console
- EdenAI API Key (
lib.SummarizeText)- Sign up and retrieve from EdenAI Dashboard
Ensure to store these API keys securely and never commit them to version control. Consider using environment variables or a secure key management system.
Here's a concise example demonstrating how to generate a response using the Google Gemini API:
import { GoogleGemini } from "@stawa/gtts";
import dotenv from "dotenv";
dotenv.config();
const gemini = new GoogleGemini({
apiKey: process.env.GEMINI_API_KEY,
model: "gemini-1.5-flash",
enableLogging: true,
});
async function main() {
try {
const question = "When was Facebook launched?";
console.log(`Question: ${question}`);
const response = await gemini.chat(question);
console.log(`Gemini's response: ${response}`);
} catch (error) {
console.error("An error occurred:", error);
}
}
main();We appreciate the contributions of all our collaborators. Each person's effort helps make this project better. A special thanks to all our contributors who have helped shape this project!