Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

MTGMAD/GemTTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GemTTS - Qwen3-TTS Desktop Application

A cross-platform desktop application for Text-to-Speech using Qwen3-TTS models with voice cloning, custom voices, and voice design capabilities.

Features

  • Voice Cloning: Clone any voice from a reference audio sample
  • TTS Custom Voice: Generate speech using preset voice models
  • Voice Design: Create custom voices by adjusting age, gender, accent, and emotion
  • Auto Model Download: Automatically downloads Qwen3-TTS models from HuggingFace
  • Cross-Platform: Works on Windows, macOS, and Linux
  • Dark/Light Mode: Comfortable interface for any lighting condition

Architecture

  • Frontend: Electron + React + TypeScript + Tailwind CSS
  • Backend: Python + FastAPI + PyTorch
  • Models: Qwen3-TTS-1.7B from HuggingFace

Prerequisites

  • Node.js 18+ and npm
  • Python 3.9+
  • Git
  • 8GB+ RAM recommended
  • 10GB+ disk space for models
  • GPU optional but recommended (CUDA-compatible)

Installation

1. Clone the repository

cd c:\Development\GemTTS

2. Set up Python Backend

cd backend

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
# source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

3. Set up Frontend

cd ../frontend

# Install dependencies
npm install

Running the Application

Development Mode

Option 1: Run Backend and Frontend Separately

Terminal 1 - Backend:

cd backend
venv\Scripts\activate  # On Windows
python main.py

Terminal 2 - Frontend:

cd frontend
npm run electron:dev

Option 2: Run Everything Together

cd frontend
npm run electron:dev

(This will automatically start the Python backend)

Production Build

cd frontend
npm run electron:build:win   # For Windows
npm run electron:build:mac   # For macOS
npm run electron:build:linux # For Linux

The built application will be in frontend/dist-electron/

Usage

  1. First Launch: The application will prompt you to download the Qwen3-TTS models. This is a one-time setup that may take 10-30 minutes depending on your internet connection.

  2. Voice Cloning Tab:

    • Upload a reference audio file (WAV, MP3, etc.)
    • Enter the text you want to speak
    • Adjust similarity and speed parameters
    • Click "Generate Voice"
  3. TTS Custom Voice Tab:

    • Select a preset voice from the dropdown
    • Enter your text
    • Adjust speed and pitch
    • Click "Generate Speech"
  4. Voice Design Tab:

    • Enter your text
    • Adjust age, gender, accent, and emotion sliders
    • Save/load presets for later use
    • Click "Generate Voice"

Project Structure

GemTTS/
├── backend/
│   ├── main.py              # FastAPI server
│   ├── model_manager.py     # Model download and management
│   ├── tts_processor.py     # TTS inference logic
│   ├── requirements.txt     # Python dependencies
│   ├── models/              # Downloaded models (auto-created)
│   ├── uploads/             # Uploaded audio files (auto-created)
│   └── outputs/             # Generated audio (auto-created)
├── frontend/
│   ├── electron/
│   │   ├── main.js          # Electron main process
│   │   └── preload.js       # Electron preload script
│   ├── src/
│   │   ├── components/      # React components
│   │   │   ├── VoiceCloning.tsx
│   │   │   ├── TTSCustomVoice.tsx
│   │   │   ├── VoiceDesign.tsx
│   │   │   └── ModelsStatus.tsx
│   │   ├── api/
│   │   │   └── apiService.ts # API client
│   │   ├── App.tsx          # Main app component
│   │   ├── App.css          # Styles
│   │   └── main.tsx         # Entry point
│   ├── package.json
│   ├── vite.config.ts
│   └── tsconfig.json
├── README.md
└── LICENSE

Configuration

Backend Configuration

Edit backend/main.py to change:

  • API host/port (default: 127.0.0.1:8000)
  • Model paths
  • Output settings

Frontend Configuration

Edit frontend/src/api/apiService.ts to change:

  • API endpoint URL
  • Request timeouts

Troubleshooting

Models Not Downloading

  • Check your internet connection
  • Ensure you have enough disk space (10GB+)
  • Check HuggingFace is accessible from your network

Audio Generation Fails

  • Ensure models are fully downloaded
  • Check Python backend logs in the terminal
  • Verify your system has enough RAM (8GB+ recommended)

GPU Not Being Used

  • Install CUDA toolkit (11.8+)
  • Install PyTorch with CUDA support:
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Electron App Won't Start

  • Ensure Python backend is running first
  • Check that port 8000 is not in use
  • Look for errors in the terminal

Development

Adding New Features

  1. Backend: Add new endpoints in main.py and processing logic in tts_processor.py
  2. Frontend: Create new components in src/components/ and wire them up in App.tsx

Testing

# Backend
cd backend
pytest

# Frontend
cd frontend
npm test

License

MIT License - see LICENSE file for details

Acknowledgments

  • Qwen3-TTS by Alibaba Cloud
  • Built with Electron, React, FastAPI, and PyTorch

Support

For issues and questions, please open an issue on GitHub.

Roadmap

  • Batch processing multiple texts
  • Voice preset library with community voices
  • Real-time voice morphing
  • SSML support for advanced text markup
  • Multi-language support
  • Voice fine-tuning interface
  • Audio effects and post-processing
  • Export to multiple formats

About

Local TTS application

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published