Thanks to visit codestin.com
Credit goes to github.com

Skip to content

aritrodium/Supertonic-TTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Supertonic TTS Studio

Supertonic TTS Studio is a professional text-to-speech application that supports almost real-time continuous streaming and full audio generation while using CPU. This project leverages the Supertonic repository for high-quality TTS synthesis using ONNX models and multiple voice styles.

Features

  • Real-time continuous streaming of generated speech.
  • Full audio file generation with download capability.
  • Multiple voice styles with descriptions.
  • Adjustable speaking speed.
  • Character and word count tracking for input text.
  • Professional Gradio-based user interface. Screenshot

Installation

  1. Clone this repository:
git clone https://github.com/aritrodium/Supertonic-TTS.git
cd Supertonic-TTS
  1. Ensure you have the required dependencies installed:
pip install -r requirements.txt
pip install gradio numpy soundfile
  1. Clone the Supertonic repository and pull LFS files:
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Supertone/supertonic supertonic
cd supertonic
git lfs pull

Usage

Run the application using Python:

python app.py

The Gradio interface will launch at http://0.0.0.0:7860 and can be accessed via your browser or shared using the share=True option.

Input Controls

  • Text Input: Enter the text to synthesize. Supports multiple paragraphs.
  • Voice Style: Select a preferred voice style from the available options.
  • Speaking Speed: Adjust the rate of speech generation.

Output

  • Streaming Output: Continuous playback of generated speech chunks.
  • Download Output: Complete audio file generation and download.

Project Structure

Supertonic-TTS-/
├── app.py                 # Main Gradio application
├── supertonic/            # Cloned Supertonic repository
│   ├── onnx/              # Pretrained ONNX models
│   ├── voice_styles/      # JSON files for voice styles
│   └── tts_model.py       # TTS model functions
└── README.md

Functions

  • run_tts_stream(text, speed, style_name) – Generates speech in streaming mode, yielding audio chunks and status updates.
  • run_tts_full(text, speed, style_name) – Generates a complete audio file for download.
  • list_voice_styles() – Lists all available voice style JSON files.
  • load_style_by_name(filename) – Loads a specific voice style.

Contributing

Contributions are welcome. Please fork the repository, make your changes, and submit a pull request with a clear description of the changes.

License

This project is licensed under the MIT License.