Codestin Search App

🧢 CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

📄 Paper | 🌐 Project Page | 🗂 Dataset | 🤗 Models | 🚀 Live Demo

Introduction

🧢 CapSpeech comprises over 10 million machine-annotated audio-caption pairs and nearly 0.36 million human-annotated audio-caption pairs. CapSpeech provides a new benchmark including these tasks:

CapTTS: style-captioned TTS
CapTTS-SE: text-to-speech synthesis with sound effects
AccCapTTS: accent-captioned TTS
EmoCapTTS: emotion-captioned TTS
AgentTTS: text-to-speech synthesis for chat agent

capspeech.mp4

Usage

⚡ Quick Start

Explore CapSpeech directly in your browser — no installation needed.

🚀 Live Demo: 🤗 Spaces

🛠️ Local Deployment

Install and Run CapSpeech locally.

💿 Installation & Usage: 📄 Instrucitons

Development

Please refer to the following documents to prepare the data, train the model, and evaluate its performance.

Main Contributors

Helin Wang at Johns Hopkins University
Jiarui Hai at Johns Hopkins University

Citation

If you find this work useful, please consider contributing to this repo and cite this work:

@misc{wang2025capspeechenablingdownstreamapplications,
      title={CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech}, 
      author={Helin Wang and Jiarui Hai and Dading Chong and Karan Thakkar and Tiantian Feng and Dongchao Yang and Junhyeok Lee and Laureano Moro Velazquez and Jesus Villalba and Zengyi Qin and Shrikanth Narayanan and Mounya Elhiali and Najim Dehak},
      year={2025},
      eprint={2506.02863},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2506.02863}, 
}

License

All datasets, listening samples, source code, pretrained checkpoints, and the evaluation toolkit are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
See the LICENSE file for details.

Acknowledgements

This implementation is based on Parler-TTS, F5-TTS, SSR-Speech, Data-Speech, EzAudio, and Vox-Profile. We appreciate their awesome work.

🌟 Like This Project?

If you find this repo helpful or interesting, consider dropping a ⭐ — it really helps and means a lot!

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
capspeech		capspeech
docs		docs
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧢 CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

Introduction

Usage

⚡ Quick Start

🛠️ Local Deployment

Development

Main Contributors

Citation

License

Acknowledgements

🌟 Like This Project?

About

Uh oh!

Releases 1

Packages

Contributors 2

Languages

License

WangHelin1997/CapSpeech

Folders and files

Latest commit

History

Repository files navigation

🧢 CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

Introduction

Usage

⚡ Quick Start

🛠️ Local Deployment

Development

Main Contributors

Citation

License

Acknowledgements

🌟 Like This Project?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages