-
LG AI Research
Stars
Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
Long-form streaming TTS system for multi-speaker dialogue generation
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…
[ICML 2025 Tokenization Workshop] HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
FHEVM, a full-stack framework for integrating Fully Homomorphic Encryption (FHE) with blockchain applications
Zama Bounty Program: Contribute to the FHE space and Zama's open source libraries and get rewarded 💰
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator (NeurIPS 2024)
collection of diffusion model papers categorized by their subareas
A TTS model capable of generating ultra-realistic dialogue in one pass.
Easily configurable liquidation bot for Morpho Blue
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions
Concrete: TFHE Compiler that converts python programs into FHE equivalent
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
The official implementation of GTCRN, an ultra-lightweight SE model.
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis
Evaluation Protocol for Large-Scale Zero-Shot TTS Literature
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"