- Hangzhou, Zhejiang, China
Stars
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Using SVM-Random forest algortihms
Faster Whisper transcription with CTranslate2
Wan: Open and Advanced Large-Scale Video Generative Models
Unified automatic quality assessment for speech, music, and sound.
Voice activity detection (VAD) paper and code(From 198*~ )and its classification.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
An Open-source Streaming High-fidelity Neural Audio Codec
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
✨✨Latest Advances on Multimodal Large Language Models
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
This is the audio sample repository for speech separation model "MossFormer2".
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
A curated list of different papers and datasets in various areas of audio-visual processing
Wan: Open and Advanced Large-Scale Video Generative Models
The best OSS video generation models, created by Genmo
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
MU-LLaMA: Music Understanding Large Language Model
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
Text-to-Music Generation with Rectified Flow Transformers
[IJCV] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
[AAAI 2025] Official codes of "ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models".
Official implementation of Magic Clothing: Controllable Garment-Driven Image Synthesis