Thanks to visit codestin.com
Credit goes to github.com

ArcherFMY

Follow

💭

Fighting

Mengyang Feng ArcherFMY

💭

Fighting

Follow

180 followers · 21 following

Hangzhou, Zhejiang, China

Achievements

Achievements

Stars

facebookresearch / sam-audio

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,302 278 Updated Jan 5, 2026

AhmedRehaan1 / Speaker-Gender-Age-Recognition

Using SVM-Random forest algortihms

Python 3 1 Updated May 19, 2025

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

Python 20,938 1,727 Updated Nov 19, 2025

Wan-Video / Wan2.2

Wan: Open and Advanced Large-Scale Video Generative Models

Python 14,198 1,693 Updated Dec 17, 2025

facebookresearch / audiobox-aesthetics

Unified automatic quality assessment for speech, music, and sound.

Python 673 48 Updated Jun 5, 2025

linan2 / Voice-activity-detection-VAD-paper-and-code

Voice activity detection (VAD) paper and code（From 198*~ ）and its classification.

113 14 Updated Feb 12, 2026

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 8,170 729 Updated Feb 12, 2026

facebookresearch / AudioDec

An Open-source Streaming High-fidelity Neural Audio Codec

Python 498 27 Updated Mar 4, 2025

modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 2,778 249 Updated Dec 8, 2025

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

17,340 1,109 Updated Feb 7, 2026

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,912 321 Updated Aug 14, 2025

jishengpeng / WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,266 110 Updated Mar 2, 2025

alibabasglab / MossFormer2

This is the audio sample repository for speech separation model "MossFormer2".

Python 170 11 Updated Nov 28, 2024

v-iashin / Synchformer

Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)

Python 106 9 Updated Sep 15, 2025

krantiparida / awesome-audio-visual

A curated list of different papers and datasets in various areas of audio-visual processing

766 67 Updated Jan 30, 2024

Wan-Video / Wan2.1

Wan: Open and Advanced Large-Scale Video Generative Models

Python 15,326 2,386 Updated Dec 15, 2025

genmoai / mochi

The best OSS video generation models, created by Genmo

Python 3,594 468 Updated Nov 14, 2025

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,085 244 Updated Feb 6, 2026

X-LANCE / SLAM-LLM

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 972 105 Updated Jan 15, 2026

shansongliu / MU-LLaMA

MU-LLaMA: Music Understanding Large Language Model

Python 302 22 Updated Aug 18, 2025

ali-vilab / ACE

All-round Creator and Editor

Python 240 19 Updated Oct 16, 2025

facebookresearch / MovieGenBench

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

433 23 Updated Mar 8, 2025

feizc / FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

Python 1,713 128 Updated Dec 10, 2024

open-mmlab / FoleyCrafter

[IJCV] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师，给你的无声视频添加生动而且同步的音效 😝

Python 643 65 Updated Jul 26, 2024

lllyasviel / Omost

Your image is almost there!

Python 7,650 440 Updated Jul 26, 2024

Tencent-Hunyuan / HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Jupyter Notebook 4,293 360 Updated Nov 27, 2025

lllyasviel / IC-Light

More relighting!

Python 8,367 527 Updated Feb 20, 2025

HVision-NKU / StoryDiffusion

Accepted as [NeurIPS 2024] Spotlight Presentation Paper

Jupyter Notebook 6,380 650 Updated Sep 26, 2024

bytedance / res-adapter

[AAAI 2025] Official codes of "ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models".

Python 769 25 Updated Apr 27, 2025

ShineChen1024 / MagicClothing

Official implementation of Magic Clothing: Controllable Garment-Driven Image Synthesis

Python 1,540 150 Updated Jul 29, 2024