Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View flyingmrwang's full-sized avatar

Block or report flyingmrwang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.

171 6 Updated Jun 17, 2025

A feature-rich command-line audio/video downloader

Python 142,569 11,519 Updated Jan 18, 2026

zero-shot voice conversion & singing voice conversion, with real-time support

Python 3,528 427 Updated Apr 20, 2025

Extrapolating RLVR to General Domains without Verifiers

Python 191 11 Updated Aug 12, 2025

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

Python 1,843 137 Updated Aug 25, 2025

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 54,191 5,927 Updated Dec 30, 2025

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 1,037 79 Updated Dec 23, 2024

Awesome speech/audio LLMs, representation learning, and codec models

1,199 74 Updated Aug 13, 2025

Audio-FLAN

Jupyter Notebook 160 5 Updated Sep 23, 2025

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

889 86 Updated Jul 8, 2025

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 2,731 241 Updated Dec 8, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,657 788 Updated May 27, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,322 844 Updated Jan 8, 2026

Audio Large Language Models

Python 851 43 Updated Jul 5, 2025

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 37,558 4,491 Updated Jan 18, 2026

Curated list of datasets and tools for post-training.

4,173 342 Updated Nov 10, 2025

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

73,389 8,425 Updated Dec 22, 2025

Fully open reproduction of DeepSeek-R1

Python 25,828 2,411 Updated Nov 24, 2025

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

Python 22,636 1,712 Updated Sep 24, 2025

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,833 313 Updated Aug 14, 2025

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 44,274 5,918 Updated Aug 16, 2024

🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.

Python 1,377 182 Updated Sep 16, 2025

python wrapper for rubberband

Python 211 26 Updated Sep 30, 2024

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Python 5,744 317 Updated Jan 16, 2026

Open-source and strong foundation image recognition models.

Jupyter Notebook 3,566 316 Updated Feb 18, 2025

A library for efficient similarity search and clustering of dense vectors.

C++ 38,788 4,190 Updated Jan 16, 2026

A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171

Python 980 102 Updated May 28, 2024

Effortless AI-assisted data labeling with AI support from YOLO, Segment Anything (SAM+SAM2), MobileSAM!!

Python 3,150 329 Updated Jan 6, 2026

Just use pyecharts to imitate Echarts official example.

HTML 1,423 608 Updated Aug 4, 2025

Weekly update the Computer Science Paper upload to arxiv.

JavaScript 106 1 Updated Jan 13, 2026
Next