Stars
Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.
A feature-rich command-line audio/video downloader
zero-shot voice conversion & singing voice conversion, with real-time support
Extrapolating RLVR to General Domains without Verifiers
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Awesome speech/audio LLMs, representation learning, and codec models
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Curated list of datasets and tools for post-training.
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Fully open reproduction of DeepSeek-R1
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Open-source and strong foundation image recognition models.
A library for efficient similarity search and clustering of dense vectors.
A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171
Effortless AI-assisted data labeling with AI support from YOLO, Segment Anything (SAM+SAM2), MobileSAM!!
Just use pyecharts to imitate Echarts official example.
Weekly update the Computer Science Paper upload to arxiv.