Stars
A toolkit for processing speech data and creating speech datasets
Microsoft.Recognizers.Text provides recognition and resolution of numbers, units, date/time, etc. in multiple languages (ZH, EN, FR, ES, PT, DE, IT, TR, HI, NL. Partial support for JA, KO, AR, SV).…
[NeurIPS 2025] OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication
SkyReels-V2: Infinite-length Film Generative model
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
A tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts.
A tool for generating .pex (Python EXecutable) files, lock files and venvs.
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
🦛 CHONK docs with Chonkie ✨ — The no-nonsense RAG library
A generative speech model for daily dialogue.
SGLang is a fast serving framework for large language models and vision language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
A TTS model capable of generating ultra-realistic dialogue in one pass.
Sample codes for my CUDA programming book
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
Tesseract Open Source OCR Engine (main repository)
Collection of training data management explorations for large language models
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
Modeling, training, eval, and inference code for OLMo