Stars
OpenSource Claude Cowork. A desktop AI assistant that helps you with programming, file management, and any task you can describe.
Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
Free ChatGPT&DeepSeek API Key,免费ChatGPT&DeepSeek API。免费接入DeepSeek API和GPT4 API,支持 gpt | deepseek | claude | gemini | grok 等排名靠前的常用大模型。
Repository of Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning
✨✨Latest Advances on Multimodal Large Language Models
[NeurIPS 2024] How do Large Language Models Handle Multilingualism?
Trying to prototype a multimodal llm which can take text and audio as input and then output text.
Build your own visual reasoning model
Open neural machine translation models and web services
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
Fully open reproduction of DeepSeek-R1
Multilingual Generative Pretrained Model
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration