Stars
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
Free ChatGPT&DeepSeek API Key,免费ChatGPT&DeepSeek API。免费接入DeepSeek API和GPT4 API,支持 gpt | deepseek | claude | gemini | grok 等排名靠前的常用大模型。
Repository of Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning
✨✨Latest Advances on Multimodal Large Language Models
[NeurIPS 2024] How do Large Language Models Handle Multilingualism?
Trying to prototype a multimodal llm which can take text and audio as input and then output text.
Build your own visual reasoning model
Open neural machine translation models and web services
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
Democratizing Reinforcement Learning for LLMs
Fully open reproduction of DeepSeek-R1
Multilingual Generative Pretrained Model
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Paper Reproduction Google SCoRE(Training Language Models to Self-Correct via Reinforcement Learning)
STACL simultaneously translation model with PaddlePaddle
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Llama3、Llama3.1 中文后训练版仓库 - 微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档。
alibaba / Megatron-LLaMA
Forked from NVIDIA/Megatron-LMBest practice for training LLaMA models in Megatron-LM