Starred repositories
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.
Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".
Official Codebase for the paper: A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone.
个人构建MoE大模型:从预è®ç»ƒåˆ°DPO的完整实践
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
A Next-Generation Training Engine Built for Ultra-Large MoE Models
ScalarLM - a unified training and inference stack
PyTorch building blocks for the OLMo ecosystem
Repository containing code and data for the paper "ArgCMV: An Argument Summarization Benchmark for the LLM-era", accepted at EMNLP 2025 Main Conference.
Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model
Awesome LLM pre-training resources, including data, frameworks, and methods.
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Latency and Memory Analysis of Transformer Models for Training and Inference
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
Transformer related optimization, including BERT, GPT
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
[TMLR 2024] Efficient Large Language Models: A Survey
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.