-
Undisclosed
- Beijing
- https://manutdzou.github.io/
Stars
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
The homepage of OneBit model quantization framework.
Official inference framework for 1-bit LLMs
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
verl: Volcano Engine Reinforcement Learning for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
ICRAR / ijson
Forked from isagalaev/ijsonIterative JSON parser with Pythonic interfaces
State-of-the-Art Text Embeddings
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A recipe for online RLHF and online iterative DPO.
Recipes to train reward model for RLHF.
Train transformer language models with reinforcement learning.
A high-throughput and memory-efficient inference and serving engine for LLMs
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Library for fast text representation and classification.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image