Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View manutdzou's full-sized avatar

Block or report manutdzou

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LLM inference in C/C++

C++ 88,424 13,446 Updated Oct 28, 2025

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python 851 121 Updated Oct 28, 2025

manutdzou's blog

JavaScript 1 Updated Oct 24, 2025

The homepage of OneBit model quantization framework.

Python 193 4 Updated Feb 5, 2025

Public repo for HF blog posts

Jupyter Notebook 3,173 925 Updated Oct 28, 2025

Official inference framework for 1-bit LLMs

Python 24,307 1,882 Updated Jun 3, 2025

Open-source unified multimodal model

Python 5,215 451 Updated Oct 27, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 25,165 1,749 Updated Oct 13, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,259 805 Updated Oct 27, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 14,829 2,363 Updated Oct 28, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 19,452 3,186 Updated Oct 28, 2025

Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"

Python 317 24 Updated Mar 4, 2025

[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Python 308 22 Updated May 22, 2025

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 497 34 Updated Feb 10, 2025

Iterative JSON parser with Pythonic interfaces

Python 1,022 56 Updated Oct 27, 2025

State-of-the-Art Text Embeddings

Python 17,769 2,704 Updated Oct 22, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 23,847 2,649 Updated Aug 12, 2024

A recipe for online RLHF and online iterative DPO.

Python 536 49 Updated Dec 28, 2024

Recipes to train reward model for RLHF.

Python 1,474 102 Updated Apr 24, 2025

Train transformer language models with reinforcement learning.

Python 16,039 2,257 Updated Oct 28, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 61,217 10,841 Updated Oct 28, 2025

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 6,226 677 Updated Oct 24, 2025

Library for fast text representation and classification.

HTML 26,393 4,807 Updated Mar 22, 2024
Python 344 34 Updated May 17, 2024

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 39,195 4,766 Updated Jun 2, 2025

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 2,261 292 Updated May 11, 2025

Pytorch implementation of BRECQ, ICLR 2021

Python 284 61 Updated Aug 1, 2021

搜索所有中文NLP数据集,附常用英文NLP数据集

Python 4,378 628 Updated Nov 21, 2022

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 31,285 3,808 Updated Jul 23, 2024
Next