revollllt

revollllt

Achievements

Stars

MiroMindAI / MiroThinker

MiroThinker is an open source deep research agent optimized for research and prediction. It achieves a 60.2% Avg@8 score on the challenging GAIA benchmark.

Python 5,354 381 Updated Jan 16, 2026

zejia-lin / BulletServe

Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration

Python 29 3 Updated Jan 8, 2026

wassemgtk / MegaScale-Infer-Prototyp

Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

Jupyter Notebook 26 2 Updated Apr 4, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 938 46 Updated Oct 29, 2025

ai-dynamo / aiconfigurator

Offline optimization of your disaggregated Dynamo graph

Python 153 50 Updated Jan 19, 2026

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 4,698 656 Updated Jan 19, 2026

mlcommons / chakra

Repository for MLCommons Chakra schema and tools

Python 152 62 Updated Oct 23, 2025

astra-sim / astra-sim

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

C++ 512 179 Updated Jan 3, 2026

chunhuizhang / llm_rl

llm & rl

Jupyter Notebook 267 27 Updated Oct 24, 2025

intel / auto-round

🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantization, MXFP4, NVFP4, GGUF, and adaptive schemes.

Python 816 72 Updated Jan 19, 2026

tile-ai / TileRT

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 533 26 Updated Dec 23, 2025

RUC-NLPIR / DeepAgent

[WWW 2026] 🛠️ DeepAgent: A General Reasoning Agent with Scalable Toolsets

Python 941 122 Updated Jan 14, 2026

infinigence / SpecEE

Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)

C++ 70 9 Updated Apr 25, 2025

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 22,545 4,095 Updated Jan 19, 2026

google-coral / coralnpu

A machine learning accelerator core designed for energy-efficient AI at the edge.

Emacs Lisp 1,996 223 Updated Jan 16, 2026

openhwgroup / cvfpu

Parametric floating-point unit with support for standard RISC-V formats and operations as well as transprecision formats.

SystemVerilog 558 146 Updated Oct 21, 2025

tishi43 / h265_decoder

H265 decoder write in verilog, verified on Xilinx ZYNQ7035

SystemVerilog 81 41 Updated Sep 2, 2021

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 885 72 Updated Nov 26, 2025

IST-DASLab / FP-Quant

Python 93 13 Updated Nov 16, 2025

IST-DASLab / qutlass

QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning

C++ 161 17 Updated Nov 11, 2025

facebookresearch / any4

Quantize transformers to any learned arbitrary 4-bit numeric format

Python 51 5 Updated Jul 10, 2025

Zhen-Dong / HAWQ

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

Python 453 84 Updated May 15, 2023

lwy2020 / MicroMix

MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models

Cuda 25 3 Updated Dec 20, 2025

aiha-lab / MX-QLLM

LLM Inference with Microscaling Format

Python 34 4 Updated Nov 12, 2024

NVlabs / GatedDeltaNet

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 428 23 Updated Sep 15, 2025

facebookresearch / ParetoQ

This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"

Python 117 8 Updated Oct 15, 2025

szqwu / Motion-Agent

Official repo of "Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs"

Python 109 9 Updated Jan 12, 2026

an-yongqi / systematic-outliers

[ICLR 2025] Systematic Outliers in Large Language Models.

Python 9 2 Updated Feb 11, 2025

CLab-HKUST-GZ / micro58-xset

X-SET's artifacts for MICRO'58

Python 4 Updated Sep 1, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,065 318 Updated Jan 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revollllt

Achievements

Achievements

Block or report revollllt

Stars

MiroMindAI / MiroThinker

zejia-lin / BulletServe

wassemgtk / MegaScale-Infer-Prototyp

efeslab / Nanoflow

ai-dynamo / aiconfigurator

flashinfer-ai / flashinfer

mlcommons / chakra

astra-sim / astra-sim

chunhuizhang / llm_rl

intel / auto-round

tile-ai / TileRT

RUC-NLPIR / DeepAgent

infinigence / SpecEE

sgl-project / sglang

google-coral / coralnpu

openhwgroup / cvfpu

tishi43 / h265_decoder

OpenGVLab / OmniQuant

IST-DASLab / FP-Quant

IST-DASLab / qutlass

facebookresearch / any4

Zhen-Dong / HAWQ

lwy2020 / MicroMix

aiha-lab / MX-QLLM

NVlabs / GatedDeltaNet

facebookresearch / ParetoQ

szqwu / Motion-Agent

an-yongqi / systematic-outliers

CLab-HKUST-GZ / micro58-xset

thu-ml / SageAttention