yanghailong-git

yanghailong-git

1 follower · 8 following

Achievements

Stars

opendataio / lmc_exernal_log_connector

An external log connector example for LMCache

Python 4 Updated Jun 13, 2025

LMCache / lmcache-vllm

The driver for LMCache core to run in vLLM

Python 59 32 Updated Feb 4, 2025

YaoJiayi / CacheBlend

Python 164 24 Updated Jul 15, 2025

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 553 75 Updated Nov 7, 2025

UFund-Me / Qbot

[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant

Jupyter Notebook 15,868 2,260 Updated Jul 6, 2025

PiotrNawrot / nano-sparse-attention

The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.

Jupyter Notebook 92 6 Updated Jul 17, 2025

PiotrNawrot / sparse-frontier

The evaluation framework for training-free sparse attention in LLMs

Python 110 8 Updated Oct 13, 2025

Infini-AI-Lab / TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 276 17 Updated Aug 31, 2024

October2001 / Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

639 20 Updated Sep 30, 2025

FFY0 / AdaKV

The Official Implementation of Ada-KV [NeurIPS 2025]

Python 125 5 Updated Nov 26, 2025

snu-mllab / KVzip

[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)

Python 187 9 Updated Dec 30, 2025

IsaacRe / vllm-kvcompress

KV cache compression for high-throughput LLM inference

Python 149 5 Updated Feb 5, 2025

ByteDance-Seed / ShadowKV

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 279 21 Updated May 1, 2025

Zefan-Cai / R-KV

[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

Python 1,166 188 Updated Oct 16, 2025

zyqCSL / DiffKV

Python 35 10 Updated Oct 11, 2025

NVIDIA / kvpress

LLM KV cache compression made easy

Python 819 94 Updated Jan 14, 2026

HarryWu99 / llm_kvcache_sparsity

Implement some method of LLM KV Cache Sparsity

Python 41 2 Updated Jun 6, 2024

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,917 336 Updated Jan 18, 2026

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python 6,719 862 Updated Jan 18, 2026

LDLINGLINGLING / nano_vllm_note

注释的nano_vllm仓库，并且完成了MiniCPM4的适配以及注册新模型的功能

Python 143 27 Updated Aug 11, 2025

tspeterkim / paged-attention-minimal

a minimal cache manager for PagedAttention, on top of llama3.

Python 130 11 Updated Aug 26, 2024

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 1,047 104 Updated Dec 30, 2024

amulil / cleanvllm

A single-file educational implementation for understanding vLLM's core concepts and running LLM inference.

Python 33 4 Updated Jun 22, 2025

punica-ai / punica

Serving multiple LoRA finetuned LLM as one

Python 1,135 57 Updated May 8, 2024

zinccat / Awesome-Triton-Kernels

Collection of kernels written in Triton language

174 9 Updated Apr 5, 2025

xinzhel / LLM-Agent-Survey

Survey on LLM Agents (Published on CoLing 2025)

461 18 Updated Oct 3, 2025

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 515 88 Updated Sep 8, 2024

brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book

Cuda 1,978 381 Updated Dec 14, 2025

wangzyon / NVIDIA_SGEMM_PRACTICE

Step-by-step optimization of CUDA SGEMM

Cuda 424 55 Updated Mar 30, 2022

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,771 250 Updated Jan 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yanghailong-git

Achievements

Achievements

Block or report yanghailong-git

Stars

opendataio / lmc_exernal_log_connector

LMCache / lmcache-vllm

YaoJiayi / CacheBlend

perplexityai / pplx-kernels

UFund-Me / Qbot

PiotrNawrot / nano-sparse-attention

PiotrNawrot / sparse-frontier

Infini-AI-Lab / TriForce

October2001 / Awesome-KV-Cache-Compression

FFY0 / AdaKV

snu-mllab / KVzip

IsaacRe / vllm-kvcompress

ByteDance-Seed / ShadowKV

Zefan-Cai / R-KV

zyqCSL / DiffKV

NVIDIA / kvpress

HarryWu99 / llm_kvcache_sparsity

xlite-dev / Awesome-LLM-Inference

LMCache / LMCache

LDLINGLINGLING / nano_vllm_note

tspeterkim / paged-attention-minimal

tspeterkim / flash-attention-minimal

amulil / cleanvllm

punica-ai / punica

zinccat / Awesome-Triton-Kernels

xinzhel / LLM-Agent-Survey

Bruce-Lee-LY / cuda_hgemm

brucefan1983 / CUDA-Programming

wangzyon / NVIDIA_SGEMM_PRACTICE

BBuf / how-to-optim-algorithm-in-cuda