A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…

TypeScript 14,978 1,542 Updated Dec 25, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,480 480 Updated Dec 25, 2025

xiaguan / pegaflow

PegaFlow is a high-performance KV cache offloading solution for vLLM v1 on single-node multi-GPU setups.

Rust 12 Updated Dec 25, 2025

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python 6,442 816 Updated Dec 25, 2025

ByteDance-Seed / ShadowKV

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 275 19 Updated May 1, 2025

google / autocxx

Tool for safe ergonomic Rust/C++ interop driven from existing C++ headers

Rust 2,500 162 Updated Aug 29, 2025

dtolnay / cxx

Safe interop between Rust and C++

Rust 6,583 395 Updated Dec 20, 2025

gkamradt / LLMTest_NeedleInAHaystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 2,122 228 Updated Aug 17, 2024

ytgui / PilotANN

Memory-Bounded GPU Acceleration for Vector Search

Python 32 4 Updated Oct 17, 2025

FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,382 588 Updated Oct 28, 2024

NEO-MLSys25 / NEO

NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading

Python 75 21 Updated Jun 16, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,682 753 Updated Dec 25, 2025

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 18,754 3,611 Updated Dec 24, 2025

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 10,161 1,698 Updated Dec 24, 2025

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,078 4,672 Updated Dec 24, 2025

Junru Shen jr-shen

Highlights

Lists (5)

ANNS

GPU Programming

LLM Benchmarks

LLM Inference Engine

LLM KVCache Management

Stars