Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View jr-shen's full-sized avatar
🤔
🤔

Highlights

  • Pro

Block or report jr-shen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,333 205 Updated Dec 23, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,546 979 Updated Dec 13, 2025

A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs

C 144 16 Updated Dec 2, 2025

FalconFS is a high-performance distributed file system (DFS) designed for AI workloads.

C++ 49 16 Updated Dec 25, 2025

A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)

C++ 437 71 Updated Dec 13, 2025

Meta's fleetwide profiler framework

C++ 334 23 Updated Sep 22, 2025

High-Throughput, Cost-Effective Billion-Scale Vector Search with a Single GPU [to appear in SIGMOD'26]

Cuda 19 3 Updated Sep 26, 2025

A low-latency, billion-scale, and updatable graph-based vector store on SSD.

Jupyter Notebook 84 28 Updated Dec 12, 2025

PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]

Python 48 7 Updated Oct 19, 2025

Distributed KV cache coordinator

Go 92 68 Updated Dec 25, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,302 360 Updated Dec 25, 2025

FlashInfer: Kernel Library for LLM Serving

Python 4,356 616 Updated Dec 25, 2025

LongBench v2 and LongBench (ACL 25'&24')

Python 1,050 112 Updated Jan 15, 2025

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 1,403 118 Updated Nov 13, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,212 748 Updated Dec 23, 2025

A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…

TypeScript 14,978 1,542 Updated Dec 25, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,480 480 Updated Dec 25, 2025

PegaFlow is a high-performance KV cache offloading solution for vLLM v1 on single-node multi-GPU setups.

Rust 12 Updated Dec 25, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 6,442 816 Updated Dec 25, 2025

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 275 19 Updated May 1, 2025

Tool for safe ergonomic Rust/C++ interop driven from existing C++ headers

Rust 2,500 162 Updated Aug 29, 2025

Safe interop between Rust and C++

Rust 6,583 395 Updated Dec 20, 2025

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 2,122 228 Updated Aug 17, 2024

Memory-Bounded GPU Acceleration for Vector Search

Python 32 4 Updated Oct 17, 2025

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,382 588 Updated Oct 28, 2024

NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading

Python 75 21 Updated Jun 16, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,682 753 Updated Dec 25, 2025

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 18,754 3,611 Updated Dec 24, 2025

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 10,161 1,698 Updated Dec 24, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,078 4,672 Updated Dec 24, 2025
Next