RTXUX

💭

Sleepy

RTXUX RTXUX

💭

Sleepy

Code for fun

85 followers · 107 following

University of Science and Technology of China

Achievements

Highlights

Organizations

Lists (4)

Sort

Pending

Stars

ovg-project / kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 513 45 Updated Oct 27, 2025

maximhq / bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Go 880 85 Updated Oct 29, 2025

Helicone / ai-gateway

The fastest, lightest, and easiest-to-integrate AI gateway on the market. Fully open-sourced.

Rust 450 28 Updated Jul 31, 2025

mlfoundations / evalchemy

Automatic evals for LLMs

HTML 550 67 Updated Jun 27, 2025

microsoft / RetrievalAttention

Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.

Python 97 15 Updated Sep 17, 2025

Alibaba-NLP / DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 16,505 1,252 Updated Oct 29, 2025

microsoft / vidur

A large-scale simulation framework for LLM inference

Python 463 89 Updated Jul 25, 2025

pku-lemonade / TokenSim

TokenSim is a tool for simulating the behavior of large language models (LLMs) in a distributed environment.

Python 16 1 Updated Sep 20, 2025

Zefan-Cai / KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 1,269 160 Updated Jan 4, 2025

October2001 / Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

584 15 Updated Sep 30, 2025

NVIDIA / kvpress

LLM KV cache compression made easy

Python 671 72 Updated Oct 28, 2025

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,642 317 Updated Aug 19, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,233 809 Updated Oct 17, 2025

eunomia-bpf / bpftime

Userspace eBPF runtime for Observability, Network, GPU & General Extensions Framework

C++ 1,195 123 Updated Oct 27, 2025

HqWu-HITCS / Awesome-Chinese-LLM

整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。

21,555 2,048 Updated May 19, 2025

Azure / AzurePublicDataset

Microsoft Azure Traces

Jupyter Notebook 1,014 165 Updated Oct 20, 2025

ModelCloud / GPTQModel

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python 852 121 Updated Oct 29, 2025

rosafilgueira / Optimization_CollectiveIO_MPI

This repository is dedicated to store the different optimizations for MPI collective IO operations that I have performed.

Groff 1 Updated Jun 15, 2016

amitashnanda / ACM_PEARC_2025_Paper_Artifact

Exploring Dynamic Load Balancing Algorithms for Block-Structured Mesh-and-Particle Simulations in AMReX

Jupyter Notebook 1 Updated Sep 5, 2025

oraios / serena

A powerful coding agent toolkit providing semantic retrieval and editing capabilities (MCP server & other integrations)

Python 14,994 1,000 Updated Oct 29, 2025

dongxianzhe / hydrainfer

a mllm inference engine for academic research

Python 14 2 Updated Oct 24, 2025

janestreet / magic-trace

magic-trace collects and displays high-resolution traces of what a process is doing

OCaml 5,128 114 Updated Oct 25, 2025

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python 5,716 672 Updated Oct 29, 2025

eunomia-bpf / MCPtrace

MCP server: using eBPF to tracing your kernel

Python 56 6 Updated Aug 1, 2025

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

HTML 21,579 2,526 Updated Oct 19, 2025

karminski / one-small-step

这是一个简单的技术科普教程项目，主要聚焦于解释一些有趣的，前沿的技术概念和原理。每篇文章都力求在 5 分钟内阅读完成。

6,125 566 Updated Aug 27, 2025

datawhalechina / happy-llm

📚 从零开始的大语言模型原理与实践教程

Jupyter Notebook 20,657 1,810 Updated Oct 17, 2025

octelium / octelium

A next-gen FOSS self-hosted unified zero trust secure access platform that can operate as a remote access VPN, a ZTNA platform, API/AI/MCP gateway, a PaaS, an ngrok-alternative and a homelab infras…

Go 2,428 73 Updated Oct 29, 2025

jjZhang94 / SyncMalloc

Cuda 9 Updated Mar 19, 2025

getml / reflect-cpp

A C++20 library for fast serialization, deserialization and validation using reflection. Supports JSON, Avro, BSON, Cap'n Proto, CBOR, CSV, flexbuffers, msgpack, parquet, TOML, UBJSON, XML, YAML / …

C++ 1,602 143 Updated Oct 28, 2025

RTXUX RTXUX

Highlights

Organizations

Lists (4)

Game OSS

Genshin

GPU MM

Pending

Stars