Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View RTXUX's full-sized avatar
💭
Sleepy
💭
Sleepy
  • University of Science and Technology of China

Highlights

  • Pro

Organizations

@ustclug @WestRabbit

Block or report RTXUX

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 513 45 Updated Oct 27, 2025

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Go 880 85 Updated Oct 29, 2025

The fastest, lightest, and easiest-to-integrate AI gateway on the market. Fully open-sourced.

Rust 450 28 Updated Jul 31, 2025

Automatic evals for LLMs

HTML 550 67 Updated Jun 27, 2025

Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.

Python 97 15 Updated Sep 17, 2025

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 16,505 1,252 Updated Oct 29, 2025

A large-scale simulation framework for LLM inference

Python 463 89 Updated Jul 25, 2025

TokenSim is a tool for simulating the behavior of large language models (LLMs) in a distributed environment.

Python 16 1 Updated Sep 20, 2025

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 1,269 160 Updated Jan 4, 2025

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

584 15 Updated Sep 30, 2025

LLM KV cache compression made easy

Python 671 72 Updated Oct 28, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,642 317 Updated Aug 19, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,233 809 Updated Oct 17, 2025

Userspace eBPF runtime for Observability, Network, GPU & General Extensions Framework

C++ 1,195 123 Updated Oct 27, 2025

整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

21,555 2,048 Updated May 19, 2025

Microsoft Azure Traces

Jupyter Notebook 1,014 165 Updated Oct 20, 2025

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python 852 121 Updated Oct 29, 2025

This repository is dedicated to store the different optimizations for MPI collective IO operations that I have performed.

Groff 1 Updated Jun 15, 2016

Exploring Dynamic Load Balancing Algorithms for Block-Structured Mesh-and-Particle Simulations in AMReX

Jupyter Notebook 1 Updated Sep 5, 2025

A powerful coding agent toolkit providing semantic retrieval and editing capabilities (MCP server & other integrations)

Python 14,994 1,000 Updated Oct 29, 2025

a mllm inference engine for academic research

Python 14 2 Updated Oct 24, 2025

magic-trace collects and displays high-resolution traces of what a process is doing

OCaml 5,128 114 Updated Oct 25, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 5,716 672 Updated Oct 29, 2025

MCP server: using eBPF to tracing your kernel

Python 56 6 Updated Aug 1, 2025

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 21,579 2,526 Updated Oct 19, 2025

这是一个简单的技术科普教程项目,主要聚焦于解释一些有趣的,前沿的技术概念和原理。每篇文章都力求在 5 分钟内阅读完成。

6,125 566 Updated Aug 27, 2025

📚 从零开始的大语言模型原理与实践教程

Jupyter Notebook 20,657 1,810 Updated Oct 17, 2025

A next-gen FOSS self-hosted unified zero trust secure access platform that can operate as a remote access VPN, a ZTNA platform, API/AI/MCP gateway, a PaaS, an ngrok-alternative and a homelab infras…

Go 2,428 73 Updated Oct 29, 2025
Cuda 9 Updated Mar 19, 2025

A C++20 library for fast serialization, deserialization and validation using reflection. Supports JSON, Avro, BSON, Cap'n Proto, CBOR, CSV, flexbuffers, msgpack, parquet, TOML, UBJSON, XML, YAML / …

C++ 1,602 143 Updated Oct 28, 2025
Next