Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View Spycsh's full-sized avatar
🧨
🧨
  • Intel
  • Shanghai
  • 14:09 (UTC +08:00)

Block or report Spycsh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
92 results for source starred repositories
Clear filter

helm charts for deploying models with llm-d

Go Template 28 49 Updated Feb 23, 2026

llm-d benchmark scripts and tooling

Python 47 52 Updated Feb 23, 2026

Inference scheduler for llm-d

Go 131 128 Updated Feb 23, 2026

Distributed KV cache scheduling & offloading libraries

Go 103 89 Updated Feb 24, 2026

Accessible large language models via k-bit quantization for PyTorch.

Python 7,972 825 Updated Feb 23, 2026

AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.

Python 143 38 Updated Feb 24, 2026

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,522 326 Updated Feb 24, 2026

Offline optimization of your disaggregated Dynamo graph

Python 192 64 Updated Feb 24, 2026

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 2,180 369 Updated Feb 24, 2026
Python 77 75 Updated Feb 24, 2026

The vLLM XPU kernels for Intel GPU

C++ 22 27 Updated Feb 11, 2026

📰 Must-read papers and blogs on Speculative Decoding ⚡️

1,126 63 Updated Jan 24, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 8,993 1,104 Updated Feb 9, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,125 875 Updated Feb 24, 2026

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 4,666 468 Updated Oct 27, 2025

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,541 1,215 Updated Feb 24, 2026

Real time interactive streaming digital human

Python 7,141 1,125 Updated Feb 11, 2026

Open Source framework for voice and multimodal conversational AI

Python 10,423 1,752 Updated Feb 24, 2026

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,617 655 Updated Feb 24, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,013 733 Updated Feb 24, 2026

Fast inference from large lauguage models via speculative decoding

Python 887 97 Updated Aug 22, 2024

An Application Framework for AI Engineering

Java 7,946 2,301 Updated Feb 23, 2026
C++ 24 20 Updated Oct 9, 2025

[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Python 4,488 529 Updated Feb 23, 2026

[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning

Python 2,127 81 Updated Dec 12, 2025

一个超轻量级、可以在移动端实时运行的数字人模型

Python 2,421 347 Updated Sep 18, 2025

每个人都能用的数字人

Python 1,854 393 Updated Nov 8, 2025

Development repository for the Triton language and compiler

MLIR 18,469 2,608 Updated Feb 24, 2026
Next