Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View Spycsh's full-sized avatar
  • Intel
  • Shanghai
  • 19:13 (UTC +08:00)

Block or report Spycsh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

llm-d benchmark scripts and tooling

Jupyter Notebook 41 43 Updated Jan 9, 2026

SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs

C++ 62 73 Updated Jan 7, 2026

Inference scheduler for llm-d

Go 117 113 Updated Jan 11, 2026

Distributed KV cache scheduling & offloading libraries

Go 94 74 Updated Jan 12, 2026

Accessible large language models via k-bit quantization for PyTorch.

Python 7,882 812 Updated Jan 8, 2026

AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.

Python 85 18 Updated Jan 10, 2026

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,337 289 Updated Jan 12, 2026

Offline optimization of your disaggregated Dynamo graph

Python 146 49 Updated Jan 12, 2026

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 2,099 351 Updated Jan 7, 2026
C++ 72 66 Updated Jan 12, 2026

The vLLM XPU kernels for Intel GPU

C++ 18 18 Updated Jan 9, 2026

📰 Must-read papers and blogs on Speculative Decoding ⚡️

1,072 56 Updated Dec 22, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,876 1,055 Updated Dec 29, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 85 135 Updated Jan 9, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,761 774 Updated Jan 12, 2026

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 4,514 440 Updated Oct 27, 2025

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,334 1,201 Updated Jan 12, 2026

Real time interactive streaming digital human

Python 7,005 1,088 Updated Jan 1, 2026

Open Source framework for voice and multimodal conversational AI

Python 9,783 1,611 Updated Jan 11, 2026

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,511 646 Updated Jan 12, 2026

FlashInfer: Kernel Library for LLM Serving

Python 4,602 640 Updated Jan 12, 2026

Fast inference from large lauguage models via speculative decoding

Python 879 94 Updated Aug 22, 2024

An Application Framework for AI Engineering

Java 7,645 2,198 Updated Jan 12, 2026
C++ 24 20 Updated Oct 9, 2025

[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Python 4,447 524 Updated Aug 11, 2025

[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning

Python 2,111 82 Updated Dec 12, 2025

一个超轻量级、可以在移动端实时运行的数字人模型

Python 2,383 342 Updated Sep 18, 2025

每个人都能用的数字人

Python 1,815 381 Updated Nov 8, 2025
Next