-
Intel
- Shanghai
-
14:09
(UTC +08:00)
Lists (1)
Sort Name ascending (A-Z)
Stars
helm charts for deploying models with llm-d
Distributed KV cache scheduling & offloading libraries
Accessible large language models via k-bit quantization for PyTorch.
AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.
Achieve state of the art inference performance with modern accelerators on Kubernetes
Offline optimization of your disaggregated Dynamo graph
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
📰 Must-read papers and blogs on Speculative Decoding ⚡️
DeepEP: an efficient expert-parallel communication library
A Datacenter Scale Distributed Inference Serving Framework
The simplest, fastest repository for training/finetuning small-sized VLMs.
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
Real time interactive streaming digital human
Open Source framework for voice and multimodal conversational AI
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
FlashInfer: Kernel Library for LLM Serving
Fast inference from large lauguage models via speculative decoding
An Application Framework for AI Engineering
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
Development repository for the Triton language and compiler