-
Intel
- Shanghai
-
19:13
(UTC +08:00)
Lists (1)
Sort Name ascending (A-Z)
Stars
llm-d benchmark scripts and tooling
intel / sycl-tla
Forked from NVIDIA/cutlassSYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs
Distributed KV cache scheduling & offloading libraries
Accessible large language models via k-bit quantization for PyTorch.
AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.
Achieve state of the art inference performance with modern accelerators on Kubernetes
Offline optimization of your disaggregated Dynamo graph
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
📰 Must-read papers and blogs on Speculative Decoding ⚡️
DeepEP: an efficient expert-parallel communication library
HabanaAI / vllm-fork
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
A Datacenter Scale Distributed Inference Serving Framework
The simplest, fastest repository for training/finetuning small-sized VLMs.
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
Real time interactive streaming digital human
Open Source framework for voice and multimodal conversational AI
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
FlashInfer: Kernel Library for LLM Serving
Fast inference from large lauguage models via speculative decoding
An Application Framework for AI Engineering
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning