Stars
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Elana: A Simple Energy & Latency Analyzer for LLMs
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
Static suckless single batch CUDA-only qwen3-0.6B mini inference engine
Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Su…
🤗A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
A paper list of some recent works about Token Compress for Vit and VLM
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
Use your Mac trackpad as a weighing scale
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Technical report of Kimina-Prover Preview.
Allow torch tensor memory to be released and resumed later