-
Lambda, Inc.
Stars
slime is an LLM post-training framework for RL Scaling.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performanceโฆ
A better wrapper for using RDMA programming APIs in Rust flavor
torchcomms: a modern PyTorch communications API
A PyTorch native platform for training generative AI models
Tile primitives for speedy kernels
FlashInfer: Kernel Library for LLM Serving
An extremely fast Python package and project manager, written in Rust.
An extensible, state of the art columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.
pizlonator / fil-c
Forked from llvm/llvm-projectFil-C: completely compatible memory safety for C and C++
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
verl: Volcano Engine Reinforcement Learning for LLMs
SkyRL: A Modular Full-stack RL Library for LLMs
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A lightweight, local-first, and ๐ experiment tracking library from Hugging Face ๐ค
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a high-performance serving framework for large language models and multimodal models.
Model Compression Toolbox for Large Language Models and Diffusion Models
DeepEP: an efficient expert-parallel communication library
Deep learning in Rust, with shape checked tensors and neural networks
[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models
LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!