Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View chelsea0x3b's full-sized avatar

Sponsors

@lwwmanning
@TSYCapital
@skinner

Block or report chelsea0x3b

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
Showing results

slime is an LLM post-training framework for RL Scaling.

Python 4,398 573 Updated Feb 26, 2026

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performanceโ€ฆ

Python 3,174 645 Updated Feb 26, 2026

A better wrapper for using RDMA programming APIs in Rust flavor

Rust 77 6 Updated Feb 10, 2026

Fast and Furious AMD Kernels

C++ 368 55 Updated Feb 25, 2026

torchcomms: a modern PyTorch communications API

C++ 341 95 Updated Feb 26, 2026

A PyTorch native platform for training generative AI models

Python 5,090 718 Updated Feb 26, 2026

Tile primitives for speedy kernels

Cuda 3,185 244 Updated Feb 24, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,037 740 Updated Feb 26, 2026

kernels, of the mega variety

Python 681 44 Updated Jan 29, 2026

An extremely fast Python package and project manager, written in Rust.

Rust 79,841 2,602 Updated Feb 26, 2026

An extensible, state of the art columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.

Rust 2,737 135 Updated Feb 26, 2026

Fil-C: completely compatible memory safety for C and C++

2,974 59 Updated Feb 25, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,191 826 Updated Feb 25, 2026

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,181 357 Updated Jan 17, 2026

verl: Volcano Engine Reinforcement Learning for LLMs

Python 19,382 3,291 Updated Feb 26, 2026

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,627 262 Updated Feb 26, 2026

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,321 1,700 Updated Feb 26, 2026

A Quirky Assortment of CuTe Kernels

Python 815 80 Updated Feb 25, 2026

A lightweight, local-first, and ๐Ÿ†“ experiment tracking library from Hugging Face ๐Ÿค—

Python 1,271 99 Updated Feb 25, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 71,223 13,704 Updated Feb 26, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 23,745 4,556 Updated Feb 26, 2026

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 759 86 Updated Aug 14, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,999 1,107 Updated Feb 9, 2026

learn from your favorite tech companies

TypeScript 165 16 Updated Feb 9, 2026
Rust 12 Updated Jul 25, 2024

Deep learning in Rust, with shape checked tensors and neural networks

Rust 1,895 105 Updated Jul 23, 2024

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models

Rust 6,150 373 Updated Jun 24, 2024
Rust 1 Updated May 14, 2023

LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!

Rust 111 6 Updated Jul 27, 2023
Rust 97 18 Updated Nov 14, 2025
Next