Stars
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
SGLang is a high-performance serving framework for large language models and multimodal models.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashMLA: Efficient Multi-head Latent Attention Kernels
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
blastrock / pkgj
Forked from mmozeiko/pkgipkg download & installation directly on Vita
GLake: optimizing GPU memory management and IO transmission.
An Advanced Launcher for miHoYo/HoYoverse Games
Source code for the X Recommendation Algorithm
OpenPPL / CuAssembler
Forked from cloudcores/CuAssemblerAn unofficial cuda assembler, for all generations of SASS, hopefully :)
Implementations of SIMD instruction sets for systems which don't natively support them.
This is a demo how to write a high performance convolution run on apple silicon