Highlights
Lists (11)
Sort Name ascending (A-Z)
Starred repositories
Mawaqit integration - salat time and nearest mosque - in Home Assistant
Accelerating MoE with IO and Tile-aware Optimizations
LM engine is a library for pretraining/finetuning LLMs
DeepEP: an efficient expert-parallel communication library
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
frozen-in-time version of our Paper Finder agent for reproducing evaluation results
Easily embed, cluster and semantically label text datasets
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
Distributed Compiler based on Triton for Parallel Systems
A debugging and profiling tool that can trace and visualize python code execution
A flexible and efficient training framework for large-scale alignment tasks
torchcomms: a modern PyTorch communications API
CSCS User Lab Day – Meet the Swiss National Supercomputing Centre
Post-training with Tinker
iperf3: A TCP, UDP, and SCTP network bandwidth measurement tool
A tool for bandwidth measurements on NVIDIA GPUs.
Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.
Analyze computation-communication overlap in V3/R1.
Pipeline Parallelism Emulation and Visualization
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
🔬 A fast, interactive web-based viewer for performance profiles.
verl: Volcano Engine Reinforcement Learning for LLMs
A powerful Python framework for writing and running portable regression tests and benchmarks for HPC systems.