-
AWS AI
- Santa Clara, CA
- https://ydtydr.github.io/
Stars
Autocomp: AI Code Optimizer for Tensor Accelerators
GPT-Prompt-Hub is an open-source community-driven repository dedicated to the collection, sharing, and refinement of custom GPT prompts
verl: Volcano Engine Reinforcement Learning for LLMs
ROCm / Megatron-LM
Forked from NVIDIA/Megatron-LMOngoing research training transformer models at scale
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as Transformer Engine, Megatron-LM, and NeMo.
Kimi K2 is the large language model series developed by Moonshot AI team
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Official implementation for Training LLMs with MXFP4
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Minimalistic large language model 3D-parallelism training
Video+code lecture on building nanoGPT from scratch
FlashMLA: Efficient Multi-head Latent Attention Kernels
Official inference framework for 1-bit LLMs
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
The best OSS video generation models, created by Genmo
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, or on-prem).
real time face swap and one-click video deepfake with only a single image