Kimi K2.5: Visual Agentic Intelligence
Scaling LLM Training: How Parallel-Agent Reinforcement Learning Changes the Game
Logit Dynamics in Softmax Policy Gradient Methods
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Your Efficient RL Framework Secretly Brings You Off-Policy RL Training
Small Leak Can Sink a Great Ship—Boost RL Training on MoE with 𝑰𝒄𝒆𝑷𝒐𝒑!
When Speed Kills Stability: Demystifying RL Collapse from the Training-Inference Mismatch
Defeating the Training-Inference Mismatch via FP16
Defeating Nondeterminism in LLM Inference
Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers
KAT-Coder-V1 Pro 重磅升级,揭秘强化学习训练稳定性关键因素
从 tokenizer 视角来分析 Agentic 多轮训练的复杂性
Training-Inference Parity in MoE Models: Where Numerics Drift
A Comedy of Estimators: On KL Regularization in RL Training of LLMs
On a few pitfalls in KL divergence gradient estimation for RL
The Policy Gradient, Bias, and Variance of OPD
重探 On-Policy Distillation(OPD):三类典型失败以及修复路径
Last Iterate of SGD Converges (Even in Unbounded Domains)
Optimizing Large Language Model Training Using FP4 Quantization
NVFP4 Pretraining: From Theory to Implementation (Part 1)
NVFP4 Pretraining: Systems Optimizations (Part 2)
SGLang RL x slime: QAT INT4 全流程实现
Getting Memory-bound Kernels to Speed-of-Light
I spent 31 hours on the math behind TurboQuant so you don't have to
SpinQuant: LLM quantization with learned rotations
mHC: Manifold-Constrained Hyper-Connections
Back to Basics: Let Denoising Generative Models Denoise
Do Machine Learning Models Memorize or Generalize?
Group Sequence Policy Optimization
The 37 Implementation Details of Proximal Policy Optimization
How We Build Trillion Parameter Reasoning RL with 10% GPUs
Efficient Multi-Adapter LLM Serving via Cross-Model KV-Cache Reuse with Activated LoRA
Learn CUTLASS the hard way - part 2!
Getting Memory-bound Kernels to Speed-of-Light
Local Memory and Register Spilling
Heaps do lie: debugging a memory leak in vLLM
Pipeline Parallelism in SGLang: Scaling to Million-Token Contexts and Beyond
Accelerating PyTorch with CUDA Graphs