-
Microsoft Research Asia
- Beijing, China
- https://xysmlx.github.io
Stars
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
An early research stage expert-parallel load balancer for MoE models based on linear programming.
MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation Model
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang
WaferLLM: Large Language Model Inference at Wafer Scale
"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai Tech Report Link: https://arxiv.org/abs/2512.10971
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.
Building the Virtuous Cycle for AI-driven LLM Systems
Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
A verification tool for ensuring parallelization equivalence in distributed model training.
FlashInfer: Kernel Library for LLM Serving
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
tile-ai / tilescale
Forked from tile-ai/tilelangTile-based language built for AI computation across all scales
Kimi K2 is the large language model series developed by Moonshot AI team