xysmlx

Lingxiao Ma xysmlx

Working on systems for machine learning.

Microsoft Research Asia
Beijing, China
https://xysmlx.github.io

Stars

NVIDIA / cuda-tile

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 249 19 Updated Dec 20, 2025

deepseek-ai / LPLB

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 471 27 Updated Nov 19, 2025

XiaomiMiMo / MiMo-V2-Flash

MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation Model

769 25 Updated Dec 17, 2025

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,642 83 Updated Dec 20, 2025

deepseek-ai / DeepSeek-Math-V2

Python 1,495 119 Updated Dec 1, 2025

tile-ai / TileRT

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 453 19 Updated Dec 8, 2025

lemyx / tilelang-dsa

DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang

Python 38 2 Updated Nov 19, 2025

MeshInfra / WaferLLM

WaferLLM: Large Language Model Inference at Wafer Scale

Python 77 11 Updated Oct 31, 2025

tile-ai / BitBLAS

ReDesign of https://github.com/microsoft/BitBLAS

Python 8 1 Updated Dec 14, 2025

HKUDS / AI-Trader

"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai Tech Report Link: https://arxiv.org/abs/2512.10971

Python 10,204 1,622 Updated Dec 19, 2025

RLinf / RLinf

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.

Python 1,768 168 Updated Dec 21, 2025

flashinfer-ai / flashinfer-bench

Building the Virtuous Cycle for AI-driven LLM Systems

Python 98 15 Updated Dec 16, 2025

deepseek-ai / DeepSeek-V3.2-Exp

Python 1,377 112 Updated Nov 18, 2025

InferenceMAX / InferenceMAX

Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS

Python 400 67 Updated Dec 21, 2025

amd / Quark

Python 103 15 Updated Sep 26, 2025

flashinfer-ai / cubloaty

a size profiler for cuda binary

Python 69 Updated Oct 7, 2025

tile-ai / tilelang-ascend

Ascend TileLang adapter

C++ 165 46 Updated Dec 20, 2025

apache / tvm-ffi

Open ABI and FFI for Machine Learning Systems

C++ 257 43 Updated Dec 20, 2025

NVIDIA / nvshmem

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 420 48 Updated Dec 20, 2025