Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View xysmlx's full-sized avatar

Block or report xysmlx

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 249 19 Updated Dec 20, 2025

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 471 27 Updated Nov 19, 2025

MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation Model

769 25 Updated Dec 17, 2025

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,642 83 Updated Dec 20, 2025

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 453 19 Updated Dec 8, 2025

DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang

Python 38 2 Updated Nov 19, 2025

WaferLLM: Large Language Model Inference at Wafer Scale

Python 77 11 Updated Oct 31, 2025

ReDesign of https://github.com/microsoft/BitBLAS

Python 8 1 Updated Dec 14, 2025

"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai Tech Report Link: https://arxiv.org/abs/2512.10971

Python 10,204 1,622 Updated Dec 19, 2025

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.

Python 1,768 168 Updated Dec 21, 2025

Building the Virtuous Cycle for AI-driven LLM Systems

Python 98 15 Updated Dec 16, 2025

Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS

Python 400 67 Updated Dec 21, 2025
Python 103 15 Updated Sep 26, 2025

a size profiler for cuda binary

Python 69 Updated Oct 7, 2025

Ascend TileLang adapter

C++ 165 46 Updated Dec 20, 2025

Open ABI and FFI for Machine Learning Systems

C++ 257 43 Updated Dec 20, 2025

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 420 48 Updated Dec 20, 2025
Python 23 5 Updated May 9, 2025

A verification tool for ensuring parallelization equivalence in distributed model training.

Python 14 1 Updated Sep 1, 2025
C++ 31 2 Updated Jul 2, 2025
Python 1,368 120 Updated Sep 12, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,317 610 Updated Dec 21, 2025

Open-Source Frontier Voice AI

Python 18,797 2,079 Updated Dec 17, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,444 1,995 Updated Nov 1, 2025

Tile primitives for speedy kernels

Cuda 3,008 217 Updated Dec 9, 2025

Tile-based language built for AI computation across all scales

C++ 102 2 Updated Dec 20, 2025

Kimi K2 is the large language model series developed by Moonshot AI team

9,743 705 Updated Nov 7, 2025
Next