Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View ziyuhuang123's full-sized avatar

Block or report ziyuhuang123

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A lightweight design for computation-communication overlap.

Cuda 183 8 Updated Oct 10, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,690 973 Updated Nov 4, 2025
Python 13 Updated Jan 14, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,161 82 Updated Aug 28, 2025

ScheMoE

Python 8 2 Updated Jan 13, 2025
Python 1 Updated Sep 19, 2025

kernels, of the mega variety

Python 597 26 Updated Sep 28, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,364 241 Updated Nov 3, 2025

[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive

Cuda 47 2 Updated Sep 24, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,842 297 Updated Nov 5, 2025

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,667 320 Updated Oct 19, 2024

Enhanced compiler frontend. Support Auto Compute + Auto Schedule + Auto Tensorize for tensor compilers.

C 6 1 Updated Dec 19, 2022
Jupyter Notebook 18 3 Updated Jan 24, 2024

Distributed Compiler based on Triton for Parallel Systems

Python 1,211 104 Updated Oct 17, 2025

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA (+ more DSLs)

Python 643 79 Updated Nov 4, 2025

Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.

HTML 15,754 3,530 Updated Nov 3, 2025

傻瓜式教程——如何使用Clash翻墙

163 8 Updated Sep 26, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 152,089 31,040 Updated Nov 4, 2025

⏰ Collaboratively track worldwide conference deadlines (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Rust 8,114 545 Updated Nov 5, 2025

Fastest kernels written from scratch

Cuda 384 51 Updated Sep 18, 2025
C++ 9 2 Updated Oct 30, 2024

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 442 47 Updated May 14, 2025
Cuda 18 8 Updated Mar 12, 2025

GEMM by WMMA (tensor core)

Cuda 14 9 Updated Jul 31, 2022

Inference code for Llama models

Python 58,899 9,812 Updated Jan 26, 2025

We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstra…

C++ 186 11 Updated Jan 28, 2025

Some microbenchmark practices

Cuda 1 Updated Apr 28, 2023

collection of benchmarks to measure basic GPU capabilities

C++ 445 67 Updated Oct 24, 2025
Next