Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View Edenzzzz's full-sized avatar
🏀
🏀

Highlights

  • Pro

Block or report Edenzzzz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Expert Specialization MoE Solution based on CUTLASS

Cuda 16 Updated Nov 6, 2025
Python 3 Updated Nov 3, 2025

[EuroSys'25] Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization

Python 21 5 Updated Aug 6, 2025
Python 35 3 Updated Nov 6, 2025

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 135 8 Updated Oct 22, 2025

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 281 50 Updated Nov 11, 2025

LongLive: Real-time Interactive Long Video Generation

Python 814 51 Updated Nov 3, 2025
Python 79 1 Updated Oct 7, 2025

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Python 563 29 Updated Oct 5, 2025

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 91 12 Updated Sep 30, 2025

[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive

Cuda 48 2 Updated Sep 24, 2025

🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench

Python 98 3 Updated Nov 10, 2025

🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.

JavaScript 22,020 2,214 Updated Oct 17, 2025

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 142 11 Updated Sep 18, 2025

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

Python 32,375 2,198 Updated Nov 3, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,168 1,918 Updated Nov 1, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,763 1,523 Updated Nov 10, 2025

Text-audio foundation model from Boson AI

Python 7,617 564 Updated Sep 15, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 462 106 Updated Nov 12, 2025
Python 147 14 Updated Dec 27, 2024

A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc.

46 4 Updated Oct 27, 2025

open-source coding LLM for software engineering tasks

Python 1,039 120 Updated Sep 30, 2025

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,377 1,364 Updated Jul 9, 2025

Storing long contexts in tiny caches with self-study

Python 214 21 Updated Oct 17, 2025

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python 299 38 Updated Nov 11, 2025

kernels, of the mega variety

Python 599 27 Updated Sep 28, 2025

ByteCheckpoint: An Unified Checkpointing Library for LFMs

Python 252 17 Updated Jul 10, 2025

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

Python 387 38 Updated Jul 5, 2025
Next