Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View andy-yang-1's full-sized avatar

Highlights

  • Pro

Block or report andy-yang-1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A benchmark for evaluating LLMs on open-ended CS problems. Exploring the Next Frontier of Computer Science.

Python 84 21 Updated Jan 19, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 547 44 Updated Jan 14, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,228 218 Updated Jan 18, 2026

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding

Python 79 2 Updated Dec 2, 2025

Real-Time VLAs via Future-state-aware Asynchronous Inference.

Python 287 14 Updated Jan 12, 2026

Miles is an enterprise-facing reinforcement learning framework for large-scale MoE post-training and production workloads, forked from and co-evolving with slime.

Python 738 77 Updated Jan 18, 2026

RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics, an automated framework, and a live leaderboard.

Python 59 7 Updated Jan 16, 2026

StreamDiffusion, Live Stream APP

Python 320 28 Updated Dec 25, 2025

LM engine is a library for pretraining/finetuning LLMs

Python 110 24 Updated Jan 12, 2026

A bunch of kernels that might make stuff slower 😉

Python 75 9 Updated Jan 19, 2026

[NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning

Python 85 Updated Nov 29, 2025

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 247 15 Updated Jan 17, 2026

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Python 838 54 Updated Oct 15, 2025

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

177 7 Updated Oct 5, 2025

DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space

Python 334 9 Updated Oct 5, 2025

Fast and memory-efficient exact kmeans

Python 135 10 Updated Nov 11, 2025

This project aims to collect the latest "call for reviewers" links from various top CS/ML/AI conferences/journals

1,066 45 Updated Dec 11, 2025

Sequence Parallel For Sparse VideoGen

Python 1 Updated Sep 8, 2025
Python 721 47 Updated Nov 30, 2025

A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention

273 5 Updated Dec 1, 2025

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 9,010 787 Updated Jan 18, 2026

[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation

Python 574 32 Updated Nov 11, 2025

kernels, of the mega variety

Python 649 40 Updated Sep 28, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 753 81 Updated Jan 10, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,176 114 Updated Jan 18, 2026

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 908 81 Updated Dec 31, 2025

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,462 222 Updated Jan 19, 2026

MAGI-1: Autoregressive Video Generation at Scale

Python 3,628 230 Updated Jun 17, 2025

Lets make video diffusion practical!

Python 16,528 1,624 Updated Oct 16, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,316 116 Updated Dec 27, 2025
Next