- Middle of nowhere in the US
-
01:10
(UTC -06:00) - in/wenxuan-tan-065b48250
Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Stars
Expert Specialization MoE Solution based on CUTLASS
[EuroSys'25] Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
LongLive: Real-time Interactive Long Video Generation
[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention
Distributed MoE in a Single Kernel [NeurIPS '25]
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench
🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Text-audio foundation model from Boson AI
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc.
open-source coding LLM for software engineering tasks
Storing long contexts in tiny caches with self-study
ArcticInference: vLLM plugin for high-throughput, low-latency inference
ByteCheckpoint: An Unified Checkpointing Library for LFMs
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching