-
Tianjin University
- Tianjin, China
-
16:31
(UTC +08:00)
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
Fast and memory-efficient exact attention
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus Agent Tools, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae…
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
slime is an LLM post-training framework for RL Scaling.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
🦜🔗 Build context-aware reasoning applications
2021年最新总结,推荐工程师合适读本,计算机科学,软件技术,创业,思想类,数学类,人物传记书籍
Large Language Model (LLM) Systems Paper List
A framework for generating realistic LLM serving workloads
Awesome LLMs on Device: A Comprehensive Survey
Kimi K2 is the large language model series developed by Moonshot AI team
A collection of prompts, system prompts and LLM instructions
CS-BAOYAN / CSLabInfo2025
Forked from CS-BAOYAN/CSLabInfo2024关于2025年CS保研实验室/导师招生广告的汇总。欢迎想要打广告的小伙伴积极PR,资瓷一下互联网精神吼不吼啊?
Tile primitives for speedy kernels
verl: Volcano Engine Reinforcement Learning for LLMs
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Universal LLM Deployment Engine with ML Compilation
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
A Datacenter Scale Distributed Inference Serving Framework