Highlights
- Pro
Starred repositories
LinkedIn Profile Data Extraction Tool extracts the data from the LinkedIn Profile once tries to access or review
A curated list of awesome skills, hooks, slash-commands, agent orchestrators, applications, and plugins for Claude Code by Anthropic
FlashMLA: Efficient Multi-head Latent Attention Kernels
Wan: Open and Advanced Large-Scale Video Generative Models
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Easiest and laziest way for building multi-agent LLMs applications.
What would you do with 1000 H100s...
Achieve state of the art inference performance with modern accelerators on Kubernetes
Best practices & guides on how to write distributed pytorch training code
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
CUDA Templates and Python DSLs for High-Performance Linear Algebra
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
Fast CUDA matrix multiplication from scratch
Kimi K2 is the large language model series developed by Moonshot AI team
High-Resolution 3D Human Digitization from A Single Image.
End-to-End Object Detection with Transformers
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Implementation of Stable Diffusion with PyTorch
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
Witness the aha moment of VLM with less than $3.
Supercharge Your LLM with the Fastest KV Cache Layer