Triton crash course prepared for beginners.
- Introduction to GPU architecture
- Write a simple CUDA kernel: softmax
- Introduction to Triton and Triton softmax kernel
- Tensor Core and Triton matrix multiplication
- Debugging kernels using NVIDIA NCU
- Flash-Attention algorithm
- Triton Flash-Attention kernels (fwd & bwd)
- Triton kernel examples #1
- Triton kernel examples #2