| Algorithms | Variants |
|---|---|
| Random | bernoulli normal uniform |
| Quantization | symmetric per-block per-tensor q2 q4 q8 fp4 |
| Reduction | mean sum prod max min arg[max|min] per-cube per-plane |
| Matmul | mma unit tma multi-stage specialization ordered multi-rows |
| Convolution | mma unit tma multi-stage im2col |
| Attention | mma unit multi-rows |
If you want to contribute new kernels, please read the GUIDE.md.