-
AWS AI
- Santa Clara, CA
- https://ydtydr.github.io/
Stars
4
stars
written in C++
Clear filter
FlashMLA: Efficient Multi-head Latent Attention Kernels
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)