Master Quality Authenticated codec reverse engineering, Tool to identify MQA encoding and Master's Sample Rate
-
Updated
Apr 8, 2023 - C++
Master Quality Authenticated codec reverse engineering, Tool to identify MQA encoding and Master's Sample Rate
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
Add a description, image, and links to the mqa topic page so that developers can more easily learn about it.
To associate your repository with the mqa topic, visit your repo's landing page and select "manage topics."