Welcome to the Hexagon-MLIR tutorials! These hands-on examples will guide you through the process of writing, compiling, and executing Triton kernels and PyTorch models on Qualcomm Hexagon NPUs.
π Start with Triton Tutorials
π Start with PyTorch Tutorials
These tutorials demonstrate how to leverage Qualcomm's Hexagon NPU targets for AI workloads. You'll discover how to:
- Write Triton Kernels: Create kernels that run efficiently on Qualcomm Hexagon NPUs
- Understand the Compilation Pipeline: Follow your code from Python through multiple IR transformations to optimized machine code
- Optimize Performance: Leverage specific features like multi-threading, vector processing, and memory hierarchy optimization
- Debug and Profile: Use built-in tools to analyze and improve your kernel performance
- Use PyTorch Flow: Take PyTorch models and compile and execute in our flow
- Understand the Compilation Pipeline: Follow your code from Python through multiple IR transformations to optimized machine code
Before diving into the tutorials, make sure you have:
- β Hexagon-MLIR framework installed (Installation Guide)
- β Python environment with required dependencies
- β Access to Hexagon hardware or simulator
- β Basic understanding of Python and tensor operations