-
Notifications
You must be signed in to change notification settings - Fork 0
UPSTREAM PR #17156: HIP: WMMA-MMQ kernels for RDNA 4 #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
UPSTREAM PR #17156: HIP: WMMA-MMQ kernels for RDNA 4 #163
Conversation
|
Access the complete analysis in the LOCI Dashboard Performance Analysis Summary: WMMA-MMQ Kernels for RDNA 4OverviewAnalysis of PR #163 shows minimal performance impact on CPU inference paths, with changes focused on GPU acceleration for AMD RDNA 4 architecture. The highest measured performance changes are within statistical noise levels and unrelated to the core GPU optimizations introduced. Key FindingsPerformance Metrics:
Power Consumption Analysis:
Code Analysis:
Impact Assessment: Actionable Recommendations:
The analysis confirms this is a targeted GPU optimization with no CPU performance regressions and proper architectural isolation between GPU and CPU code paths. |
2 similar comments
|
Access the complete analysis in the LOCI Dashboard Performance Analysis Summary: WMMA-MMQ Kernels for RDNA 4OverviewAnalysis of PR #163 shows minimal performance impact on CPU inference paths, with changes focused on GPU acceleration for AMD RDNA 4 architecture. The highest measured performance changes are within statistical noise levels and unrelated to the core GPU optimizations introduced. Key FindingsPerformance Metrics:
Power Consumption Analysis:
Code Analysis:
Impact Assessment: Actionable Recommendations:
The analysis confirms this is a targeted GPU optimization with no CPU performance regressions and proper architectural isolation between GPU and CPU code paths. |
|
Access the complete analysis in the LOCI Dashboard Performance Analysis Summary: WMMA-MMQ Kernels for RDNA 4OverviewAnalysis of PR #163 shows minimal performance impact on CPU inference paths, with changes focused on GPU acceleration for AMD RDNA 4 architecture. The highest measured performance changes are within statistical noise levels and unrelated to the core GPU optimizations introduced. Key FindingsPerformance Metrics:
Power Consumption Analysis:
Code Analysis:
Impact Assessment: Actionable Recommendations:
The analysis confirms this is a targeted GPU optimization with no CPU performance regressions and proper architectural isolation between GPU and CPU code paths. |
|
Access the complete analysis in the LOCI Dashboard Performance Analysis SummaryOverviewAnalysis of version Key FindingsPerformance Metrics:
Core Function Impact: Power Consumption Analysis:
Flame Graph and CFG Analysis: Code Review Insights: Conclusion: |
a87918f to
6f7320f
Compare
9ea0205 to
1308d3f
Compare
Mirrored from ggml-org/llama.cpp#17156
Enabled WMMA-MMQ kernels for RDNA 4 architecture on AMD GPUs
Following similar approach to ggml-org/llama.cpp#14624
Using ./build/bin/llama-bench to collect the following performance results
Performance results with ggml/llama.cpp master commit up to/includes 5b180c3
Build command for the following performance results:
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" cmake -S . -B build -DGGML_HIP=ON -DGGML_CUDA_FORCE_MMQ=OFF -DGGML_HIP_UMA=OFF -DGGML_HIP_ROCWMMA_FATTN=OFF -DGPU_TARGETS="gfx1201" -DGGML_HIP_GRAPHS=OFF -DLLAMA_CURL=OFF -DGGML_CUDA_FORCE_CUBLAS=OFF -DCMAKE_BUILD_TYPE=Release && cmake --build build --config Release -- -j 32
Build command for the following performance results:
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" cmake -S . -B build -DGGML_HIP=ON -DGGML_HIP_UMA=OFF -DGGML_HIP_ROCWMMA_FATTN=ON -DGPU_TARGETS=gfx1201 -DGGML_HIP_GRAPHS=OFF -DLLAMA_CURL=OFF -DGGML_CUDA_FORCE_CUBLAS=OFF -DCMAKE_BUILD_TYPE=Release && cmake --build build --config Release -- -j 32