Tags: ROCm/hipBLASLt
Tags
[hipblasLT] SK use Tree reduction if < 2 split (#3403) ## Motivation If origami gives us Parallel reduction and sk.grid/tiles==1 then we will launch post kernel call, this PR sets reduction strategy to Tree reduction in this case. ## Technical Details After calling getSKReduction, this does a check to see if sk.grid/tiles < 2, if so we ensure the reduction type is Tree reduction. Small change to origami intended to keep `select_grid` behaviour the same. ## Test Plan hipblaslt-test ran locally and passed, wait for CI, need to verify downstream issue is resolved. ## Test Result hipblaslt-test passed, awaiting other results. ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Jeffrey Novotny <[email protected]>
Revert bad logic - Low Offset Overflow (#2472) (#2647) ## Motivation In response to a ticket debug. The problem was the low 32-bits of the read address were being incremented when moving to the next tile, but the high 32-bits were not. This could cause a problem if the workspace buffer was allocated with an address close to the 32-bit boundary - it can create a scenario where incrementing to the next tile causes the low 32-bits to wrap to 0, and the carry was not being handled correctly so the read address would be out of bounds before the beginning of the buffer. Reverting bad logic from ROCm/rocm-libraries#1080 ## Motivation <!-- Explain the purpose of this PR and the goals it aims to achieve. --> ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: mahmoodw <[email protected]> Co-authored-by: mahmoodw <[email protected]>
Fix StreamK ExtraIters Bug (#1933) (#2008) ## Motivation This PR fixes a bug in StreamK extraIters calculations + Improving naming conventions for the parallel reduction path. ## Technical Details Fixes bug in extraIters calculation that would cause incorrect results. ## Test Plan Passed all CI tests. ## Test Result ## Submission Checklist - [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: Ali Yazdani <[email protected]> Co-authored-by: Val Movsik <[email protected]>
[rocm-libraries] ROCm/rocm-libraries#1753 (commit 0a25de4) Cherry-Pick StreamK Changes to rocm 7.0 ## Motivation Some StreamK features/improvements are needed. ## Technical Details This PR avoids multiple potential overflows in StreamK math. ## Test Plan Locally on GFX950 and CI ## Test Result [----------] Global test environment tear-down [==========] 19997 tests from 12 test suites ran. (1601396 ms total) [ PASSED ] 19997 tests. hipBLASLt version: 100000 hipBLASLt git version: 20250912-42-17-gb1537e7cb6-dirty command line: ./hipblaslt-test ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
[rocm-libraries] ROCm/rocm-libraries#1233 (commit 976b9c4) Origami lib for F8BS_TN_SABV (#521) This PR adds library for F8BS_TN with row-wise scaling (SABV). These changes have been reviewed and validated, passed CI.
[rocm-libraries] ROCm/rocm-libraries#1233 (commit 976b9c4) Origami lib for F8BS_TN_SABV (#521) This PR adds library for F8BS_TN with row-wise scaling (SABV). These changes have been reviewed and validated, passed CI.
PreviousNext