Enable handle from cublasCreate to be used in cublasLt calls#587
Enable handle from cublasCreate to be used in cublasLt calls#587vosen merged 16 commits intovosen:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR enables cublasHandle_t to be used interchangeably with cublasLtHandle_t in cublasLt function calls, implementing support for CUDA's specification that allows a regular cuBLAS handle to be passed to cuBLASLt functions.
Key changes:
- Moves BLAS handle types from
zluda_blasandzluda_blaslttozluda_commonasBlasHandleandBlasLtHandle - Uses
#[repr(transparent)]newtype wrappers around a sharedBlasHandlesstruct that containsOnceLockfields for bothrocblas_handleandhipblasLtHandle_t - Implements lazy initialization of
hipblasLtHandle_twhen acublasHandle_tis used in cublasLt functions
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| zluda_common/src/lib.rs | Adds handle module and updates from_cuda_object! macro to use $crate:: for better portability |
| zluda_common/src/handle.rs | New file defining BlasHandles struct with OnceLock fields and newtype wrappers BlasHandle/BlasLtHandle sharing the same COOKIE for interchangeability |
| zluda_blaslt/src/impl.rs | Replaces local Handle with BlasLtHandle, adds lazy initialization function hipblas_lt_init, and updates all functions to use the new handle type |
| zluda_blas/src/impl.rs | Replaces local Handle with BlasHandle, adds rocblas_unwrap helper, and updates all functions to use the new handle type |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
This is a good start, but there are subtle issues with this approach which need to be resolved.
- Most importantly, it's not clear what is the extent of the handle layout compatibility that we guarantee. PR comment says that cuBLAS handle can be passed to cuBLASLt. Does it work the other way around? The code here implies so, but if it does not, then the code probably can be simplified.
Anyway, this behavior is really non-obvious and it must be covered by a test (both cublas handle into cublaslt and cublaslt handle into cublas) - The memory layout of
BlasHandlesis wrong. Rust only guarantees ordering of types withrepr(C)andBlasHandleshas no such attribute, so it can legally be{ rocblas, hipblas_lt }in zluda_blas and{ hipblas_lt, rocblas }in zluda_blaslt. And just marking the typesrepr(C)will not resolve the issue , becauserepr(C)is not transitive:OnceLock<rocblas_handle>can have different size and layout between zluda_blas and zluda_blaslt. The solution would be indirection:Box<T>is guaranteed to have the same layout as*mut T, which transitively if T is Sized is the same asusize. If the cublaslt handle is not guaranteed to be cublas compatible then you could have your blas handle type be[repr(C)] { Box<BlasLtHandle>, ... <Blas fields> }and blas_lt handle be[repr(C)] { Box<BlasLtHandle> }and it's going to be guaranteed to have a compatible layout
|
A Even though it is not valid, a user could theoretically pass a |
|
I've changed the representation of the blas library handles to once again be separate. Now, |
|
I've updated this again to try to fix the memory layout issues. The hipblasLt handle is now managed by zluda_blaslt, which zluda_blas now has a runtime dependency on. |
# Conflicts: # Cargo.lock # zluda_blas/src/impl.rs # zluda_blas/src/lib.rs # zluda_blaslt/src/impl.rs # zluda_windows/src/lib.rs
A
cublasHandle_tmay be passed tocublasLtcalls in place of acublasLtHandle_t. This PR adds support for this functionality.This is implemented by moving the ZLUDA blas and blaslt handles to
zluda_common, asBlasHandleandBlasLtHandle. These are both newtype wrappers around theBlasHandlesstruct, using#[repr(transparent)]to ensure that they have the same in-memory representation asBlasHandles.ZludaObjectis then implemented for both types, with the sameCOOKIE, so that LiveCheck values of both types will be interchangeable.BlasHandlescontainsOnceLockfields for arocblas_handleand ahipblasLtHandle_t. These are initialized in their respectiveCreatefunctions. AllcublasLtfunctions that take a handle then also allow lazy initialization of ahipblasLtHandle_t.The other option would be to always initialize a
hipblasLtHandle_tinzluda_blas::impl::create. This is probably low overhead, but I was trying to avoid introducing a dependency onhipblasLtintozluda_blas.