-
Notifications
You must be signed in to change notification settings - Fork 52
Description
This is a cross ref RFC on DLPack based exchange. As of now, DLPack exchange relies on python functions such as tensor.__dlpack__()
. While they works well for common cases, the general overhead of such exchange is at the level of 0.2-0.3 us for very well optimized version, and can go up to 0.4-1 us for less optimized implementation.
For a function that takes three arguments f(a, b, c), assume we run DLPack exchange for each argument, the general conversion overhead usually gets to around 0.7us - 3us.
While such overhead can be acceptable in many settings, in GPU applications the extra 1-3us overhead can still be significant. For a kernel that takes 2us to finish, 0.7 us means 30% additional overhead in execution
Recently, we propose to develop a set of specific C functions to help DLPack based exchange for array libraries that works on C extensions, please see more context here
In the context of array-api, it would be useful to help standardize the specific field for such speed exchange
mypackage.Tensor.__c_dlpack_from_pyobject__
mypackage.Tensor.__c_dlpack_to_pyobject__
mypackage.Tensor.__c_dlpack_tensor_allocator__
Note that the proposed speed exchange function can be used in conjunction with the current DLPack exchange, to gracefully handle fallback cases.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status