[RFC]: DLPack C Function for Speed Exchange

This is a cross ref RFC on DLPack based exchange. As of now, DLPack exchange relies on python functions such as `tensor.__dlpack__()`. While they works well for common cases, the general overhead of such exchange is at the level of 0.2-0.3 us for very well optimized version, and can go up to 0.4-1 us for less optimized implementation.

For a function that takes three arguments f(a, b, c), assume we run DLPack exchange for each argument, the general conversion overhead usually gets to around 0.7us - 3us.

While such overhead can be acceptable in many settings, in GPU applications the extra 1-3us overhead can still be significant. For a kernel that takes 2us to finish, 0.7 us means 30% additional overhead in execution

Recently, we propose to develop a set of specific C functions to help DLPack based exchange for array libraries that works on C extensions, please see more context here

https://github.com/dmlc/dlpack/issues/175

In the context of array-api, it would be useful to help standardize the specific field for such speed exchange

- `mypackage.Tensor.__c_dlpack_from_pyobject__`
- `mypackage.Tensor.__c_dlpack_to_pyobject__`
- `mypackage.Tensor.__c_dlpack_tensor_allocator__`

Note that the proposed speed exchange function can be used in conjunction with the current DLPack exchange, to gracefully handle fallback cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: DLPack C Function for Speed Exchange #973

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: DLPack C Function for Speed Exchange #973

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions