Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[RFC]: DLPack C Function for Speed Exchange #973

@tqchen

Description

@tqchen

This is a cross ref RFC on DLPack based exchange. As of now, DLPack exchange relies on python functions such as tensor.__dlpack__(). While they works well for common cases, the general overhead of such exchange is at the level of 0.2-0.3 us for very well optimized version, and can go up to 0.4-1 us for less optimized implementation.

For a function that takes three arguments f(a, b, c), assume we run DLPack exchange for each argument, the general conversion overhead usually gets to around 0.7us - 3us.

While such overhead can be acceptable in many settings, in GPU applications the extra 1-3us overhead can still be significant. For a kernel that takes 2us to finish, 0.7 us means 30% additional overhead in execution

Recently, we propose to develop a set of specific C functions to help DLPack based exchange for array libraries that works on C extensions, please see more context here

dmlc/dlpack#175

In the context of array-api, it would be useful to help standardize the specific field for such speed exchange

  • mypackage.Tensor.__c_dlpack_from_pyobject__
  • mypackage.Tensor.__c_dlpack_to_pyobject__
  • mypackage.Tensor.__c_dlpack_tensor_allocator__

Note that the proposed speed exchange function can be used in conjunction with the current DLPack exchange, to gracefully handle fallback cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API extensionAdds new functions or objects to the API.Needs DiscussionNeeds further discussion.RFCRequest for comments. Feature requests and proposed changes.topic: DLPackDLPack.topic: Device HandlingDevice handling.

    Type

    No type

    Projects

    Status

    Stage 0

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions