Open
Description
This is an early issue to publicly discuss the possibility (or not) to use the Array API (see #22352) for k-means and make it run on GPUs using PyTorch in particular.
@fcharras has already started to run some promising experiments using the raw PyTorch API. Maybe you could link to a gist with your code?
Unfortunately, the current state of the Array API is likely too limiting because AFAIK it does not yet expose the equivalent of torch.cdist
, torch.expand
and torch.scatter_add_
.
The purpose of this issue is to precisely identify what is blocking us with the current state of Array API and discuss potential solutions:
- use this use case to report to the Array API standardization committee what are our needs to make the spec evolve and benefit everybody;
- alternatively, explore the use of multi-dispatch system such as uarray that is being adopted in scipy to make it possible to maintain a pytorch-specific optimized code path as an alternative to a slower yet generic Array API code path and numpy-optimized code path that would rely on our current Cython code,
- decide that the estimator-level engine API proposed in [DRAFT] Engine plugin API and engine entry point for Lloyd's KMeans #25535 is the only sane way to make this estimator GPU (which I now doubt personally).
Metadata
Metadata
Assignees
Type
Projects
Status
Todo