Array API support for k-means

This is an early issue to publicly discuss the possibility (or not) to use the Array API (see #22352) for k-means and make it run on GPUs using PyTorch in particular.

@fcharras has already started to run some promising experiments using the raw PyTorch API. Maybe you could link to a gist with your code?

Unfortunately, the current state of the Array API is likely too limiting because AFAIK it does not yet expose the equivalent of `torch.cdist`, `torch.expand` and `torch.scatter_add_`.

The purpose of this issue is to precisely identify what is blocking us with the current state of Array API and discuss potential solutions:

- use this use case to report to the Array API standardization committee what are our needs to make the spec evolve and benefit everybody;
- alternatively, explore the use of multi-dispatch system such as [uarray](https://github.com/Quansight-Labs/uarray) that is being adopted in scipy to make it possible to maintain a pytorch-specific optimized code path as an alternative to a slower yet generic Array API code path and numpy-optimized code path that would rely on our current Cython code,
- decide that the estimator-level engine API proposed in #25535 is the only sane way to make this estimator GPU (which I now doubt personally).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Array API support for k-means #26585

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Array API support for k-means #26585

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions