RFC Consider changing the output of transformer returning sparse matrix to sparse array

Currently, we have a couple of transformer that returns sparse matrix. With the work on array-api in the ecosystem, it could actually be annoying because matrix are not array and might not be supported.

To illustrate with an example, let's look at:

```python
from scipy.spatial.distance import pdist
from sklearn.feature_extraction.text import TfidfVectorizer

text = ['Example Category', 'Random Word']
transformer = TfidfVectorizer()
vector = transformer.fit_transform(text)
pdist(vector.todense())
```

It will fail with the following error

```pytb
TypeError: Inputs of type `numpy.matrix` are not supported.
```

Because `.todense()` will convert from a sparse matrix to a sparse array and that nowadays scipy will dipatch using the array-api.

If instead, we were returning a sparse array, `.todense()` would provide a NumPy array and we would be fine.

So a broad question is: do we want to anticipate and send a `FutureWarning` and change the type of output and requesting people to make an explicit conversion to a sparse matrix if they want to keep the previous behaviour?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC Consider changing the output of transformer returning sparse matrix to sparse array #32216

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC Consider changing the output of transformer returning sparse matrix to sparse array #32216

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions