Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RFC Consider changing the output of transformer returning sparse matrix to sparse array #32216

@glemaitre

Description

@glemaitre

Currently, we have a couple of transformer that returns sparse matrix. With the work on array-api in the ecosystem, it could actually be annoying because matrix are not array and might not be supported.

To illustrate with an example, let's look at:

from scipy.spatial.distance import pdist
from sklearn.feature_extraction.text import TfidfVectorizer

text = ['Example Category', 'Random Word']
transformer = TfidfVectorizer()
vector = transformer.fit_transform(text)
pdist(vector.todense())

It will fail with the following error

TypeError: Inputs of type `numpy.matrix` are not supported.

Because .todense() will convert from a sparse matrix to a sparse array and that nowadays scipy will dipatch using the array-api.

If instead, we were returning a sparse array, .todense() would provide a NumPy array and we would be fine.

So a broad question is: do we want to anticipate and send a FutureWarning and change the type of output and requesting people to make an explicit conversion to a sparse matrix if they want to keep the previous behaviour?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions