-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Open
Labels
Description
Currently, we have a couple of transformer that returns sparse matrix. With the work on array-api in the ecosystem, it could actually be annoying because matrix are not array and might not be supported.
To illustrate with an example, let's look at:
from scipy.spatial.distance import pdist
from sklearn.feature_extraction.text import TfidfVectorizer
text = ['Example Category', 'Random Word']
transformer = TfidfVectorizer()
vector = transformer.fit_transform(text)
pdist(vector.todense())
It will fail with the following error
TypeError: Inputs of type `numpy.matrix` are not supported.
Because .todense()
will convert from a sparse matrix to a sparse array and that nowadays scipy will dipatch using the array-api.
If instead, we were returning a sparse array, .todense()
would provide a NumPy array and we would be fine.
So a broad question is: do we want to anticipate and send a FutureWarning
and change the type of output and requesting people to make an explicit conversion to a sparse matrix if they want to keep the previous behaviour?