Description
From #2969 (comment):
"Anything that uses liblinear (and possibly other bundled C as opposed to Cython code) will segfault when given CSR arrays with 64 bit indices (e.g. LogisticRegression(), LinearSVC() etc). This is fairly critical IMO, and even if sparse arrays with 64 bit indices won't be supported there in the near future (or at all), it would be good to check for indices dtype and raise a python exception when appropriate. This is also the reason these tests need to be run with pytest-xdist using the -n 1 option, so that pytest could recover from a crashed interpreter."
I assume the same is true of SVC, SVR.
The issue is that scipy.sparse matrices only relatively began to support large sparse matrices, such as where indptr
and indices
of csr_matrix
may be 64-bit ints. This case should be ruled out for the liblinear/libsvm solvers. I think the best solution (so that we can later support or reject large sparse matrices more systematically) is to add a boolean parameter such as accept_large_sparse
to sklearn.utils.check_array
.