Closed
Description
Hi all
I'm working with a pretty large data set and am having an issue with line 758 of text.py (CountVectorizer code):
indptr.append(len(j_indices))
In my case, the length of j_indices is larger than the maximum signed int. indptr is an int array.
I tried making indptr a long array but that leads to other bigger memory issues.
Any thoughts?
Metadata
Metadata
Assignees
Labels
No labels