BUG: sklearn.feature_extraction.text.HashingVectorizer.fit_transform raises ValueError when it shouldn't

#### Description
`sklearn.feature_extraction.text.HashingVectorizer.fit_transform` raises `ValueError: indices and data should have the same size` for data of a certain length. If you chunk the same data it runs fine.

#### Steps/Code to Reproduce

```python
import sklearn
from sklearn.feature_extraction.text import HashingVectorizer
print('scikit-learn version')
print(sklearn.__version__)
vectorizer = HashingVectorizer(
    analyzer='char', non_negative=True,
    n_features=1024, ngram_range=[4,16])
X = ['A'*1432]*203452
print('works')
vectorizer.fit_transform(X[:100000])
print('does not work')
vectorizer.fit_transform(X)
```

#### Expected Results
```
scikit-learn version
0.18.1
works
does not work
```
#### Actual Results
```
scikit-learn version
0.18.1
works
does not work
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-aae200adab09> in <module>()
     10 vectorizer.fit_transform(X[:100000])
     11 print('does not work')
---> 12 vectorizer.fit_transform(X)

/Users/benkaehler/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/sklearn/feature_extraction/text.py in transform(self, X, y)
    485 
    486         analyzer = self.build_analyzer()
--> 487         X = self._get_hasher().transform(analyzer(doc) for doc in X)
    488         if self.binary:
    489             X.data.fill(1)

/Users/benkaehler/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/sklearn/feature_extraction/hashing.py in transform(self, raw_X, y)
    147 
    148         X = sp.csr_matrix((values, indices, indptr), dtype=self.dtype,
--> 149                           shape=(n_samples, self.n_features))
    150         X.sum_duplicates()  # also sorts the indices
    151         if self.non_negative:

/Users/benkaehler/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/scipy/sparse/compressed.py in __init__(self, arg1, shape, dtype, copy)
     96             self.data = np.asarray(self.data, dtype=dtype)
     97 
---> 98         self.check_format(full_check=False)
     99 
    100     def getnnz(self, axis=None):

/Users/benkaehler/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/scipy/sparse/compressed.py in check_format(self, full_check)
    165         # check index and data arrays
    166         if (len(self.indices) != len(self.data)):
--> 167             raise ValueError("indices and data should have the same size")
    168         if (self.indptr[-1] > len(self.indices)):
    169             raise ValueError("Last value of index pointer should be less than "

ValueError: indices and data should have the same size
```
#### Versions
```
Darwin-16.5.0-x86_64-i386-64bit
Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar  6 2017, 12:15:08) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.12.1
SciPy 0.19.0
Scikit-Learn 0.18.1
```
and
```
Linux-2.6.32-504.16.2.el6.x86_64-x86_64-with-centos-6.6-Final
Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar  6 2017, 11:58:13) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.12.1
SciPy 0.19.0
Scikit-Learn 0.18.1
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: sklearn.feature_extraction.text.HashingVectorizer.fit_transform raises ValueError when it shouldn't #8941

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: sklearn.feature_extraction.text.HashingVectorizer.fit_transform raises ValueError when it shouldn't #8941

Description

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions