Thanks to visit codestin.com
Credit goes to github.com

Skip to content

sklearn: 0.20.2-OverflowError: signed integer is greater than maximum when calling selector.fit_transform:  #13045

Closed
@jiajiexiao

Description

@jiajiexiao

Description

The "OverflowError: signed integer is greater than maximum" error messages occurred when calling selector.fit_transform() to select significant features from a large (millions to billions of samples and hundreds of thousands of features) sparse dictionary vector.

Steps/Code to Reproduce

protein_vec = DictVectorizer(sparse=True, dtype=np.uint16).fit(protein_in_pairs)
selector = GenericUnivariateSelect(chi2, 'fpr', param=UserInput.fpr_alpha)
protein_vec_selected = selector.fit_transform(protein_vec.transform(protein_in_pairs), labels_balanced)

Expected Results

No error is thrown. Significant features return.

Actual Results

File "/home/xx/.conda/envs/seqfeaturizer/lib/python3.6/site-packages/sklearn/feature_extraction/dict_vectorizer.py", line 292, in transform
return self._transform(X, fitting=False)
File "/home/xx/.conda/envs/seqfeaturizer/lib/python3.6/site-packages/sklearn/feature_extraction/dict_vectorizer.py", line 181, in _transform
indptr.append(len(indices))
OverflowError: signed integer is greater than maximum

Versions

System:
python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0]
executable: /home/xx/.conda/envs/seqfeaturizer/bin/python
machine: Linux-3.10.0-693.11.6.el7.x86_64-x86_64-with-centos-7.4.1708-Core

BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /home/xx/.conda/envs/seqfeaturizer/lib
cblas_libs: mkl_rt, pthread

Python deps:
pip: 18.1
setuptools: 40.6.3
sklearn: 0.20.2
numpy: 1.15.4
scipy: 1.1.0
Cython: None
pandas: 0.23.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions