Closed
Description
Description
The partial_fit method in IncrementalPCA does integer division instead of float division here:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/incremental_pca.py#L249
This causes IncrementalPCA to give the wrong output when using python 2.
np.sqrt((self.n_samples_seen_ * n_samples) /
should be changed to
np.sqrt(float(self.n_samples_seen_ * n_samples) /
Steps/Code to Reproduce
Running the following code in python 2 and 3 gives a different output.
from __future__ import print_function
import numpy as np
from sklearn.decomposition import IncrementalPCA
A = np.array([[1, 2, 4], [5, 3, 6]])
B = np.array([[6, 7, 3], [5, 2, 1], [3, 5, 6]])
C = np.array([[3, 2, 1]])
ipca = IncrementalPCA(n_components=2)
ipca.partial_fit(A)
ipca.partial_fit(B)
print(ipca.transform(C))
Expected Results
Python 3 output:
[[-1.48864923 -3.15618645]]
Actual Results
Python 2 output:
[[-1.9943712 -2.86487266]]
Versions
Linux-4.10.0-27-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]')
('NumPy', '1.13.1')
('SciPy', '0.19.1')
('Scikit-Learn', '0.18.2')
Metadata
Metadata
Assignees
Labels
No labels