Description
Description
I am trying to find the log-likelihood of observing data under a specific model that has n_components < n_features. I am using the score function under the decomposition.PCA class. This function works fine if I set n_components = n_features in the data, however when I set n_components < n_features and try to evaluate the score after the PCA fit I get a ValueError in the data that says "array must not contain infs or NaNs". However a np.isfinite(X).all() returns true for the data I am using. I can trace the error back to calculating the precision of the data and it encounters a division by zero some where. Even if I transform the data into the lower dimension basis and try to run the score on that it still gives an error.
This comes partly from trying to understand how this example evaluates the score of pca's with different number of components and trying to reproduce it manually.
Steps/Code to Reproduce
Example:
from sklearn import decomposition, datasets
digits = datasets.load_digits()
X_digits = digits.data
y_digits = digits.target
pca = decomposition.PCA(n_components=30)
pca.fit(X_digits)
print('score =',pca.score(X_digits))
Versions
Linux-4.4.0-38-generic-x86_64-with-debian-jessie-sid
Python 3.5.1 |Anaconda 2.4.1 (64-bit)| (default, Dec 7 2015, 11:16:01)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.11.2
SciPy 0.18.1
Scikit-Learn 0.18a