Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Finding log-likelihood for PCA with n_components < n_features ( decomposition.pca.score error )  #7568

Closed
@pearsonkyle

Description

@pearsonkyle

Description

I am trying to find the log-likelihood of observing data under a specific model that has n_components < n_features. I am using the score function under the decomposition.PCA class. This function works fine if I set n_components = n_features in the data, however when I set n_components < n_features and try to evaluate the score after the PCA fit I get a ValueError in the data that says "array must not contain infs or NaNs". However a np.isfinite(X).all() returns true for the data I am using. I can trace the error back to calculating the precision of the data and it encounters a division by zero some where. Even if I transform the data into the lower dimension basis and try to run the score on that it still gives an error.

This comes partly from trying to understand how this example evaluates the score of pca's with different number of components and trying to reproduce it manually.

Steps/Code to Reproduce

Example:

from sklearn import decomposition, datasets
digits = datasets.load_digits()
X_digits = digits.data
y_digits = digits.target

pca = decomposition.PCA(n_components=30)
pca.fit(X_digits)
print('score =',pca.score(X_digits))

Versions

Linux-4.4.0-38-generic-x86_64-with-debian-jessie-sid
Python 3.5.1 |Anaconda 2.4.1 (64-bit)| (default, Dec 7 2015, 11:16:01)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.11.2
SciPy 0.18.1
Scikit-Learn 0.18a

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions