Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: MLE for PCA mis-estimates rank #16730

Closed
@larsoner

Description

@larsoner

After #16224 it looks like this code no longer produces the correct result:

import numpy as np
from sklearn.decomposition import PCA
n_samples, n_dim = 1000, 10
X = np.random.RandomState(0).randn(n_samples, n_dim)
X[:, -1] = np.mean(X[:, :-1], axis=-1)  # true X dim is ndim - 1
pca_skl = PCA('mle', svd_solver='full')
pca_skl.fit(X)
assert pca_skl.n_components_ == n_dim - 1

Before #16224 this passed (n_components_ == 9) but after #16224 it gives 8. Not sure why this would happen given the singular value spectrum looks good:

import matplotlib.pyplot as plt
s = np.linalg.svdvals(X)
plt.stem(s)

Figure_1

Maybe an off-by-one error somewhere?

cc'ing @lschwetlick since it was your PR

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions