Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Problems in sklearn.decomposition.PCA with "n_components='mle' option" #4441

Closed
@alexis-mignon

Description

@alexis-mignon

We have found several problems in the implementation of the method to automatically tune the number of components of the PCA algorithms:

  1. The algorithm never tests full rank: this is most probably due to the fact that loops using the rank end always at rank-1 (for i in range(rank)).
  2. If two eigen values are equals there is a log(0) issue.
  3. Zeros eigen values are not treated explicitly

Possible solutions:

  • For (1): Checking the loops ranges
  • For (3): Predetecting small eigen values lower than the numerical noise excluding them from rank scan

I have no idea for 2. We had the problem here with very small eigen values (in numerical noise) which were totally identical. I never managed to create a syntetic dataset which reproduce the problem since the even with symetric datasets, there is always a small difference (in the order of numerical precision) between theoretically identical eigen values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions