Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Remove unnecessary restriction on number of samples in IncrementalPCA #30224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ThomasGesseyJonesPX
Copy link
Contributor

Currently when calling IncrementalPCA.partial_fit() the number of samples in X always has to be greater or equal to the number of PCA components due to this hardcoded check

elif not self.n_components <= n_samples:

However, this restriction actually only needs to apply to the first call to partial_fit(), which performs the initial SVD decomposition. Once this first call has established the initial $U$ and $\Sigma$, none of the algorithms steps (see original paper https://www.cs.toronto.edu/~dross/ivt/RossLimLinYang_ijcv.pdf) require any size restrictions on the number of samples in the subsequent batches passed to partial_fit. The current error message hence unnecessary restricts the usage of IncrementalPCA, in particular in online applications where data might arrive in batches of different sizes.

This PR, updates the aforementioned check to only apply to the first call of partial_fit and adds a non-regression test for this issue.

Copy link

github-actions bot commented Nov 5, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: c752837. Link to the linter CI: here

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Some minor feedback below, but otherwise LGTM.

We probably need better tests (e.g. on non-spherical data) for this estimator, and we could also clarify what are the expectations in terms of accuracy between the full-batch and incremental version of PCA, but this is beyond the scope of this PR.

@ThomasGesseyJonesPX
Copy link
Contributor Author

Thanks for the feedback! I have implemented all of your suggestions

@ThomasGesseyJonesPX ThomasGesseyJonesPX force-pushed the correct_errors_in_incremental_pca branch from d2d4f27 to c752837 Compare November 7, 2024 09:47
@adrinjalali adrinjalali merged commit 94f8875 into scikit-learn:main Nov 7, 2024
30 checks passed
@ThomasGesseyJonesPX ThomasGesseyJonesPX deleted the correct_errors_in_incremental_pca branch November 7, 2024 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants