-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
[MRG+1] Allow sparse input to incremental PCA #13960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Allow sparse input to incremental PCA #13960
Conversation
880a7a3 to
17d7248
Compare
|
Looks good to me. |
17d7248 to
dc276bf
Compare
NicolasHug
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:.
3afe748 to
9e119f3
Compare
yes, but if the way you treat sparse matrices is to densify them, you might as well get the user to do that... |
A clarification: if the user wants to batchwise densify and fit multiple sparse matrices, then this is not currently possible. Example scenario: you want to fit a single estimator to multiple large |
|
Why can't the user currently pass them, dense, one by one to partial_fit?
|
|
I would presume that turning the entire sparse matrix to dense at once would be undesirable for memory reasons. |
|
That's why we require the user to make the data dense in situations where
doing it automatically may be deleterious.
|
9e119f3 to
5b45e71
Compare
|
Note: it's failing tests due to some other unrelated change, I think - failing tests in grid search CV. |
28c78a0 to
7ea580c
Compare
|
@NicolasHug @jnothman any further comments? |
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my review time has been limited. Please add a check that partial_fit still raises an appropriate error when passed sparse X. Otherwise lgtm, thanks!
NicolasHug
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Joel mentioned please test the error in partial_fit for dense input.
Also all the methods that accept/return a sparse X should be changed to array-like or sparse matrix
Otherwise LGTM too!
7ea580c to
bf978a8
Compare
|
@jnothman @NicolasHug should be good to go now. |
bf978a8 to
cb5a185
Compare
NicolasHug
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scottgigante !
NicolasHug
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scottgigante !
|
Thanks Scott! |
|
Thanks @jnothman and @NicolasHug for the detailed reviews! |
Reference Issues/PRs
Fixes #13957.
What does this implement/fix? Explain your changes.
Implements sparse input for IncrementalPCA. IncrementalPCA is by design suited to accepting sparse input; this allows the input to be sparse, and if it is so, converts the data to dense on a batch-wise basis.