Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@scottgigante
Copy link
Contributor

Reference Issues/PRs

Fixes #13957.

What does this implement/fix? Explain your changes.

Implements sparse input for IncrementalPCA. IncrementalPCA is by design suited to accepting sparse input; this allows the input to be sparse, and if it is so, converts the data to dense on a batch-wise basis.

@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch 4 times, most recently from 880a7a3 to 17d7248 Compare May 27, 2019 23:57
@abhinavsagar
Copy link
Contributor

Looks good to me.

@scottgigante scottgigante changed the title Allow sparse input to incremental PCA [MRG] Allow sparse input to incremental PCA May 28, 2019
@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch from 17d7248 to dc276bf Compare May 28, 2019 13:04
@scottgigante scottgigante changed the title [MRG] Allow sparse input to incremental PCA [WIP] Allow sparse input to incremental PCA May 28, 2019
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:.

@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch 6 times, most recently from 3afe748 to 9e119f3 Compare May 29, 2019 03:31
@scottgigante scottgigante changed the title [WIP] Allow sparse input to incremental PCA [MRG] Allow sparse input to incremental PCA May 29, 2019
@jnothman
Copy link
Member

If I want to apply partial_fit batchwise to multiple input matrices, I would not be able to run step 2 without step 1.

yes, but if the way you treat sparse matrices is to densify them, you might as well get the user to do that...

@scottgigante
Copy link
Contributor Author

If I want to apply partial_fit batchwise to multiple input matrices, I would not be able to run step 2 without step 1.

yes, but if the way you treat sparse matrices is to densify them, you might as well get the user to do that...

A clarification: if the user wants to batchwise densify and fit multiple sparse matrices, then this is not currently possible. Example scenario: you want to fit a single estimator to multiple large mtx files. Alternative solution would be to load all of the files, concatenate the matrices with scipy.sparse.vstack and then apply fit.

@jnothman
Copy link
Member

jnothman commented Jun 2, 2019 via email

@scottgigante
Copy link
Contributor Author

I would presume that turning the entire sparse matrix to dense at once would be undesirable for memory reasons.

@jnothman
Copy link
Member

jnothman commented Jun 3, 2019 via email

@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch from 9e119f3 to 5b45e71 Compare June 3, 2019 15:51
@scottgigante
Copy link
Contributor Author

Note: it's failing tests due to some other unrelated change, I think - failing tests in grid search CV.

@scottgigante scottgigante reopened this Jun 6, 2019
@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch 2 times, most recently from 28c78a0 to 7ea580c Compare June 6, 2019 21:47
@scottgigante
Copy link
Contributor Author

@NicolasHug @jnothman any further comments?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my review time has been limited. Please add a check that partial_fit still raises an appropriate error when passed sparse X. Otherwise lgtm, thanks!

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Joel mentioned please test the error in partial_fit for dense input.

Also all the methods that accept/return a sparse X should be changed to array-like or sparse matrix

Otherwise LGTM too!

@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch from 7ea580c to bf978a8 Compare June 12, 2019 18:22
@scottgigante
Copy link
Contributor Author

@jnothman @NicolasHug should be good to go now.

@scottgigante scottgigante force-pushed the feature/incremental_pca_sparse_input branch from bf978a8 to cb5a185 Compare June 13, 2019 14:46
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scottgigante !

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scottgigante !

@scottgigante scottgigante changed the title [MRG] Allow sparse input to incremental PCA [MRG+1] Allow sparse input to incremental PCA Jun 13, 2019
@jnothman jnothman merged commit df7dd83 into scikit-learn:master Jun 14, 2019
@jnothman
Copy link
Member

Thanks Scott!

@scottgigante
Copy link
Contributor Author

Thanks @jnothman and @NicolasHug for the detailed reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IncrementalPCA should accept sparse input

4 participants