[MRG+1] Allow sparse input to incremental PCA #13960

scottgigante · 2019-05-27T23:17:57Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Implements sparse input for IncrementalPCA. IncrementalPCA is by design suited to accepting sparse input; this allows the input to be sparse, and if it is so, converts the data to dense on a batch-wise basis.

abhinavsagar · 2019-05-28T12:06:34Z

Looks good to me.

doc/modules/decomposition.rst

sklearn/decomposition/incremental_pca.py

NicolasHug

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:.

sklearn/decomposition/incremental_pca.py

sklearn/decomposition/tests/test_incremental_pca.py

jnothman · 2019-05-30T12:32:27Z

If I want to apply partial_fit batchwise to multiple input matrices, I would not be able to run step 2 without step 1.

yes, but if the way you treat sparse matrices is to densify them, you might as well get the user to do that...

scottgigante · 2019-06-01T23:29:48Z

If I want to apply partial_fit batchwise to multiple input matrices, I would not be able to run step 2 without step 1.

yes, but if the way you treat sparse matrices is to densify them, you might as well get the user to do that...

A clarification: if the user wants to batchwise densify and fit multiple sparse matrices, then this is not currently possible. Example scenario: you want to fit a single estimator to multiple large mtx files. Alternative solution would be to load all of the files, concatenate the matrices with scipy.sparse.vstack and then apply fit.

jnothman · 2019-06-02T04:11:25Z

Why can't the user currently pass them, dense, one by one to partial_fit?

scottgigante · 2019-06-02T18:05:44Z

I would presume that turning the entire sparse matrix to dense at once would be undesirable for memory reasons.

jnothman · 2019-06-03T08:29:25Z

That's why we require the user to make the data dense in situations where doing it automatically may be deleterious.

sklearn/decomposition/tests/test_incremental_pca.py

scottgigante · 2019-06-03T16:24:06Z

Note: it's failing tests due to some other unrelated change, I think - failing tests in grid search CV.

scottgigante · 2019-06-06T22:30:43Z

@NicolasHug @jnothman any further comments?

jnothman

Sorry my review time has been limited. Please add a check that partial_fit still raises an appropriate error when passed sparse X. Otherwise lgtm, thanks!

NicolasHug

As Joel mentioned please test the error in partial_fit for dense input.

Also all the methods that accept/return a sparse X should be changed to array-like or sparse matrix

Otherwise LGTM too!

scottgigante · 2019-06-12T19:05:54Z

@jnothman @NicolasHug should be good to go now.

sklearn/decomposition/tests/test_incremental_pca.py

NicolasHug

Thanks @scottgigante !

NicolasHug

Thanks @scottgigante !

jnothman · 2019-06-14T06:11:31Z

Thanks Scott!

scottgigante · 2019-06-14T11:26:29Z

Thanks @jnothman and @NicolasHug for the detailed reviews!

scottgigante force-pushed the feature/incremental_pca_sparse_input branch 4 times, most recently from 880a7a3 to 17d7248 Compare May 27, 2019 23:57

abhinavsagar approved these changes May 28, 2019

View reviewed changes

scottgigante changed the title ~~Allow sparse input to incremental PCA~~ [MRG] Allow sparse input to incremental PCA May 28, 2019

jnothman reviewed May 28, 2019

View reviewed changes

doc/modules/decomposition.rst Outdated Show resolved Hide resolved

sklearn/decomposition/incremental_pca.py Show resolved Hide resolved

sklearn/decomposition/incremental_pca.py Outdated Show resolved Hide resolved

scottgigante force-pushed the feature/incremental_pca_sparse_input branch from 17d7248 to dc276bf Compare May 28, 2019 13:04

scottgigante commented May 28, 2019

View reviewed changes

sklearn/decomposition/incremental_pca.py Outdated Show resolved Hide resolved

scottgigante changed the title ~~[MRG] Allow sparse input to incremental PCA~~ [WIP] Allow sparse input to incremental PCA May 28, 2019

jnothman reviewed May 28, 2019

View reviewed changes

sklearn/decomposition/incremental_pca.py Outdated Show resolved Hide resolved

NicolasHug reviewed May 28, 2019

View reviewed changes

sklearn/decomposition/incremental_pca.py Show resolved Hide resolved

sklearn/decomposition/tests/test_incremental_pca.py Outdated Show resolved Hide resolved

sklearn/decomposition/tests/test_incremental_pca.py Outdated Show resolved Hide resolved

scottgigante force-pushed the feature/incremental_pca_sparse_input branch 6 times, most recently from 3afe748 to 9e119f3 Compare May 29, 2019 03:31

scottgigante changed the title ~~[WIP] Allow sparse input to incremental PCA~~ [MRG] Allow sparse input to incremental PCA May 29, 2019

NicolasHug reviewed Jun 3, 2019

View reviewed changes

sklearn/decomposition/tests/test_incremental_pca.py Outdated Show resolved Hide resolved

sklearn/decomposition/tests/test_incremental_pca.py Outdated Show resolved Hide resolved

scottgigante force-pushed the feature/incremental_pca_sparse_input branch from 9e119f3 to 5b45e71 Compare June 3, 2019 15:51

scottgigante closed this Jun 6, 2019

scottgigante reopened this Jun 6, 2019

scottgigante force-pushed the feature/incremental_pca_sparse_input branch 2 times, most recently from 28c78a0 to 7ea580c Compare June 6, 2019 21:47

jnothman reviewed Jun 6, 2019

View reviewed changes

NicolasHug reviewed Jun 7, 2019

View reviewed changes

scottgigante force-pushed the feature/incremental_pca_sparse_input branch from 7ea580c to bf978a8 Compare June 12, 2019 18:22

NicolasHug reviewed Jun 12, 2019

View reviewed changes

sklearn/decomposition/tests/test_incremental_pca.py Outdated Show resolved Hide resolved

allow sparse input to incremental PCA

cb5a185

scottgigante force-pushed the feature/incremental_pca_sparse_input branch from bf978a8 to cb5a185 Compare June 13, 2019 14:46

NicolasHug approved these changes Jun 13, 2019

View reviewed changes

scottgigante changed the title ~~[MRG] Allow sparse input to incremental PCA~~ [MRG+1] Allow sparse input to incremental PCA Jun 13, 2019

jnothman approved these changes Jun 14, 2019

View reviewed changes

jnothman merged commit df7dd83 into scikit-learn:master Jun 14, 2019

scottgigante deleted the feature/incremental_pca_sparse_input branch June 14, 2019 11:26

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

ENH allow sparse input to incremental PCA (scikit-learn#13960)

568534d

scottgigante mentioned this pull request Oct 5, 2019

Use sklearn.decomposition.IncrementalPCA for scprep.reduce.pca with sparse data KrishnaswamyLab/scprep#79

Open

Uh oh!

[MRG+1] Allow sparse input to incremental PCA #13960

[MRG+1] Allow sparse input to incremental PCA #13960

Uh oh!

Conversation

scottgigante commented May 27, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

abhinavsagar commented May 28, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman commented May 30, 2019

Uh oh!

scottgigante commented Jun 1, 2019

Uh oh!

jnothman commented Jun 2, 2019 via email

Uh oh!

scottgigante commented Jun 2, 2019

Uh oh!

jnothman commented Jun 3, 2019 via email

Uh oh!

Uh oh!

Uh oh!

scottgigante commented Jun 3, 2019

Uh oh!

scottgigante commented Jun 6, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

scottgigante commented Jun 12, 2019

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jun 14, 2019

Uh oh!

scottgigante commented Jun 14, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants