[MRG+1] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division #9492

jrbourbeau · 2017-08-04T04:00:36Z

Reference Issue

What does this implement/fix? Explain your changes.

Currently the partial_fit method for sklearn.decomposition.IncrementalPCA will use integer division for Python 2 version and float division for Python 3 (see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/incremental_pca.py#L249). This PR ensures that float division is used for all Python versions by casting the relevant numerator to a floating-point number.

Any other comments?

While casting the numerator to a float fixes issue #9489. I'm not sure if this is the preferred style, or if adding from __future__ import division is a better option.

NelleV · 2017-08-04T05:03:00Z

Hi @jrbourbeau
Thanks a lot for the patch. It looks good overall, but can you add a test?

NelleV · 2017-08-04T05:04:50Z

(also, it might be more elegant to use the from __future__ import division solution.)

jnothman · 2017-08-05T22:33:13Z

I've preferred the future, but others have preferred converting to float. The latter is worthwhile where we expect users to copy-paste, as in examples. otherwise, I suspect we should consider adding some future imports to the top of every file and including a CI check for it.

…

On 4 Aug 2017 3:04 pm, "Nelle Varoquaux" ***@***.***> wrote: (also, it might be more elegant to use the from __future__ import division solution. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#9492 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz66AQbPKIJEiCXQKXkRUNsP5vSXzyks5sUqZ0gaJpZM4OtQBR> .

jrbourbeau · 2017-08-06T14:54:45Z

I would prefer the from __future__ import division route as well. @jnothman, do you think it's safe to assume that the internals of sklearn.decomposition.IncrementalPCA aren't something users will be doing a copy-paste with?

jrbourbeau · 2017-08-06T15:03:56Z

@NelleV, do have any suggestions for a test to add? I was thinking the original code that caught this problem in issue #9489 might be a good test. Something like

import numpy as np
from sklearn.decomposition import IncrementalPCA

def test_partial_fit_correct_answer():
    A = np.array([[1, 2, 4], [5, 3, 6]])
    B = np.array([[6, 7, 3], [5, 2, 1], [3, 5, 6]])
    C = np.array([[3, 2, 1]])

    ipca = IncrementalPCA(n_components=2)
    ipca.partial_fit(A)
    ipca.partial_fit(B)
    # Know answer is [[-1.48864923 -3.15618645]]
    np.testing.assert_equal(ipca.transform(C), [[-1.48864923 -3.15618645]])

But I'm not sure if hard-coding the correct answer is the way to go, or if one should just simply check for float division (since that's the real problem here) with something like

def test_float_division():
    assert 3/2 == 1.5

jnothman · 2017-08-09T02:34:14Z

A test hard-coding the correct answer with a comment saying it's a non regression test for issue#9489 would be great

lesteve · 2017-08-09T08:30:53Z

Here is a snippet that can be easily turned into a regression test:

import numpy as np
from sklearn.decomposition import IncrementalPCA

rng = np.random.RandomState(0)
A = rng.randn(5, 3) + 2
B = rng.randn(7, 3) + 5

pca = IncrementalPCA(n_components=2)
pca.partial_fit(A)
pca.n_samples_seen_ = float(pca.n_samples_seen_)
pca.partial_fit(B)
print(pca.singular_values_)

pca2 = IncrementalPCA(n_components=2)
pca2.partial_fit(A)
pca2.partial_fit(B)
print(pca2.singular_values_)

With scikit-learn 0.18.2 and Python 2.7, the output is:

[ 7.70269857  3.88561777]
[ 6.64122066  3.88184665]

lesteve · 2017-08-10T09:51:14Z

sklearn/decomposition/tests/test_incremental_pca.py

@@ -273,3 +273,17 @@ def test_whitening():
        assert_almost_equal(X, Xinv_ipca, decimal=prec)
        assert_almost_equal(X, Xinv_pca, decimal=prec)
        assert_almost_equal(Xinv_pca, Xinv_ipca, decimal=prec)
+
+
+def test_partial_fit_correct_answer():


I'd rather have something like #9492 (comment) as a regression test. To me this seems less magical than comparing to a "known answer" whose provenance is not very clear.

Hey @lesteve, thanks for the comment! I'll modify the test accordingly

lesteve · 2017-08-10T09:51:50Z

Also you want to add an entry into doc/whats_new.rst about your fix. Do mention it is Python 2 only.

jrbourbeau · 2017-08-10T16:17:36Z

It looks like I added a bunch of other commits I didn't intend to in my most recent push. I was trying to update doc/whats_new.rst with an entry for this PR, but had to rebase on master to get the updates to doc/whats_new.rst that had been added since the feature branch for this PR was made.

I attempted the rebase following the instructions from contributing guide (https://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html#rebasing-on-master), but must have accidentally included all the other commits as well.

I'm not exactly sure what to do here. Does anyone know of a way to fix this? Or should I open up a new PR? Thanks!!

jrbourbeau · 2017-08-10T17:09:45Z

I made a backup branch before I used git rebase. Would

# reset branch back to the saved point
git reset --hard backup_branch
# push this saved point to the remote branch for this PR
git push -f origin remote_feature_branch

work to revert this PR back to its state before I did the rebase?

jnothman · 2017-08-10T23:02:44Z

yes, that should work

…

On 11 Aug 2017 3:09 am, "James Bourbeau" ***@***.***> wrote: I made a backup branch before I used git rebase. Would # reset branch back to the saved point git reset --hard backup_branch # push this saved point to the remote branch for this PR git push -f origin remote_feature_branch work to revert this PR back to its state before I did the rebase? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9492 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz61rRPyssae27gVN7yfPjWwMYTzRCks5sWzlbgaJpZM4OtQBR> .

jrbourbeau · 2017-08-11T00:41:43Z

Thanks so much @jnothman!

I want to add an entry to doc/whats_new.rst related to this PR. However, there have been changes made to the current master branch version of doc/whats_new.rst. Namely, a section for version 0.20 has been added. I would like to add an entry for this PR to the version 0.20 section without creating any merge conflicts with the current master branch.

I was wondering how I should go about this. One possible way would be to copy the new version 0.20 section from the current master branch into my feature branch, add an entry for this PR, and then commit it. Is this the preferred way, or is there a different way to go about this?

jnothman · 2017-08-11T01:21:33Z

on my machine it'll look like git pull upstream master git merge FETCH_HEAD

…

On 11 Aug 2017 10:41 am, "James Bourbeau" ***@***.***> wrote: Thanks so much @jnothman <https://github.com/jnothman>! I want to add an entry to doc/whats_new.rst related to this PR. However, there have been changes made to the current master branch version of doc/whats_new.rst. Namely, a section for version 0.20 has been added. I would like to add an entry for this PR to the version 0.20 section without creating any merge conflicts with the current master branch. I was wondering how I should go about this. One possible way would be to copy the new version 0.20 section from the current master branch into my feature branch, add an entry for this PR, and then commit it. Is this the preferred way, or is there a different way to go about this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9492 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz66biD6ssO-JFqQXf9iYJPHOAoQZRks5sW6NQgaJpZM4OtQBR> .

…into IncrementalPCA-partial_fit-float-division Want to get recent changes made to `doc/whats_new.rst` into my feature branch

jrbourbeau · 2017-08-11T14:13:13Z

It worked! Thanks for helping me with that @jnothman

jrbourbeau · 2017-08-11T19:55:42Z

PR #9526 updated doc/whats_new.rst on master in preparation for the 0.19 release, leading to a merge conflict. But now all conflicts have been resolved. Unless there are any other comments, I think this PR is good to go.

To summarize, this PR:

Makes sure the partial_fit method for sklearn.decomposition.IncrementalPCA will use float
division for Python 2 versions.
Adds non-regression test for issue IncrementalPCA.partial_fit doesn't use float division in python 2 #9489 to sklearn/decomposition/tests/test_incremental_pca.py.
Updates doc/whats_new.rst with an entry for this PR.

jnothman

Yes, LGTM. Thanks

jnothman · 2017-08-14T03:04:13Z

doc/whats_new.rst

+occurs due to changes in the modelling logic (bug fixes or enhancements), or in
+random sampling procedures.
+
+- :class:`decomposition.IncrementalPCA` (bug fix)


add "in Python 2"

NelleV · 2017-08-14T19:51:54Z

Thanks @jrbourbeau !

…lPCA uses float division (scikit-learn#9492) * Ensures that partial_fit uses float division * Switches to using future division for float division * Adds non-regression test for issue scikit-learn#9489 * Updates test to remove dependence on a "known answer" * Updates doc/whats_new.rst with entry for this PR * Specifies bug fix is for Python 2 versions in doc/whats_new.rst

Ensures that partial_fit uses float division

7a47d1d

Switches to using future division for float division

8c0b3b9

Adds non-regression test for issue scikit-learn#9489

99ea6d3

jrbourbeau changed the title ~~Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division~~ [MRG] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division Aug 10, 2017

lesteve reviewed Aug 10, 2017

View reviewed changes

Updates test to remove dependence on a "known answer"

3016393

jrbourbeau force-pushed the IncrementalPCA-partial_fit-float-division branch from 51a693d to 3016393 Compare August 11, 2017 00:27

jrbourbeau added 2 commits August 10, 2017 20:46

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

0c76596

…into IncrementalPCA-partial_fit-float-division Want to get recent changes made to `doc/whats_new.rst` into my feature branch

Updates doc/whats_new.rst with entry for this PR

9aa2e74

jrbourbeau changed the title ~~[MRG] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division~~ [WIP] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division Aug 11, 2017

Merge branch 'master' into IncrementalPCA-partial_fit-float-division

754cf43

jrbourbeau changed the title ~~[WIP] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division~~ [MRG] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division Aug 11, 2017

jnothman reviewed Aug 14, 2017

View reviewed changes

jnothman changed the title ~~[MRG] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division~~ [MRG+1] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division Aug 14, 2017

Specifies bug fix is for Python 2 versions in doc/whats_new.rst

567139c

NelleV merged commit 86d8f18 into scikit-learn:master Aug 14, 2017

jrbourbeau deleted the IncrementalPCA-partial_fit-float-division branch August 14, 2017 19:56

Uh oh!

[MRG+1] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division #9492

[MRG+1] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division #9492

Uh oh!

Conversation

jrbourbeau commented Aug 4, 2017

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

NelleV commented Aug 4, 2017

Uh oh!

NelleV commented Aug 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Aug 5, 2017 via email

Uh oh!

jrbourbeau commented Aug 6, 2017

Uh oh!

jrbourbeau commented Aug 6, 2017

Uh oh!

jnothman commented Aug 9, 2017

Uh oh!

lesteve commented Aug 9, 2017

Uh oh!

lesteve Aug 10, 2017

Choose a reason for hiding this comment

Uh oh!

jrbourbeau Aug 10, 2017

Choose a reason for hiding this comment

Uh oh!

lesteve commented Aug 10, 2017

Uh oh!

jrbourbeau commented Aug 10, 2017

Uh oh!

jrbourbeau commented Aug 10, 2017

Uh oh!

jnothman commented Aug 10, 2017 via email

Uh oh!

jrbourbeau commented Aug 11, 2017

Uh oh!

jnothman commented Aug 11, 2017 via email

Uh oh!

jrbourbeau commented Aug 11, 2017

Uh oh!

jrbourbeau commented Aug 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Aug 14, 2017

Choose a reason for hiding this comment

Uh oh!

NelleV commented Aug 14, 2017

Uh oh!

Uh oh!

NelleV commented Aug 4, 2017 •

edited

Loading

jrbourbeau commented Aug 11, 2017 •

edited

Loading