-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division #9492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division #9492
Conversation
Hi @jrbourbeau |
(also, it might be more elegant to use the |
I've preferred the future, but others have preferred converting to float.
The latter is worthwhile where we expect users to copy-paste, as in
examples. otherwise, I suspect we should consider adding some future
imports to the top of every file and including a CI check for it.
…On 4 Aug 2017 3:04 pm, "Nelle Varoquaux" ***@***.***> wrote:
(also, it might be more elegant to use the from __future__ import division
solution.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9492 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz66AQbPKIJEiCXQKXkRUNsP5vSXzyks5sUqZ0gaJpZM4OtQBR>
.
|
I would prefer the |
@NelleV, do have any suggestions for a test to add? I was thinking the original code that caught this problem in issue #9489 might be a good test. Something like import numpy as np
from sklearn.decomposition import IncrementalPCA
def test_partial_fit_correct_answer():
A = np.array([[1, 2, 4], [5, 3, 6]])
B = np.array([[6, 7, 3], [5, 2, 1], [3, 5, 6]])
C = np.array([[3, 2, 1]])
ipca = IncrementalPCA(n_components=2)
ipca.partial_fit(A)
ipca.partial_fit(B)
# Know answer is [[-1.48864923 -3.15618645]]
np.testing.assert_equal(ipca.transform(C), [[-1.48864923 -3.15618645]]) But I'm not sure if hard-coding the correct answer is the way to go, or if one should just simply check for float division (since that's the real problem here) with something like def test_float_division():
assert 3/2 == 1.5 |
A test hard-coding the correct answer with a comment saying it's a non regression test for issue#9489 would be great |
Here is a snippet that can be easily turned into a regression test: import numpy as np
from sklearn.decomposition import IncrementalPCA
rng = np.random.RandomState(0)
A = rng.randn(5, 3) + 2
B = rng.randn(7, 3) + 5
pca = IncrementalPCA(n_components=2)
pca.partial_fit(A)
pca.n_samples_seen_ = float(pca.n_samples_seen_)
pca.partial_fit(B)
print(pca.singular_values_)
pca2 = IncrementalPCA(n_components=2)
pca2.partial_fit(A)
pca2.partial_fit(B)
print(pca2.singular_values_) With scikit-learn 0.18.2 and Python 2.7, the output is:
|
@@ -273,3 +273,17 @@ def test_whitening(): | |||
assert_almost_equal(X, Xinv_ipca, decimal=prec) | |||
assert_almost_equal(X, Xinv_pca, decimal=prec) | |||
assert_almost_equal(Xinv_pca, Xinv_ipca, decimal=prec) | |||
|
|||
|
|||
def test_partial_fit_correct_answer(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have something like #9492 (comment) as a regression test. To me this seems less magical than comparing to a "known answer" whose provenance is not very clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lesteve, thanks for the comment! I'll modify the test accordingly
Also you want to add an entry into doc/whats_new.rst about your fix. Do mention it is Python 2 only. |
It looks like I added a bunch of other commits I didn't intend to in my most recent push. I was trying to update I attempted the rebase following the instructions from contributing guide (https://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html#rebasing-on-master), but must have accidentally included all the other commits as well. I'm not exactly sure what to do here. Does anyone know of a way to fix this? Or should I open up a new PR? Thanks!! |
I made a backup branch before I used
work to revert this PR back to its state before I did the rebase? |
yes, that should work
…On 11 Aug 2017 3:09 am, "James Bourbeau" ***@***.***> wrote:
I made a backup branch before I used git rebase. Would
# reset branch back to the saved point
git reset --hard backup_branch
# push this saved point to the remote branch for this PR
git push -f origin remote_feature_branch
work to revert this PR back to its state before I did the rebase?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9492 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz61rRPyssae27gVN7yfPjWwMYTzRCks5sWzlbgaJpZM4OtQBR>
.
|
51a693d
to
3016393
Compare
Thanks so much @jnothman! I want to add an entry to I was wondering how I should go about this. One possible way would be to copy the new version 0.20 section from the current master branch into my feature branch, add an entry for this PR, and then commit it. Is this the preferred way, or is there a different way to go about this? |
on my machine it'll look like
git pull upstream master
git merge FETCH_HEAD
…On 11 Aug 2017 10:41 am, "James Bourbeau" ***@***.***> wrote:
Thanks so much @jnothman <https://github.com/jnothman>!
I want to add an entry to doc/whats_new.rst related to this PR. However,
there have been changes made to the current master branch version of
doc/whats_new.rst. Namely, a section for version 0.20 has been added. I
would like to add an entry for this PR to the version 0.20 section without
creating any merge conflicts with the current master branch.
I was wondering how I should go about this. One possible way would be to
copy the new version 0.20 section from the current master branch into my
feature branch, add an entry for this PR, and then commit it. Is this the
preferred way, or is there a different way to go about this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9492 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz66biD6ssO-JFqQXf9iYJPHOAoQZRks5sW6NQgaJpZM4OtQBR>
.
|
…into IncrementalPCA-partial_fit-float-division Want to get recent changes made to `doc/whats_new.rst` into my feature branch
It worked! Thanks for helping me with that @jnothman |
PR #9526 updated To summarize, this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, LGTM. Thanks
doc/whats_new.rst
Outdated
occurs due to changes in the modelling logic (bug fixes or enhancements), or in | ||
random sampling procedures. | ||
|
||
- :class:`decomposition.IncrementalPCA` (bug fix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add "in Python 2"
Thanks @jrbourbeau ! |
…lPCA uses float division (scikit-learn#9492) * Ensures that partial_fit uses float division * Switches to using future division for float division * Adds non-regression test for issue scikit-learn#9489 * Updates test to remove dependence on a "known answer" * Updates doc/whats_new.rst with entry for this PR * Specifies bug fix is for Python 2 versions in doc/whats_new.rst
…lPCA uses float division (scikit-learn#9492) * Ensures that partial_fit uses float division * Switches to using future division for float division * Adds non-regression test for issue scikit-learn#9489 * Updates test to remove dependence on a "known answer" * Updates doc/whats_new.rst with entry for this PR * Specifies bug fix is for Python 2 versions in doc/whats_new.rst
…lPCA uses float division (scikit-learn#9492) * Ensures that partial_fit uses float division * Switches to using future division for float division * Adds non-regression test for issue scikit-learn#9489 * Updates test to remove dependence on a "known answer" * Updates doc/whats_new.rst with entry for this PR * Specifies bug fix is for Python 2 versions in doc/whats_new.rst
…lPCA uses float division (scikit-learn#9492) * Ensures that partial_fit uses float division * Switches to using future division for float division * Adds non-regression test for issue scikit-learn#9489 * Updates test to remove dependence on a "known answer" * Updates doc/whats_new.rst with entry for this PR * Specifies bug fix is for Python 2 versions in doc/whats_new.rst
Reference Issue
Fixes #9489
What does this implement/fix? Explain your changes.
Currently the
partial_fit
method forsklearn.decomposition.IncrementalPCA
will use integer division for Python 2 version and float division for Python 3 (see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/incremental_pca.py#L249). This PR ensures that float division is used for all Python versions by casting the relevant numerator to a floating-point number.Any other comments?
While casting the numerator to a float fixes issue #9489. I'm not sure if this is the preferred style, or if adding
from __future__ import division
is a better option.