-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG + 1] Raising an error when batch_size < n_components in IncrementalPCA #9303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d382bb4
fcb2768
d4bd366
71c5a73
2cff58d
624e3dd
5b250ce
c508034
e6b38e3
93f7301
1acfd8b
289a8ac
be5ac2d
090c0f4
eee25b3
46fd392
a755554
522ebe0
5bdc0f3
d15c601
5d989a9
41d4613
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -211,11 +211,18 @@ def partial_fit(self, X, y=None, check_input=True): | |
self.components_ = None | ||
|
||
if self.n_components is None: | ||
self.n_components_ = n_features | ||
if self.components_ is None: | ||
self.n_components_ = min(n_samples, n_features) | ||
else: | ||
self.n_components_ = self.components_.shape[0] | ||
elif not 1 <= self.n_components <= n_features: | ||
raise ValueError("n_components=%r invalid for n_features=%d, need " | ||
"more rows than columns for IncrementalPCA " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have no idea what "more rows than columns means here" ... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think the message is good here either, but I wanted to focus my pull request on the points mentioned in #6452 (so that it would be reviewed and merged more quickly). |
||
"processing" % (self.n_components, n_features)) | ||
elif not self.n_components <= n_samples: | ||
raise ValueError("n_components=%r must be less or equal to " | ||
"the batch number of samples " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Funnily enough we were chatting with @ogrisel about this yesterday in an unrelated context. IIUC he was hoping that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also I bumped into the same problem (checking that n_components <= n_features but not n_components <= n_samples) in sklearn/decomposition/pca.py yesterday. There are also slight inconsistencies between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lesteve I have a pull-request for PCA as well --> #8742. Do you want to have a look, I do think it may be a quick case to just finish off. |
||
"%d." % (self.n_components, n_samples)) | ||
else: | ||
self.n_components_ = self.n_components | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ | |
from sklearn.utils.testing import assert_almost_equal | ||
from sklearn.utils.testing import assert_array_almost_equal | ||
from sklearn.utils.testing import assert_raises | ||
from sklearn.utils.testing import assert_raises_regex | ||
|
||
from sklearn import datasets | ||
from sklearn.decomposition import PCA, IncrementalPCA | ||
|
@@ -73,10 +74,41 @@ def test_incremental_pca_inverse(): | |
|
||
def test_incremental_pca_validation(): | ||
# Test that n_components is >=1 and <= n_features. | ||
X = [[0, 1], [1, 0]] | ||
for n_components in [-1, 0, .99, 3]: | ||
assert_raises(ValueError, IncrementalPCA(n_components, | ||
batch_size=10).fit, X) | ||
X = np.array([[0, 1, 0], [1, 0, 0]]) | ||
n_samples, n_features = X.shape | ||
for n_components in [-1, 0, .99, 4]: | ||
assert_raises_regex(ValueError, | ||
"n_components={} invalid for n_features={}, need" | ||
" more rows than columns for IncrementalPCA " | ||
"processing".format(n_components, n_features), | ||
IncrementalPCA(n_components, batch_size=10).fit, X) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this also be raised for partial_fit? |
||
|
||
# Tests that n_components is also <= n_samples. | ||
n_components = 3 | ||
assert_raises_regex(ValueError, | ||
"n_components={} must be less or equal to " | ||
"the batch number of samples {}".format( | ||
n_components, n_samples), | ||
IncrementalPCA( | ||
n_components=n_components).partial_fit, X) | ||
|
||
|
||
def test_n_components_none(): | ||
# Ensures that n_components == None is handled correctly | ||
rng = np.random.RandomState(1999) | ||
for n_samples, n_features in [(50, 10), (10, 50)]: | ||
X = rng.rand(n_samples, n_features) | ||
ipca = IncrementalPCA(n_components=None) | ||
|
||
# First partial_fit call, ipca.n_components_ is inferred from | ||
# min(X.shape) | ||
ipca.partial_fit(X) | ||
assert ipca.n_components_ == min(X.shape) | ||
|
||
# Second partial_fit call, ipca.n_components_ is inferred from | ||
# ipca.components_ computed from the first partial_fit call | ||
ipca.partial_fit(X) | ||
assert ipca.n_components_ == ipca.components_.shape[0] | ||
|
||
|
||
def test_incremental_pca_set_params(): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add regression tests for these? I guess if
n_features < n_samples
we had an error earlier?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the master had an error if n_samples < n_features (you wrote the opposite, but I believe it was a typo right?). As a ‘visual’ aid, this is the partial_fit method, so n_samples is equivalent to the size of the batches used.