-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
Add n_components="auto" to NMF when H and W are provided #26634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* adding "auto" as default value for n_components * updating parameter validation, deprecation notice, documentation and tests accordingly --------- Co-authored-by: Alexandre Landeau <[email protected]> Co-authored-by: avigny <[email protected]>
jeremiedbb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Alexlandeau.
Here are some comments. In particular, this pr should target version 1.4 since the feature freeze for 1.3 already happened. Also, the addition of auto and future change of default should also be done for NMF and MiniBatchNMF.
Please also add an entry in the v1.4.rst changelog.
Co-authored-by: Jérémie du Boisberranger <[email protected]>
adding `init="custom"` so that we actually test the modified code
…nto bug/nmf_26392
This allows to check the shapes of w or h if given to the estimator. added the corresponding test as well
jeremiedbb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Here's a new batch of comments, mostly nitpicks.
| X, H=H_true, n_components="auto", update_H=False | ||
| ) # should not fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can check that H.shape has not changed H.shape == H_true.shape.
you can also check that W has the appropriate shape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in e5cffdb
| def test_nmf_n_components_auto(Estimator): | ||
| rng = np.random.RandomState(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a small description of the goal of this test, like "Check that n_components is correctly inferred from the provided custom initialization."
And similar for other tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 0e65acd
|
CI is failing because this PR introduces many FutureWarnings. You need to check every place where we call |
|
Should we also add
|
Yes, we don't want warnings in the examples either |
I was wrong, the scikit-learn/examples/compose/plot_compare_reduction.py Lines 53 to 70 in bcbc2fc
|
jeremiedbb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @Alexlandeau and @avigny
I fixed the remaining failing test and triggered the CI to run the examples to make sure we don't miss future warnings there. It'll be a longer run.
sklearn/decomposition/_nmf.py
Outdated
| if H is not None: | ||
| self._n_components = _num_samples(H) | ||
| elif W is not None: | ||
| self._n_components = _num_features(W) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please extend the tests to cover this branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small note about this branch:
if H is None and W is not None we now set self._n_components = _num_features(W) instead of self._n_components = X.shape[1]
Situation before to the PR
If you have init = "custom" ==> Error because W is not set
If you have init != "custom" ==> A new W is created with the shape n_samples, X.shape[1]. This is misleading because X.shape[1] can be different from _num_features(W)
see
scikit-learn/sklearn/decomposition/_nmf.py
Lines 1208 to 1216 in 9cbcc1f
| if self.solver == "mu": | |
| avg = np.sqrt(X.mean() / self._n_components) | |
| W = np.full((n_samples, self._n_components), avg, dtype=X.dtype) | |
| else: | |
| W = np.zeros((n_samples, self._n_components), dtype=X.dtype) | |
| else: | |
| W, H = _initialize_nmf( | |
| X, self._n_components, init=self.init, random_state=self.random_state | |
| ) |
Situation after the PR
If you have init = "custom" [Same behaviour] ==> Error because W is not set
If you have init != "custom" [Changed behaviour]==> a new W is created with but now the shape of W will be the same as the shape of the W passed as argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test was added in d8b6bf8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have init != "custom" [Changed behaviour]==> a new W is created with but now the shape of W will be the same as the shape of the W passed as argument
I don't think this is a desirable behavior. It's also a breaking change. In that case we should completely ignore W and fallback to X.shape[1].
It requires a little bit of refactoring because it means that we can't compute n_components before checking in which situation we are. I directly pushed some changes to fix the behavior. The benefit is that we no longer have to rely on _num_samples and _num_features :)
ogrisel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the missing test is added, LGTM.
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
…nto bug/nmf_26392
|
Thanks @jeremiedbb for the last review and suggestions ! |
|
Thanks!! |
|
Thank you @Alexlandeau for this! |
…cikit-learn#26634) Co-authored-by: avigny <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>
…cikit-learn#26634) Co-authored-by: avigny <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>
Fixes #26392