Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Alexlandeau
Copy link
Contributor

Fixes #26392

  • adding "auto" as default value for n_components. In this mode, n_components will be inferred from H and W if they are provided
  • adding additional validation on inferred n_components to prevent inconsistencies
  • updating parameter validation, deprecation notice, documentation and tests accordingly

* adding "auto" as default value for n_components
* updating parameter validation, deprecation notice, documentation and tests accordingly
---------

Co-authored-by: Alexandre Landeau <[email protected]>
Co-authored-by: avigny <[email protected]>
@github-actions
Copy link

github-actions bot commented Jun 20, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 024ed75. Link to the linter CI: here

@ogrisel ogrisel changed the title Fix issue 26392 on NMF Add n_components="auto" to NMF when H and W are provided Jun 20, 2023
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Alexlandeau.

Here are some comments. In particular, this pr should target version 1.4 since the feature freeze for 1.3 already happened. Also, the addition of auto and future change of default should also be done for NMF and MiniBatchNMF.

Please also add an entry in the v1.4.rst changelog.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Here's a new batch of comments, mostly nitpicks.

Comment on lines +980 to +981
X, H=H_true, n_components="auto", update_H=False
) # should not fail
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can check that H.shape has not changed H.shape == H_true.shape.
you can also check that W has the appropriate shape.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in e5cffdb

Comment on lines 938 to 939
def test_nmf_n_components_auto(Estimator):
rng = np.random.RandomState(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a small description of the goal of this test, like "Check that n_components is correctly inferred from the provided custom initialization."

And similar for other tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 0e65acd

@jeremiedbb
Copy link
Member

CI is failing because this PR introduces many FutureWarnings. You need to check every place where we call (MiniBatch)NMF or non_negative_factorization where n_components is left to its default value and change it to "auto" :)

@avigny
Copy link
Contributor

avigny commented Jun 23, 2023

Should we also add n_components="auto" this example? I haven't found other places where the warning could occur

"reduce_dim": [PCA(iterated_power=7), NMF(max_iter=1_000)],

@jeremiedbb
Copy link
Member

Should we also add n_components="auto" this example? I haven't found other places where the warning could occur

Yes, we don't want warnings in the examples either

@avigny
Copy link
Contributor

avigny commented Jun 23, 2023

Should we also add n_components="auto" this example? I haven't found other places where the warning could occur

Yes, we don't want warnings in the examples either

I was wrong, the n_components values are set by the GridSearchCV so there is no need to add a value for n_components

N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
{
"reduce_dim": [PCA(iterated_power=7), NMF(max_iter=1_000)],
"reduce_dim__n_components": N_FEATURES_OPTIONS,
"classify__C": C_OPTIONS,
},
{
"reduce_dim": [SelectKBest(mutual_info_classif)],
"reduce_dim__k": N_FEATURES_OPTIONS,
"classify__C": C_OPTIONS,
},
]
reducer_labels = ["PCA", "NMF", "KBest(mutual_info_classif)"]
grid = GridSearchCV(pipe, n_jobs=1, param_grid=param_grid)
grid.fit(X, y)

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @Alexlandeau and @avigny

I fixed the remaining failing test and triggered the CI to run the examples to make sure we don't miss future warnings there. It'll be a longer run.

if H is not None:
self._n_components = _num_samples(H)
elif W is not None:
self._n_components = _num_features(W)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please extend the tests to cover this branch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small note about this branch:

if H is None and W is not None we now set self._n_components = _num_features(W) instead of self._n_components = X.shape[1]

Situation before to the PR

If you have init = "custom" ==> Error because W is not set
If you have init != "custom" ==> A new W is created with the shape n_samples, X.shape[1]. This is misleading because X.shape[1] can be different from _num_features(W)
see

if self.solver == "mu":
avg = np.sqrt(X.mean() / self._n_components)
W = np.full((n_samples, self._n_components), avg, dtype=X.dtype)
else:
W = np.zeros((n_samples, self._n_components), dtype=X.dtype)
else:
W, H = _initialize_nmf(
X, self._n_components, init=self.init, random_state=self.random_state
)

Situation after the PR

If you have init = "custom" [Same behaviour] ==> Error because W is not set
If you have init != "custom" [Changed behaviour]==> a new W is created with but now the shape of W will be the same as the shape of the W passed as argument

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test was added in d8b6bf8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have init != "custom" [Changed behaviour]==> a new W is created with but now the shape of W will be the same as the shape of the W passed as argument

I don't think this is a desirable behavior. It's also a breaking change. In that case we should completely ignore W and fallback to X.shape[1].

It requires a little bit of refactoring because it means that we can't compute n_components before checking in which situation we are. I directly pushed some changes to fix the behavior. The benefit is that we no longer have to rely on _num_samples and _num_features :)

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the missing test is added, LGTM.

@jeremiedbb jeremiedbb merged commit ee5d94e into scikit-learn:main Jun 26, 2023
@Alexlandeau
Copy link
Contributor Author

Thanks @jeremiedbb for the last review and suggestions !

@avigny
Copy link
Contributor

avigny commented Jun 27, 2023

Thanks!!

@yotamcons
Copy link
Contributor

Thank you @Alexlandeau for this!

punndcoder28 pushed a commit to punndcoder28/scikit-learn that referenced this pull request Jul 29, 2023
…cikit-learn#26634)

Co-authored-by: avigny <[email protected]>
Co-authored-by: Jérémie du Boisberranger <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
…cikit-learn#26634)

Co-authored-by: avigny <[email protected]>
Co-authored-by: Jérémie du Boisberranger <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NMF fit transform without updating H should not require the user to input "n_components"

5 participants