MAINT Parameters validation for decomposition.dict_learning #24871

OmarManzoor · 2022-11-09T10:30:07Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Added the validate_params decorator for dict_learning function.

Any other comments?

BenjaminBossan

Let me preface that I don't know what exactly this function is doing, so I can only review superficially. I only found a minor issue with the doc of alpha, the rest LGTM.

Setting max_iter=0 doesn't seem to do anything useful, but it's possible, so I guess we can allow 0.

BenjaminBossan · 2022-11-10T15:13:22Z

sklearn/decomposition/_dict_learning.py

+    {
+        "X": ["array-like"],
+        "n_components": [Interval(Integral, 1, None, closed="left")],
+        "alpha": [Interval(Real, 0, None, closed="left")],


According to the docstring, alpha should be an int. However, it works with float and looking at the code, it is passed to sparse_encode, whose docs say it can be float, so presumably the docstring is incorrect here. So it should probably be amended.

…_dict_learning

glemaitre · 2022-11-10T18:07:52Z

We iterated a bit on the problem of the validation of parameters for functions. Indeed, we are in the case where a class calls a function and we will have a double validation.

We already got this pattern in another issue: #24868 (comment)

The idea would be to revert the pattern: the function should instantiate an object and return the class attribute that is required. For instance, it what is intended here: #24884

Here we could isolate the main algorithm in a _dict_learning function that should not do any validation. Then, dict_learning will call DictionaryLearning. Since DictionaryLearning is validating parameter, we will not need to have the @validate_params decorator.

@OmarManzoor could you have a look at the class/function to see if this would make sense?

OmarManzoor · 2022-11-10T18:15:09Z

We iterated a bit on the problem of the validation of parameters for functions. Indeed, we are in the case where a class calls a function and we will have a double validation.

We already got this pattern in another issue: #24868 (comment)

The idea would be to revert the pattern: the function should instantiate an object and return the class attribute that is required. For instance, it what is intended here: #24884

Here we could isolate the main algorithm in a _dict_learning function that should not do any validation. Then, dict_learning will call DictionaryLearning. Since DictionaryLearning is validating parameter, we will not need to have the @validate_params decorator.

@OmarManzoor could you have a look at the class/function to see if this would make sense?

Sure I will check this out.

…_dict_learning

sklearn/decomposition/_dict_learning.py

…_dict_learning

glemaitre

It could be worth adding in the changelog that we expose the new parameter and attribute in the class.

Together with these changes, we should add a couple of unit tests to ensure that it works as expected. These tests probably exist already for the function and we should just adap them for the class.

jeremiedbb

Thanks for the PR @OmarManzoor. Here's a couple of suggestions.

Also, I agree that the tests about numerical stability can be improved, but it's kind of unrelated to the param validation. Could you open a separate PR for that ? it would make the reviews easier.

jeremiedbb · 2022-12-08T17:23:23Z

sklearn/decomposition/_dict_learning.py

+            estimator.code_,
+            estimator.components_,
+            estimator.error_,
+            estimator.n_iter_,
+        )
+    return estimator.code_, estimator.components_, estimator.error_


I don't think it's necessary to store the code. It can be even bigger than X for an overcomplete dictionary and make the estimator very heavy in memory. The code is what comes out of transform so we just have to call it here.

jeremiedbb · 2022-12-08T17:30:17Z

sklearn/decomposition/_dict_learning.py

+@validate_params(
+    {
+        "X": ["array-like"],
+        "n_components": [Interval(Integral, 1, None, closed="left"), None],
+        "alpha": [Interval(Real, 0, None, closed="left")],
+        "max_iter": [Interval(Integral, 0, None, closed="left")],
+        "tol": [Interval(Real, 0, None, closed="left")],
+        "method": [StrOptions({"lars", "cd"})],
+        "n_jobs": [Integral, None],
+        "dict_init": [np.ndarray, None],
+        "code_init": [np.ndarray, None],
+        "callback": [callable, None],
+        "verbose": ["verbose"],
+        "random_state": ["random_state"],
+        "return_n_iter": ["boolean"],
+        "positive_dict": ["boolean"],
+        "positive_code": ["boolean"],
+        "method_max_iter": [Interval(Integral, 0, None, closed="left")],
+    }
+)
+def dict_learning(


Sorry for the back and forth but it was finally decided that we can do partial validation for function parameters. It means that the dict should only contain the parameters that are not params of the DictionaryLearning class or that have different constraints.

…_dict_learning

jeremiedbb

I was looking at the failures in the CI so I directly pushed some changes to fix it. Should be good now. LGTM

OmarManzoor · 2022-12-09T13:10:38Z

I was looking at the failures in the CI so I directly pushed some changes to fix it. Should be good now. LGTM

I think the CI failures are related to errors in this test, test_sparse_pca.py. The resulting values don't match. I am not sure if the code_ that was being used earlier is similar to the code that we are using now using transform.

needs more investigation

jeremiedbb · 2022-12-09T17:44:51Z

@OmarManzoor it was indeed more complicated than expected :)
The issue was that the function used to do what an efficient fit_transform would do, but the class did not implement it. So I added a fit_transform method that is not just fit.transform. I also had to tweak a few things because we want to guarantee that fit.transform is equivalent to fit_transform, which is hard to ensure for this kind of algorithm.

…_dict_learning

OmarManzoor · 2022-12-12T09:03:22Z

@OmarManzoor it was indeed more complicated than expected :) The issue was that the function used to do what an efficient fit_transform would do, but the class did not implement it. So I added a fit_transform method that is not just fit.transform. I also had to tweak a few things because we want to guarantee that fit.transform is equivalent to fit_transform, which is hard to ensure for this kind of algorithm.

Looks good! Thank you for the updates.

doc/whats_new/v1.3.rst

…m/omarmanzoor/scikit-learn into pr/OmarManzoor/24871-1

jeremiedbb

LGTM. Thanks @OmarManzoor.

glemaitre · 2022-12-28T18:12:08Z

LGTM as well.

…earn#24871) Co-authored-by: jeremiedbb <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]>

MAINT Parameters validation for decomposition.dict_learning

9f76d68

github-actions bot added the module:decomposition label Nov 9, 2022

Fix tests

4fb462f

BenjaminBossan reviewed Nov 10, 2022

View reviewed changes

OmarManzoor added 2 commits November 10, 2022 22:51

Merge remote-tracking branch 'upstream/main' into validate_params_for…

66c750f

…_dict_learning

Fix docstring of alpha in dict_learning function

6042d6e

Adjust docstring and constraint of alpha

8820736

OmarManzoor added 3 commits November 11, 2022 16:47

Reorganize dict learning

3ff2caa

Merge remote-tracking branch 'upstream/main' into validate_params_for…

c012373

…_dict_learning

Set alpha to be int or float in dict_learning public function

7a1319e

glemaitre self-requested a review November 13, 2022 18:44

SarahRemus mentioned this pull request Nov 15, 2022

Make automatic validation for all scikit-learn public functions #24862

Closed