Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MAINT Parameters validation for decomposition.dict_learning #24871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

OmarManzoor
Copy link
Contributor

Reference Issues/PRs

Towards #24862

What does this implement/fix? Explain your changes.

  • Added the validate_params decorator for dict_learning function.

Any other comments?

Copy link
Contributor

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me preface that I don't know what exactly this function is doing, so I can only review superficially. I only found a minor issue with the doc of alpha, the rest LGTM.

Setting max_iter=0 doesn't seem to do anything useful, but it's possible, so I guess we can allow 0.

{
"X": ["array-like"],
"n_components": [Interval(Integral, 1, None, closed="left")],
"alpha": [Interval(Real, 0, None, closed="left")],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docstring, alpha should be an int. However, it works with float and looking at the code, it is passed to sparse_encode, whose docs say it can be float, so presumably the docstring is incorrect here. So it should probably be amended.

@glemaitre
Copy link
Member

We iterated a bit on the problem of the validation of parameters for functions. Indeed, we are in the case where a class calls a function and we will have a double validation.

We already got this pattern in another issue: #24868 (comment)

The idea would be to revert the pattern: the function should instantiate an object and return the class attribute that is required. For instance, it what is intended here: #24884

Here we could isolate the main algorithm in a _dict_learning function that should not do any validation. Then, dict_learning will call DictionaryLearning. Since DictionaryLearning is validating parameter, we will not need to have the @validate_params decorator.

@OmarManzoor could you have a look at the class/function to see if this would make sense?

@OmarManzoor
Copy link
Contributor Author

We iterated a bit on the problem of the validation of parameters for functions. Indeed, we are in the case where a class calls a function and we will have a double validation.

We already got this pattern in another issue: #24868 (comment)

The idea would be to revert the pattern: the function should instantiate an object and return the class attribute that is required. For instance, it what is intended here: #24884

Here we could isolate the main algorithm in a _dict_learning function that should not do any validation. Then, dict_learning will call DictionaryLearning. Since DictionaryLearning is validating parameter, we will not need to have the @validate_params decorator.

@OmarManzoor could you have a look at the class/function to see if this would make sense?

Sure I will check this out.

@jeremiedbb jeremiedbb added the Validation related to input validation label Nov 18, 2022
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be worth adding in the changelog that we expose the new parameter and attribute in the class.

Together with these changes, we should add a couple of unit tests to ensure that it works as expected. These tests probably exist already for the function and we should just adap them for the class.

@glemaitre glemaitre removed their request for review December 1, 2022 14:15
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @OmarManzoor. Here's a couple of suggestions.

Also, I agree that the tests about numerical stability can be improved, but it's kind of unrelated to the param validation. Could you open a separate PR for that ? it would make the reviews easier.

Comment on lines 1204 to 1209
estimator.code_,
estimator.components_,
estimator.error_,
estimator.n_iter_,
)
return estimator.code_, estimator.components_, estimator.error_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to store the code. It can be even bigger than X for an overcomplete dictionary and make the estimator very heavy in memory. The code is what comes out of transform so we just have to call it here.

Comment on lines 1036 to 1056
@validate_params(
{
"X": ["array-like"],
"n_components": [Interval(Integral, 1, None, closed="left"), None],
"alpha": [Interval(Real, 0, None, closed="left")],
"max_iter": [Interval(Integral, 0, None, closed="left")],
"tol": [Interval(Real, 0, None, closed="left")],
"method": [StrOptions({"lars", "cd"})],
"n_jobs": [Integral, None],
"dict_init": [np.ndarray, None],
"code_init": [np.ndarray, None],
"callback": [callable, None],
"verbose": ["verbose"],
"random_state": ["random_state"],
"return_n_iter": ["boolean"],
"positive_dict": ["boolean"],
"positive_code": ["boolean"],
"method_max_iter": [Interval(Integral, 0, None, closed="left")],
}
)
def dict_learning(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the back and forth but it was finally decided that we can do partial validation for function parameters. It means that the dict should only contain the parameters that are not params of the DictionaryLearning class or that have different constraints.

jeremiedbb
jeremiedbb previously approved these changes Dec 9, 2022
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at the failures in the CI so I directly pushed some changes to fix it. Should be good now. LGTM

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented Dec 9, 2022

I was looking at the failures in the CI so I directly pushed some changes to fix it. Should be good now. LGTM

I think the CI failures are related to errors in this test, test_sparse_pca.py. The resulting values don't match. I am not sure if the code_ that was being used earlier is similar to the code that we are using now using transform.

@jeremiedbb jeremiedbb dismissed their stale review December 9, 2022 13:13

needs more investigation

@jeremiedbb
Copy link
Member

@OmarManzoor it was indeed more complicated than expected :)
The issue was that the function used to do what an efficient fit_transform would do, but the class did not implement it. So I added a fit_transform method that is not just fit.transform. I also had to tweak a few things because we want to guarantee that fit.transform is equivalent to fit_transform, which is hard to ensure for this kind of algorithm.

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented Dec 12, 2022

@OmarManzoor it was indeed more complicated than expected :) The issue was that the function used to do what an efficient fit_transform would do, but the class did not implement it. So I added a fit_transform method that is not just fit.transform. I also had to tweak a few things because we want to guarantee that fit.transform is equivalent to fit_transform, which is hard to ensure for this kind of algorithm.

Looks good! Thank you for the updates.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @OmarManzoor.

@glemaitre
Copy link
Member

LGTM as well.

@glemaitre glemaitre enabled auto-merge (squash) December 28, 2022 18:12
@glemaitre glemaitre merged commit 5aa9b99 into scikit-learn:main Dec 28, 2022
@OmarManzoor OmarManzoor deleted the validate_params_for_dict_learning branch December 30, 2022 08:11
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 3, 2023
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:decomposition Validation related to input validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants