-
-
Notifications
You must be signed in to change notification settings - Fork 26k
MAINT Parameters validation for decomposition.dict_learning #24871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT Parameters validation for decomposition.dict_learning #24871
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me preface that I don't know what exactly this function is doing, so I can only review superficially. I only found a minor issue with the doc of alpha
, the rest LGTM.
Setting max_iter=0
doesn't seem to do anything useful, but it's possible, so I guess we can allow 0.
{ | ||
"X": ["array-like"], | ||
"n_components": [Interval(Integral, 1, None, closed="left")], | ||
"alpha": [Interval(Real, 0, None, closed="left")], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docstring, alpha
should be an int. However, it works with float and looking at the code, it is passed to sparse_encode
, whose docs say it can be float, so presumably the docstring is incorrect here. So it should probably be amended.
We iterated a bit on the problem of the validation of parameters for functions. Indeed, we are in the case where a class calls a function and we will have a double validation. We already got this pattern in another issue: #24868 (comment) The idea would be to revert the pattern: the function should instantiate an object and return the class attribute that is required. For instance, it what is intended here: #24884 Here we could isolate the main algorithm in a @OmarManzoor could you have a look at the class/function to see if this would make sense? |
Sure I will check this out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be worth adding in the changelog that we expose the new parameter and attribute in the class.
Together with these changes, we should add a couple of unit tests to ensure that it works as expected. These tests probably exist already for the function and we should just adap them for the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @OmarManzoor. Here's a couple of suggestions.
Also, I agree that the tests about numerical stability can be improved, but it's kind of unrelated to the param validation. Could you open a separate PR for that ? it would make the reviews easier.
estimator.code_, | ||
estimator.components_, | ||
estimator.error_, | ||
estimator.n_iter_, | ||
) | ||
return estimator.code_, estimator.components_, estimator.error_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary to store the code. It can be even bigger than X for an overcomplete dictionary and make the estimator very heavy in memory. The code is what comes out of transform
so we just have to call it here.
@validate_params( | ||
{ | ||
"X": ["array-like"], | ||
"n_components": [Interval(Integral, 1, None, closed="left"), None], | ||
"alpha": [Interval(Real, 0, None, closed="left")], | ||
"max_iter": [Interval(Integral, 0, None, closed="left")], | ||
"tol": [Interval(Real, 0, None, closed="left")], | ||
"method": [StrOptions({"lars", "cd"})], | ||
"n_jobs": [Integral, None], | ||
"dict_init": [np.ndarray, None], | ||
"code_init": [np.ndarray, None], | ||
"callback": [callable, None], | ||
"verbose": ["verbose"], | ||
"random_state": ["random_state"], | ||
"return_n_iter": ["boolean"], | ||
"positive_dict": ["boolean"], | ||
"positive_code": ["boolean"], | ||
"method_max_iter": [Interval(Integral, 0, None, closed="left")], | ||
} | ||
) | ||
def dict_learning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the back and forth but it was finally decided that we can do partial validation for function parameters. It means that the dict should only contain the parameters that are not params of the DictionaryLearning class or that have different constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the failures in the CI so I directly pushed some changes to fix it. Should be good now. LGTM
I think the CI failures are related to errors in this test, test_sparse_pca.py. The resulting values don't match. I am not sure if the |
@OmarManzoor it was indeed more complicated than expected :) |
Looks good! Thank you for the updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @OmarManzoor.
LGTM as well. |
…earn#24871) Co-authored-by: jeremiedbb <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]>
…earn#24871) Co-authored-by: jeremiedbb <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]>
…earn#24871) Co-authored-by: jeremiedbb <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]>
Reference Issues/PRs
Towards #24862
What does this implement/fix? Explain your changes.
Any other comments?