-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH expose n_oversamples in PCA when using solver="randomized" #21109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH expose n_oversamples in PCA when using solver="randomized" #21109
Conversation
The variable n_oversamples in func sklearn.utils.extmath.randomized_svd is 10 by default. The outer function sklearn.decomposition.PCA.fit_transform cannot modify the variable. So when the input data feature is greater than 10, the result of svd will be quite different and there will be a big error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremiedbb you probably want to give your advise regarding which API to use for this parameter.
I'd keep the API of randomized_svd, i.e an integer. I agree that it might be interesting to have it being a float representing a ratio instead of an absolute value. But then we should discuss about changing it in randomized_svd directly. |
First,thank you for your reply.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some tiny suggestions.
@x-shadow-man: the CI fails because of some changes made to the code have not been properly formatted. Setting up the pre-commit setup (see the optional step n.9 of this section) will make you able to work more easily and will prevent problems in your code and in the jobs run on the CI. 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some last tiny suggestions, and after fixing the code formatting issues, this will LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you, @x-shadow-man.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, otherwise LGTM. thanks @x-shadow-man
You should check that my suggestion are working with |
Thanks @x-shadow-man Merging |
Thanks for your help, I am also happy to solve this problem! |
…t-learn#21109) Co-authored-by: Guillaume Lemaitre <[email protected]>
…t-learn#21109) Co-authored-by: Guillaume Lemaitre <[email protected]>
…t-learn#21109) Co-authored-by: Guillaume Lemaitre <[email protected]>
Reference Issues/PRs
What does this implement/fix? Explain your changes.
fix:#20589
PCA returns highly inaccurate results when number of features is large #20589
Any other comments?