You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For sklearn.decomposition.PCA you can pass a decimal argument for n_components to specify the minimum explained variance ratio you want returned by fit(). Would it be possible to also have this behavior for TruncatedSVD?
The text was updated successfully, but these errors were encountered:
There was related discussion about this in #7973 . Generally I think it would be good to have better constency between PCA and TruncatedSVD. The implementation should take care to avoid issues raised in #10034 that were fixed in #10042 ...
So I'm a bit confused on this one -- isn't the point of TruncatedSVD that it only computes up to k components of SVD? So to calculate the distribution of explained variance (to determine how many components to keep), we'd first have to compute a fullsvd decomp, calculate the explained variances, find the cutoff amount, and retain the selected components right? That's at least how PCA does it. So, without truncating a full SVD, I don't know how we'd be able to decide a priori the right number of components.
Yes, it looks like in PCA a float n_compontents only works if svd_solver="full" (i.e. a full SVD is computed) while the TruncatedSVD doesn't have the algorithm="full". So if we wanted to implement this we would need to add it. I'm not sure overall it would be worth it.
Since it's an issue from 2018 and there wasn't much interest in it from users since, closing as won't fix. Thanks for investigating @Micky774 !
For
sklearn.decomposition.PCA
you can pass a decimal argument forn_components
to specify the minimum explained variance ratio you want returned byfit()
. Would it be possible to also have this behavior forTruncatedSVD
?The text was updated successfully, but these errors were encountered: