-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] Improve speed of plot_pca_vs_fa_model_selection.py example #13685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Improve speed of plot_pca_vs_fa_model_selection.py example #13685
Conversation
|
||
|
||
def compute_scores(X): | ||
pca = PCA(svd_solver='full') | ||
fa = FactorAnalysis() | ||
fa = FactorAnalysis(tol=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it's a good idea to advertise using such a high tolerance without a comment or empirical evidence that it's ok.
my 2c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agramfort Thanks for your comment. I totally agree! Do you think that the comment introduced in d79e45d is enough?
I also added the same comment in b742f21 for the plot_rbm_logistic_classification.py
example where I also increased the tolerance in #13648
please show us how the tol affects the results. For FA how many iterations
do you end up doing with tol=1 vs default?
|
I'm closing this one as the issue has been solved in #21671. |
Reference Issues/PRs
Partial fix to #13383
What does this implement/fix? Explain your changes.
I propose two modifications to the example
examples/decomposition/plot_pca_vs_fa_model_selection.py
mentioned in #13383 :FactorAnalysis(tol=1)
n_components = 45
.The build time (on my computer) decreases from 11.846 to 3.437 seconds.
These modifications are motivated by the fact that among the 3 dimensional reduction techniques trained, only Factor Analysis for 40 and 45 components takes some significant time to fit:
After applying the 2 modifications mentioned above, we achieve significant speed improvements:
The printed chosen best numbers of components remain unchanged (I added the nature of the noise to improve its readability):
However the 2 plots changed a bit:
Homoscedastic Noise
Original plot

Modified plot

Heteroscedastic Noise
Original plot

Modified plot

As you can see, estimated FA scores (average log-likelihood) are different for 40 components for both noises (and the points for 45 components are removed from the 2 plots). However I think that the plots are ok, because the interpretation is left unchanged as this part of the plot isn't important in the example.