Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] Improve speed of plot_pca_vs_fa_model_selection.py example #13685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Framartin
Copy link
Contributor

@Framartin Framartin commented Apr 21, 2019

Reference Issues/PRs

Partial fix to #13383

What does this implement/fix? Explain your changes.

I propose two modifications to the example examples/decomposition/plot_pca_vs_fa_model_selection.py mentioned in #13383 :

  • Increase the tolerance of the FactorAnalysis: FactorAnalysis(tol=1)
  • Do not run CV for n_components = 45.

The build time (on my computer) decreases from 11.846 to 3.437 seconds.

These modifications are motivated by the fact that among the 3 dimensional reduction techniques trained, only Factor Analysis for 40 and 45 components takes some significant time to fit:

Total running time of the script: ( 0 minutes 11.846 seconds)

I - Homoscedastic noise
0 components...
    Time PCA: 0:00:00.022663
    Time FA: 0:00:00.042642
5 components...
    Time PCA: 0:00:00.018533
    Time FA: 0:00:00.042504
10 components...
    Time PCA: 0:00:00.017095
    Time FA: 0:00:00.056470
15 components...
    Time PCA: 0:00:00.023689
    Time FA: 0:00:00.079793
20 components...
    Time PCA: 0:00:00.018067
    Time FA: 0:00:00.089906
25 components...
    Time PCA: 0:00:00.016740
    Time FA: 0:00:00.105046
30 components...
    Time PCA: 0:00:00.017797
    Time FA: 0:00:00.120986
35 components...
    Time PCA: 0:00:00.017244
    Time FA: 0:00:00.188839
40 components...
    Time PCA: 0:00:00.017586
    Time FA: 0:00:02.802308
45 components...
    Time PCA: 0:00:00.017656
    Time FA: 0:00:01.555045
Time PCA MLE: 0:00:00.000007

II - Heteroscedastic noise
0 components...
    Time PCA: 0:00:00.016803
    Time FA: 0:00:00.026970
5 components...
    Time PCA: 0:00:00.016859
    Time FA: 0:00:00.069176
10 components...
    Time PCA: 0:00:00.016116
    Time FA: 0:00:00.080635
15 components...
    Time PCA: 0:00:00.016484
    Time FA: 0:00:00.101170
20 components...
    Time PCA: 0:00:00.016745
    Time FA: 0:00:00.116705
25 components...
    Time PCA: 0:00:00.015989
    Time FA: 0:00:00.123466
30 components...
    Time PCA: 0:00:00.016480
    Time FA: 0:00:00.126200
35 components...
    Time PCA: 0:00:00.016477
    Time FA: 0:00:00.158986
40 components...
    Time PCA: 0:00:00.017423
    Time FA: 0:00:02.712774
45 components...
    Time PCA: 0:00:00.015865
    Time FA: 0:00:01.289101
Time PCA MLE: 0:00:00.000016

After applying the 2 modifications mentioned above, we achieve significant speed improvements:

I - Homoscedastic noise
0 components...
    Time PCA: 0:00:00.025566
    Time FA: 0:00:00.036936
5 components...
    Time PCA: 0:00:00.021508
    Time FA: 0:00:00.045077
10 components...
    Time PCA: 0:00:00.020512
    Time FA: 0:00:00.043922
15 components...
    Time PCA: 0:00:00.015972
    Time FA: 0:00:00.072845
20 components...
    Time PCA: 0:00:00.016356
    Time FA: 0:00:00.088447
25 components...
    Time PCA: 0:00:00.017344
    Time FA: 0:00:00.096822
30 components...
    Time PCA: 0:00:00.016814
    Time FA: 0:00:00.112733
35 components...
    Time PCA: 0:00:00.016290
    Time FA: 0:00:00.185826
40 components...
    Time PCA: 0:00:00.016499
    Time FA: 0:00:00.325999
Time PCA MLE: 0:00:00.000008

II - Heteroscedastic noise
0 components...
    Time PCA: 0:00:00.015870
    Time FA: 0:00:00.028239
5 components...
    Time PCA: 0:00:00.018323
    Time FA: 0:00:00.065301
10 components...
    Time PCA: 0:00:00.015181
    Time FA: 0:00:00.074517
15 components...
    Time PCA: 0:00:00.018003
    Time FA: 0:00:00.102947
20 components...
    Time PCA: 0:00:00.016931
    Time FA: 0:00:00.112086
25 components...
    Time PCA: 0:00:00.017937
    Time FA: 0:00:00.114989
30 components...
    Time PCA: 0:00:00.017200
    Time FA: 0:00:00.126424
35 components...
    Time PCA: 0:00:00.017375
    Time FA: 0:00:00.140677
40 components...
    Time PCA: 0:00:00.016992
    Time FA: 0:00:00.270111
Time PCA MLE: 0:00:00.000014

The printed chosen best numbers of components remain unchanged (I added the nature of the noise to improve its readability):

Homoscedastic Noise:
best n_components by PCA CV = 10
best n_components by FactorAnalysis CV = 10
best n_components by PCA MLE = 10
Heteroscedastic Noise:
best n_components by PCA CV = 35
best n_components by FactorAnalysis CV = 10
best n_components by PCA MLE = 38

However the 2 plots changed a bit:

Homoscedastic Noise

Original plot
homo

Modified plot
homo_tol=1_loop_to_40

Heteroscedastic Noise

Original plot
hetero

Modified plot
hetero_tol=1_loop_to_40

As you can see, estimated FA scores (average log-likelihood) are different for 40 components for both noises (and the points for 45 components are removed from the 2 plots). However I think that the plots are ok, because the interpretation is left unchanged as this part of the plot isn't important in the example.

@Framartin Framartin changed the title Improve speed of plot_pca_vs_fa_model_selection.py example [MRG] Improve speed of plot_pca_vs_fa_model_selection.py example Apr 21, 2019


def compute_scores(X):
pca = PCA(svd_solver='full')
fa = FactorAnalysis()
fa = FactorAnalysis(tol=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it's a good idea to advertise using such a high tolerance without a comment or empirical evidence that it's ok.

my 2c

Copy link
Contributor Author

@Framartin Framartin Apr 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agramfort Thanks for your comment. I totally agree! Do you think that the comment introduced in d79e45d is enough?

I also added the same comment in b742f21 for the plot_rbm_logistic_classification.py example where I also increased the tolerance in #13648

@agramfort
Copy link
Member

agramfort commented Apr 22, 2019 via email

Base automatically changed from master to main January 22, 2021 10:51
@cmarmo
Copy link
Contributor

cmarmo commented Jan 24, 2022

I'm closing this one as the issue has been solved in #21671.

@cmarmo cmarmo closed this Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants