[MRG] Improve speed of plot_pca_vs_fa_model_selection.py example #13685

Framartin · 2019-04-21T03:34:46Z

Reference Issues/PRs

Partial fix to #13383

What does this implement/fix? Explain your changes.

I propose two modifications to the example examples/decomposition/plot_pca_vs_fa_model_selection.py mentioned in #13383 :

Increase the tolerance of the FactorAnalysis: FactorAnalysis(tol=1)
Do not run CV for n_components = 45.

The build time (on my computer) decreases from 11.846 to 3.437 seconds.

These modifications are motivated by the fact that among the 3 dimensional reduction techniques trained, only Factor Analysis for 40 and 45 components takes some significant time to fit:

Total running time of the script: ( 0 minutes 11.846 seconds)

I - Homoscedastic noise
0 components...
    Time PCA: 0:00:00.022663
    Time FA: 0:00:00.042642
5 components...
    Time PCA: 0:00:00.018533
    Time FA: 0:00:00.042504
10 components...
    Time PCA: 0:00:00.017095
    Time FA: 0:00:00.056470
15 components...
    Time PCA: 0:00:00.023689
    Time FA: 0:00:00.079793
20 components...
    Time PCA: 0:00:00.018067
    Time FA: 0:00:00.089906
25 components...
    Time PCA: 0:00:00.016740
    Time FA: 0:00:00.105046
30 components...
    Time PCA: 0:00:00.017797
    Time FA: 0:00:00.120986
35 components...
    Time PCA: 0:00:00.017244
    Time FA: 0:00:00.188839
40 components...
    Time PCA: 0:00:00.017586
    Time FA: 0:00:02.802308
45 components...
    Time PCA: 0:00:00.017656
    Time FA: 0:00:01.555045
Time PCA MLE: 0:00:00.000007

II - Heteroscedastic noise
0 components...
    Time PCA: 0:00:00.016803
    Time FA: 0:00:00.026970
5 components...
    Time PCA: 0:00:00.016859
    Time FA: 0:00:00.069176
10 components...
    Time PCA: 0:00:00.016116
    Time FA: 0:00:00.080635
15 components...
    Time PCA: 0:00:00.016484
    Time FA: 0:00:00.101170
20 components...
    Time PCA: 0:00:00.016745
    Time FA: 0:00:00.116705
25 components...
    Time PCA: 0:00:00.015989
    Time FA: 0:00:00.123466
30 components...
    Time PCA: 0:00:00.016480
    Time FA: 0:00:00.126200
35 components...
    Time PCA: 0:00:00.016477
    Time FA: 0:00:00.158986
40 components...
    Time PCA: 0:00:00.017423
    Time FA: 0:00:02.712774
45 components...
    Time PCA: 0:00:00.015865
    Time FA: 0:00:01.289101
Time PCA MLE: 0:00:00.000016

After applying the 2 modifications mentioned above, we achieve significant speed improvements:

I - Homoscedastic noise
0 components...
    Time PCA: 0:00:00.025566
    Time FA: 0:00:00.036936
5 components...
    Time PCA: 0:00:00.021508
    Time FA: 0:00:00.045077
10 components...
    Time PCA: 0:00:00.020512
    Time FA: 0:00:00.043922
15 components...
    Time PCA: 0:00:00.015972
    Time FA: 0:00:00.072845
20 components...
    Time PCA: 0:00:00.016356
    Time FA: 0:00:00.088447
25 components...
    Time PCA: 0:00:00.017344
    Time FA: 0:00:00.096822
30 components...
    Time PCA: 0:00:00.016814
    Time FA: 0:00:00.112733
35 components...
    Time PCA: 0:00:00.016290
    Time FA: 0:00:00.185826
40 components...
    Time PCA: 0:00:00.016499
    Time FA: 0:00:00.325999
Time PCA MLE: 0:00:00.000008

II - Heteroscedastic noise
0 components...
    Time PCA: 0:00:00.015870
    Time FA: 0:00:00.028239
5 components...
    Time PCA: 0:00:00.018323
    Time FA: 0:00:00.065301
10 components...
    Time PCA: 0:00:00.015181
    Time FA: 0:00:00.074517
15 components...
    Time PCA: 0:00:00.018003
    Time FA: 0:00:00.102947
20 components...
    Time PCA: 0:00:00.016931
    Time FA: 0:00:00.112086
25 components...
    Time PCA: 0:00:00.017937
    Time FA: 0:00:00.114989
30 components...
    Time PCA: 0:00:00.017200
    Time FA: 0:00:00.126424
35 components...
    Time PCA: 0:00:00.017375
    Time FA: 0:00:00.140677
40 components...
    Time PCA: 0:00:00.016992
    Time FA: 0:00:00.270111
Time PCA MLE: 0:00:00.000014

The printed chosen best numbers of components remain unchanged (I added the nature of the noise to improve its readability):

Homoscedastic Noise:
best n_components by PCA CV = 10
best n_components by FactorAnalysis CV = 10
best n_components by PCA MLE = 10
Heteroscedastic Noise:
best n_components by PCA CV = 35
best n_components by FactorAnalysis CV = 10
best n_components by PCA MLE = 38

However the 2 plots changed a bit:

Homoscedastic Noise

Original plot

Modified plot

Heteroscedastic Noise

Original plot

Modified plot

As you can see, estimated FA scores (average log-likelihood) are different for 40 components for both noises (and the points for 45 components are removed from the 2 plots). However I think that the plots are ok, because the interpretation is left unchanged as this part of the plot isn't important in the example.

agramfort · 2019-04-21T13:53:34Z

examples/decomposition/plot_pca_vs_fa_model_selection.py



 def compute_scores(X):
    pca = PCA(svd_solver='full')
-    fa = FactorAnalysis()
+    fa = FactorAnalysis(tol=1)


I am not sure it's a good idea to advertise using such a high tolerance without a comment or empirical evidence that it's ok.

my 2c

@agramfort Thanks for your comment. I totally agree! Do you think that the comment introduced in d79e45d is enough?

I also added the same comment in b742f21 for the plot_rbm_logistic_classification.py example where I also increased the tolerance in #13648

…ation.py example

agramfort · 2019-04-22T06:12:40Z

please show us how the tol affects the results. For FA how many iterations do you end up doing with tol=1 vs default?

cmarmo · 2022-01-24T07:52:03Z

I'm closing this one as the issue has been solved in #21671.

Framartin added 5 commits April 20, 2019 22:06

improve docstring indentation

fdf87a8

add tol=1 to FactorAnalysis and do not run CV for 45 components

81cddb7

add print statement to improve readability

2c7c917

add comment about the removal of 45 components in CV

a07794b

fix pep8

cc7049f

Framartin changed the title ~~Improve speed of plot_pca_vs_fa_model_selection.py example~~ [MRG] Improve speed of plot_pca_vs_fa_model_selection.py example Apr 21, 2019

agramfort reviewed Apr 21, 2019

View reviewed changes

Framartin added 2 commits April 21, 2019 21:58

add comment about tolerance value

d79e45d

add comment about tolerance value for the plot_rbm_logistic_classific…

b742f21

…ation.py example

amueller added the Needs work label Aug 6, 2019

Base automatically changed from master to main January 22, 2021 10:51

cmarmo closed this Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Improve speed of plot_pca_vs_fa_model_selection.py example #13685

[MRG] Improve speed of plot_pca_vs_fa_model_selection.py example #13685

Uh oh!

Framartin commented Apr 21, 2019 •

edited

Loading

Uh oh!

agramfort Apr 21, 2019

Uh oh!

Framartin Apr 22, 2019 •

edited

Loading

Uh oh!

agramfort commented Apr 22, 2019 via email

Uh oh!

cmarmo commented Jan 24, 2022

Uh oh!

Uh oh!

Uh oh!

[MRG] Improve speed of plot_pca_vs_fa_model_selection.py example #13685

[MRG] Improve speed of plot_pca_vs_fa_model_selection.py example #13685

Uh oh!

Conversation

Framartin commented Apr 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

agramfort Apr 21, 2019

Choose a reason for hiding this comment

Uh oh!

Framartin Apr 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agramfort commented Apr 22, 2019 via email

Uh oh!

cmarmo commented Jan 24, 2022

Uh oh!

Uh oh!

Framartin commented Apr 21, 2019 •

edited

Loading

Framartin Apr 22, 2019 •

edited

Loading