MNT accelerate plot_iterative_imputer_variants_comparison.py #21748

siavrez · 2021-11-22T22:39:40Z

Adding bootstrapping to Extratrees with 0.75 sample_fraction improves runtime 4.8 seconds in 5 folds and 3 seconds in 3 folds. Also changed number of folds to 3. Total runtime is now 10 .1 +/- 1.3 seconds. from 24 +/- 3.3 seconds.

Reference Issues/PRs

#21598

What does this implement/fix? Explain your changes.

Any other comments?

…raping to ETrees and changed folds to 3

siavrez · 2021-11-22T23:06:45Z

MSE using 3 folds instead of 5:

5 fold:
Original Full Data 0.631302
SimpleImputer mean 0.826854
median 0.832756
IterativeImputer BayesianRidge 0.695367
DecisionTreeRegressor 0.764438
ExtraTreesRegressor 0.701408
KNeighborsRegressor 0.834774

3 fold:
Original Full Data 0.657900
SimpleImputer mean 0.868160
median 0.875723
IterativeImputer BayesianRidge 0.676844
DecisionTreeRegressor 0.903947
ExtraTreesRegressor 0.744838
KNeighborsRegressor 0.869787

MSE is worse for all but difference is similar.

examples/impute/plot_iterative_imputer_variants_comparison.py

… True

examples/impute/plot_iterative_imputer_variants_comparison.py

glemaitre

LGTM

adrinjalali · 2021-11-24T12:53:20Z

We have a lot of ConvergenceWarning reported by IterativeImputer now, we should make sure examples don't have such warnings:

/home/circleci/project/sklearn/impute/_iterative.py:700: ConvergenceWarning: [IterativeImputer] Early stopping criterion not reached.
  warnings.warn(

siavrez · 2021-11-24T13:40:39Z

I'll try to find a way to avoid ConvergenceWarning .

siavrez · 2021-11-24T14:56:03Z

The only other example of iterative imputation also uses California housing dataset. If using other datasets for this one is an option I can try to find the one with minimum ConvergenceWarning .

siavrez · 2021-11-24T16:12:08Z

I checked the implementation of tol in ItrativeImputer : normalized_tol = self.tol * np.max(np.abs(X[~mask_missing_values])) . One possible explanation is California housing dataset is full of outliers and in this scenario tolerance checking is only based on outlier values (for 5 variables).

… max_iter

siavrez · 2021-11-24T16:51:21Z

Even after 250 iterations DecisionTreeRegressor does not converge for 5 variables.

glemaitre · 2021-11-24T22:04:17Z

We have a lot of ConvergenceWarning reported by IterativeImputer now, we should make sure examples don't have such warnings

@adrinjalali
This is the issue of the IterativeImputer. I don't think that we can do anything to remove these warnings. They are already raised on the original example on my computer.

xref: #14338

I was planning to have a look at this.

siavrez · 2021-11-24T22:07:21Z

There's no warnings with the new implementation, but I had to change the Tree and set the tolerance for each estimator.

siavrez · 2021-11-24T22:29:07Z

Iterative Imputation without scaling:

Original Full Data 0.631302
SimpleImputer mean 0.826854
median 0.832756
IterativeImputer BayesianRidge 0.696538
RandomForestRegressor 0.713138
ExtraTreesRegressor 0.714369
KNeighborsRegressor 0.837740

With Robust Scaling:

Original Full Data 0.630870
SimpleImputer mean 0.830052
median 0.835994
IterativeImputer BayesianRidge 0.699316
RandomForestRegressor 0.712361
ExtraTreesRegressor 0.727760
KNeighborsRegressor 0.779323

adrinjalali

otherwise, plus @ogrisel's suggestion, LGTM.

examples/impute/plot_iterative_imputer_variants_comparison.py

ogrisel

The code looks good and the speed-up is nice but the top-level docstring still needs to be adapted to reflect the content of the code.

ExtraTreesRegressor need to be replace by RandomForestRegressor in several occurrences;
mentions of DecisionTreeRegressor needs to be removed;
the pipeline with the expansion of a degree 2 polynomial kernel needs to be introduce.

And while we are at it we could add a final comment emphasizing that while some methods are seemingly better than others on average, the error bars observed on the cross-validated scores are still very wide in call cases.

We could finally emphasize that some estimators such as HistGradientBoostingRegressor can natively deal with missing features and are often recommended over building pipelines with complex and costly missing values imputation strategies.

examples/impute/plot_iterative_imputer_variants_comparison.py

…ity to deal with missing values

siavrez · 2021-12-08T10:49:20Z

Egress is over the account limit seems to be the cause of failing tests.

glemaitre

Otherwise LGTM.

examples/impute/plot_iterative_imputer_variants_comparison.py

Co-authored-by: Guillaume Lemaitre <[email protected]>

glemaitre

LGTM

examples/impute/plot_iterative_imputer_variants_comparison.py

jeremiedbb

time is now 4sec instead of 16sec. LGTM. Thanks @siavrez !

…cikit-learn#21748) Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Adrin Jalali <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]>

accelerate plot_iterative_imputer_variants_comparison.py added bootst…

a2dce00

…raping to ETrees and changed folds to 3

glemaitre reviewed Nov 23, 2021

View reviewed changes

examples/impute/plot_iterative_imputer_variants_comparison.py Outdated Show resolved Hide resolved

glemaitre reviewed Nov 23, 2021

View reviewed changes

examples/impute/plot_iterative_imputer_variants_comparison.py Outdated Show resolved Hide resolved

glemaitre changed the title ~~accelerate plot_iterative_imputer_variants_comparison.py added bootst…~~ MNT accelerate plot_iterative_imputer_variants_comparison.py Nov 23, 2021

siavrez added 2 commits November 23, 2021 16:04

Added more params to ETregressor to reduce runtime

7893fc9

removed bootstrap=True, with max_sample param bootstrap is changed to…

2452df2

… True

adrinjalali mentioned this pull request Nov 23, 2021

Accelerate slow examples #21598

Closed

41 tasks

adrinjalali reviewed Nov 23, 2021

View reviewed changes

examples/impute/plot_iterative_imputer_variants_comparison.py Outdated Show resolved Hide resolved

examples/impute/plot_iterative_imputer_variants_comparison.py Outdated Show resolved Hide resolved

change the folds to 5

64eec52

glemaitre approved these changes Nov 24, 2021

View reviewed changes

changing tree with random forest add tolerance for each model, change…

4415648

… max_iter

adrinjalali reviewed Nov 25, 2021

View reviewed changes

examples/impute/plot_iterative_imputer_variants_comparison.py Outdated Show resolved Hide resolved

siavrez added 2 commits November 25, 2021 20:14

removed comment

0f70fdb

added bootstrap=True to randomforest

f591ec0

glemaitre self-requested a review November 25, 2021 17:25

glemaitre reviewed Nov 25, 2021

View reviewed changes

examples/impute/plot_iterative_imputer_variants_comparison.py Show resolved Hide resolved

examples/impute/plot_iterative_imputer_variants_comparison.py Outdated Show resolved Hide resolved

examples/impute/plot_iterative_imputer_variants_comparison.py Show resolved Hide resolved

glemaitre self-requested a review November 25, 2021 17:37

siavrez added 2 commits November 25, 2021 21:15

added comment for tolerance

8e36e9b

added bootstrap=True to ET

a4855cc

glemaitre reviewed Nov 25, 2021

View reviewed changes

examples/impute/plot_iterative_imputer_variants_comparison.py Show resolved Hide resolved

glemaitre self-requested a review November 25, 2021 18:05

siavrez added 2 commits November 25, 2021 21:45

changed tolerance comment place

4509282

change ET with Ny-Ridge pipeline

b8947b6

siavrez requested a review from ogrisel November 28, 2021 09:43

ogrisel reviewed Dec 7, 2021

View reviewed changes

examples/impute/plot_iterative_imputer_variants_comparison.py Outdated Show resolved Hide resolved

Changed docstring and added a comment about HistGradientBoosting abil…

84e77a2

…ity to deal with missing values

Changed docstring

5bc1ae2

glemaitre approved these changes Dec 15, 2021

View reviewed changes

examples/impute/plot_iterative_imputer_variants_comparison.py Outdated Show resolved Hide resolved

examples/impute/plot_iterative_imputer_variants_comparison.py Outdated Show resolved Hide resolved

siavrez and others added 2 commits December 15, 2021 20:51

Update examples/impute/plot_iterative_imputer_variants_comparison.py

bbf468c

Co-authored-by: Guillaume Lemaitre <[email protected]>

Update examples/impute/plot_iterative_imputer_variants_comparison.py

e8817d5

Co-authored-by: Guillaume Lemaitre <[email protected]>

siavrez requested a review from ogrisel January 7, 2022 15:17

siavrez requested review from glemaitre and adrinjalali January 19, 2022 20:15

glemaitre approved these changes Jan 24, 2022

View reviewed changes

cmarmo added module:impute Waiting for Reviewer labels Feb 1, 2022

Merge branch 'main' into accelerate_examples7

4f0187b

jeremiedbb reviewed Feb 23, 2022

View reviewed changes

examples/impute/plot_iterative_imputer_variants_comparison.py Outdated Show resolved Hide resolved

examples/impute/plot_iterative_imputer_variants_comparison.py Outdated Show resolved Hide resolved

examples/impute/plot_iterative_imputer_variants_comparison.py Outdated Show resolved Hide resolved

jeremiedbb added 4 commits February 23, 2022 13:46

Update examples/impute/plot_iterative_imputer_variants_comparison.py

0babf50

Update examples/impute/plot_iterative_imputer_variants_comparison.py

804b3a0

Update examples/impute/plot_iterative_imputer_variants_comparison.py

b30b128

Merge branch 'main' into accelerate_examples7

2d6144d

jeremiedbb approved these changes Feb 23, 2022

View reviewed changes

jeremiedbb merged commit 8286f02 into scikit-learn:main Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNT accelerate plot_iterative_imputer_variants_comparison.py #21748

MNT accelerate plot_iterative_imputer_variants_comparison.py #21748

siavrez commented Nov 22, 2021

siavrez commented Nov 22, 2021 •

edited

Loading

glemaitre left a comment

adrinjalali commented Nov 24, 2021

siavrez commented Nov 24, 2021

siavrez commented Nov 24, 2021

siavrez commented Nov 24, 2021 •

edited

Loading

siavrez commented Nov 24, 2021

glemaitre commented Nov 24, 2021

siavrez commented Nov 24, 2021 •

edited

Loading

siavrez commented Nov 24, 2021 •

edited

Loading

adrinjalali left a comment

ogrisel left a comment •

edited

Loading

siavrez commented Dec 8, 2021

glemaitre left a comment

glemaitre left a comment

jeremiedbb left a comment

MNT accelerate plot_iterative_imputer_variants_comparison.py #21748

MNT accelerate plot_iterative_imputer_variants_comparison.py #21748

Conversation

siavrez commented Nov 22, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

siavrez commented Nov 22, 2021 • edited Loading

glemaitre left a comment

Choose a reason for hiding this comment

adrinjalali commented Nov 24, 2021

siavrez commented Nov 24, 2021

siavrez commented Nov 24, 2021

siavrez commented Nov 24, 2021 • edited Loading

siavrez commented Nov 24, 2021

glemaitre commented Nov 24, 2021

siavrez commented Nov 24, 2021 • edited Loading

siavrez commented Nov 24, 2021 • edited Loading

adrinjalali left a comment

Choose a reason for hiding this comment

ogrisel left a comment • edited Loading

Choose a reason for hiding this comment

siavrez commented Dec 8, 2021

glemaitre left a comment

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

jeremiedbb left a comment

Choose a reason for hiding this comment

siavrez commented Nov 22, 2021 •

edited

Loading

siavrez commented Nov 24, 2021 •

edited

Loading

siavrez commented Nov 24, 2021 •

edited

Loading

siavrez commented Nov 24, 2021 •

edited

Loading

ogrisel left a comment •

edited

Loading