IterativeImputer not converging (at all) #14338

amueller · 2019-07-13T19:15:48Z

Observed in #14330: the IterativeImputer doesn't converge, as a matter of fact, the convergence criterion doesn't seem to go down with iterations.
To me that indicates that either there's an issue with the implementation, an issue with the metric, or that our stopping criterion is not suitable.

Were there examples on which it converged? It would be nice to plot the convergence but we don't allow callbacks like that. But just printing it on https://github.com/scikit-learn/scikit-learn/blob/master/examples/impute/plot_iterative_imputer_variants_comparison.py shows that it doesn't go down.

amueller · 2019-07-13T19:22:53Z

It seems to converge for BayesianRidge but not for anything else?

jnothman · 2019-07-14T02:21:42Z

While developing iterative imputer we realised that RidgeCV had sharp changes across iterations due to selecting a different alpha. So you might need something equivalent to decreasing learning rate to stop big fluctuations.

amueller · 2019-07-14T16:00:27Z

Similar things seem true for RandomForestRegressor and DecisionTreeRegressor.
Does missForest do a learning rate?

amueller · 2019-07-14T20:40:22Z

After each iteration the difference between the previous and the new imputed data matrix is assessed
for the continuous and categorical parts. The stopping criterion is defined such that the imputation
process is stopped as soon as both differences have become larger once. In case of only one type
of variable the computation stops as soon as the corresponding difference goes up for the first time.
However, the imputation last performed where both differences went up is generally less accurate
than the previous one. Therefore, whenever the computation stops due to the stopping criterion (and
not due to ’maxiter’) the before last imputation matrix is returned.

That seems... strange and is quite different from what we're doing.

amueller · 2019-07-14T20:43:29Z

This is the paper:
https://academic.oup.com/bioinformatics/article/28/1/112/219101

They are also using a normalized error, I don't think we're doing that.

sergeyf · 2019-08-06T18:21:53Z

It makes sense for it not to converge if we're sampling from the posterior. I haven't thought about the non-sampling case, and all of our close studies were with BayesianRidge, where conversion seemed fine (as already pointed out).

We added convergence to have more feature-parity with missForest, but our stopping criterion is different (as you pointed out). Have any of your examples suggested a better criterion? Should we try a switch to NRMSE?

Here is our current criterion given a user-passed tol:

normalized_tol = tol * np.max(np.abs(X[~mask_missing_values]))
inf_norm = np.linalg.norm(Xt - Xt_previous, ord=np.inf, axis=None)
if inf_norm < normalized_tol:
    stopped = True
else:
    stopped = False

If we used NRMSE, we'd get something like this:

normalized_tol = np.sqrt(tol) * np.std(X[~mask_missing_values])
rmse = np.sqrt(np.mean((Xt - Xt_previous)**2)
if rmse < normalized_tol:
    stopped = True
else:
    stopped = False

amueller · 2019-08-06T19:07:06Z

I haven't done experiments. It's a bit strange to do something that's not in the literature. But if we'd implement the literature we'd have different convergence criteria based on the BaseEstimator which is also strange, and we wouldn't know what to do in general.

My main problem right now is that the behavior for forests is non-sensical. We will always tell the user we didn't converge, independent of max_iter, and potentially waste a lot of time.

We could just not warn on non-convergence but I have no idea if that's better or worse. I really don't like useless and even misleading warnings.

If you can find a better convergence criterion that would be great, but I figure the authors of MissForest have tried that and didn't find one?

sergeyf · 2019-08-06T19:26:35Z

I can mess around with a few different criteria with RandomForests and RidgeCV as the estimators. Do you have a quick code snippet showing that it doesn't converge on some standard dataset that I can start with?

amueller · 2019-08-06T20:31:24Z

#14330, our example that's currently in master.

sergeyf · 2019-08-06T20:47:04Z

Thanks, I'll take a look when work dies down later this week.

…

On Tue, Aug 6, 2019 at 1:33 PM Andreas Mueller ***@***.***> wrote: #14330 <#14330>, our example that's currently in master. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14338?email_source=notifications&email_token=AAOJV3HRGP4BX6QMPHNTW6DQDHNZLA5CNFSM4IDA6S52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3WL7AQ#issuecomment-518832002>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOJV3BRJK4R53MGVBULAWTQDHNZLANCNFSM4IDA6S5Q> .

sergeyf · 2019-08-07T15:47:05Z

It does look like the convergence criterion we currently have is much harder to reach than the one based on L2 norm that I sketched above. Even having that one satisfied requires raising tol=0.01 in the California housing example.

Also, the resulting values (with verbose=True) suggest that the MissForest approach (stop as soon as difference goes up once) will stop almost instantly:

[IterativeImputer] Completing matrix with shape (1651, 8)
[IterativeImputer] Change: 150.62540875988776, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 168.4118759459612, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 160.59066495975725, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 166.9114776657906, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 166.44792073992258, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 186.05851770595208, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 162.45250321704452, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 197.03936766728808, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 173.7129718576556, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 170.6855623698636, scaled tolerance: 73.90341902767585 
c:\users\serge\github\scikit-learn\sklearn\impute\_iterative.py:609: ConvergenceWarning: [IterativeImputer] Early stopping criterion not reached.

Note that the change goes up right away from 150

Here the change is np.sqrt(np.mean((Xt - Xt_previous)**2))
and scaled tolerance is normalized_tol = np.sqrt(self.tol) * np.std(X[~mask_missing_values]).

It's bizarre to either have it stop instantly OR to just keep going until max_tol is reached. Here's another one where the early stopping criterion isn't reached:

[IterativeImputer] Completing matrix with shape (1651, 8)
[IterativeImputer] Change: 174.54944838828126, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 173.66703582762716, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 149.5384099099918, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 150.3383817915679, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 136.13070686825725, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 182.6894084181655, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 196.4934645783515, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 156.6539777730629, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 164.56931419726476, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 141.13059298301417, scaled tolerance: 74.06775309177029

The difference becomes larger for the first time right away from 174 to 173.

My intuition is that we should switch to the L2 version and raised the default tol.

jnothman · 2019-08-08T01:13:52Z

Would the approach from missforest work if we required that the criterion is held for multiple iterations as per n_iter_no_change?

sergeyf · 2019-08-08T01:36:00Z

Here's the first dataset:
150, 168, 160, 166, 166, 186, 162, 197, 173, 170
For n_iter_no_change==1 -> stops when we see 168 and backtrack to the 150 iter.
For n_iter_no_change==2 -> stops when we see 160 and backtrack to the 150 iter?!

174, 173, 149, 150, 136, 182, 196, 156, 164, 141
For n_iter_no_change==1 -> stops when we see 150 and backtrack to the 149 iter.
For n_iter_no_change==2 -> stops when we see 196 and backtrack to the 136 iter.

With larger n_iter_no_change we have to keep more results in memory so we can backtrack. I'm not sure if it's worth it...

amueller · 2019-08-16T20:21:48Z

Running this on several datasets my intuition was that sometimes missforest just doesn't really do much and it is done after one iteration, which seems fine. And in other cases it actually converges.

amueller · 2019-08-16T20:22:11Z

Maybe we should try the datasets from the missforest paper?

jnothman · 2019-10-27T22:59:45Z

I don't think this is a blocker. Certainly not as long as iterative imputer is regarded as experimental

jnothman · 2019-11-05T22:19:37Z

Moving to 0.23

adrinjalali · 2020-04-20T10:03:19Z

moving to 0.24. We probably should start thinking of this as "high priority", as tagged lol.

amueller · 2020-06-08T19:51:04Z

I think the problem is that no-one has an idea how to solve it, right?

We could:
a) magic the stopping criterion based on the estimator (fragile)
b) make the stopping criterion a parameter and have the user choose it, so that when there's a convergence warning they could choose to change it
c) make the stopping criterion a required parameter
d) have different estimators with different stopping criteria or even different built-in base estimators, say a "IterativeForestImputation" and "IterativeLinearImputation".

I guess b) isn't too bad, if people read warning messages?

sergeyf · 2020-06-08T19:59:41Z

I would also vote for (b), and add my previous conclusion: My intuition is that we should switch to the L2 version and raised the default tol.

adrinjalali · 2020-06-09T08:19:20Z

We could also:

allow a callable stopping criterion for users to fine tune it
accept an iteration_hyperparams parameter which gives the hyper parameters to the base estimator at each iteration, based on the iteration number and loss maybe? This can be a list of length n_iter of dict of params or a callable giving the new hyper parameters at each iteration.

Wouldn't that be flexible enough?

jnothman · 2020-06-25T14:35:55Z

Wouldn't that be flexible enough?

Flexible enough but not usable enough? I like the idea of having the user specify a hyperparam path or something. If it existed: (a) would it achieve the goal of improving the chance of meaningful convergence; and (b) would it be clear how a user should use it?

adrinjalali · 2020-06-26T12:45:28Z

I think good documentation should be good enough to handle the usability. Also, for both of these parameters I'm suggesting some accepted literals which would be used by non-advanced users. It would also allow us to have some magical defaults which usually converge, and if not, the user has enough flexibility to fine tune the imputation.

ghost · 2020-12-08T12:53:37Z

Hi,

So if I understand correctly, the convergence error will always be produced, regardless of the max_iter? I am currently running it on a dataset (133,70) with the ExtraTreeRegressor and with max_iter = 60 it still produces the error and takes forever to impute.

MhdAlkh · 2021-05-27T09:08:15Z

Hi,
I'm having the error " ConvergenceWarning: [IterativeImputer] Early stopping criterion not reached." reached.", ConvergenceWarning)".
Any conclusion on how to solve it?

shuq007 · 2021-08-21T05:00:38Z

Have changed most of the parameters for ConvergenceWarning that is only slowing down the process but no change in warnings. So, I use the following lines to ignore warnings that suppress all the warnings:

import warnings
warnings.filterwarnings(action='ignore', category=DeprecationWarning)
warnings.filterwarnings(action='ignore', category=UserWarning)
warnings.filterwarnings(action='ignore', category=FutureWarning)
warnings.filterwarnings(action='ignore', category=RuntimeWarning)

amueller mentioned this issue Jul 13, 2019

DOC: fix convergence warning in iterative example #14330

Closed

amueller mentioned this issue Jul 14, 2019

add more verbosity to iterative imputer #14367

Merged

LuxMiranda mentioned this issue Jul 16, 2019

IterativeImputer does not behave as expected when using fit() and transform() on a train/test split versus using fit_transform() on an entire dataset #14383

Closed

amueller mentioned this issue Aug 6, 2019

[WIP] Example of multiple imputation with IterativeImputer #13025

Open

adrinjalali added this to the 0.22 milestone Oct 21, 2019

adrinjalali added Blocker High Priority High priority issues and pull requests Moderate Anything that requires some knowledge of conventions and best practices labels Oct 21, 2019

adrinjalali removed the Blocker label Oct 29, 2019

jnothman modified the milestones: 0.22, 0.23 Nov 5, 2019

amueller mentioned this issue Mar 4, 2020

Any plans to stabilize IterativeImputer? What are the current roadblocks to doing so? #16638

Closed

rth mentioned this issue Apr 14, 2020

Callbacks API #16925

Closed

5 tasks

adrinjalali modified the milestones: 0.23, 0.24 Apr 20, 2020

cmarmo removed this from the 0.24 milestone Oct 15, 2020

This was referenced Nov 20, 2021

Improve documentation of the impute module #21722

Open

MNT accelerate plot_iterative_imputer_variants_comparison.py #21748

Merged

jeremiedbb mentioned this issue Dec 16, 2021

[WIP] Callback API continued #22000

Draft

4 tasks

cmarmo added the module:impute label Mar 23, 2022

cmarmo mentioned this issue Oct 30, 2022

first attempt to change iterative_imputer #22078

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IterativeImputer not converging (at all) #14338

IterativeImputer not converging (at all) #14338

amueller commented Jul 13, 2019

amueller commented Jul 13, 2019

jnothman commented Jul 14, 2019 via email

amueller commented Jul 14, 2019

amueller commented Jul 14, 2019

amueller commented Jul 14, 2019

sergeyf commented Aug 6, 2019 •

edited

Loading

amueller commented Aug 6, 2019

sergeyf commented Aug 6, 2019

amueller commented Aug 6, 2019

sergeyf commented Aug 6, 2019 via email

sergeyf commented Aug 7, 2019

jnothman commented Aug 8, 2019 via email

sergeyf commented Aug 8, 2019

amueller commented Aug 16, 2019

amueller commented Aug 16, 2019

jnothman commented Oct 27, 2019

jnothman commented Nov 5, 2019

adrinjalali commented Apr 20, 2020

amueller commented Jun 8, 2020 •

edited

Loading

sergeyf commented Jun 8, 2020

adrinjalali commented Jun 9, 2020

jnothman commented Jun 25, 2020

adrinjalali commented Jun 26, 2020

ghost commented Dec 8, 2020

MhdAlkh commented May 27, 2021

shuq007 commented Aug 21, 2021 •

edited

Loading

IterativeImputer not converging (at all) #14338

IterativeImputer not converging (at all) #14338

Comments

amueller commented Jul 13, 2019

amueller commented Jul 13, 2019

jnothman commented Jul 14, 2019 via email

amueller commented Jul 14, 2019

amueller commented Jul 14, 2019

amueller commented Jul 14, 2019

sergeyf commented Aug 6, 2019 • edited Loading

amueller commented Aug 6, 2019

sergeyf commented Aug 6, 2019

amueller commented Aug 6, 2019

sergeyf commented Aug 6, 2019 via email

sergeyf commented Aug 7, 2019

jnothman commented Aug 8, 2019 via email

sergeyf commented Aug 8, 2019

amueller commented Aug 16, 2019

amueller commented Aug 16, 2019

jnothman commented Oct 27, 2019

jnothman commented Nov 5, 2019

adrinjalali commented Apr 20, 2020

amueller commented Jun 8, 2020 • edited Loading

sergeyf commented Jun 8, 2020

adrinjalali commented Jun 9, 2020

jnothman commented Jun 25, 2020

adrinjalali commented Jun 26, 2020

ghost commented Dec 8, 2020

MhdAlkh commented May 27, 2021

shuq007 commented Aug 21, 2021 • edited Loading

sergeyf commented Aug 6, 2019 •

edited

Loading

amueller commented Jun 8, 2020 •

edited

Loading

shuq007 commented Aug 21, 2021 •

edited

Loading