Thanks to visit codestin.com
Credit goes to github.com

Skip to content

IterativeImputer not converging (at all) #14338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
amueller opened this issue Jul 13, 2019 · 26 comments
Open

IterativeImputer not converging (at all) #14338

amueller opened this issue Jul 13, 2019 · 26 comments
Labels
High Priority High priority issues and pull requests Moderate Anything that requires some knowledge of conventions and best practices module:impute

Comments

@amueller
Copy link
Member

Observed in #14330: the IterativeImputer doesn't converge, as a matter of fact, the convergence criterion doesn't seem to go down with iterations.
To me that indicates that either there's an issue with the implementation, an issue with the metric, or that our stopping criterion is not suitable.

Were there examples on which it converged? It would be nice to plot the convergence but we don't allow callbacks like that. But just printing it on https://github.com/scikit-learn/scikit-learn/blob/master/examples/impute/plot_iterative_imputer_variants_comparison.py shows that it doesn't go down.

@amueller
Copy link
Member Author

It seems to converge for BayesianRidge but not for anything else?

@jnothman
Copy link
Member

jnothman commented Jul 14, 2019 via email

@amueller
Copy link
Member Author

Similar things seem true for RandomForestRegressor and DecisionTreeRegressor.
Does missForest do a learning rate?

@amueller
Copy link
Member Author

After each iteration the difference between the previous and the new imputed data matrix is assessed
for the continuous and categorical parts. The stopping criterion is defined such that the imputation
process is stopped as soon as both differences have become larger once. In case of only one type
of variable the computation stops as soon as the corresponding difference goes up for the first time.
However, the imputation last performed where both differences went up is generally less accurate
than the previous one. Therefore, whenever the computation stops due to the stopping criterion (and
not due to ’maxiter’) the before last imputation matrix is returned.

That seems... strange and is quite different from what we're doing.

@amueller
Copy link
Member Author

This is the paper:
https://academic.oup.com/bioinformatics/article/28/1/112/219101

They are also using a normalized error, I don't think we're doing that.

@sergeyf
Copy link
Contributor

sergeyf commented Aug 6, 2019

It makes sense for it not to converge if we're sampling from the posterior. I haven't thought about the non-sampling case, and all of our close studies were with BayesianRidge, where conversion seemed fine (as already pointed out).

We added convergence to have more feature-parity with missForest, but our stopping criterion is different (as you pointed out). Have any of your examples suggested a better criterion? Should we try a switch to NRMSE?

Here is our current criterion given a user-passed tol:

normalized_tol = tol * np.max(np.abs(X[~mask_missing_values]))
inf_norm = np.linalg.norm(Xt - Xt_previous, ord=np.inf, axis=None)
if inf_norm < normalized_tol:
    stopped = True
else:
    stopped = False

If we used NRMSE, we'd get something like this:

normalized_tol = np.sqrt(tol) * np.std(X[~mask_missing_values])
rmse = np.sqrt(np.mean((Xt - Xt_previous)**2)
if rmse < normalized_tol:
    stopped = True
else:
    stopped = False

@amueller
Copy link
Member Author

amueller commented Aug 6, 2019

I haven't done experiments. It's a bit strange to do something that's not in the literature. But if we'd implement the literature we'd have different convergence criteria based on the BaseEstimator which is also strange, and we wouldn't know what to do in general.

My main problem right now is that the behavior for forests is non-sensical. We will always tell the user we didn't converge, independent of max_iter, and potentially waste a lot of time.

We could just not warn on non-convergence but I have no idea if that's better or worse. I really don't like useless and even misleading warnings.

If you can find a better convergence criterion that would be great, but I figure the authors of MissForest have tried that and didn't find one?

@sergeyf
Copy link
Contributor

sergeyf commented Aug 6, 2019

I can mess around with a few different criteria with RandomForests and RidgeCV as the estimators. Do you have a quick code snippet showing that it doesn't converge on some standard dataset that I can start with?

@amueller
Copy link
Member Author

amueller commented Aug 6, 2019

#14330, our example that's currently in master.

@sergeyf
Copy link
Contributor

sergeyf commented Aug 6, 2019 via email

@sergeyf
Copy link
Contributor

sergeyf commented Aug 7, 2019

It does look like the convergence criterion we currently have is much harder to reach than the one based on L2 norm that I sketched above. Even having that one satisfied requires raising tol=0.01 in the California housing example.

Also, the resulting values (with verbose=True) suggest that the MissForest approach (stop as soon as difference goes up once) will stop almost instantly:

[IterativeImputer] Completing matrix with shape (1651, 8)
[IterativeImputer] Change: 150.62540875988776, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 168.4118759459612, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 160.59066495975725, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 166.9114776657906, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 166.44792073992258, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 186.05851770595208, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 162.45250321704452, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 197.03936766728808, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 173.7129718576556, scaled tolerance: 73.90341902767585 
[IterativeImputer] Change: 170.6855623698636, scaled tolerance: 73.90341902767585 
c:\users\serge\github\scikit-learn\sklearn\impute\_iterative.py:609: ConvergenceWarning: [IterativeImputer] Early stopping criterion not reached.

Note that the change goes up right away from 150

Here the change is np.sqrt(np.mean((Xt - Xt_previous)**2))
and scaled tolerance is normalized_tol = np.sqrt(self.tol) * np.std(X[~mask_missing_values]).

It's bizarre to either have it stop instantly OR to just keep going until max_tol is reached. Here's another one where the early stopping criterion isn't reached:

[IterativeImputer] Completing matrix with shape (1651, 8)
[IterativeImputer] Change: 174.54944838828126, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 173.66703582762716, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 149.5384099099918, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 150.3383817915679, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 136.13070686825725, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 182.6894084181655, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 196.4934645783515, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 156.6539777730629, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 164.56931419726476, scaled tolerance: 74.06775309177029 
[IterativeImputer] Change: 141.13059298301417, scaled tolerance: 74.06775309177029 

The difference becomes larger for the first time right away from 174 to 173.

My intuition is that we should switch to the L2 version and raised the default tol.

@jnothman
Copy link
Member

jnothman commented Aug 8, 2019 via email

@sergeyf
Copy link
Contributor

sergeyf commented Aug 8, 2019

Here's the first dataset:
150, 168, 160, 166, 166, 186, 162, 197, 173, 170
For n_iter_no_change==1 -> stops when we see 168 and backtrack to the 150 iter.
For n_iter_no_change==2 -> stops when we see 160 and backtrack to the 150 iter?!

174, 173, 149, 150, 136, 182, 196, 156, 164, 141
For n_iter_no_change==1 -> stops when we see 150 and backtrack to the 149 iter.
For n_iter_no_change==2 -> stops when we see 196 and backtrack to the 136 iter.

With larger n_iter_no_change we have to keep more results in memory so we can backtrack. I'm not sure if it's worth it...

@amueller
Copy link
Member Author

Running this on several datasets my intuition was that sometimes missforest just doesn't really do much and it is done after one iteration, which seems fine. And in other cases it actually converges.

@amueller
Copy link
Member Author

Maybe we should try the datasets from the missforest paper?

@adrinjalali adrinjalali added this to the 0.22 milestone Oct 21, 2019
@adrinjalali adrinjalali added Blocker High Priority High priority issues and pull requests Moderate Anything that requires some knowledge of conventions and best practices labels Oct 21, 2019
@jnothman
Copy link
Member

I don't think this is a blocker. Certainly not as long as iterative imputer is regarded as experimental

@jnothman
Copy link
Member

jnothman commented Nov 5, 2019

Moving to 0.23

@adrinjalali
Copy link
Member

moving to 0.24. We probably should start thinking of this as "high priority", as tagged lol.

@adrinjalali adrinjalali modified the milestones: 0.23, 0.24 Apr 20, 2020
@amueller
Copy link
Member Author

amueller commented Jun 8, 2020

I think the problem is that no-one has an idea how to solve it, right?

We could:
a) magic the stopping criterion based on the estimator (fragile)
b) make the stopping criterion a parameter and have the user choose it, so that when there's a convergence warning they could choose to change it
c) make the stopping criterion a required parameter
d) have different estimators with different stopping criteria or even different built-in base estimators, say a "IterativeForestImputation" and "IterativeLinearImputation".

I guess b) isn't too bad, if people read warning messages?

@sergeyf
Copy link
Contributor

sergeyf commented Jun 8, 2020

I would also vote for (b), and add my previous conclusion: My intuition is that we should switch to the L2 version and raised the default tol.

@adrinjalali
Copy link
Member

We could also:

  • allow a callable stopping criterion for users to fine tune it
  • accept an iteration_hyperparams parameter which gives the hyper parameters to the base estimator at each iteration, based on the iteration number and loss maybe? This can be a list of length n_iter of dict of params or a callable giving the new hyper parameters at each iteration.

Wouldn't that be flexible enough?

@jnothman
Copy link
Member

Wouldn't that be flexible enough?

Flexible enough but not usable enough? I like the idea of having the user specify a hyperparam path or something. If it existed: (a) would it achieve the goal of improving the chance of meaningful convergence; and (b) would it be clear how a user should use it?

@adrinjalali
Copy link
Member

I think good documentation should be good enough to handle the usability. Also, for both of these parameters I'm suggesting some accepted literals which would be used by non-advanced users. It would also allow us to have some magical defaults which usually converge, and if not, the user has enough flexibility to fine tune the imputation.

@cmarmo cmarmo removed this from the 0.24 milestone Oct 15, 2020
@ghost
Copy link

ghost commented Dec 8, 2020

Hi,

So if I understand correctly, the convergence error will always be produced, regardless of the max_iter? I am currently running it on a dataset (133,70) with the ExtraTreeRegressor and with max_iter = 60 it still produces the error and takes forever to impute.

@MhdAlkh
Copy link

MhdAlkh commented May 27, 2021

Hi,
I'm having the error " ConvergenceWarning: [IterativeImputer] Early stopping criterion not reached." reached.", ConvergenceWarning)".
Any conclusion on how to solve it?

@shuq007
Copy link

shuq007 commented Aug 21, 2021

Have changed most of the parameters for ConvergenceWarning that is only slowing down the process but no change in warnings. So, I use the following lines to ignore warnings that suppress all the warnings:

import warnings
warnings.filterwarnings(action='ignore', category=DeprecationWarning)
warnings.filterwarnings(action='ignore', category=UserWarning)
warnings.filterwarnings(action='ignore', category=FutureWarning)
warnings.filterwarnings(action='ignore', category=RuntimeWarning)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High Priority High priority issues and pull requests Moderate Anything that requires some knowledge of conventions and best practices module:impute
Projects
None yet
Development

No branches or pull requests

7 participants