Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] Make IterativeImputer doctest more stable #13026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 23, 2019

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Jan 22, 2019

At #11977 and related PRs, we've been encountering a troublesome doctest that sometimes fails with:

115     >>> print(np.round(imp.transform(X_test)))
Differences (unified diff with -expected +actual):
    @@ -1,3 +1,3 @@
     [[ 1.  2.]
      [ 6.  3.]
    - [26.  6.]]
    + [24.  6.]]

This PR is to help us debug what might be going on... To see, for instance, if we can get the issue to occur outside of a doctest context on some travis instances and not others.

@jnothman
Copy link
Member Author

Ping @sergeyf

@sergeyf
Copy link
Contributor

sergeyf commented Jan 22, 2019

Wow is 'heisenburg' a common bug type? How appropriate =)

@jnothman
Copy link
Member Author

@sergeyf
Copy link
Contributor

sergeyf commented Jan 22, 2019

Oh I'm familiar with the uncertainty principle etc. I was just wondering if it's a common term for software bugs.

@jnothman
Copy link
Member Author

Follow the link.

@jnothman
Copy link
Member Author

So we have failures on all Travis instances, despite apparent success on the doctest!

        assert_allclose(
            np.round(imp.transform(X_test)),
            [[1., 2.],
             [6., 3.],
>            [26., 6.]])
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=0
E       
E       (mismatch 16.66666666666667%)
E        x: array([ 1.,  2.,  6.,  3., 24.,  6.])
E        y: array([ 1.,  2.,  6.,  3., 26.,  6.])

I'm not getting that failure locally.

@jnothman
Copy link
Member Author

I suspect that because of the iterative nature of this optimisation, small differences in floating points result in large changes to the learnt coefficients. On my machine these are the RidgeCV models learnt in the example:

0 int: 1.6847826086956452 coef: [0.32608696]
1 int: -5.09940049636776 coef: [3.049708]
2 int: 1.6721032018275959 coef: [0.32789787]
3 int: -5.099409763929247 coef: [3.04971263]
4 int: 1.6721037001117616 coef: [0.32789738]
5 int: -5.099419031962363 coef: [3.04971727]
6 int: 1.6721041982557412 coef: [0.32789688]
7 int: -5.099428299341254 coef: [3.0497219]
8 int: 1.672104696548769 coef: [0.32789638]
9 int: -5.099437567638342 coef: [3.04972653]
10 int: 1.672105194796364 coef: [0.32789588]
11 int: -5.09944683617838 coef: [3.04973117]
12 int: 1.6721056930191658 coef: [0.32789538]
13 int: -5.099456104032713 coef: [3.0497358]
14 int: 1.672106191220379 coef: [0.32789488]
15 int: -4.652421905418542 coef: [2.89991114]
16 int: 2.237405688867285 coef: [0.10405314]
17 int: -11.424889030892283 coef: [6.21247668]
18 int: 1.8390356100552203 coef: [0.16096465]
19 int: -11.424754375661257 coef: [6.21240935]

on Travis:

0 int: 1.6847826086956452 coef: [0.32608696]
1 int: -5.0994004961969175 coef: [3.049708]
2 int: 1.672103201856015 coef: [0.32789787]
3 int: -5.099409764476283 coef: [3.04971263]
4 int: 1.818621639876683 coef: [0.26986734]
5 int: -6.026955210130643 coef: [3.51348789]
6 int: 1.824778409309882 coef: [0.24493355]
7 int: -6.795593173494343 coef: [3.89780925]
8 int: 1.7434482914862914 coef: [0.25655237]
9 int: -6.795584945777488 coef: [3.89780513]
10 int: 1.743448020386955 coef: [0.25655264]
11 int: -6.795576715768165 coef: [3.89780102]
12 int: 1.7434477500099121 coef: [0.25655291]
13 int: -1.854289190325142 coef: [2.11372197]
14 int: 1.7132116881514807 coef: [0.28678913]
15 int: -5.973694205425158 coef: [3.48685723]
16 int: 2.202533079390852 coef: [0.10842849]
17 int: -8.726949467628984 coef: [5.13024935]
18 int: 1.8631458859439283 coef: [0.16619803]
19 int: -10.689565132692788 coef: [5.84481104]

I still don't understand why we're sometimes getting the doctest passing on Travis, and sometimes getting it failing...

Something else we see is that there are some sudden shifts in the fits. Iterations 15-17 on my machine and iteration 13 on Travis are very different to the ones before (noting that the iterations cycle between two features, so compare all odds to odds and evens to evens). I suspect that these are due to discrete changes in best alpha in RidgeCV (and will check soon).

It's very possible that early stopping would fix this, with a sufficiently large tolerance.

@jnothman
Copy link
Member Author

For the early stopping argument, here are the first few iterations of training data imputation:

[[1.        2.       ]
 [4.        3.       ]
 [7.        3.9673913]]
[[1.        2.       ]
 [4.0497235 3.       ]
 [7.        3.9673913]]
[[1.         2.        ]
 [4.0497235  3.        ]
 [7.         3.96738831]]
[[1.         2.        ]
 [4.04972813 3.        ]
 [7.         3.96738831]]
[[1.         2.        ]
 [4.04972813 3.        ]
 [7.         3.96738533]]
[[1.         2.        ]
 [4.04973277 3.        ]
 [7.         3.96738533]]
[[1.         2.        ]
 [4.04973277 3.        ]
 [7.         3.96738234]]

Are we seriously improving the imputation with changes in the order of 3e-6? Maybe we are, and should let the model spend more time forgetting its initialisation?

@jnothman
Copy link
Member Author

And yes, those sudden changes in model are due to sudden changes in choice of alpha. Does this make RidgeCV a bad choice, or just bad for toy data with two features?

@sergeyf
Copy link
Contributor

sergeyf commented Jan 22, 2019

If floating point differences are actually changing which alpha gets chosen, I'm guessing it's a toy feature issue. I would bet floating point differences probably matter less if you have many more samples. Can you see if a different RidgeCV alpha is chosen if you, say, add 2 more rows to the training set in this example?

@jnothman
Copy link
Member Author

Experimenting with more data (just making some data with two features and some random covariance from make_classification) suggests that:

  • convergence happens
  • different estimators can still converge on quite different results in this kind of toy case; the method may be quite brittle for some datasets (which explains further why multiple imputation is important)
  • it might not be great to have RidgeCV as a default because with the discrete alphas it has a non-smooth objective that is brittle to small changes, which isn't great when you're going to apply it repeatedly to slightly-changing input and want it to converge. BayesianRidge might be better.
  • this should all be taken with a grain of salt because the data is very artificial

@sergeyf
Copy link
Contributor

sergeyf commented Jan 22, 2019

I like the idea of making BayesianRidge default for everything. It's definitely slower, but probably worth it. Maybe that can be done in this PR, along with adding the Heisenbug test?

@jnothman
Copy link
Member Author

jnothman commented Jan 22, 2019

I don't think we need to add the heisenbug test. We need to make the example more stable by making the linear fit unequivocal, e.g. with

    imp.fit([[np.nan, 3], [7, np.nan], [1, 2], [2, 4], [4, 8]])

However, this assumption of linear covariance is probably poor in practice, and wouldn't be learnt by a decision tree.

@sergeyf
Copy link
Contributor

sergeyf commented Jan 23, 2019 via email

@jnothman jnothman changed the title [WIP] Debugging a doctest heisenbug in iterativeimputer branch [MRG] Make IterativeImputer doctest more stable Jan 23, 2019
@sergeyf
Copy link
Contributor

sergeyf commented Jan 23, 2019

Awesome!

Just to be clear, which PR should change RidgeCV to BayesianRidge as default?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jnothman !

@jnothman
Copy link
Member Author

jnothman commented Jan 23, 2019 via email

@jnothman jnothman merged commit 34b7a46 into scikit-learn:iterativeimputer Jan 23, 2019
@jnothman
Copy link
Member Author

Merging as a minor doc fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants