[MRG] Make IterativeImputer doctest more stable #13026

jnothman · 2019-01-22T05:29:16Z

At #11977 and related PRs, we've been encountering a troublesome doctest that sometimes fails with:

115     >>> print(np.round(imp.transform(X_test)))
Differences (unified diff with -expected +actual):
    @@ -1,3 +1,3 @@
     [[ 1.  2.]
      [ 6.  3.]
    - [26.  6.]]
    + [24.  6.]]

This PR is to help us debug what might be going on... To see, for instance, if we can get the issue to occur outside of a doctest context on some travis instances and not others.

jnothman · 2019-01-22T05:29:22Z

Ping @sergeyf

sergeyf · 2019-01-22T05:35:16Z

Wow is 'heisenburg' a common bug type? How appropriate =)

jnothman · 2019-01-22T06:04:46Z

http://en.wikipedia.org/wiki/Heisenbug

sergeyf · 2019-01-22T06:05:31Z

Oh I'm familiar with the uncertainty principle etc. I was just wondering if it's a common term for software bugs.

jnothman · 2019-01-22T06:06:17Z

Follow the link.

jnothman · 2019-01-22T06:48:46Z

So we have failures on all Travis instances, despite apparent success on the doctest!

        assert_allclose(
            np.round(imp.transform(X_test)),
            [[1., 2.],
             [6., 3.],
>            [26., 6.]])
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=0
E       
E       (mismatch 16.66666666666667%)
E        x: array([ 1.,  2.,  6.,  3., 24.,  6.])
E        y: array([ 1.,  2.,  6.,  3., 26.,  6.])

I'm not getting that failure locally.

jnothman · 2019-01-22T08:00:18Z

I suspect that because of the iterative nature of this optimisation, small differences in floating points result in large changes to the learnt coefficients. On my machine these are the RidgeCV models learnt in the example:

0 int: 1.6847826086956452 coef: [0.32608696]
1 int: -5.09940049636776 coef: [3.049708]
2 int: 1.6721032018275959 coef: [0.32789787]
3 int: -5.099409763929247 coef: [3.04971263]
4 int: 1.6721037001117616 coef: [0.32789738]
5 int: -5.099419031962363 coef: [3.04971727]
6 int: 1.6721041982557412 coef: [0.32789688]
7 int: -5.099428299341254 coef: [3.0497219]
8 int: 1.672104696548769 coef: [0.32789638]
9 int: -5.099437567638342 coef: [3.04972653]
10 int: 1.672105194796364 coef: [0.32789588]
11 int: -5.09944683617838 coef: [3.04973117]
12 int: 1.6721056930191658 coef: [0.32789538]
13 int: -5.099456104032713 coef: [3.0497358]
14 int: 1.672106191220379 coef: [0.32789488]
15 int: -4.652421905418542 coef: [2.89991114]
16 int: 2.237405688867285 coef: [0.10405314]
17 int: -11.424889030892283 coef: [6.21247668]
18 int: 1.8390356100552203 coef: [0.16096465]
19 int: -11.424754375661257 coef: [6.21240935]

on Travis:

0 int: 1.6847826086956452 coef: [0.32608696]
1 int: -5.0994004961969175 coef: [3.049708]
2 int: 1.672103201856015 coef: [0.32789787]
3 int: -5.099409764476283 coef: [3.04971263]
4 int: 1.818621639876683 coef: [0.26986734]
5 int: -6.026955210130643 coef: [3.51348789]
6 int: 1.824778409309882 coef: [0.24493355]
7 int: -6.795593173494343 coef: [3.89780925]
8 int: 1.7434482914862914 coef: [0.25655237]
9 int: -6.795584945777488 coef: [3.89780513]
10 int: 1.743448020386955 coef: [0.25655264]
11 int: -6.795576715768165 coef: [3.89780102]
12 int: 1.7434477500099121 coef: [0.25655291]
13 int: -1.854289190325142 coef: [2.11372197]
14 int: 1.7132116881514807 coef: [0.28678913]
15 int: -5.973694205425158 coef: [3.48685723]
16 int: 2.202533079390852 coef: [0.10842849]
17 int: -8.726949467628984 coef: [5.13024935]
18 int: 1.8631458859439283 coef: [0.16619803]
19 int: -10.689565132692788 coef: [5.84481104]

I still don't understand why we're sometimes getting the doctest passing on Travis, and sometimes getting it failing...

Something else we see is that there are some sudden shifts in the fits. Iterations 15-17 on my machine and iteration 13 on Travis are very different to the ones before (noting that the iterations cycle between two features, so compare all odds to odds and evens to evens). I suspect that these are due to discrete changes in best alpha in RidgeCV (and will check soon).

It's very possible that early stopping would fix this, with a sufficiently large tolerance.

jnothman · 2019-01-22T08:03:25Z

For the early stopping argument, here are the first few iterations of training data imputation:

[[1.        2.       ]
 [4.        3.       ]
 [7.        3.9673913]]
[[1.        2.       ]
 [4.0497235 3.       ]
 [7.        3.9673913]]
[[1.         2.        ]
 [4.0497235  3.        ]
 [7.         3.96738831]]
[[1.         2.        ]
 [4.04972813 3.        ]
 [7.         3.96738831]]
[[1.         2.        ]
 [4.04972813 3.        ]
 [7.         3.96738533]]
[[1.         2.        ]
 [4.04973277 3.        ]
 [7.         3.96738533]]
[[1.         2.        ]
 [4.04973277 3.        ]
 [7.         3.96738234]]

Are we seriously improving the imputation with changes in the order of 3e-6? Maybe we are, and should let the model spend more time forgetting its initialisation?

jnothman · 2019-01-22T08:04:36Z

And yes, those sudden changes in model are due to sudden changes in choice of alpha. Does this make RidgeCV a bad choice, or just bad for toy data with two features?

sergeyf · 2019-01-22T14:43:50Z

If floating point differences are actually changing which alpha gets chosen, I'm guessing it's a toy feature issue. I would bet floating point differences probably matter less if you have many more samples. Can you see if a different RidgeCV alpha is chosen if you, say, add 2 more rows to the training set in this example?

jnothman · 2019-01-22T23:20:44Z

Experimenting with more data (just making some data with two features and some random covariance from make_classification) suggests that:

convergence happens
different estimators can still converge on quite different results in this kind of toy case; the method may be quite brittle for some datasets (which explains further why multiple imputation is important)
it might not be great to have RidgeCV as a default because with the discrete alphas it has a non-smooth objective that is brittle to small changes, which isn't great when you're going to apply it repeatedly to slightly-changing input and want it to converge. BayesianRidge might be better.
this should all be taken with a grain of salt because the data is very artificial

sergeyf · 2019-01-22T23:23:57Z

I like the idea of making BayesianRidge default for everything. It's definitely slower, but probably worth it. Maybe that can be done in this PR, along with adding the Heisenbug test?

jnothman · 2019-01-22T23:50:02Z

I don't think we need to add the heisenbug test. We need to make the example more stable by making the linear fit unequivocal, e.g. with

    imp.fit([[np.nan, 3], [7, np.nan], [1, 2], [2, 4], [4, 8]])

However, this assumption of linear covariance is probably poor in practice, and wouldn't be learnt by a decision tree.

sergeyf · 2019-01-23T00:37:01Z

I can add the BayesianRidge change and doc change in the MICE PR or the other one. Let me know.

…

On Tue, Jan 22, 2019, 3:50 PM Joel Nothman ***@***.*** wrote: I don't think we need to add the heisenbug test. We need to make the example more stable by including making the linear fit unequivocal, e.g. with imp.fit([[np.nan, 3], [7, np.nan], [1, 2], [2, 4], [4, 8]]) However, this assumption of linear covariance is probably poor in practice, and wouldn't be learnt by a decision tree. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13026 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABya7AvBJ-qoN0CD0n-gXqLVML5neC47ks5vF6PYgaJpZM4aL9Wk> .

sergeyf · 2019-01-23T15:04:29Z

Awesome!

Just to be clear, which PR should change RidgeCV to BayesianRidge as default?

adrinjalali

Thanks @jnothman !

jnothman · 2019-01-23T21:21:47Z

The change to the default predictor can be its own small pr. Thanks

jnothman · 2019-01-23T22:56:54Z

Merging as a minor doc fix

Debugging a doctest heisenbug: Add unit test equivalent

fa148c9

Fix copy-paste error

09bb175

jnothman added 2 commits January 22, 2019 18:28

Some debug output

c304120

Show transformation during training

2f5694c

Show alpha

efea778

Make the example more obvious

83c481d

jnothman changed the title ~~[WIP] Debugging a doctest heisenbug in iterativeimputer branch~~ [MRG] Make IterativeImputer doctest more stable Jan 23, 2019

jnothman added 3 commits January 23, 2019 11:54

Resurrect after a git typo

d8b7008

More git management fails

6268e27

Fixes to doctest

5285e83

adrinjalali approved these changes Jan 23, 2019

View reviewed changes

sergeyf mentioned this pull request Jan 23, 2019

[MRG+1] Changing default model for IterativeImputer to BayesianRidge #13038

Merged

jnothman merged commit 34b7a46 into scikit-learn:iterativeimputer Jan 23, 2019

Uh oh!

[MRG] Make IterativeImputer doctest more stable #13026

[MRG] Make IterativeImputer doctest more stable #13026

Uh oh!

Conversation

jnothman commented Jan 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jan 22, 2019

Uh oh!

sergeyf commented Jan 22, 2019

Uh oh!

jnothman commented Jan 22, 2019

Uh oh!

sergeyf commented Jan 22, 2019

Uh oh!

jnothman commented Jan 22, 2019

Uh oh!

jnothman commented Jan 22, 2019

Uh oh!

jnothman commented Jan 22, 2019

Uh oh!

jnothman commented Jan 22, 2019

Uh oh!

jnothman commented Jan 22, 2019

Uh oh!

sergeyf commented Jan 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jan 22, 2019

Uh oh!

sergeyf commented Jan 22, 2019

Uh oh!

jnothman commented Jan 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sergeyf commented Jan 23, 2019 via email

Uh oh!

sergeyf commented Jan 23, 2019

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jan 23, 2019 via email

Uh oh!

jnothman commented Jan 23, 2019

Uh oh!

Uh oh!

jnothman commented Jan 22, 2019 •

edited

Loading

sergeyf commented Jan 22, 2019 •

edited

Loading

jnothman commented Jan 22, 2019 •

edited

Loading