[MRG + 1] FIX Calculation of standard deviation of predictions in ARDRegression #10153

jdoepfert · 2017-11-16T12:44:09Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This resolves the issue that a ValueError is thrown upon calling predict() with return_std=True after fitting ARDRegression() with particular inputs X:

    X = np.array([[1, 0],
                  [0, 0]])
    y = np.array([0, 0])
    clf = ARDRegression(n_iter=1)
    clf.fit(X, y)
    clf.predict(X, return_std=True)

ValueError: shapes (2,1) and (2,2) not aligned: 1 (dim 1) != 2 (dim 0)

I think this PR actually resolves the following bug in the algorithm:

In the algorithm's fitting iteration loop (L473), first sigma_ and coef_ are updated, using the parameter estimates from the previous iteration (e.g. keep_lambda)
Then, the parameters are updated (e.g. keep_lambda is updated according to lambda_ < self.threshold_lambda)

At prediction stage, a kind of keep_lambda comparison is again performed, using self.lambda_ < self.threshold_lambda (L551), to adapt the shape of X to the shape of self.sigma_.
However, due to the algorithm's structure outlined above, this self.lambda_ < self.threshold_lambda is not identical to the keep_lambda used for calculating the sigma_, since keep_lambda and lambda_ are updated after sigma_. Therefore, in rare occasions, when keep_lambda changes during the last iteration of the algorithm, the shapes of the adapted X and self.sigma_ in L552 will not match.

The fix consists of applying the updates for sigma_ and coef_ one more time after the last iteration of the algorithm. This way, the updated parameter estimates are used to calculate the final sigma_ and coef_. This is also in line with this publication, see discussion in #10128.

glemaitre · 2017-11-17T15:23:00Z

sklearn/linear_model/tests/test_bayes.py

@@ -110,6 +111,22 @@ def test_std_bayesian_ridge_ard_with_constant_input():
        assert_array_less(y_std, expected_upper_boundary)


+def test_regression_issue_10128():
+    # this ARDRegression test throws a `ValueError` on master, commit 5963fd2
+    np.random.seed(752)


I think that we don't have to rely on a random seed and we can create a deterministic toy example.
We should check the data (specifically the unique value) and create such a matrix as input

still haven't succeeded in coming up with a smaller example.

Ok I came up with a better example! Will push it later

glemaitre · 2017-11-17T15:24:38Z

sklearn/linear_model/tests/test_bayes.py

@@ -110,6 +111,22 @@ def test_std_bayesian_ridge_ard_with_constant_input():
        assert_array_less(y_std, expected_upper_boundary)


+def test_regression_issue_10128():
+    # this ARDRegression test throws a `ValueError` on master, commit 5963fd2


Maybe a succinct description will be better. you can also put the issue number instead of the commit

glemaitre · 2017-11-17T15:24:59Z

sklearn/linear_model/tests/test_bayes.py

+    clf = ARDRegression()
+    clf.fit(X_train, y_train)
+    clf.predict(X_test, return_std=True)
+


Maybe checking the value of the coef would be meaningful in this case

I check for the correct shape of sigma_ now

glemaitre · 2017-11-17T15:26:25Z

sklearn/linear_model/bayes.py

-        for iter_ in range(self.n_iter):
-            # Compute mu and sigma (using Woodbury matrix identity)
+        # Compute sigma and mu (using Woodbury matrix identity)
+        def update_sigma(X, alpha_, lambda_, keep_lambda, n_samples):


i am not sure what is the best: keep the function here or make a private method.
@jnothman WDYT?

I am fine leaving it here if the object still pickles

Is this tested somewhere?

codecov · 2017-11-17T17:13:22Z

Codecov Report

Merging #10153 into master will decrease coverage by 0.07%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #10153      +/-   ##
==========================================
- Coverage   96.21%   96.13%   -0.08%     
==========================================
  Files         337      337              
  Lines       62891    62942      +51     
==========================================
  Hits        60508    60508              
- Misses       2383     2434      +51

Impacted Files	Coverage Δ
sklearn/linear_model/bayes.py	`95.67% <100%> (+0.22%)`	⬆️
sklearn/linear_model/tests/test_bayes.py	`91.66% <100%> (+1.24%)`	⬆️
sklearn/svm/bounds.py	`94.73% <0%> (-5.27%)`	⬇️
sklearn/model_selection/tests/test_split.py	`92.29% <0%> (-4.15%)`	⬇️
sklearn/tests/test_cross_validation.py	`97.76% <0%> (-1.97%)`	⬇️
sklearn/utils/testing.py	`76.48% <0%> (-0.52%)`	⬇️
sklearn/model_selection/_validation.py	`96.91% <0%> (-0.02%)`	⬇️
sklearn/cross_validation.py	`98.59% <0%> (-0.01%)`	⬇️
sklearn/model_selection/_split.py	`99.13% <0%> (-0.01%)`	⬇️
sklearn/linear_model/tests/test_omp.py	`100% <0%> (ø)`	⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5963fd2...7741afc. Read the comment docs.

jdoepfert · 2017-11-17T22:07:50Z

Any idea why appveyor fails? def test_predict_proba_binary() in sklearn/neural_network/tests/test_mlp.py fails, if I interpret the huge log correctly. On my machine, the tests run without problems.

EDIT: appveyor now passes again

glemaitre · 2017-11-18T15:06:58Z

LGTM

ping @agramfort @jnothman (you might want to give your opinion regarding private methods/locally defined function)

agramfort · 2017-11-20T20:47:59Z

@jdoepfert please document the bug fix in what's new page.

# Conflicts: # doc/whats_new/v0.20.rst

jdoepfert · 2017-11-21T14:14:17Z

@agramfort I updated what's new. Hopefully correctly, I am not sure about the format.

glemaitre · 2017-11-22T09:31:55Z

Ok so that's a MRG+2. thanks @jdoepfert

jdoepfert · 2017-11-22T10:50:17Z

@glemaitre thx for reviewing!

…Regression (scikit-learn#10153)

jdoepfert added 4 commits November 16, 2017 13:04

quick fix

a928479

remove unused helper variable

1035e98

remove double import in test

5330801

update docstring, fix blanks

5c6625d

glemaitre mentioned this pull request Nov 17, 2017

Implement faster ARDRegression #1531

Closed

jdoepfert added 3 commits November 17, 2017 15:56

apply update for mu and sigma after iteration loop

8ec8024

Merge branch 'new_fix' into fix/10128

b606e02

restore changes from previous fix

c2821b6

jdoepfert mentioned this pull request Nov 17, 2017

ARDRegression still crashes when trained on some constant y #10128

Closed

glemaitre reviewed Nov 17, 2017

View reviewed changes

jdoepfert added 2 commits November 17, 2017 17:33

update test

9e9071b

update comment in test

7741afc

improve test

5d62e97

jdoepfert changed the title ~~[WIP] Fix bug in ARDRegression: Calculation of standard deviation of predictions~~ [MRG] Fix bug in ARDRegression: Calculation of standard deviation of predictions Nov 18, 2017

glemaitre changed the title ~~[MRG] Fix bug in ARDRegression: Calculation of standard deviation of predictions~~ [MRG + 1] FIX Calculation of standard deviation of predictions in ARDRegression Nov 18, 2017

jdoepfert added 2 commits November 21, 2017 15:00

update whatsnew

24c0373

Merge branch 'master' into fix/10128 and resolve conflicts

3c3ab95

# Conflicts: # doc/whats_new/v0.20.rst

agramfort approved these changes Nov 22, 2017

View reviewed changes

glemaitre merged commit 7ffad2c into scikit-learn:master Nov 22, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG + 1] FIX Calculation of standard deviation of predictions in ARD…

8cca6a0

…Regression (scikit-learn#10153)

Uh oh!

[MRG + 1] FIX Calculation of standard deviation of predictions in ARDRegression #10153

[MRG + 1] FIX Calculation of standard deviation of predictions in ARDRegression #10153

Uh oh!

Conversation

jdoepfert commented Nov 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jdoepfert commented Nov 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Nov 18, 2017

Uh oh!

agramfort commented Nov 20, 2017

Uh oh!

jdoepfert commented Nov 21, 2017

Uh oh!

glemaitre commented Nov 22, 2017

Uh oh!

jdoepfert commented Nov 22, 2017

Uh oh!

Uh oh!

jdoepfert commented Nov 16, 2017 •

edited

Loading

codecov bot commented Nov 17, 2017 •

edited

Loading

jdoepfert commented Nov 17, 2017 •

edited

Loading