[MRG + 1] Fix BayesianRidge() and ARDRegression() for constant target vectors #10095

jdoepfert · 2017-11-09T13:33:44Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR fixes the issue that when fitting a BayesianRidge or ARDRegression classifier with a constant target vector y, the predict method yields NaN arrays for both predictions and the respective standard devitions. This occurs since the estimated precision for the noise alpha_ is initialized with 1/np.var(y) = inf.

This PR fixes this issue by initializing alpha_ with 1/(np.var(y) + eps), where eps = np.spacing(1).

Any other comments?

The returned standard deviation std for the predictions will not be zero for a constant target vector (as suggested in #10092), but a small number instead. I added a test for this. However, std approaches zero as the number of samples increase, as expected:

glemaitre · 2017-11-09T14:47:29Z

I would have thought that alpha_ = 1. / (np.var(y) + np.spacing(1)) should be enough

jdoepfert · 2017-11-09T14:57:30Z

@glemaitre Yea I like that solution better as well, will implement it. But note that then std will not be zero, as suggested in the issue.

glemaitre · 2017-11-09T15:25:26Z

sklearn/linear_model/bayes.py

@@ -162,7 +162,8 @@ def fit(self, X, y):
        n_samples, n_features = X.shape

        # Initialization of the values of the parameters
-        alpha_ = 1. / np.var(y)
+        eps = np.spacing(1)


You might want to add a small comment to explain why we add eps

Thx for the suggestion. I added a comment, and additionally applied the same fix to ARDRegression

I would prefer you use:

eps = np.finfo(np.float64).eps

as here you're not taking into account the dtype of X or y anyway

agramfort · 2017-11-09T20:37:10Z

sklearn/linear_model/tests/test_bayes.py

+    # constant target vectors
+    n_samples = 4
+    n_features = 5
+    constant_value = np.random.rand()


use a RandomState

agramfort · 2017-11-09T20:37:31Z

sklearn/linear_model/tests/test_bayes.py

+    # The standard dev. should be relatively small (< 0.1 is tested here)
+    n_samples = 4
+    n_features = 5
+    constant_value = np.random.rand()


idem use RandomState

jnothman · 2017-11-09T21:15:12Z

sklearn/linear_model/tests/test_bayes.py

+
+def test_std_ard_with_constant_input():
+    # Test ARDRegression standard dev. for edge case of constant target vector
+    # The standard dev. should be relatively small (< 0.1 is tested here)


That seems quite a large standard dev...

tl;dr: I decreased the upper bound to 0.01.

Playing around with a couple of random initializations, the std was around 0.005. To make sure that the test won't fail by chance, I originally chose 0.1 as an upper bound. Now that I use a random state in the test, it's safe to reduce the bound to 0.01.

agramfort · 2017-11-10T19:40:52Z

thx @jdoepfert

jdoepfert · 2017-11-11T00:14:30Z

awesome!

… vectors (scikit-learn#10095) * add test for issue scikit-learn#10092 * add comment to test * split into two tests * add tests for scores, alpha and beta * adapt tests: n_samples != n_features * add test when no intercept is fitted * add handling of constant target vector when intercept is fitted * fix typo in comments * fix format issues * replace original fix with simpler fix * add comment * increase upper boundary for test * increase upper boundary for test * merge tests for ARDRegression and BayesianRidge * use random state in tests * decrease upper bound for std * replace np.spacing(1) -> np.finfo(np.float64).eps

jdoepfert added 9 commits November 9, 2017 10:56

add test for issue scikit-learn#10092

d6be6dc

add comment to test

f367a73

split into two tests

e01e730

add tests for scores, alpha and beta

a0fe5b0

adapt tests: n_samples != n_features

e4ffc99

add test when no intercept is fitted

108029e

add handling of constant target vector when intercept is fitted

19585f3

fix typo in comments

31cef4e

fix format issues

b528169

jdoepfert changed the title ~~[MRG] Fix BayesianRidge() for constant target vectors~~ [WIP] Fix BayesianRidge() for constant target vectors Nov 9, 2017

replace original fix with simpler fix

b01f009

glemaitre reviewed Nov 9, 2017

View reviewed changes

jdoepfert added 3 commits November 9, 2017 16:44

add comment

de87bdf

increase upper boundary for test

99e7ca7

increase upper boundary for test

0e1dee1

jdoepfert changed the title ~~[WIP] Fix BayesianRidge() for constant target vectors~~ [WIP] Fix BayesianRidge() and ARDRegression() for constant target vectors Nov 9, 2017

merge tests for ARDRegression and BayesianRidge

b5ac73c

jdoepfert changed the title ~~[WIP] Fix BayesianRidge() and ARDRegression() for constant target vectors~~ [MRG] Fix BayesianRidge() and ARDRegression() for constant target vectors Nov 9, 2017

agramfort reviewed Nov 9, 2017

View reviewed changes

jnothman reviewed Nov 9, 2017

View reviewed changes

jdoepfert added 3 commits November 9, 2017 22:23

use random state in tests

fde9b30

decrease upper bound for std

69e5221

replace np.spacing(1) -> np.finfo(np.float64).eps

cb5d3ca

agramfort approved these changes Nov 10, 2017

View reviewed changes

glemaitre changed the title ~~[MRG] Fix BayesianRidge() and ARDRegression() for constant target vectors~~ [MRG + 1] Fix BayesianRidge() and ARDRegression() for constant target vectors Nov 10, 2017

glemaitre approved these changes Nov 10, 2017

View reviewed changes

agramfort merged commit c6005e1 into scikit-learn:master Nov 10, 2017

sergeyf mentioned this pull request Nov 13, 2017

ARDRegression still crashes when trained on some constant y #10128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG + 1] Fix BayesianRidge() and ARDRegression() for constant target vectors #10095

[MRG + 1] Fix BayesianRidge() and ARDRegression() for constant target vectors #10095

Uh oh!

jdoepfert commented Nov 9, 2017 •

edited

Loading

Uh oh!

glemaitre commented Nov 9, 2017

Uh oh!

jdoepfert commented Nov 9, 2017

Uh oh!

glemaitre Nov 9, 2017

Uh oh!

jdoepfert Nov 9, 2017

Uh oh!

agramfort Nov 9, 2017

Uh oh!

jdoepfert Nov 9, 2017

Uh oh!

agramfort Nov 9, 2017

Uh oh!

jdoepfert Nov 9, 2017

Uh oh!

agramfort Nov 9, 2017

Uh oh!

jnothman Nov 9, 2017

Uh oh!

jdoepfert Nov 9, 2017

Uh oh!

agramfort commented Nov 10, 2017

Uh oh!

jdoepfert commented Nov 11, 2017

Uh oh!

Uh oh!

Uh oh!

[MRG + 1] Fix BayesianRidge() and ARDRegression() for constant target vectors #10095

[MRG + 1] Fix BayesianRidge() and ARDRegression() for constant target vectors #10095

Uh oh!

Conversation

jdoepfert commented Nov 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

glemaitre commented Nov 9, 2017

Uh oh!

jdoepfert commented Nov 9, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agramfort commented Nov 10, 2017

Uh oh!

jdoepfert commented Nov 11, 2017

Uh oh!

Uh oh!

jdoepfert commented Nov 9, 2017 •

edited

Loading