[MRG] Fix pass sample weights to final estimator #15773

J-A16 · 2019-12-04T02:28:50Z

Reference Issues/PRs

Fixes #13425

What does this implement/fix? Explain your changes.

RANSACRegressor will now pass weights to the used estimator during the training of the final model.

Any other comments?

…nd RadiusNeighborsRegressor

jnothman · 2019-12-04T02:35:10Z

This requires tests, and the loss functions should probably also be weighted.

J-A16 · 2019-12-04T02:36:28Z

I just fixed the line length and pushed to my repository, how do I rerun the tests?

J-A16 · 2019-12-04T02:37:22Z

I will look into the issue you sent me.

J-A16 · 2019-12-04T03:15:42Z

@jnothman, before you said just passing a custom base_estimator that required a sample_weight was enough. When I create this new test, as long as I'm passing the dummy estimator that requires the sample_weight and it runs with no problem, no specific assert statement is really necessary?

jnothman · 2019-12-04T07:19:29Z

Now I'm not sure that the test I proposed before is sufficient. It would be sufficient to test non-regression, but not sufficient to test correct handling of weights.

J-A16 · 2019-12-04T12:44:56Z

How do you propose testing correct handling of weights?

glemaitre · 2019-12-04T14:17:44Z

sklearn/linear_model/tests/test_ransac.py

+
+
+def test_ransac_base_estimator_fit_sample_weight():
+    class DummyLinearRegression(LinearRegression):


This estimator is exactly a LinearRegression then. So we could use the LinearRegression directly

The sample_weight is optional in the original LinearRegression, the point of the dummy is to make it necessary in the call. The old _ransac.py code breaks, as it should, using this test.

OK I see. I think the test below is better since we check the fitted model.

glemaitre · 2019-12-04T14:24:30Z

sklearn/linear_model/tests/test_ransac.py

+    base_estimator = DummyLinearRegression()
+    ransac_estimator = RANSACRegressor(base_estimator, random_state=0)
+    n_samples = y.shape[0]
+    weights = np.ones(n_samples)


I don't think that the test is actually testing anything. We should make sure that passing non unit weights will lead to a final model train on non unit weights. One way to do that is to pass a sample_weight with non unit weights and be sure sure that some of these weights will be used. Then we can use ransac_estimator.inlier_mask_ to train a model which should give the same results than the fitted model in ransac.

J-A16 · 2019-12-04T15:23:32Z

As in where is it?

J-A16 · 2019-12-04T15:27:06Z

Found it. Do I just add it to the last version's file?

glemaitre · 2019-12-04T17:16:39Z

@glemaitre When it comes to weighting the loss functions, what do you suggest?

What do you mean?

glemaitre · 2019-12-04T17:17:03Z

doc/whats_new/v0.23.rst

+:mod:`sklearn.linear_model`
+...........................
+
+- |Fix| Fixed a bug that made :class:`linear_model.RANSACRegressor` fail when


Suggested change

- |Fix| Fixed a bug that made :class:`linear_model.RANSACRegressor` fail when

- |Fix| Fixed a bug that made :class:`linear_model.RANSACRegressor` failed when

If we remove all the extra info around RANSACRegressor, the original reads:
Fixed a bug that made RANSACRegressor fail when

The suggestion reads:
Fixed a bug that made RANSACRegressor failed when

The original grammar is correct, are you saying this is just the standard? Should I change something else?

Sorry my mistake

The bug is not around failing if the estimator requires weights. It's that weights should have been passed on any case.

"Fixed a bug where sample_weight were not used when fitting the final estimator ..."

Ah, right. I corrected it.

doc/whats_new/v0.23.rst

Co-Authored-By: Guillaume Lemaitre <[email protected]>

J-A16 · 2019-12-04T17:33:32Z

@glemaitre When it comes to weighting the loss functions, what do you suggest?

What do you mean?

jnothman mentions that the loss functions within RANSACRegressor.fit() should probably be weighted, do you have a suggestion for how I should go about it?

glemaitre · 2019-12-04T17:58:43Z

jnothman mentions that the loss functions within RANSACRegressor.fit() should probably be weighted, do you have a suggestion for how I should go about it?

I see. I would suggest to first merge this PR and open another one to solve the issue with the loss.
Basically, we just need to define a loss that weight the error y - y_pred. The relevant line are from 285 to 297. Then, in the case we have a loss which is a callable and that sample_weight is not None, we need to check that the callable is taking a sample_weight argument otherwise we should raise an error.

glemaitre · 2019-12-04T18:28:14Z

To be more specific, it would be a diff around this:

diff --git a/sklearn/linear_model/_ransac.py b/sklearn/linear_model/_ransac.py
index 40ebb3a084..50a9fecf29 100644
--- a/sklearn/linear_model/_ransac.py
+++ b/sklearn/linear_model/_ransac.py
@@ -283,20 +283,22 @@ class RANSACRegressor(MetaEstimatorMixin, RegressorMixin,
             residual_threshold = self.residual_threshold
 
         if self.loss == "absolute_loss":
-            if y.ndim == 1:
-                loss_function = lambda y_true, y_pred: np.abs(y_true - y_pred)
-            else:
-                loss_function = lambda \
-                    y_true, y_pred: np.sum(np.abs(y_true - y_pred), axis=1)
+
+            def loss_function(y_true, y_pred, sample_weight=None):
+                sample_weight = np.ones(y_true.shape)
+                error = np.abs(sample_weight * (y_true - y_pred))
+                return error if y.ndim == 1 else np.sum(error, axis=1)
 
         elif self.loss == "squared_loss":
-            if y.ndim == 1:
-                loss_function = lambda y_true, y_pred: (y_true - y_pred) ** 2
-            else:
-                loss_function = lambda \
-                    y_true, y_pred: np.sum((y_true - y_pred) ** 2, axis=1)
+
+            def loss_function(y_true, y_pred, sample_weight=None):
+                sample_weight = np.ones(y_true.shape)
+                error = sample_weight * ((y_true - y_pred) ** 2)
+                return error if y.ndim == 1 else np.sum(error, axis=1)
 
         elif callable(self.loss):
+            # FIXME: check that self.loss has `sample_weight` parameters if
+            # it is sample_weight is not None
             loss_function = self.loss
 
         else:
@@ -373,7 +375,12 @@ class RANSACRegressor(MetaEstimatorMixin, RegressorMixin,
 
             # residuals of all data for current random sample model
             y_pred = base_estimator.predict(X)
-            residuals_subset = loss_function(y, y_pred)
+            if sample_weight is None:
+                residuals_subset = loss_function(y, y_pred)
+            else:
+                residuals_subset = loss_function(
+                    y, y_pred, sample_weight=sample_weight
+                )
 
             # classify data into inliers and outliers
             inlier_mask_subset = residuals_subset < residual_threshold

I would need to think a bit more regarding the testing. But in some way, we want to check some model equivalence or differences.

J-A16 · 2019-12-05T08:42:19Z

loss_has_sample_weight = `sample_weight` in signature(self.loss).parameters
if sample_weight is not None and loss_has_sample_weight:
    loss_function = self.loss
else:
   raise ValueError()

Do we automatically raise a ValueError here if both conditions aren't met?
I would use a function to test for the sample_weight parameter, but I couldn't find a general parameter testing function like has_fit_parameter().

Also, this statement is not automatically wiping out whatever value sample_weight had?:

sample_weight = np.ones(y_true.shape)

Should it be this?:

if sample_weight is None:
    sample_weight = np.ones(y_true.shape)

J-A16 · 2019-12-05T08:57:37Z

Also, this pull request is ready?

glemaitre

LGTM

glemaitre · 2019-12-05T18:05:37Z

@jnothman do you want to have a look at it. Basically, I would be interested to know if we should add sample_weight to the loss right now or if it can be in a subsequent PR.

glemaitre · 2019-12-05T18:09:20Z

Also, this statement is not automatically wiping out whatever value sample_weight had?:

yes I made a mistake. Basically this is just to have an idea of what to do :)

Do we automatically raise a ValueError here if both conditions aren't met?

I think that this fine if the loss defined handle sample_weight but a user does not give one. But again this is along these line.

What will be important is to have some proper tests.

jnothman

Otherwise this lgtm but yes we need an issue re the loss functions

jnothman · 2019-12-07T11:13:02Z

doc/whats_new/v0.23.rst

+:mod:`sklearn.linear_model`
+...........................
+
+- |Fix| Fixed a bug that made :class:`linear_model.RANSACRegressor` fail when


The bug is not around failing if the estimator requires weights. It's that weights should have been passed on any case.

"Fixed a bug where sample_weight were not used when fitting the final estimator ..."

J-A16 · 2019-12-08T16:05:27Z

@jnothman, I fixed the what's new entry.

jnothman · 2019-12-08T21:45:35Z

Please resolve conflicts, ensuring the change log remains in sorted order

jnothman · 2019-12-09T08:33:20Z

Thanks @J-A16

glemaitre · 2019-12-09T13:00:23Z

@J-A16 Thanks for your efforts.

Do you want to make th next PR to include sample_weight in the loss function? If you like of time, I can make the PR then.

J-A16 · 2019-12-09T17:09:11Z

@glemaitre, you mentioned the tests should check for equivalences or differences, so perhaps I should implement the different loss functions in the tests, instantiate different RANSACRegressor objects, each with it's own loss function, and assert that the results are the same for corresponding loss functions?

glemaitre · 2019-12-09T17:28:02Z

equivalences or differences

I was thinking about checking that sample_weight will have an impact on the loss and therefore on the final estimator found.

assert that the results are the same for corresponding loss functions?

It could be a start.

J-A16 added 17 commits November 20, 2019 17:03

added versionadded to three algorithm classes

08eb587

added versionadded to three algorithm classes

cb36b47

added versionadded to three algorithm classes

2af9349

Update _regression.py

613347f

Update _regression.py

8ebdaf2

Update _unsupervised.py

f081b86

added versionadded to three algorithm classes

5ffdccd

Merge branch 'master' of https://github.com/J-A16/scikit-learn

4ceca1a

Update _regression.py

6e4fc9e

Update _regression.py

090e2a7

added versionadded comment to NearestNeighbors, KNeighborsRegressor a…

5df84b8

…nd RadiusNeighborsRegressor

Merge branch 'master' of https://github.com/J-A16/scikit-learn

9d8446b

added versionadded comment to NearestNeighbors, KNeighborsRegressor a…

db7afac

…nd RadiusNeighborsRegressor

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn

f40de4d

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn

ac093b7

Passed sample_weight to base_estimator.fit()

e0d78ce

Fix pass sample weights to final estimator

2c1a103

Fix pass sample weights to final estimator

2cc4e1d

J-A16 added 4 commits December 3, 2019 22:58

added test to test_ransac.py

af91dbc

added test to test_ransac.py

4b33817

Linter changes

9b62a7b

Undo

816f7a5

glemaitre reviewed Dec 4, 2019

View reviewed changes

J-A16 added 3 commits December 4, 2019 10:50

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn

2a6638d

added an entry to what's new

4b7a474

fixed title underline length

5251f0c

glemaitre reviewed Dec 4, 2019

View reviewed changes

doc/whats_new/v0.23.rst Outdated Show resolved Hide resolved

J-A16 and others added 2 commits December 4, 2019 12:27

Update doc/whats_new/v0.23.rst

125dade

Co-Authored-By: Guillaume Lemaitre <[email protected]>

Update doc/whats_new/v0.23.rst

fb3e5bc

Co-Authored-By: Guillaume Lemaitre <[email protected]>

glemaitre approved these changes Dec 5, 2019

View reviewed changes

jnothman reviewed Dec 7, 2019

View reviewed changes

fixed what's new entry

16b9c8a

J-A16 added 2 commits December 9, 2019 02:49

resolve conflict

2de97b8

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn

6b2be6f

jnothman merged commit 1c42e79 into scikit-learn:master Dec 9, 2019

panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020

FIX pass sample weights to final estimator (scikit-learn#15773)

fe14a32



		def test_ransac_base_estimator_fit_sample_weight():
		class DummyLinearRegression(LinearRegression):

	- \|Fix\| Fixed a bug that made :class:`linear_model.RANSACRegressor` fail when
	- \|Fix\| Fixed a bug that made :class:`linear_model.RANSACRegressor` failed when

Uh oh!

[MRG] Fix pass sample weights to final estimator #15773

[MRG] Fix pass sample weights to final estimator #15773

Uh oh!

Conversation

J-A16 commented Dec 4, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman commented Dec 4, 2019

Uh oh!

J-A16 commented Dec 4, 2019

Uh oh!

J-A16 commented Dec 4, 2019

Uh oh!

J-A16 commented Dec 4, 2019

Uh oh!

jnothman commented Dec 4, 2019 via email

Uh oh!

J-A16 commented Dec 4, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

J-A16 commented Dec 4, 2019

Uh oh!

J-A16 commented Dec 4, 2019

Uh oh!

glemaitre commented Dec 4, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

J-A16 commented Dec 4, 2019

Uh oh!

glemaitre commented Dec 4, 2019

Uh oh!

glemaitre commented Dec 4, 2019

Uh oh!

J-A16 commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

J-A16 commented Dec 5, 2019

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Dec 5, 2019

Uh oh!

glemaitre commented Dec 5, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

J-A16 commented Dec 8, 2019

Uh oh!

jnothman commented Dec 8, 2019

Uh oh!

jnothman commented Dec 9, 2019

Uh oh!

glemaitre commented Dec 9, 2019

Uh oh!

J-A16 commented Dec 9, 2019

J-A16 commented Dec 5, 2019 •

edited

Loading