[MRG] BUG apply sample weights to RANSAC loss functions #15952

jcusick13 · 2019-12-22T20:23:37Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Ensures that sample_weight is applied to the output of the loss_function when determining inliers/outliers at each iteration.

Any other comments?

I'm not too sure of the best way to add a non-regression test for this behavior. I originally tried fitting two RANSAC estimators with/without weights and then compared their output coefficients (which were indeed different) but there was no way to know that the difference was from sample_weight being used during the underlying estimators' .fit() or if it was because of sample_weight being applied to the output of the loss_function.

Would it be useful to run an example with/without weights, checking the resulting .inlier_mask_ (or something similar), and then writing a test that asserts the .inlier_mask_ is the same?

glemaitre · 2019-12-23T23:15:43Z

sklearn/linear_model/_ransac.py

@@ -373,7 +373,10 @@ def fit(self, X, y, sample_weight=None):

            # residuals of all data for current random sample model
            y_pred = base_estimator.predict(X)
-            residuals_subset = loss_function(y, y_pred)
+            if sample_weight is None:


Could you use the sklearn.utils._check_sample_weight. It will give something like:

sample_weight = _check_sample_weigth(sample_weight, X) residuals_subset = loss_function(y, y_pred) * sample_weight

glemaitre · 2019-12-23T23:32:12Z

Thinking a bit more about sample_weight in RANSAC, I find something counter-intuitive. The semantic of the weights seem to be inverse than in other algorithms. Low weights on outliers will favour picking these points since the loss will be lower than high weights on inliers.

I have the intuition that this is different from other models where we try to reduce the loss but taking into account all samples meaning we will try to reduce the error of the points with high weights. Here the high weights samples will only be discarded and recognized as outliers.

@jnothman Does what I say make sense (or is it too late here :))?

jnothman · 2019-12-24T03:06:55Z

Hmmmm... Might have to think more about this when I have headspace to do so. I think that low weight on an outlier should mean it's more outlying... If we ensure that the invariances we have previously discussed are upheld, then this should be the case. But it might be hard to uphold them since aspects of this algorithm are discrete?

jcusick13 · 2019-12-26T18:03:43Z

I just updated the function to use the syntax suggested by @glemaitre. Though I don't have as thorough knowledge of sklearn, I think @glemaitre's point makes sense. Perhaps it would be more consistent to do something like the below?

residuals_subset = loss_function(y, y_pred) * (1 - sample_weight)

jnothman

jnothman · 2020-04-19T15:10:58Z

A non-regression test might check for specific values, which we've tested conform with our expectations in relation to master???

In any case, please add a |Fix| entry to the change log at doc/whats_new/v0.23.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:

jcusick13 · 2020-05-04T12:55:30Z

Sure thing @jnothman, I can get to adding a test this week. I'll ping you once it's added 👍

glemaitre · 2022-05-17T08:23:10Z

Superseded by #23371

Apply sample weights to loss fn

61d3170

glemaitre reviewed Dec 23, 2019

View reviewed changes

glemaitre changed the title ~~[MRG] Apply sample weights to RANSAC loss functions~~ [MRG] BUG apply sample weights to RANSAC loss functions Dec 23, 2019

Use internal check sample weight fn

1b80337

rth added Enhancement module:linear_model labels Feb 5, 2020

rth self-requested a review February 5, 2020 21:16

jnothman reviewed Apr 19, 2020

View reviewed changes

Base automatically changed from master to main January 22, 2021 10:51

cmarmo added Stalled help wanted labels Mar 29, 2022

MaxwellLZH mentioned this pull request May 14, 2022

FIX apply sample weight to RANSAC residual threshold #23371

Closed

glemaitre closed this May 17, 2022

cmarmo added Superseded PR has been replace by a newer PR and removed Stalled help wanted labels Aug 3, 2022

glemaitre mentioned this pull request Mar 13, 2024

RANSACRegressor should use weighted loss functions #15836

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] BUG apply sample weights to RANSAC loss functions #15952

[MRG] BUG apply sample weights to RANSAC loss functions #15952

Uh oh!

jcusick13 commented Dec 22, 2019

Uh oh!

glemaitre Dec 23, 2019

Uh oh!

glemaitre commented Dec 23, 2019

Uh oh!

jnothman commented Dec 24, 2019 via email

Uh oh!

jcusick13 commented Dec 26, 2019

Uh oh!

jnothman left a comment •

edited

Loading

Uh oh!

jnothman commented Apr 19, 2020

Uh oh!

jcusick13 commented May 4, 2020

Uh oh!

glemaitre commented May 17, 2022

Uh oh!

Uh oh!

Uh oh!

[MRG] BUG apply sample weights to RANSAC loss functions #15952

[MRG] BUG apply sample weights to RANSAC loss functions #15952

Uh oh!

Conversation

jcusick13 commented Dec 22, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

glemaitre Dec 23, 2019

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Dec 23, 2019

Uh oh!

jnothman commented Dec 24, 2019 via email

Uh oh!

jcusick13 commented Dec 26, 2019

Uh oh!

jnothman left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Apr 19, 2020

Uh oh!

jcusick13 commented May 4, 2020

Uh oh!

glemaitre commented May 17, 2022

Uh oh!

Uh oh!

jnothman left a comment •

edited

Loading