Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] BUG apply sample weights to RANSAC loss functions #15952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jcusick13
Copy link
Contributor

Reference Issues/PRs

Fixes #15836

What does this implement/fix? Explain your changes.

Ensures that sample_weight is applied to the output of the loss_function when determining inliers/outliers at each iteration.

Any other comments?

I'm not too sure of the best way to add a non-regression test for this behavior. I originally tried fitting two RANSAC estimators with/without weights and then compared their output coefficients (which were indeed different) but there was no way to know that the difference was from sample_weight being used during the underlying estimators' .fit() or if it was because of sample_weight being applied to the output of the loss_function.

Would it be useful to run an example with/without weights, checking the resulting .inlier_mask_ (or something similar), and then writing a test that asserts the .inlier_mask_ is the same?

@@ -373,7 +373,10 @@ def fit(self, X, y, sample_weight=None):

# residuals of all data for current random sample model
y_pred = base_estimator.predict(X)
residuals_subset = loss_function(y, y_pred)
if sample_weight is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the sklearn.utils._check_sample_weight. It will give something like:

sample_weight = _check_sample_weigth(sample_weight, X)
residuals_subset = loss_function(y, y_pred) * sample_weight

@glemaitre glemaitre changed the title [MRG] Apply sample weights to RANSAC loss functions [MRG] BUG apply sample weights to RANSAC loss functions Dec 23, 2019
@glemaitre
Copy link
Member

Thinking a bit more about sample_weight in RANSAC, I find something counter-intuitive. The semantic of the weights seem to be inverse than in other algorithms. Low weights on outliers will favour picking these points since the loss will be lower than high weights on inliers.

I have the intuition that this is different from other models where we try to reduce the loss but taking into account all samples meaning we will try to reduce the error of the points with high weights. Here the high weights samples will only be discarded and recognized as outliers.

@jnothman Does what I say make sense (or is it too late here :))?

@jnothman
Copy link
Member

jnothman commented Dec 24, 2019 via email

@jcusick13
Copy link
Contributor Author

I just updated the function to use the syntax suggested by @glemaitre. Though I don't have as thorough knowledge of sklearn, I think @glemaitre's point makes sense. Perhaps it would be more consistent to do something like the below?

residuals_subset = loss_function(y, y_pred) * (1 - sample_weight)

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman
Copy link
Member

A non-regression test might check for specific values, which we've tested conform with our expectations in relation to master???

In any case, please add a |Fix| entry to the change log at doc/whats_new/v0.23.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:

@jcusick13
Copy link
Contributor Author

Sure thing @jnothman, I can get to adding a test this week. I'll ping you once it's added 👍

@glemaitre
Copy link
Member

Superseded by #23371

@glemaitre glemaitre closed this May 17, 2022
@cmarmo cmarmo added Superseded PR has been replace by a newer PR and removed Stalled help wanted labels Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement module:linear_model Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RANSACRegressor should use weighted loss functions
5 participants