-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] BUG apply sample weights to RANSAC loss functions #15952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/linear_model/_ransac.py
Outdated
@@ -373,7 +373,10 @@ def fit(self, X, y, sample_weight=None): | |||
|
|||
# residuals of all data for current random sample model | |||
y_pred = base_estimator.predict(X) | |||
residuals_subset = loss_function(y, y_pred) | |||
if sample_weight is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use the sklearn.utils._check_sample_weight
. It will give something like:
sample_weight = _check_sample_weigth(sample_weight, X)
residuals_subset = loss_function(y, y_pred) * sample_weight
Thinking a bit more about I have the intuition that this is different from other models where we try to reduce the loss but taking into account all samples meaning we will try to reduce the error of the points with high weights. Here the high weights samples will only be discarded and recognized as outliers. @jnothman Does what I say make sense (or is it too late here :))? |
Hmmmm... Might have to think more about this when I have headspace to do
so.
I think that low weight on an outlier should mean it's more outlying... If
we ensure that the invariances we have previously discussed are upheld,
then this should be the case. But it might be hard to uphold them since
aspects of this algorithm are discrete?
|
I just updated the function to use the syntax suggested by @glemaitre. Though I don't have as thorough knowledge of sklearn, I think @glemaitre's point makes sense. Perhaps it would be more consistent to do something like the below?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A non-regression test might check for specific values, which we've tested conform with our expectations in relation to master??? In any case, please add a |
Sure thing @jnothman, I can get to adding a test this week. I'll ping you once it's added 👍 |
Superseded by #23371 |
Reference Issues/PRs
Fixes #15836
What does this implement/fix? Explain your changes.
Ensures that
sample_weight
is applied to the output of theloss_function
when determining inliers/outliers at each iteration.Any other comments?
I'm not too sure of the best way to add a non-regression test for this behavior. I originally tried fitting two RANSAC estimators with/without weights and then compared their output coefficients (which were indeed different) but there was no way to know that the difference was from
sample_weight
being used during the underlying estimators'.fit()
or if it was because ofsample_weight
being applied to the output of theloss_function
.Would it be useful to run an example with/without weights, checking the resulting
.inlier_mask_
(or something similar), and then writing a test that asserts the.inlier_mask_
is the same?