-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Fix pass sample weights to final estimator #15773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nd RadiusNeighborsRegressor
…nd RadiusNeighborsRegressor
This requires tests, and the loss functions should probably also be weighted. |
I just fixed the line length and pushed to my repository, how do I rerun the tests? |
I will look into the issue you sent me. |
@jnothman, before you said just passing a custom base_estimator that required a sample_weight was enough. When I create this new test, as long as I'm passing the dummy estimator that requires the sample_weight and it runs with no problem, no specific assert statement is really necessary? |
Now I'm not sure that the test I proposed before is sufficient. It would be
sufficient to test non-regression, but not sufficient to test correct
handling of weights.
|
How do you propose testing correct handling of weights? |
|
||
|
||
def test_ransac_base_estimator_fit_sample_weight(): | ||
class DummyLinearRegression(LinearRegression): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This estimator is exactly a LinearRegression then. So we could use the LinearRegression directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sample_weight is optional in the original LinearRegression, the point of the dummy is to make it necessary in the call. The old _ransac.py code breaks, as it should, using this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see. I think the test below is better since we check the fitted model.
base_estimator = DummyLinearRegression() | ||
ransac_estimator = RANSACRegressor(base_estimator, random_state=0) | ||
n_samples = y.shape[0] | ||
weights = np.ones(n_samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that the test is actually testing anything. We should make sure that passing non unit weights will lead to a final model train on non unit weights. One way to do that is to pass a sample_weight
with non unit weights and be sure sure that some of these weights will be used. Then we can use ransac_estimator.inlier_mask_
to train a model which should give the same results than the fitted model in ransac.
As in where is it? |
Found it. Do I just add it to the last version's file? |
What do you mean? |
doc/whats_new/v0.23.rst
Outdated
:mod:`sklearn.linear_model` | ||
........................... | ||
|
||
- |Fix| Fixed a bug that made :class:`linear_model.RANSACRegressor` fail when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- |Fix| Fixed a bug that made :class:`linear_model.RANSACRegressor` fail when | |
- |Fix| Fixed a bug that made :class:`linear_model.RANSACRegressor` failed when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove all the extra info around RANSACRegressor, the original reads:
Fixed a bug that made RANSACRegressor fail when
The suggestion reads:
Fixed a bug that made RANSACRegressor failed when
The original grammar is correct, are you saying this is just the standard? Should I change something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my mistake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug is not around failing if the estimator requires weights. It's that weights should have been passed on any case.
"Fixed a bug where sample_weight were not used when fitting the final estimator ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. I corrected it.
Co-Authored-By: Guillaume Lemaitre <[email protected]>
Co-Authored-By: Guillaume Lemaitre <[email protected]>
jnothman mentions that the loss functions within RANSACRegressor.fit() should probably be weighted, do you have a suggestion for how I should go about it? |
I see. I would suggest to first merge this PR and open another one to solve the issue with the loss. |
To be more specific, it would be a diff around this: diff --git a/sklearn/linear_model/_ransac.py b/sklearn/linear_model/_ransac.py
index 40ebb3a084..50a9fecf29 100644
--- a/sklearn/linear_model/_ransac.py
+++ b/sklearn/linear_model/_ransac.py
@@ -283,20 +283,22 @@ class RANSACRegressor(MetaEstimatorMixin, RegressorMixin,
residual_threshold = self.residual_threshold
if self.loss == "absolute_loss":
- if y.ndim == 1:
- loss_function = lambda y_true, y_pred: np.abs(y_true - y_pred)
- else:
- loss_function = lambda \
- y_true, y_pred: np.sum(np.abs(y_true - y_pred), axis=1)
+
+ def loss_function(y_true, y_pred, sample_weight=None):
+ sample_weight = np.ones(y_true.shape)
+ error = np.abs(sample_weight * (y_true - y_pred))
+ return error if y.ndim == 1 else np.sum(error, axis=1)
elif self.loss == "squared_loss":
- if y.ndim == 1:
- loss_function = lambda y_true, y_pred: (y_true - y_pred) ** 2
- else:
- loss_function = lambda \
- y_true, y_pred: np.sum((y_true - y_pred) ** 2, axis=1)
+
+ def loss_function(y_true, y_pred, sample_weight=None):
+ sample_weight = np.ones(y_true.shape)
+ error = sample_weight * ((y_true - y_pred) ** 2)
+ return error if y.ndim == 1 else np.sum(error, axis=1)
elif callable(self.loss):
+ # FIXME: check that self.loss has `sample_weight` parameters if
+ # it is sample_weight is not None
loss_function = self.loss
else:
@@ -373,7 +375,12 @@ class RANSACRegressor(MetaEstimatorMixin, RegressorMixin,
# residuals of all data for current random sample model
y_pred = base_estimator.predict(X)
- residuals_subset = loss_function(y, y_pred)
+ if sample_weight is None:
+ residuals_subset = loss_function(y, y_pred)
+ else:
+ residuals_subset = loss_function(
+ y, y_pred, sample_weight=sample_weight
+ )
# classify data into inliers and outliers
inlier_mask_subset = residuals_subset < residual_threshold I would need to think a bit more regarding the testing. But in some way, we want to check some model equivalence or differences. |
Do we automatically raise a ValueError here if both conditions aren't met? Also, this statement is not automatically wiping out whatever value sample_weight had?:
Should it be this?:
|
Also, this pull request is ready? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jnothman do you want to have a look at it. Basically, I would be interested to know if we should add |
yes I made a mistake. Basically this is just to have an idea of what to do :)
I think that this fine if the loss defined handle sample_weight but a user does not give one. But again this is along these line. What will be important is to have some proper tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise this lgtm but yes we need an issue re the loss functions
doc/whats_new/v0.23.rst
Outdated
:mod:`sklearn.linear_model` | ||
........................... | ||
|
||
- |Fix| Fixed a bug that made :class:`linear_model.RANSACRegressor` fail when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug is not around failing if the estimator requires weights. It's that weights should have been passed on any case.
"Fixed a bug where sample_weight were not used when fitting the final estimator ..."
@jnothman, I fixed the what's new entry. |
Please resolve conflicts, ensuring the change log remains in sorted order |
Thanks @J-A16 |
@J-A16 Thanks for your efforts. Do you want to make th next PR to include |
@glemaitre, you mentioned the tests should check for equivalences or differences, so perhaps I should implement the different loss functions in the tests, instantiate different RANSACRegressor objects, each with it's own loss function, and assert that the results are the same for corresponding loss functions? |
I was thinking about checking that
It could be a start. |
Reference Issues/PRs
Fixes #13425
What does this implement/fix? Explain your changes.
RANSACRegressor will now pass weights to the used estimator during the training of the final model.
Any other comments?