Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] Deprecate residual_metric and add support for loss in RANSAC #5497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 15, 2016

Conversation

MechCoder
Copy link
Member

Partly fixes #4740

Supply arbitrary residual metrics for 1-D targets was not possible.

from sklearn.linear_model import RANSACRegressor
from sklearn.datasets import make_regression
X, y = make_regression()
res_met = lambda dy: dy ** 2
ransac = RANSACRegressor(min_samples=5, residual_metric=res_met)
ransac.fit(X, y)
IndexError: too many indices for array

The workaround was to explicitly define res_met as accepting 2-D arrays, (since there is a reshape done) which is non-obvious.

res_met = lambda dy: np.sum(dy**2, axis=1)
ransac = RANSACRegressor(min_samples=5, residual_metric=res_met)
ransac.fit(X, y)

@MechCoder
Copy link
Member Author

ping @amueller

On hindsight, I think something like loss_function which takes in y_true and ypred as input and returns the loss would have been better, since in that way I can use functions in sklearn.metrics directly.

@MechCoder
Copy link
Member Author

Tests should fail because of this (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/tests/test_ransac.py#L296) . I find it odd that the residual metric for a 1-D array, expects a 2-D array.

I'm not sure what the cleanest way to proceed is, except for deprecating the behavior.

@MechCoder
Copy link
Member Author

ping @ahojnnes . Would be great to have your inputs.

@agramfort
Copy link
Member

test?

@MechCoder
Copy link
Member Author

I can add a test if we decide what to do with the current behavior.

The current behavior for residual_metric accepts a 2-D array and returns a 1-D array even for mono-output y.

@agramfort
Copy link
Member

no opinion ...

@MechCoder
Copy link
Member Author

deprecate?

@agramfort
Copy link
Member

agramfort commented Oct 21, 2015 via email

@MechCoder
Copy link
Member Author

See: #2025

@agramfort
Copy link
Member

agramfort commented Oct 21, 2015 via email

@MechCoder
Copy link
Member Author

also pinging @jnothman and @arjoly as they reviewed the earlier PR

@amueller
Copy link
Member

@MechCoder travis is unhappy

@@ -177,6 +183,15 @@ def __init__(self, base_estimator=None, min_samples=None,
self.residual_metric = residual_metric
self.random_state = random_state

def _residual_metric(residual):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you don't need to make a method out of this.

@arjoly
Copy link
Member

arjoly commented Oct 23, 2015

Can you add tests?

@MechCoder
Copy link
Member Author

@amueller Tests are failing because I do not know what to do with the existing behavior for 1-D array.

The parameter residual_metric accepts a callable that accepts a 2-D array and returns a 1-D array.
This behavior is for both 1-D and 2-D arrays.

For example, now if I need to support arbitrary residual metrics for 1-D array, I need to do this

res_met = lambda dy: np.sum(dy**2, axis=1)
ransac = RANSACRegressor(min_samples=5, residual_metric=res_met)
ransac.fit(X, y)

which is unusual, don't you think?

@amueller
Copy link
Member

@agramfort @arjoly IRL @MechCoder and I just discussed the current behavior. We thought it might be more scikit-learn style to pass a scorer instead of the residual_metric. That would allow reuse of the functions in the metrics module.
We could deprecate residual_metric and introduce scoring. That makes writing your own one slightly harder, though.
Alternatively, we could pass a score_func which is just one of the metrics functions. That would mean we can't use strings, though. Wdyt?

@agramfort
Copy link
Member

agramfort commented Oct 30, 2015 via email

@MechCoder
Copy link
Member Author

@agramfort

# Previous
from sklearn.linear_regression import RANSACRegressor
res_metric = lambda dy: np.mean(np.abs(dy.reshape(-1, 1)))
ransac = RANSACRegressor(residual_metric=res_metric)
ransac,fit(X, y)

# Suggested approach 1
ransac = RANSACRegressor(scoring=mean_absolute_error)
ransac.fit(X, y)

# Suggested approach 2
scorer = make_scorer(mean_absolute_error, greater_is_better=False)
ransac = RANSACRegressor(scoring=scorer)
ransac.fit(X, y)

@agramfort
Copy link
Member

agramfort commented Oct 31, 2015 via email

@MechCoder
Copy link
Member Author

Then, how do you suggest we provide string inputs?

@agramfort
Copy link
Member

agramfort commented Oct 31, 2015 via email

@MechCoder
Copy link
Member Author

sounds good to me as well.

@amueller
Copy link
Member

amueller commented Nov 2, 2015

I'm happy to use callables that take y_true and y_pred. It's just that I don't think we use this kind of function as an option anywhere else. But I don't have a strong opinion.

@MechCoder MechCoder changed the title [MRG] Supply arbitrary residual_metrics to RANSAC for 1-D targets [MRG] Deprecate residual_metric and add support for loss in RANSAC Nov 3, 2015
@MechCoder
Copy link
Member Author

@amueller @agramfort I've made changes. Please review.

@agramfort
Copy link
Member

that's fine with me. I'll let @amueller validate.

@MechCoder
Copy link
Member Author

@amueller I can haz reviews?

else:
residual_metric = self.residual_metric
raise ValueError(
"loss should be 'absolute_loss', 'squared_loss' or a callable."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space in the string at the end. Capitalize "Got".

@ahojnnes
Copy link
Contributor

Left you two comments, otherwise LGTM.

@MechCoder
Copy link
Member Author

Two +1's from @agramfort and @ahojnnes . Will merge when Travis passes

MechCoder added a commit that referenced this pull request Jan 15, 2016
[MRG] Deprecate residual_metric and add support for loss in RANSAC
@MechCoder MechCoder merged commit ac2ff4a into scikit-learn:master Jan 15, 2016
@MechCoder MechCoder deleted the ransac_residual branch January 15, 2016 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RANSACRegressor residual_metric has a weird docstring
5 participants