-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
[MRG+1] Add new regression metric - Mean Squared Log Error #7655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e00d9b3
0fa24ab
35de499
62f1317
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,6 +14,7 @@ | |
| # Jochen Wersdorfer <[email protected]> | ||
| # Lars Buitinck | ||
| # Joel Nothman <[email protected]> | ||
| # Karan Desai <[email protected]> | ||
| # Noel Dawe <[email protected]> | ||
| # Manoj Kumar <[email protected]> | ||
| # Michael Eickenberg <[email protected]> | ||
|
|
@@ -33,6 +34,7 @@ | |
| __ALL__ = [ | ||
| "mean_absolute_error", | ||
| "mean_squared_error", | ||
| "mean_squared_log_error", | ||
| "median_absolute_error", | ||
| "r2_score", | ||
| "explained_variance_score" | ||
|
|
@@ -241,6 +243,73 @@ def mean_squared_error(y_true, y_pred, | |
| return np.average(output_errors, weights=multioutput) | ||
|
|
||
|
|
||
| def mean_squared_log_error(y_true, y_pred, | ||
| sample_weight=None, | ||
| multioutput='uniform_average'): | ||
| """Mean squared logarithmic error regression loss | ||
|
|
||
| Read more in the :ref:`User Guide <mean_squared_log_error>`. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| y_true : array-like of shape = (n_samples) or (n_samples, n_outputs) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: you should write
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| Ground truth (correct) target values. | ||
|
|
||
| y_pred : array-like of shape = (n_samples) or (n_samples, n_outputs) | ||
| Estimated target values. | ||
|
|
||
| sample_weight : array-like of shape = (n_samples), optional | ||
| Sample weights. | ||
|
|
||
| multioutput : string in ['raw_values', 'uniform_average'] \ | ||
| or array-like of shape = (n_outputs) | ||
|
|
||
| Defines aggregating of multiple output values. | ||
| Array-like value defines weights used to average errors. | ||
|
|
||
| 'raw_values' : | ||
| Returns a full set of errors when the input is of multioutput | ||
| format. | ||
|
|
||
| 'uniform_average' : | ||
| Errors of all outputs are averaged with uniform weight. | ||
|
|
||
| Returns | ||
| ------- | ||
| loss : float or ndarray of floats | ||
| A non-negative floating point value (the best value is 0.0), or an | ||
| array of floating point values, one for each individual target. | ||
|
|
||
| Examples | ||
| -------- | ||
| >>> from sklearn.metrics import mean_squared_log_error | ||
| >>> y_true = [3, 5, 2.5, 7] | ||
| >>> y_pred = [2.5, 5, 4, 8] | ||
| >>> mean_squared_log_error(y_true, y_pred) # doctest: +ELLIPSIS | ||
| 0.039... | ||
| >>> y_true = [[0.5, 1], [1, 2], [7, 6]] | ||
| >>> y_pred = [[0.5, 2], [1, 2.5], [8, 8]] | ||
| >>> mean_squared_log_error(y_true, y_pred) # doctest: +ELLIPSIS | ||
| 0.044... | ||
| >>> mean_squared_log_error(y_true, y_pred, multioutput='raw_values') | ||
| ... # doctest: +ELLIPSIS | ||
| array([ 0.004..., 0.083...]) | ||
| >>> mean_squared_log_error(y_true, y_pred, multioutput=[0.3, 0.7]) | ||
| ... # doctest: +ELLIPSIS | ||
| 0.060... | ||
|
|
||
| """ | ||
| y_type, y_true, y_pred, multioutput = _check_reg_targets( | ||
| y_true, y_pred, multioutput) | ||
|
|
||
| if not (y_true >= 0).all() and not (y_pred >= 0).all(): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It can be used with anything > -1, right?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @amueller It can be, but
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Additionally I just recalled that, I read somewhere - this metric is used for positive values only, still there is
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. alright.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, my reading of the equation agrees that it's designed for non-negative values with an exponential trend. |
||
| raise ValueError("Mean Squared Logarithmic Error cannot be used when " | ||
| "targets contain negative values.") | ||
|
|
||
| return mean_squared_error(np.log(y_true + 1), np.log(y_pred + 1), | ||
| sample_weight, multioutput) | ||
|
|
||
|
|
||
| def median_absolute_error(y_true, y_pred): | ||
| """Median absolute error regression loss | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kaggle's note that "RMSLE penalizes an under-predicted estimate greater than an over-predicted estimate" may be valuable here.