-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] Enhancement: Add MAPE as an evaluation metric #10711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d45fb57
762fa19
9a6b8fd
ea62611
70ab1e0
962a8e2
4f16a45
e2e93b0
6df2446
7c516ef
cb7d875
8194eb0
fd28645
d10af3c
61ea463
088467f
f9ac4e4
595bcf6
7575d0e
a228116
199f038
732d4dc
7a78396
0ffc824
953a6e0
064833b
e707fc8
cc8cd10
c780845
9dba988
ff43f92
a48f9d7
a0f2cd0
bfe2143
e4cb140
7312998
9c3a776
553f1ed
9ea3975
407332a
c74eabb
60f370d
e04fe37
2e20a48
203aed1
ec7de1e
96d6d4a
7aac4c0
6f1ab55
0ac8bc4
e2184f3
3dbe763
83c4aa5
6b350a5
bb36550
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -87,6 +87,7 @@ Scoring Function | |
'explained_variance' :func:`metrics.explained_variance_score` | ||
'max_error' :func:`metrics.max_error` | ||
'neg_mean_absolute_error' :func:`metrics.mean_absolute_error` | ||
'neg_mape' :func:`metrics.mean_absolute_percentage_error` | ||
'neg_mean_squared_error' :func:`metrics.mean_squared_error` | ||
'neg_root_mean_squared_error' :func:`metrics.mean_squared_error` | ||
'neg_mean_squared_log_error' :func:`metrics.mean_squared_log_error` | ||
|
@@ -1859,6 +1860,46 @@ Here is a small example of usage of the :func:`mean_absolute_error` function:: | |
>>> mean_absolute_error(y_true, y_pred, multioutput=[0.3, 0.7]) | ||
0.85... | ||
|
||
.. _mean_absolute_percentage_error: | ||
|
||
Mean absolute percentage error | ||
------------------------------ | ||
|
||
The :func:`mean_absolute_percentage_error` function, also known as **MAPE**, computes `mean absolute | ||
percentage error <https://en.wikipedia.org/wiki/Mean_absolute_percentage_error>`_, a risk | ||
metric corresponding to the expected value of the absolute percentage error loss or | ||
:math:`l1`-norm of percentage loss. | ||
|
||
If :math:`\hat{y}_i` is the predicted value of the :math:`i`-th sample, | ||
and :math:`y_i` is the corresponding true value, then the mean absolute percentage error | ||
(MAPE) estimated over :math:`n_{\text{samples}}` is defined as | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would put the (MAPE) at the first mention of mean absolute percentage error above. Also maybe add at least one sentence of explanation, say There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
.. math:: | ||
|
||
\text{MAPE}(y, \hat{y}) = \frac{100}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}}-1} \left| \frac{y_i - \hat{y}_i}{y_i} \right|. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't reviewed how the rest of the document does it, but it seems excessively pedantic to say the sum starts at 0 and ends n_samples-1: it makes the formula a little harder to read, where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was trying to follow the formula from Wikipedia, but your We can make the change in another PR if enough people agree with it :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jnothman, I think it's better to keep it as is, to remain consistent with the other metrics definitions. We can create an issue to apply the change to all metrics separately. |
||
|
||
Here is a small example of usage of the :func:`mean_absolute_percentage_error` function:: | ||
|
||
>>> from sklearn.metrics import mean_absolute_percentage_error | ||
>>> y_true = [3, -0.5, 2, 7] | ||
>>> y_pred = [2.5, 0.0, 2, 8] | ||
>>> mean_absolute_percentage_error(y_true, y_pred) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe add an example of it not being shift-invariant, i.e. add 10 to y_true and y_pred and show that the error is much smaller and add a sentence to explain. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
32.738... | ||
|
||
MAPE computes the error relative to the true value. Therefore the same absolute distance between | ||
prediction and ground truth will lead to a smaller error if the true value is larger. | ||
In particular the metric is not shift-invariant. For example, if :math:`y_{true}` and :math:`y_{pred}` | ||
in the example above are shifted by adding 10, the error becomes smaller: | ||
|
||
>>> from sklearn.metrics import mean_absolute_percentage_error | ||
>>> import numpy as np | ||
>>> y_true = np.array([3, -0.5, 2, 7]) | ||
>>> y_pred = np.array([2.5, 0.0, 2, 8]) | ||
>>> y_true = y_true + 10 | ||
>>> y_pred = y_pred + 10 | ||
>>> mean_absolute_percentage_error(y_true, y_pred) | ||
3.747... | ||
|
||
.. _mean_squared_error: | ||
|
||
Mean squared error | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,7 @@ | |
# Michael Eickenberg <[email protected]> | ||
# Konstantin Shmelkov <[email protected]> | ||
# Christian Lorentzen <[email protected]> | ||
# Mohamed Ali Jamaoui <[email protected]> | ||
# License: BSD 3 clause | ||
|
||
|
||
|
@@ -36,6 +37,7 @@ | |
__ALL__ = [ | ||
"max_error", | ||
"mean_absolute_error", | ||
"mean_absolute_percentage_error", | ||
"mean_squared_error", | ||
"mean_squared_log_error", | ||
"median_absolute_error", | ||
|
@@ -189,6 +191,47 @@ def mean_absolute_error(y_true, y_pred, | |
return np.average(output_errors, weights=multioutput) | ||
|
||
|
||
def mean_absolute_percentage_error(y_true, y_pred): | ||
"""Mean absolute percentage error regression loss | ||
|
||
Read more in the :ref:`User Guide <mean_absolute_percentage_error>`. | ||
|
||
Parameters | ||
---------- | ||
y_true : array-like of shape = (n_samples,) | ||
Ground truth (correct) target values. | ||
|
||
y_pred : array-like of shape = (n_samples,) | ||
Estimated target values. | ||
|
||
Returns | ||
------- | ||
loss : float | ||
A positive floating point value between 0.0 and 100.0, | ||
the best value is 0.0. | ||
|
||
Examples | ||
-------- | ||
>>> from sklearn.metrics import mean_absolute_percentage_error | ||
>>> y_true = [3, -0.5, 2, 7] | ||
>>> y_pred = [2.5, 0.0, 2, 8] | ||
>>> mean_absolute_percentage_error(y_true, y_pred) | ||
32.738... | ||
""" | ||
y_type, y_true, y_pred, _ = _check_reg_targets(y_true, y_pred, | ||
'uniform_average') | ||
|
||
if y_type == 'continuous-multioutput': | ||
raise ValueError("Multioutput not supported " | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's fine not to support it for now, but I think there is little doubt that multi-output mape would be the same on the flattened input: this would give an identical measure to macro-averaging. If we supported the variant mentioned in Wikipedia where you divide by the mean y_true, that is a different matter, because the mean across all columns may be inappropriate. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess there is a possibility to add the two kinds of implementations of MAPE and allow the user to change between them. We can also do that, when users request to have it :) |
||
"in mean_absolute_percentage_error") | ||
|
||
if (y_true == 0).any(): | ||
raise ValueError("mean_absolute_percentage_error requires" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not currently executed in any tests. It should be tested There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a test case called |
||
" y_true to not include zeros") | ||
|
||
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100 | ||
|
||
|
||
def mean_squared_error(y_true, y_pred, | ||
sample_weight=None, | ||
multioutput='uniform_average', squared=True): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not spell it out fully here since like all the other metrics? i.e. neg_mean_absolute_percentage_error
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lesteve I clarified in the PR description above that the name has to be chosen/voted by all of us. Initially I used
neg_mean_absolute_percentage_error
but then, sincemape
is already a famous acronym which, also, makes the metric cleaner, I chose to switch to neg_mape. However, we can change back to the long version, If most of us think that's the right thing to do.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be in favour of
neg_mean_absolute_error
version personally. It is more consistent withneg_mean_absolute_error
and more consistent with the metric name (metrics.mean_absolute_percentage_error
). Happy to hear what others think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also be in favor using the explicit expanded name by default and introduce
neg_mape
as an alias as we do forneg_mse
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we do not have
neg_mse
. I thought we had.