-
-
Notifications
You must be signed in to change notification settings - Fork 26k
calibration_loss calculator added #10971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request introduces 1 alert when merging ce6ef9e into d3f8b1e - view on lgtm.com new alerts:
Comment posted by lgtm.com |
Fixed |
You need to add it to the tests in the metrics common tests, you need to add an explanation to the user guide and you should add it to relevant examples, probably the ones about the calibration curve and the brier loss. |
Texts is only complaining about flake8 errors. Duct worry about lgtm.com. please update the model evaluation user guide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this measure vary with sample order?
The calibration loss is defined as the measure to access the quality of | ||
learning methods and learned models. A calibration measure based on | ||
overlaping binning is CAL (Caruana and Niculescu-Mizil, 2004). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"read more in the User Guide"
sklearn/metrics/classification.py
Outdated
@@ -1993,3 +1993,64 @@ def brier_score_loss(y_true, y_prob, sample_weight=None, pos_label=None): | |||
y_true = np.array(y_true == pos_label, int) | |||
y_true = _check_binary_probabilistic_predictions(y_true, y_prob) | |||
return np.average((y_true - y_prob) ** 2, weights=sample_weight) | |||
|
|||
|
|||
def calibration_loss(y_true, y_prob, bin_size=2.0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If bin_size is an interview, use 2, not 2.0
sklearn/metrics/classification.py
Outdated
|
||
bin_end = bin_start + bin_size | ||
actual_per_pos_class = (y_true[bin_start:bin_end] | ||
.sum()) / float(bin_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Float should not be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Np.mean should be preferred anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the actual_per_neg_class np.mean couldn't be used, so to maintain consistency, I didn't use it in here as well !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then at least remove float
All test passing, except lgtm ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again: user guide?
sklearn/metrics/classification.py
Outdated
|
||
bin_end = bin_start + bin_size | ||
actual_per_pos_class = (y_true[bin_start:bin_end] | ||
.sum()) / float(bin_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then at least remove float
@jnothman I have added an example case and description along with the code, like the other metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing from test_common.py. You need to describe this in the user guide, i.e. doc/modules/model_evaluation.rst
sklearn/metrics/classification.py
Outdated
pos_loss = 0.0 | ||
neg_loss = 0.0 | ||
|
||
for bin_start in range(0, len(y_true)-bin_size + 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space around -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/metrics/classification.py
Outdated
pos_loss += bin_error_pos | ||
|
||
actual_per_neg_class = (bin_size - y_true[bin_start:bin_end] | ||
.sum()) / float(bin_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add from __future__ import division
at the top of the file instead of casting to float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
y_true = np.array([0, 1, 1, 0, 1, 1]) | ||
y_pred = np.array([0.1, 0.8, 0.9, 0.3, 1.0, 0.95]) | ||
calibration_loss_val = calibration_loss(y_true, y_pred, bin_size=2) | ||
assert_almost_equal(calibration_loss_val, 0.46999, decimal=4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this example from? Either cite a reference, or show how you calculated it
Due to some issues with this. I have added it here : #12479 |
This PR could be closed? |
Superseded by #12479 |
Reference Issues/PRs
New calibration_loss added
What does this implement/fix? Explain your changes.
caliberation_los calculator added
Any other comments?