-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] Add calibration loss metric in classification #12479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/metrics/classification.py
Outdated
y_prob : array, shape (n_samples,) | ||
Probabilities of the positive class. | ||
bin_size : int | ||
Size of the bin (samples) analysed in one iteration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my opinion is better say: Size of the bin (samples) analysed on each iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sklearn/metrics/classification.py
Outdated
""" | ||
pos_loss = 0.0 | ||
neg_loss = 0.0 | ||
for bin_start in range(0, len(y_true) - bin_size + 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't use just range(len(y_true) - bin_size +1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pos_loss /= (len(y_true) - bin_size + 1) | ||
neg_loss /= (len(y_true) - bin_size + 1) | ||
loss = (0.5) * (pos_loss + neg_loss) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would better save the len(y_true) in a variable to don't call len() some times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- actual_per_pos_class).sum() | ||
pos_loss += bin_error_pos | ||
actual_per_neg_class = (bin_size - y_true[bin_start:bin_end] | ||
.sum()) / bin_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If bin_size is negative this will raise a exception. Maybe have to add a control before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am getting this error:
AssertionError: Docstring Error: sklearn.metrics.classification.calibration_loss arg mismatch:
sklearn/metrics/classification.py
Outdated
Probabilities of the positive class. | ||
bin_size : int | ||
Size of the bin (samples) analysed in each iteration | ||
Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to the let a newline below returns and try to ---
match the length of Return
Returns | |
Returns | |
------- |
sklearn/metrics/classification.py
Outdated
------- | ||
score : float | ||
Calibration loss | ||
Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same
Examples | |
Examples | |
-------- |
sklearn/metrics/classification.py
Outdated
---------- | ||
y_true : array, shape (n_samples,) | ||
True targets. | ||
y_prob : array, shape (n_samples,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y_prob : array, shape (n_samples,) | |
y_prob : array, shape (n_samples,) |
sklearn/metrics/classification.py
Outdated
True targets. | ||
y_prob : array, shape (n_samples,) | ||
Probabilities of the positive class. | ||
bin_size : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bin_size : int | |
bin_size : int |
If you apply my comments this solve the Travis error |
@eamanu Done!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please, edit the PR to add the [MRG] tag.
I need to add my name to the contributors list right? |
The changelog entry is sufficient, but it should be in v0.21.rst, not v0.20.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not doing a full review yet. This needs to be in doc/modules/{model_evaluation,classes}.rst
this needs to add |
There's an interesting discussion of debiasing the calibration error in https://arxiv.org/pdf/1909.10155.pdf That's a current NeurIPS paper but the method they are discussing is actually already established, so it might be a good candidate. cc @thomasjpfan who has shown interest. |
Reference Issues/PRs
#10971
What does this implement/fix? Explain your changes.
Added calibration loss metric for classification
Any other comments?