Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] Add calibration loss metric in classification #12479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

aishgrt1
Copy link
Contributor

Reference Issues/PRs

#10971

What does this implement/fix? Explain your changes.

Added calibration loss metric for classification

Any other comments?

y_prob : array, shape (n_samples,)
Probabilities of the positive class.
bin_size : int
Size of the bin (samples) analysed in one iteration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion is better say: Size of the bin (samples) analysed on each iteration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"""
pos_loss = 0.0
neg_loss = 0.0
for bin_start in range(0, len(y_true) - bin_size + 1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't use just range(len(y_true) - bin_size +1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pos_loss /= (len(y_true) - bin_size + 1)
neg_loss /= (len(y_true) - bin_size + 1)
loss = (0.5) * (pos_loss + neg_loss)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would better save the len(y_true) in a variable to don't call len() some times

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- actual_per_pos_class).sum()
pos_loss += bin_error_pos
actual_per_neg_class = (bin_size - y_true[bin_start:bin_end]
.sum()) / bin_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If bin_size is negative this will raise a exception. Maybe have to add a control before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am getting this error:
AssertionError: Docstring Error: sklearn.metrics.classification.calibration_loss arg mismatch:

Probabilities of the positive class.
bin_size : int
Size of the bin (samples) analysed in each iteration
Returns
Copy link
Contributor

@eamanu eamanu Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to the let a newline below returns and try to --- match the length of Return

Suggested change
Returns
Returns
-------

-------
score : float
Calibration loss
Examples
Copy link
Contributor

@eamanu eamanu Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same

Suggested change
Examples
Examples
--------

----------
y_true : array, shape (n_samples,)
True targets.
y_prob : array, shape (n_samples,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
y_prob : array, shape (n_samples,)
y_prob : array, shape (n_samples,)

True targets.
y_prob : array, shape (n_samples,)
Probabilities of the positive class.
bin_size : int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bin_size : int
bin_size : int

@eamanu
Copy link
Contributor

eamanu commented Oct 30, 2018

If you apply my comments this solve the Travis error

@aishgrt1
Copy link
Contributor Author

@eamanu Done!!

Copy link
Contributor

@eamanu eamanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please, edit the PR to add the [MRG] tag.

@aishgrt1 aishgrt1 changed the title Add calibration loss metric in classification [MRG] Add calibration loss metric in classification Oct 30, 2018
@aishgrt1
Copy link
Contributor Author

I need to add my name to the contributors list right?

@jnothman
Copy link
Member

I need to add my name to the contributors list right?

The changelog entry is sufficient, but it should be in v0.21.rst, not v0.20.rst

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not doing a full review yet. This needs to be in doc/modules/{model_evaluation,classes}.rst

@amueller
Copy link
Member

this needs to add
https://www.math.ucdavis.edu/~saito/data/roc/ferri-class-perf-metrics.pdf
and maybe references therein.
Maybe we should add that one to the metric docs just somewhere in general?

@amueller
Copy link
Member

There's an interesting discussion of debiasing the calibration error in https://arxiv.org/pdf/1909.10155.pdf

That's a current NeurIPS paper but the method they are discussing is actually already established, so it might be a good candidate. cc @thomasjpfan who has shown interest.

@ogrisel
Copy link
Member

ogrisel commented Nov 9, 2020

I just realized that #11096 implements the same thing and additional variations and has a more complete documentation.

Sorry @aishgrt1 for our failure to review your work properly on time. Thanks again for your contribution.

@ogrisel ogrisel closed this Nov 9, 2020
@ogrisel ogrisel added Superseded PR has been replace by a newer PR and removed Stalled labels Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:metrics Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants