-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
[MRG+1] Log loss bug fixed #7166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Log loss bug fixed #7166
Conversation
enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss
|
looks like there are conflicts, would you mind rebasing on master? |
doc/whats_new.rst
Outdated
| - The :func: `ignore_warnings` now accept a category argument to ignore only | ||
| the warnings of a specified type. By `Thierry Guillemot`_. | ||
|
|
||
| - Added Labels flag to `metrics.log_loss` to correct metrics when only one class is present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowercase labels, and surround it with double backticks (see line 129 for how to format parameters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also change metrics.log_loss to :class:metrics.log_loss
enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss
enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss
…n/scikit-learn into log_loss_bug_fixed3
|
hmm i'm not sure why past commits are showing up multiple times here.. |
sklearn/metrics/classification.py
Outdated
| sample_weight : array-like of shape = [n_samples], optional | ||
| Sample weights. | ||
| .. versionadded:: 0.18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
versionadded goes below the new thing, indented on the same level as If not provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saw that one just now. Pushed fix
|
Thanks @nelson-liu - I think the latest push fixes the issues you pointed out. However, on testing the code, I am not sure if it full resolves all three issues - in particular #4033. |
|
@nelson-liu - I'll be travelling for the next few days, but let me know if you have any thoughts on the above. From my side, OK to merge as-is (I don't think the code will degrade and there are some improvements), but I am not sure it fully squashes all three bugs so might be good to have a second pair of eyes on this. Let me know your thoughts and I can do some more work on this when I am back. |
| lb.fit(labels) if labels is not None else lb.fit(y_true) | ||
| if labels is None and len(lb.classes_) == 1: | ||
| raise ValueError('y_true has only one label. Please provide ' | ||
| 'the true labels explicitly through the labels argument.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look like pep8 to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add another ValueError if len(lb.classes_) == 1 and labels is not None saying that labels should have more than one unique label.
|
besides git issues / minor cosmetic changes, this LGTM. @MechCoder reviewed the original PR, would you mind taking another look? |
| raise ValueError('y_true has only one label. Please provide ' | ||
| 'the true labels explicitly through the labels argument.') | ||
|
|
||
| T = lb.transform(y_true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not your fault but single letter variable names are not great :-/ feel free to replace.
| 'the true labels explicitly through the labels argument.') | ||
|
|
||
| T = lb.transform(y_true) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel liek the logic below should be changed. It assumes that there are exactly two classes. We can check now whether that's true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which logic are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 1611 and 1612
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when is lb.transform(X).shape[1] == 1 and len(lb.classes_) > 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the input is malformed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a check anywhere that len(labels) == y_pred.shape[1]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a check anywhere that len(labels) == y_pred.shape[1]?
Yup we should be checking that as well.
when the input is malformed?
Sorry, I am slow. Could you give an example? :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, can T.shape[1] happen now at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, I'm slow.
|
This seems like the right thing to do. But does it actually fix all the issues? In cross-validation, we don't pass the labels by default, do we? |
| # because y_true label are the same, there should be an error if the | ||
| # labels option has not been used | ||
|
|
||
| # error_logloss = log_loss(y_true, y_score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either remove or uncomment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove, i think...
|
@nelson-liu Would you have the time to cherry-pick the commits (or in whatever way you like) onto a new PR? I can do it myself but then we would need one more reviewer to get it merged. Thanks! |
|
Sure, today might not be great but I'll see if I can squeeze some time while I wait for jobs to run. Else I can do it tomorrow |
|
@MechCoder luckily (?), the cluster i'm working on is down for maintenance. i've pulled the relevant commits from this PR and addressed the issues; waiting for CI tests (at least travis) to pass on my branch before making a PR. |
|
@nelson-liu Thanks! I appreciate the efforts. |
|
Superseded again by #7239 |
all credit for this goes to @hongguangguo (#6714)
Reference Issue
metrics.log_loss fails when any classes are missing in y_true #4033
Fix a bug, the result is wrong when use sklearn.metrics.log_loss with one class, #4546
Log_loss is calculated incorrectly when only 1 class present #6703
What does this implement/fix? Explain your changes.
added labels option. when length of unique y_true < number of columns for y_score/y_pred, should use labels so as to len(unique(labels)) eques y_pred.shap[1].