[MRG+1] Log loss bug fixed #7166

indianajensen · 2016-08-09T11:06:49Z

all credit for this goes to @hongguangguo (#6714)

Reference Issue
metrics.log_loss fails when any classes are missing in y_true #4033
Fix a bug, the result is wrong when use sklearn.metrics.log_loss with one class, #4546
Log_loss is calculated incorrectly when only 1 class present #6703

What does this implement/fix? Explain your changes.
added labels option. when length of unique y_true < number of columns for y_score/y_pred, should use labels so as to len(unique(labels)) eques y_pred.shap[1].

enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss

nelson-liu · 2016-08-09T15:45:18Z

looks like there are conflicts, would you mind rebasing on master?

nelson-liu · 2016-08-09T15:50:23Z

doc/whats_new.rst

   - The :func: `ignore_warnings` now accept a category argument to ignore only
     the warnings of a specified type. By `Thierry Guillemot`_.

+   - Added Labels flag to `metrics.log_loss` to correct metrics when only one class is present


lowercase labels, and surround it with double backticks (see line 129 for how to format parameters)

also change metrics.log_loss to :class:metrics.log_loss

enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss

…n/scikit-learn into log_loss_bug_fixed3

… credits

nelson-liu · 2016-08-09T17:47:19Z

hmm i'm not sure why past commits are showing up multiple times here..
to rebase on master, first you need to update your local master to mirror the remote, and then run git rebase master on your log_loss_bug_fixed3 branch.

nelson-liu · 2016-08-09T17:48:56Z

sklearn/metrics/classification.py

    sample_weight : array-like of shape = [n_samples], optional
        Sample weights.

+    .. versionadded:: 0.18


versionadded goes below the new thing, indented on the same level as If not provided.

saw that one just now. Pushed fix

indianajensen · 2016-08-09T17:49:30Z

Thanks @nelson-liu - I think the latest push fixes the issues you pointed out. However, on testing the code, I am not sure if it full resolves all three issues - in particular #4033.
Can you think of anyone from the team who is an expert on the log_loss code and who could help me take a look to make sure we have squashed the bugs before we close off the tickets?

…n/scikit-learn into log_loss_bug_fixed3

indianajensen · 2016-08-15T22:33:59Z

@nelson-liu - I'll be travelling for the next few days, but let me know if you have any thoughts on the above. From my side, OK to merge as-is (I don't think the code will degrade and there are some improvements), but I am not sure it fully squashes all three bugs so might be good to have a second pair of eyes on this. Let me know your thoughts and I can do some more work on this when I am back.

nelson-liu · 2016-08-22T18:16:01Z

sklearn/metrics/classification.py

+    lb.fit(labels) if labels is not None else lb.fit(y_true)
+    if labels is None and len(lb.classes_) == 1:
+        raise ValueError('y_true has only one label. Please provide '
+        'the true labels explicitly through the labels argument.')


this doesn't look like pep8 to me

We should also add another ValueError if len(lb.classes_) == 1 and labels is not None saying that labels should have more than one unique label.

nelson-liu · 2016-08-22T18:17:40Z

besides git issues / minor cosmetic changes, this LGTM. @MechCoder reviewed the original PR, would you mind taking another look?

amueller · 2016-08-22T21:53:55Z

sklearn/metrics/classification.py

+        raise ValueError('y_true has only one label. Please provide '
+        'the true labels explicitly through the labels argument.')
+
+    T = lb.transform(y_true)


not your fault but single letter variable names are not great :-/ feel free to replace.

amueller · 2016-08-22T21:54:53Z

sklearn/metrics/classification.py

+        'the true labels explicitly through the labels argument.')
+
+    T = lb.transform(y_true)
+


I feel liek the logic below should be changed. It assumes that there are exactly two classes. We can check now whether that's true.

Which logic are you referring to?

Line 1611 and 1612

But when is lb.transform(X).shape[1] == 1 and len(lb.classes_) > 2?

when the input is malformed?

Is there a check anywhere that len(labels) == y_pred.shape[1]?

Is there a check anywhere that len(labels) == y_pred.shape[1]?

Yup we should be checking that as well.

when the input is malformed?

Sorry, I am slow. Could you give an example? :(

Actually, can T.shape[1] happen now at all?

Never mind, I'm slow.

amueller · 2016-08-22T21:56:08Z

This seems like the right thing to do. But does it actually fix all the issues? In cross-validation, we don't pass the labels by default, do we?

amueller · 2016-08-22T22:01:38Z

sklearn/metrics/tests/test_classification.py

+    # because y_true label are the same, there should be an error if the
+    # labels option has not been used
+
+    # error_logloss = log_loss(y_true, y_score)


either remove or uncomment?

remove, i think...

MechCoder · 2016-08-24T19:27:20Z

@nelson-liu Would you have the time to cherry-pick the commits (or in whatever way you like) onto a new PR? I can do it myself but then we would need one more reviewer to get it merged.

Thanks!

nelson-liu · 2016-08-24T19:39:18Z

Sure, today might not be great but I'll see if I can squeeze some time while I wait for jobs to run. Else I can do it tomorrow

nelson-liu · 2016-08-24T20:40:56Z

@MechCoder luckily (?), the cluster i'm working on is down for maintenance. i've pulled the relevant commits from this PR and addressed the issues; waiting for CI tests (at least travis) to pass on my branch before making a PR.

MechCoder · 2016-08-24T20:59:57Z

@nelson-liu Thanks! I appreciate the efforts.

MechCoder · 2016-08-24T21:00:14Z

Superseded again by #7239

Harry040 and others added 2 commits August 9, 2016 11:37

fixed log_loss bug

be4c3f2

enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss

fixes as per existing pull request scikit-learn#6714

48a4811

indianajensen mentioned this pull request Aug 9, 2016

log_loss bug fixed #7158

Closed

nelson-liu reviewed Aug 9, 2016
View reviewed changes

indianajensen and others added 12 commits August 9, 2016 18:06

fixed error message when y_pred and y_test labels don't match

76c7296

fixed log_loss bug

36d1765

enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss

fixes as per existing pull request scikit-learn#6714

f20e385

fixed error message when y_pred and y_test labels don't match

c94226f

removed conflicting file doc/whats_new.rst

86ec862

fixed log_loss bug

099e249

enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss

fixed error message when y_pred and y_test labels don't match

1f6dadc

fixes as per existing pull request scikit-learn#6714

2f01a31

Merge branch 'log_loss_bug_fixed3' of https://github.com/indianajense…

8efc8a0

…n/scikit-learn into log_loss_bug_fixed3

readded doc/whats_new.rst

1259de6

corrected doc/whats_new.rst for syntax and with correct formatting of…

0feafc5

… credits

additional formatting fixes for doc/whats_new.rst

bf5ce41

nelson-liu reviewed Aug 9, 2016
View reviewed changes

indianajensen added 4 commits August 9, 2016 18:51

fixed versionadded comment

7d3fe60

removed superflous line

235ee0c

removed superfluous line

8c998ed

Merge branch 'log_loss_bug_fixed3' of https://github.com/indianajense…

cd35345

…n/scikit-learn into log_loss_bug_fixed3

nelson-liu reviewed Aug 22, 2016
View reviewed changes

amueller reviewed Aug 22, 2016
View reviewed changes

amueller added this to the 0.18 milestone Aug 22, 2016

MechCoder mentioned this pull request Aug 24, 2016

[MRG+1] fixed log_loss bug #6714

Closed

nelson-liu mentioned this pull request Aug 24, 2016

[MRG+2] Fix log loss bug #7239

Merged

MechCoder closed this Aug 24, 2016

		'the true labels explicitly through the labels argument.')

		T = lb.transform(y_true)

Uh oh!

[MRG+1] Log loss bug fixed #7166

[MRG+1] Log loss bug fixed #7166

Uh oh!

Conversation

indianajensen commented Aug 9, 2016 • edited by TomDLT Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nelson-liu commented Aug 9, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nelson-liu commented Aug 9, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

indianajensen commented Aug 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

indianajensen commented Aug 15, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nelson-liu commented Aug 22, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller Aug 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Aug 22, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MechCoder commented Aug 24, 2016

Uh oh!

nelson-liu commented Aug 24, 2016

Uh oh!

nelson-liu commented Aug 24, 2016

Uh oh!

MechCoder commented Aug 24, 2016

Uh oh!

MechCoder commented Aug 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

indianajensen commented Aug 9, 2016 •

edited by TomDLT

Loading

indianajensen commented Aug 9, 2016 •

edited

Loading

amueller Aug 24, 2016 •

edited

Loading