-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
[MRG+1] fixed log_loss bug #6714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/metrics/classification.py
Outdated
| predict_proba method. | ||
| labels : array-like | ||
| When len(unique(y_true)) < len(unique(y_pred)), you must use labels option and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about rewording this as , "if labels is not provided, the labels are inferred from y_true or more specifically, set to np.unique(y_true)"
| assert_almost_equal(score1, score2) | ||
|
|
||
|
|
||
| def test_log_loss(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the name of the test to something more specific, we are not testing log_loss but the correctness of the labels argument.
|
@MechCoder . I rewrite some points. May be there are still some points to change. Please tell ,I will change better. |
| clf.fit(X, y) | ||
|
|
||
|
|
||
| y_score = clf.predict_proba([[2,2], [2,2]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the test you can probably hard code y_score = [[0, 1], [0, 1]].
Also, can you run flake8 on your code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said before, something in between 0 and 1, might be better, to avoid the clipping done below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously, I not use flake8. after I run flake8, there are some points to changes. Thanks.
| from sklearn.metrics import zero_one_loss | ||
| from sklearn.metrics import brier_score_loss | ||
|
|
||
| from sklearn.tree import DecisionTreeClassifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused import
|
What is the reason for closing, @hongguangguo ? |
|
oh, I don't know what exact time to close the pull request. @jnothman |
As in, you think it should have been further reviewed or merged, but it's remained dormant? A better idea would be to say "could someone please review this?" |
| assert_almost_equal(calculated_log_loss, ture_log_loss) | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra line
|
Just some minor cosmetic comments. +1 for merge. |
|
@hongguangguo Would you have time to address the comments? It would be nice to get this in 0.18. |
|
@MechCoder - looks like @hongguangguo might be on an extended "break" from coding, judging from other github activity. How do you feel about someone else making the changes you highlight? When would this have to be done by to get it in the code in time for v0.18? |
|
@indianajensen the beta is scheduled for mid-august and the final release is tentatively planned for first week of September. However, this doesn't have to be in by 0.18, it would be very nice to have it though. Due to @hongguangguo 's long period of inactivity even after being reminded, i think it's reasonable for you to cherry-pick the commits from this PR and try to complete it to merging quality. |
|
thanks @nelson-liu. Certainly don't want to take any of the credit for @hongguangguo's great work here, but have put up a new PR as per your suggestion and then we can back that out later if it becomes redundant. Please have a look and see if you think I have missed anything. |
|
Closing in favour of #7166 |
|
|
||
| T = lb.transform(y_true) | ||
|
|
||
| if T.shape[1] == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'd say "if T.shape == 1 and len(labels) == 2". Is there a check that the shape of y_pred matches len(labels)?
fixes as per existing pull request scikit-learn#6714 fixed log_loss bug enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss fixes as per existing pull request scikit-learn#6714 fixed error message when y_pred and y_test labels don't match fixed error message when y_pred and y_test labels don't match corrected doc/whats_new.rst for syntax and with correct formatting of credits additional formatting fixes for doc/whats_new.rst fixed versionadded comment removed superfluous line removed superflous line
fixes as per existing pull request scikit-learn#6714 fixed log_loss bug enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss fixes as per existing pull request scikit-learn#6714 fixed error message when y_pred and y_test labels don't match fixed error message when y_pred and y_test labels don't match corrected doc/whats_new.rst for syntax and with correct formatting of credits additional formatting fixes for doc/whats_new.rst fixed versionadded comment removed superfluous line removed superflous line
fixes as per existing pull request scikit-learn#6714 fixed log_loss bug enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss fixes as per existing pull request scikit-learn#6714 fixed error message when y_pred and y_test labels don't match fixed error message when y_pred and y_test labels don't match corrected doc/whats_new.rst for syntax and with correct formatting of credits additional formatting fixes for doc/whats_new.rst fixed versionadded comment removed superfluous line removed superflous line
…ber of classes in y_true and y_pred differ Fixes #4033 , #4546 , #6703 * fixed log_loss bug enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss * fixed error message when y_pred and y_test labels don't match fixes as per existing pull request #6714 fixed log_loss bug enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss fixes as per existing pull request #6714 fixed error message when y_pred and y_test labels don't match fixed error message when y_pred and y_test labels don't match corrected doc/whats_new.rst for syntax and with correct formatting of credits additional formatting fixes for doc/whats_new.rst fixed versionadded comment removed superfluous line removed superflous line * Wrap up changes to fix log_loss bug and clean up log_loss fix a typo in whatsnew refactor conditional and move dtype check before np.clip general cleanup of log_loss remove dtype checks edit non-regression test and wordings fix non-regression test misc doc fixes / clarifications + final touches fix naming of y_score2 variable specify log loss is only valid for 2 labels or more
fixes as per existing pull request scikit-learn#6714 fixed log_loss bug enhance log_loss labels option feature log_loss changed test log_loss case u add ValueError in log_loss fixes as per existing pull request scikit-learn#6714 fixed error message when y_pred and y_test labels don't match fixed error message when y_pred and y_test labels don't match corrected doc/whats_new.rst for syntax and with correct formatting of credits additional formatting fixes for doc/whats_new.rst fixed versionadded comment removed superfluous line removed superflous line
Reference Issue
metrics.log_loss fails when any classes are missing in y_true #4033
Fix a bug, the result is wrong when use sklearn.metrics.log_loss with one class, #4546
Log_loss is calculated incorrectly when only 1 class present #6703
What does this implement/fix? Explain your changes.
added labels option. when length of unique y_true < number of columns for y_score/y_pred, should use labels so as to len(unique(labels)) eques y_pred.shap[1].
Any other comments?