-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG + 2] FIX bug in svm's decision values when decision_function_shape
is 'ovr'
#7724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 2] FIX bug in svm's decision values when decision_function_shape
is 'ovr'
#7724
Conversation
the sign of dec values should be changed to feed into function _ovr_decision_function
I hope this PR is correct. |
It would be great to add a test that is failing on master and passing in this branch. I think the place where it belongs is sklearn/svm/tests/test_svm.py. |
+1 on the test! |
Why not change voting to |
For instance, after changing import numpy as np
from sklearn.svm import SVC
from sklearn.utils.testing import assert_true
base_points = np.array([[1, 1], [2, 2]])
X = np.vstack((
base_points * [1, 1], # Q1
base_points * [-1, 1], # Q2
base_points * [1, -1], # Q3
base_points * [-1, -1] # Q4
))
y = [0] * 2 + [1] * 2 + [2] * 2 + [3] * 2
svm = SVC(kernel='linear', decision_function_shape='ovr')
svm.fit(X, y)
# One point close to the decision boundary and another far away
base_points = np.array([[5, 5], [10, 10]])
# For all the quadrants (classes)
X = np.vstack((
base_points * [1, 1], # Q1
base_points * [-1, 1], # Q2
base_points * [1, -1], # Q3
base_points * [-1, -1] # Q4
))
list(zip(svm.predict(X), svm.decision_function(X))) Gives [(0, array([ 0.25, 2. , 1. , 2.75])),
(0, array([ 0.5, 2. , 1. , 2.5])),
(1, array([ 2. , 0.25, 2.75, 1. ])),
(1, array([ 2. , 0.5, 2.5, 1. ])),
(2, array([ 2. , 2.75, 0.25, 1. ])),
(2, array([ 2. , 2.5, 0.5, 1. ])),
(3, array([ 2.75, 2. , 1. , 0.25])),
(3, array([ 2.5, 2. , 1. , 0.5]))] You can see that for the first prediction, when predicted class is 0, the decision value is higher for class = 3 and not 0. If you fix it as
You get [(0, array([ 3.25, 1. , 2. , -0.25])),
(0, array([ 3.5, 1. , 2. , -0.5])),
(1, array([ 1. , 3.25, -0.25, 2. ])),
(1, array([ 1. , 3.5, -0.5, 2. ])),
(2, array([ 1. , -0.25, 3.25, 2. ])),
(2, array([ 1. , -0.5, 3.5, 2. ])),
(3, array([-0.25, 1. , 2. , 3.25])),
(3, array([-0.5, 1. , 2. , 3.5]))] Which is correct... And the test should be the above snipped with the last line replaced by decisions = svm.decision_function(X)
pos_class_decisions = decisions[range(8), svm.predict(X)].reshape([4, 2])
# Test if the point closer to the decision boundary has a lower value
# compared to a poing farther away from the boundary
assert_true(np.all(pos_class_decisions[:, 0] < pos_class_decisions[:, 1]))
# Assert that the predicted class has the maximum value
assert_array_equal(np.argmax(decisions, axis=1), svm.predict(X)) I stupidly tested it on the fixed code ;) |
@raghavrv thank you very much for your reply, so you agree with me that this is indeed a bug, and we shall change to
I think it should be option 1 and you think it should be option 2. The reason why I proposed change to option 1 is this: (from line 413 onwards, in master/sklearn/utils/multiclass.py)
this piece of code aggregate sum of confidence from each of the n_classes * (n_classes -1) / 2 classifiers. For the k^{th} binary classifier, it's between class i and class j, it minus confidence from class i, and add confidence for class j, so the confidence is actually the confidence in class j, right? However, the dec values is positive for the first class (class i), and negative for the second class (class j), in the ovo case of SVM, so I think we should change the sign of dec, so that class i should receive negative confidence, class j should receive positive confidence, to match the intuition in the _ovr_decision_function defined in master/sklearn/utils/multiclass.py Now, let's run an example, with plots of original code (first row), option 1 (second row), and option 2 (third row).
|
@raghavrv Raghav, I tried your code as well, but I got totally different result, I have used your base points, X and y
The option 1 and 2 gives:
when we use np.argmax, it should be consistent with y. So I think option 1 -- changing sign of dec -- is right, what do you think? |
@btdai Thanks for the patient response. I think you are right! Option 1 is the correct way to go here. Apologies for the confusion... The test should be along the lines of import numpy as np
from sklearn.svm import SVC
from sklearn.utils.testing import assert_true
from sklearn.utils.testing import assert_array_equal
base_points = np.array([[1, 1], [2, 2], [2, 1], [1, 2]])
X = np.vstack((
base_points * [1, 1], # Q1
base_points * [-1, 1], # Q2
base_points * [-1, -1], # Q3
base_points * [1, -1] # Q4
))
y = [1] * 4 + [2] * 4 + [3] * 4 + [4] * 4
# One point close to the decision boundary and another far away
base_points = np.array([[5, 5], [10, 10]])
# For all the quadrants (classes)
X_test = np.vstack((
base_points * [1, 1], # Q1
base_points * [-1, 1], # Q2
base_points * [-1, -1], # Q3
base_points * [1, -1], # Q4
))
svm = SVC(kernel='linear', decision_function_shape='ovr')
svm.fit(X, y)
decisions = svm.decision_function(X_test)
# Subtract 1 from predictions to get indices for that class (Quadrant)
pos_class_decisions = decisions[range(8), svm.predict(X_test) - 1].reshape([4, 2])
# Test if the point closer to the decision boundary has a lower value
# compared to a poing farther away from the boundary
assert_true(np.all(pos_class_decisions[:, 0] < pos_class_decisions[:, 1]))
# Assert that the predicted class has the maximum value
# Add 1 to convert argmax positions to class labels (Quadrants)
assert_array_equal(np.argmax(decisions, axis=1) + 1, svm.predict(X_test))
# [1000, 0.1] is closer to Q4 compared to Q2/Q3
confidences = svm.decision_function([[1000, 0.1]])[0]
confidence_Q4 = confidences[4 - 1]
confidence_Q3 = confidences[3 - 1]
confidence_Q2 = confidences[2 - 1]
assert_true(confidence_Q4 > confidence_Q3)
assert_true(confidence_Q4 > confidence_Q2)
assert_true(confidence_Q2 > confidence_Q3) If you have a better test, please feel free to use that! :) This test fails in master as well as with my (incorrect) fix... |
I don't understand how this test tests the right thing. We want a tie between the hard predicted classes on the test point, right? So you should compare [.1, .1] and [1, 1] in all the quadrants and see that the decision function for that quadrant is higher further away from 0,0. |
It instead checks
True we need not have 4 points to train. Just the [-1, 1] etc would suffice... |
Thank you @raghavrv , I think your test case is good, but I'm still new to scikit-learn community, I'm not sure if @amueller Andreas' idea is to have one test case added into test case. I think your test case is good, but I made X and X_test of the same length, so we can compare the array predictions with y directly. I have also changed the class label, so that we do not need to add or subtract when comparing. This piece of code can't run on my scikit-learn (mine is not developer version), could you help me run see if the current code runs on your development mode. Also, I think we should assert the deci_vals for positive class greater than 0 as well.
|
@raghavrv ok, I didn't entirely understand your test case. I'm good with just including it, commenting it thoroughly and making it as succinct as possible. |
decision_function_shape
is 'ovr'
@amueller I'm sorry to be the last one blocking this issue. When would you like to release 0.18.1 ? |
sorry to ask again, I see the there's test_multiclass.py in master/sklearn/tests, do I need to create another test just for this function? |
would like to release ASAP. The test should be in |
Hi, I have created this PR: #7810 it added the test file test_ovr_decision_function.py in sklearn/svm/tests Please review this PR, thank you. |
You need to add the test in this PR and not create a separate PR, I will close #7810. Also could you add the tests in For this you just need to create an additional commit in this branch i.e. btdai_patch_ovr_decision_function and then use git push to update the branch on your fork. Sorry if it is not clear enough, maybe try to read again the links about github that we gave you before. Do say if you get completely stuck though. |
@lesteve thank you for your comments, yah, i should have just added the test case in this branch. I've now added it, seems it's going through some automatic checks. |
Please add your Also Travis is failing because of flake8 issues, if you are on Linux or OSX you can run |
decision_function_shape
is 'ovr'
decision_function_shape
is 'ovr'
Needs a rebase. Then good for merge. Thanks @btdai |
…m/btdai/scikit-learn into btdai_patch_ovr_decision_function
Sorry, still needs a rebase. |
i.e. you probably need to update your local copy of master, then merge or rebase with it. |
ya, git rebase btdai_patch_.... |
bingtian@ASPIRES7:~/git$ git rebase btdai_patch_ovr_decision_function rebase says up to date so I push with Still can't make it |
You want something like:
|
Hi, I git rebase with -i but then I tried to push, it said "Everything up-to-date" is it normal? Thanks. Please help @amueller @raghavrv @jnothman bingtian@ASPIRES7: |
…into btdai_patch_ovr_decision_function Conflicts: doc/whats_new.rst
Ok, I should have done git merge master Thank you @jnothman |
|
@@ -4881,8 +4886,12 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson. | |||
|
|||
.. _Peng Meng: https://github.com/mpjlu | |||
|
|||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've not correctly resolved merge conflicts
Thanks, @btdai! |
thank you for your help, @jnothman I can work on the next bug now :) |
Reference Issue
Fixes #6416
What does this implement/fix? Explain your changes.
Changed the sign of dec for the function ovr_decision_function
from
return _ovr_decision_function(dec < 0, dec, len(self.classes))
to
return ovr_decision_function(dec < 0, -dec, len(self.classes))
Reason: for function _ovr_decision_function, the first argument should be consistent with the second argument, so if we take dec < 0, i.e., set True for negative decision value, the confidence that we feed into _ovr_decision_function needs to be inverted as well.
Hence I proposed the above change. Please kindly refer to the examples in issue #6416, thank you very much.
Any other comments?
the sign of dec values should be changed to feed into function _ovr_decision_function