-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Incorrect predictions when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'
.
#9889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, just taking the coef for one class indeed seems incorrect. Is there
any way to adjust the coef of one class (and the intercept) given the other
to get the right probabilities?
|
Indeed, there is a difference in the way we want to compute
So we do expect different results between (1) and (2). In the current code, we incorrectly use the sigmoid on case (2). Removing lines 762 and 763 does solve this problem, but change the API of Another way to fix it is to change directly if self.multi_class == "ovr":
return super(LogisticRegression, self)._predict_proba_lr(X)
elif self.coef_.shape[0] == 1:
decision = self.decision_function(X)
return softmax(np.c_[-decision, decision], copy=False)
else:
return softmax(self.decision_function(X), copy=False) It will also break current behavior of |
If it didn't cause too many further issues, I think the first solution would be better i.e. changing the API of I also believe anyone using |
We already break the generalisation elsewhere, as in |
I agree it would make more sense to have However, it is a valid argument also for the OVR case (actually, it is nice to have the same |
Another option could be to always use the I believe by doing this, the only code that would get broken would be anyone who already knew about the bug and had coded their own workaround. Note: If we do this, it is not returning the actual correct parameters with regards to the regularisation, despite the fact the solutions will be identical in terms of prediction. This may make it a no go. |
Changing the binary coef_ in the general case is just not going to happen.
If you want to fix a bug, fix a bug...
…On 10 October 2017 at 00:05, Tom Dupré la Tour ***@***.***> wrote:
I agree it would make more sense to have coef_.shape = (n_classes,
n_features) even when n_classes = 2, to have more consistency and avoid
special cases.
However, it is a valid argument also for the OVR case (actually, it is
nice to have the same coef_ API for both multinomial and OVR cases). Does
that mean we should change the coef_ API in all cases? It is an important
API changes which will break a lot of user code, and which might not be
consistent with the rest of scikit-learn.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#9889 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6-kpIPulfN1z6KQYbqEQP2bRd3pPks5sqhkPgaJpZM4PyMNd>
.
|
Are the learnt probabilities then equivalent if it changes to ovr for 2
classes? Seems a reasonable idea to me.
|
Thinking about it, it works with no regularisation but when regularisation is involved we should expect slightly different results for Maybe then just change |
why the warning? what would it say?
|
The issue is that the values of
This is true in the
I believe this would surprise and cause errors for many people upon receiving a 1D vector of coefficients I would suggest a warning message when
|
I think it would be excessive noise to warn such upon fit. why not just
amend the coef_ description? most users will not be manually making
probabilistic interpretations of coef_ in any case, and we can't in general
stop users misinterpreting things on the basis of assumption rather than
reading the docs...
|
Fair enough. My only argument to the contrary would be that using However I agree in this case that if a user is only receiving a 1D vector of coefficients (i.e. it is not in the general form as for dimensions > 2), then they should be checking the documentation for exactly what this means, so amending the |
So to sum-up, we need to:
Do you want to do it @rwolst ? |
Sure, I'll do it. |
Thanks |
Hi @rwolst, |
@srajanpaliwal I'll have a pull request this evening. |
Uh oh!
There was an error while loading. Please reload this page.
Description
Incorrect predictions when fitting a LogisticRegression model on binary outcomes with
multi_class='multinomial'
.Steps/Code to Reproduce
Expected Results
If we compare against R or using
multi_class='ovr'
, the log loss (which is approximately proportional to the objective function as the regularisation is set to be negligible through the choice ofC
) is incorrect. We expect the log loss to be roughly0.5922995
Actual Results
The actual log loss when using
multi_class='multinomial'
is0.61505641264
.Further Information
See the stack exchange question https://stats.stackexchange.com/questions/306886/confusing-behaviour-of-scikit-learn-logistic-regression-multinomial-optimisation?noredirect=1#comment583412_306886 for more information.
The issue it seems is caused in https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/linear_model/logistic.py#L762. In the
multinomial
case even ifclasses.size==2
we cannot reduce to a 1D case by throwing away one of the vectors of coefficients (as we can in normal binary logistic regression). This is essentially a difference between softmax (redundancy allowed) and logistic regression.This can be fixed by commenting out the lines 762 and 763. I am apprehensive however that this may cause some other unknown issues which is why I am positing as a bug.
Versions
Linux-4.10.0-33-generic-x86_64-with-Ubuntu-16.04-xenial
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.19.0
The text was updated successfully, but these errors were encountered: