Incorrect predictions when fitting a LogisticRegression model on binary outcomes with `multi_class='multinomial'`. #9889

rwolst · 2017-10-09T09:10:51Z

Description

Incorrect predictions when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'.

Steps/Code to Reproduce

    from sklearn.linear_model import LogisticRegression
    import sklearn.metrics
    import numpy as np

    # Set up a logistic regression object
    lr = LogisticRegression(C=1000000, multi_class='multinomial',
                            solver='sag', tol=0.0001, warm_start=False,
                            verbose=0)

    # Set independent variable values
    Z = np.array([
       [ 0.        ,  0.        ],
       [ 1.33448632,  0.        ],
       [ 1.48790105, -0.33289528],
       [-0.47953866, -0.61499779],
       [ 1.55548163,  1.14414766],
       [-0.31476657, -1.29024053],
       [-1.40220786, -0.26316645],
       [ 2.227822  , -0.75403668],
       [-0.78170885, -1.66963585],
       [ 2.24057471, -0.74555021],
       [-1.74809665,  2.25340192],
       [-1.74958841,  2.2566389 ],
       [ 2.25984734, -1.75106702],
       [ 0.50598996, -0.77338402],
       [ 1.21968303,  0.57530831],
       [ 1.65370219, -0.36647173],
       [ 0.66569897,  1.77740068],
       [-0.37088553, -0.92379819],
       [-1.17757946, -0.25393047],
       [-1.624227  ,  0.71525192]])
    
    # Set dependant variable values
    Y = np.array([1, 0, 0, 1, 0, 0, 0, 0, 
                  0, 0, 1, 1, 1, 0, 0, 1, 
                  0, 0, 1, 1], dtype=np.int32)

    lr.fit(Z, Y)
    p = lr.predict_proba(Z)
    print(sklearn.metrics.log_loss(Y, p)) # ...

    print(lr.intercept_)
    print(lr.coef_)

Expected Results

If we compare against R or using multi_class='ovr', the log loss (which is approximately proportional to the objective function as the regularisation is set to be negligible through the choice of C) is incorrect. We expect the log loss to be roughly 0.5922995

Actual Results

The actual log loss when using multi_class='multinomial' is 0.61505641264.

Further Information

See the stack exchange question https://stats.stackexchange.com/questions/306886/confusing-behaviour-of-scikit-learn-logistic-regression-multinomial-optimisation?noredirect=1#comment583412_306886 for more information.

The issue it seems is caused in https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/linear_model/logistic.py#L762. In the multinomial case even if classes.size==2 we cannot reduce to a 1D case by throwing away one of the vectors of coefficients (as we can in normal binary logistic regression). This is essentially a difference between softmax (redundancy allowed) and logistic regression.

This can be fixed by commenting out the lines 762 and 763. I am apprehensive however that this may cause some other unknown issues which is why I am positing as a bug.

Versions

Linux-4.10.0-33-generic-x86_64-with-Ubuntu-16.04-xenial
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.19.0

The text was updated successfully, but these errors were encountered:

jnothman · 2017-10-09T09:29:34Z

Yes, just taking the coef for one class indeed seems incorrect. Is there any way to adjust the coef of one class (and the intercept) given the other to get the right probabilities?

TomDLT · 2017-10-09T11:20:23Z

This is essentially a difference between softmax (redundancy allowed) and logistic regression.

Indeed, there is a difference in the way we want to compute predict_proba:

In OVR-LR, you want a sigmoid: exp(D(x)) / (exp(D(x)) + 1), where D(x) is the decision function.
In multinomial-LR with n_classes=2, you want a softmax exp(D(x)) / (exp(D(x)) + exp(-D(x)).

So we do expect different results between (1) and (2).
However, there is indeed a bug and your fix is correct:

In the current code, we incorrectly use the sigmoid on case (2). Removing lines 762 and 763 does solve this problem, but change the API of self.coef_, which specifically states coef_ is of shape (1, n_features) when the given problem is binary.

Another way to fix it is to change directly predict_proba, adding the case binary+multinomial:

        if self.multi_class == "ovr":
            return super(LogisticRegression, self)._predict_proba_lr(X)
        elif self.coef_.shape[0] == 1:
            decision = self.decision_function(X)
            return softmax(np.c_[-decision, decision], copy=False)
        else:
            return softmax(self.decision_function(X), copy=False)

It will also break current behavior of predict_proba, but we don't need deprecation if we consider it is a bug.

rwolst · 2017-10-09T12:01:38Z

If it didn't cause too many further issues, I think the first solution would be better i.e. changing the API of self.coef_ for the multinomial case. This is because, say we fit a logistic regression lr, then upon inspecting the lr.coef_ and lr.intercept_ objects, it is clear what model is being used.

I also believe anyone using multinomial for a binary case (as I was) is doing it as part of some more general functionality and will also be fitting non-binary models depending on their data. If they want to access the parameters of the models (as I was) .intercept_ and .coef_ their generalisation will be broken in the binary case if only predict_proba is changed.

jnothman · 2017-10-09T12:59:16Z

We already break the generalisation elsewhere, as in decision_function and in the coef_ shape for other multiclass methods. I think maintaining internal consistency here might be more important than some abstract concern that "generalisation will be broken". I think we should choose modifying predict_proba. This also makes it clear that the multinomial case does not suddenly introduce more free parameters.

TomDLT · 2017-10-09T13:01:01Z

I agree it would make more sense to have coef_.shape = (n_classes, n_features) even when n_classes = 2, to have more consistency and avoid special cases.

However, it is a valid argument also for the OVR case (actually, it is nice to have the same coef_ API for both multinomial and OVR cases). Does that mean we should change the coef_ API in all cases? It is an important API change which will break a lot of user code, and which might not be consistent with the rest of scikit-learn...

rwolst · 2017-10-09T13:49:46Z

Another option could be to always use the ovr method in the binary case even when multiclass is set to multinomial. This would avoid the case of models having exactly the same coefficients but predicting different values due to have different multiclass parameters. As previously mentioned, if predict_proba gets changed, the multinomial prediction would be particularly confusing if someone just looks at the 1D coefficients coef_ (I think the ovr case is the intuitive one).

I believe by doing this, the only code that would get broken would be anyone who already knew about the bug and had coded their own workaround.

Note: If we do this, it is not returning the actual correct parameters with regards to the regularisation, despite the fact the solutions will be identical in terms of prediction. This may make it a no go.

jnothman · 2017-10-09T13:51:48Z

Changing the binary coef_ in the general case is just not going to happen. If you want to fix a bug, fix a bug...

…

On 10 October 2017 at 00:05, Tom Dupré la Tour ***@***.***> wrote: I agree it would make more sense to have coef_.shape = (n_classes, n_features) even when n_classes = 2, to have more consistency and avoid special cases. However, it is a valid argument also for the OVR case (actually, it is nice to have the same coef_ API for both multinomial and OVR cases). Does that mean we should change the coef_ API in all cases? It is an important API changes which will break a lot of user code, and which might not be consistent with the rest of scikit-learn. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9889 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6-kpIPulfN1z6KQYbqEQP2bRd3pPks5sqhkPgaJpZM4PyMNd> .

jnothman · 2017-10-09T13:53:23Z

Are the learnt probabilities then equivalent if it changes to ovr for 2 classes? Seems a reasonable idea to me.

rwolst · 2017-10-09T14:00:45Z

Thinking about it, it works with no regularisation but when regularisation is involved we should expect slightly different results for ovr and multinomial.

Maybe then just change predict_proba as suggested and a warning message when fitting a binary model with multinomial.

jnothman · 2017-10-09T21:27:56Z

why the warning? what would it say?

rwolst · 2017-10-10T13:18:55Z

The issue is that the values of coef_ do not intuitively describe the model in the binary case using multinomial. If someone fits a binary logistic regression and receives back a 1D vector of coefficients (W say for convenience), I would assume that they will think the predicted probability of a new observation X, is given by

exp(dot(W,X)) / (1 + exp(dot(W,X)))

This is true in the ovr case only. In the multinomial case, it is actually given by

exp(dot(W,X)) / (exp(dot(-W,X)) + exp(dot(W,X)))

I believe this would surprise and cause errors for many people upon receiving a 1D vector of coefficients W so I think they should be warned about it. In fact I wouldn't be surprised if people currently using the logistic regression coefficients in the multinomial, binary outcome case, have bugs in their code.

I would suggest a warning message when .fit is called with multinomial, when binary outcomes are detected. Something along the lines of (this can probably be made more concise):

Fitting a binary model with multi_class=multinomial. The returned `coef_` and `intercept_` values form the coefficients for outcome 1 (True), use `-coef_` and `-intercept` to form the coefficients for outcome 0 (False).

jnothman · 2017-10-11T01:37:17Z

I think it would be excessive noise to warn such upon fit. why not just amend the coef_ description? most users will not be manually making probabilistic interpretations of coef_ in any case, and we can't in general stop users misinterpreting things on the basis of assumption rather than reading the docs...

rwolst · 2017-10-11T10:59:41Z

Fair enough. My only argument to the contrary would be that using multinomial for binary classification is a fairly uncommon thing to do, so the warning would be infrequent.

However I agree in this case that if a user is only receiving a 1D vector of coefficients (i.e. it is not in the general form as for dimensions > 2), then they should be checking the documentation for exactly what this means, so amending the coef_ description should suffice.

TomDLT · 2017-10-11T11:24:40Z

So to sum-up, we need to:

update predict_proba as described in Incorrect predictions when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'. #9889 (comment)
update coef_'s docstring
add a test and a bugfix entry in whats_new

Do you want to do it @rwolst ?

rwolst · 2017-10-12T08:56:20Z

Sure, I'll do it.

jnothman · 2017-10-15T09:35:06Z

Thanks

srajanpaliwal · 2017-10-16T16:19:36Z

Hi @rwolst,
if you are not working on this issue. I can take it up.

rwolst · 2017-10-17T08:58:25Z

@srajanpaliwal I'll have a pull request this evening.

…cikit-learn#9889)

…-learn#9889)

…cikit-learn#9889)

TomDLT added the Bug label Oct 10, 2017

TomDLT added Easy Well-defined and straightforward way to resolve Need Contributor labels Oct 11, 2017

rwolst added a commit to rwolst/scikit-learn that referenced this issue Oct 17, 2017

Incorrect multinomial logistic regression predict_proba test added (s…

8939713

…cikit-learn#9889)

rwolst added a commit to rwolst/scikit-learn that referenced this issue Oct 17, 2017

Fixed incorrect multinomial logistic regression predict_proba (scikit…

4ecac9e

…-learn#9889)

rwolst added a commit to rwolst/scikit-learn that referenced this issue Oct 17, 2017

Updated what's new for multinomial logistic regression predictions (s…

ae22f35

…cikit-learn#9889)

rwolst added a commit to rwolst/scikit-learn that referenced this issue Oct 17, 2017

Updated doc string for coef_ and intercept_ (scikit-learn#9889)

70e46b4

rwolst mentioned this issue Oct 17, 2017

[MRG+1] Fix incorrect predict_proba for LogisticRegression in binary case using multinomial parameter. #9939

Merged

lesteve added help wanted and removed Need Contributor labels Oct 18, 2017

jnothman closed this as completed in #9939 Jan 6, 2018

rwolst mentioned this issue Mar 19, 2018

Warm start bug when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'. #10836

Closed

jnothman mentioned this issue Aug 19, 2018

[MRG] Change default solver in LogisticRegression #11476

Closed

jeremiedbb mentioned this issue May 2, 2024

DEP deprecate multi_class in LogisticRegression #28703

Merged

Uh oh!

Incorrect predictions when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'. #9889

Incorrect predictions when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'. #9889

Comments

rwolst commented Oct 9, 2017 • edited by TomDLT Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Further Information

Versions

jnothman commented Oct 9, 2017 via email

Uh oh!

TomDLT commented Oct 9, 2017

Uh oh!

rwolst commented Oct 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Oct 9, 2017

Uh oh!

TomDLT commented Oct 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwolst commented Oct 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Oct 9, 2017 via email

Uh oh!

jnothman commented Oct 9, 2017 via email

Uh oh!

rwolst commented Oct 9, 2017

Uh oh!

jnothman commented Oct 9, 2017 via email

Uh oh!

rwolst commented Oct 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Oct 11, 2017 via email

Uh oh!

rwolst commented Oct 11, 2017

Uh oh!

TomDLT commented Oct 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwolst commented Oct 12, 2017

Uh oh!

jnothman commented Oct 15, 2017

Uh oh!

srajanpaliwal commented Oct 16, 2017

Uh oh!

rwolst commented Oct 17, 2017

Uh oh!

Incorrect predictions when fitting a LogisticRegression model on binary outcomes with `multi_class='multinomial'`. #9889

Incorrect predictions when fitting a LogisticRegression model on binary outcomes with `multi_class='multinomial'`. #9889

rwolst commented Oct 9, 2017 •

edited by TomDLT

Loading

rwolst commented Oct 9, 2017 •

edited

Loading

TomDLT commented Oct 9, 2017 •

edited

Loading

rwolst commented Oct 9, 2017 •

edited

Loading

rwolst commented Oct 10, 2017 •

edited

Loading

TomDLT commented Oct 11, 2017 •

edited

Loading