Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Incorrect predictions when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'. #9889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rwolst opened this issue Oct 9, 2017 · 18 comments · Fixed by #9939
Labels
Bug Easy Well-defined and straightforward way to resolve help wanted

Comments

@rwolst
Copy link
Contributor

rwolst commented Oct 9, 2017

Description

Incorrect predictions when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'.

Steps/Code to Reproduce

    from sklearn.linear_model import LogisticRegression
    import sklearn.metrics
    import numpy as np

    # Set up a logistic regression object
    lr = LogisticRegression(C=1000000, multi_class='multinomial',
                            solver='sag', tol=0.0001, warm_start=False,
                            verbose=0)

    # Set independent variable values
    Z = np.array([
       [ 0.        ,  0.        ],
       [ 1.33448632,  0.        ],
       [ 1.48790105, -0.33289528],
       [-0.47953866, -0.61499779],
       [ 1.55548163,  1.14414766],
       [-0.31476657, -1.29024053],
       [-1.40220786, -0.26316645],
       [ 2.227822  , -0.75403668],
       [-0.78170885, -1.66963585],
       [ 2.24057471, -0.74555021],
       [-1.74809665,  2.25340192],
       [-1.74958841,  2.2566389 ],
       [ 2.25984734, -1.75106702],
       [ 0.50598996, -0.77338402],
       [ 1.21968303,  0.57530831],
       [ 1.65370219, -0.36647173],
       [ 0.66569897,  1.77740068],
       [-0.37088553, -0.92379819],
       [-1.17757946, -0.25393047],
       [-1.624227  ,  0.71525192]])
    
    # Set dependant variable values
    Y = np.array([1, 0, 0, 1, 0, 0, 0, 0, 
                  0, 0, 1, 1, 1, 0, 0, 1, 
                  0, 0, 1, 1], dtype=np.int32)

    lr.fit(Z, Y)
    p = lr.predict_proba(Z)
    print(sklearn.metrics.log_loss(Y, p)) # ...

    print(lr.intercept_)
    print(lr.coef_)

Expected Results

If we compare against R or using multi_class='ovr', the log loss (which is approximately proportional to the objective function as the regularisation is set to be negligible through the choice of C) is incorrect. We expect the log loss to be roughly 0.5922995

Actual Results

The actual log loss when using multi_class='multinomial' is 0.61505641264.

Further Information

See the stack exchange question https://stats.stackexchange.com/questions/306886/confusing-behaviour-of-scikit-learn-logistic-regression-multinomial-optimisation?noredirect=1#comment583412_306886 for more information.

The issue it seems is caused in https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/linear_model/logistic.py#L762. In the multinomial case even if classes.size==2 we cannot reduce to a 1D case by throwing away one of the vectors of coefficients (as we can in normal binary logistic regression). This is essentially a difference between softmax (redundancy allowed) and logistic regression.

This can be fixed by commenting out the lines 762 and 763. I am apprehensive however that this may cause some other unknown issues which is why I am positing as a bug.

Versions

Linux-4.10.0-33-generic-x86_64-with-Ubuntu-16.04-xenial
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.19.0

@jnothman
Copy link
Member

jnothman commented Oct 9, 2017 via email

@TomDLT
Copy link
Member

TomDLT commented Oct 9, 2017

This is essentially a difference between softmax (redundancy allowed) and logistic regression.

Indeed, there is a difference in the way we want to compute predict_proba:

  1. In OVR-LR, you want a sigmoid: exp(D(x)) / (exp(D(x)) + 1), where D(x) is the decision function.
  2. In multinomial-LR with n_classes=2, you want a softmax exp(D(x)) / (exp(D(x)) + exp(-D(x)).

So we do expect different results between (1) and (2).
However, there is indeed a bug and your fix is correct:

In the current code, we incorrectly use the sigmoid on case (2). Removing lines 762 and 763 does solve this problem, but change the API of self.coef_, which specifically states coef_ is of shape (1, n_features) when the given problem is binary.

Another way to fix it is to change directly predict_proba, adding the case binary+multinomial:

        if self.multi_class == "ovr":
            return super(LogisticRegression, self)._predict_proba_lr(X)
        elif self.coef_.shape[0] == 1:
            decision = self.decision_function(X)
            return softmax(np.c_[-decision, decision], copy=False)
        else:
            return softmax(self.decision_function(X), copy=False)

It will also break current behavior of predict_proba, but we don't need deprecation if we consider it is a bug.

@rwolst
Copy link
Contributor Author

rwolst commented Oct 9, 2017

If it didn't cause too many further issues, I think the first solution would be better i.e. changing the API of self.coef_ for the multinomial case. This is because, say we fit a logistic regression lr, then upon inspecting the lr.coef_ and lr.intercept_ objects, it is clear what model is being used.

I also believe anyone using multinomial for a binary case (as I was) is doing it as part of some more general functionality and will also be fitting non-binary models depending on their data. If they want to access the parameters of the models (as I was) .intercept_ and .coef_ their generalisation will be broken in the binary case if only predict_proba is changed.

@jnothman
Copy link
Member

jnothman commented Oct 9, 2017

We already break the generalisation elsewhere, as in decision_function and in the coef_ shape for other multiclass methods. I think maintaining internal consistency here might be more important than some abstract concern that "generalisation will be broken". I think we should choose modifying predict_proba. This also makes it clear that the multinomial case does not suddenly introduce more free parameters.

@TomDLT
Copy link
Member

TomDLT commented Oct 9, 2017

I agree it would make more sense to have coef_.shape = (n_classes, n_features) even when n_classes = 2, to have more consistency and avoid special cases.

However, it is a valid argument also for the OVR case (actually, it is nice to have the same coef_ API for both multinomial and OVR cases). Does that mean we should change the coef_ API in all cases? It is an important API change which will break a lot of user code, and which might not be consistent with the rest of scikit-learn...

@rwolst
Copy link
Contributor Author

rwolst commented Oct 9, 2017

Another option could be to always use the ovr method in the binary case even when multiclass is set to multinomial. This would avoid the case of models having exactly the same coefficients but predicting different values due to have different multiclass parameters. As previously mentioned, if predict_proba gets changed, the multinomial prediction would be particularly confusing if someone just looks at the 1D coefficients coef_ (I think the ovr case is the intuitive one).

I believe by doing this, the only code that would get broken would be anyone who already knew about the bug and had coded their own workaround.

Note: If we do this, it is not returning the actual correct parameters with regards to the regularisation, despite the fact the solutions will be identical in terms of prediction. This may make it a no go.

@jnothman
Copy link
Member

jnothman commented Oct 9, 2017 via email

@jnothman
Copy link
Member

jnothman commented Oct 9, 2017 via email

@rwolst
Copy link
Contributor Author

rwolst commented Oct 9, 2017

Thinking about it, it works with no regularisation but when regularisation is involved we should expect slightly different results for ovr and multinomial.

Maybe then just change predict_proba as suggested and a warning message when fitting a binary model with multinomial.

@jnothman
Copy link
Member

jnothman commented Oct 9, 2017 via email

@rwolst
Copy link
Contributor Author

rwolst commented Oct 10, 2017

The issue is that the values of coef_ do not intuitively describe the model in the binary case using multinomial. If someone fits a binary logistic regression and receives back a 1D vector of coefficients (W say for convenience), I would assume that they will think the predicted probability of a new observation X, is given by

exp(dot(W,X)) / (1 + exp(dot(W,X)))

This is true in the ovr case only. In the multinomial case, it is actually given by

exp(dot(W,X)) / (exp(dot(-W,X)) + exp(dot(W,X)))

I believe this would surprise and cause errors for many people upon receiving a 1D vector of coefficients W so I think they should be warned about it. In fact I wouldn't be surprised if people currently using the logistic regression coefficients in the multinomial, binary outcome case, have bugs in their code.

I would suggest a warning message when .fit is called with multinomial, when binary outcomes are detected. Something along the lines of (this can probably be made more concise):

Fitting a binary model with multi_class=multinomial. The returned `coef_` and `intercept_` values form the coefficients for outcome 1 (True), use `-coef_` and `-intercept` to form the coefficients for outcome 0 (False).

@TomDLT TomDLT added the Bug label Oct 10, 2017
@jnothman
Copy link
Member

jnothman commented Oct 11, 2017 via email

@rwolst
Copy link
Contributor Author

rwolst commented Oct 11, 2017

Fair enough. My only argument to the contrary would be that using multinomial for binary classification is a fairly uncommon thing to do, so the warning would be infrequent.

However I agree in this case that if a user is only receiving a 1D vector of coefficients (i.e. it is not in the general form as for dimensions > 2), then they should be checking the documentation for exactly what this means, so amending the coef_ description should suffice.

@TomDLT
Copy link
Member

TomDLT commented Oct 11, 2017

So to sum-up, we need to:

Do you want to do it @rwolst ?

@TomDLT TomDLT added Easy Well-defined and straightforward way to resolve Need Contributor labels Oct 11, 2017
@rwolst
Copy link
Contributor Author

rwolst commented Oct 12, 2017

Sure, I'll do it.

@jnothman
Copy link
Member

Thanks

@srajanpaliwal
Copy link
Contributor

Hi @rwolst,
if you are not working on this issue. I can take it up.

@rwolst
Copy link
Contributor Author

rwolst commented Oct 17, 2017

@srajanpaliwal I'll have a pull request this evening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Easy Well-defined and straightforward way to resolve help wanted
Projects
None yet
5 participants