Linear Discriminant Analysis eigen solver questionable implementation #11727

xiaoxionglin · 2018-08-01T09:14:56Z

Description

There seems to be a bug in the eigen solver part of LDA.

Steps/Code to Reproduce

When you use LDA with eigen solver. The decision function is implemented as

scores = safe_sparse_dot(X, self.coef_.T, dense_output=True) + self.intercept_

evals, evecs = linalg.eigh(Sb, Sw)
        self.coef_ = np.dot(self.means_, evecs).dot(evecs.T)
        self.intercept_ = (-0.5 * np.diag(np.dot(self.means_, self.coef_.T)) +
                           np.log(self.priors_))

where self.means_ is the mean for each class.

self.coef_ implemented here is essentially the same as self.means_.

https://github.com/scikit-learn/scikit-learn/blob/f0ab589f/sklearn/linear_model/base.py#L278
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/discriminant_analysis.py

Actual Results

This means the decision function becomes

scores= X @ means_ -0.5 * np.diag(means_ @ means_.T) + np.log(priors_)

Expected Results

while the true decision function should be

scores= X @ linalg.inv(Sw) @ means_ -0.5 * np.diag(means_ @ linalg.inv(Sw) @ means_.T) + np.log(priors_)

These could all be caused by the wrong line in eigen solver:

self.coef_ = np.dot(self.means_, evecs).dot(evecs.T)

where as in lsqr solver it is:

self.coef_ = linalg.lstsq(self.covariance_, self.means_.T)[0].T

Sw means Covariance within group, the same as self.covariance_

Versions

The text was updated successfully, but these errors were encountered:

agramfort · 2018-08-01T09:33:14Z

don't we have tests to assess that results are the same across solvers? why is this not picked up? can you provide a non-regression test?

xiaoxionglin · 2018-08-01T10:07:06Z

I'm also wondering.

I played a bit with iris dataset

print(__doc__)

import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

iris = datasets.load_iris()

X = iris.data
X=X@np.diag([10,5,1,0.1])
#X= np.concatenate((X,X[:,0:2].mean(1,keepdims=True),np.zeros((150,2))),1)
y = iris.target
target_names = iris.target_names

ldal = LinearDiscriminantAnalysis(solver='lsqr', shrinkage=0.1)
ldal.fit(X,y)
print(ldal.score(X,y))
plt.imshow(ldal.covariance_)

lda = LinearDiscriminantAnalysis(solver='eigen', shrinkage=0.1)
lda.fit(X,y)
print(lda.score(X,y))
plt.figure()
plt.imshow(lda.covariance_)

By the implementation difference I mentioned above, as long as the covariance matrix is not identity matrix, there should be difference between these 2 solvers.

but it only gives different results in 2 scenarios:

variance for important features are very different and you use shrinkage
features are collinear and you use shrinkage

There seems to be something specific to shrinkage... I don't get it.

xiaoxionglin · 2018-08-06T10:19:49Z

by changing self.coef_ from

self.coef_ = np.dot(self.means_, evecs).dot(evecs.T)

to

coef_=self.means_@np.linalg.inv(self.covariance_);

eigen solver reached the same performance as lsqr solver

agramfort · 2018-08-10T11:52:54Z

can you propose a non-regression test?

agamemnonc · 2019-03-01T18:17:31Z

don't we have tests to assess that results are the same across solvers? why is this not picked up? can you provide a non-regression test?

The reason the test passes is that it only checks that the predictions are the same and not that the posterior probabilities are equal. The test introduced in #11796 (test_lda_predict_proba) passes for svd and lsqr solvers, but fails for the eigen solver.

agamemnonc · 2019-03-04T12:38:52Z

@lxx74n74n can you please confirm that the fix in #11796 solves this issue? Please feel free to jump in the discussion in that PR.

agamemnonc mentioned this issue Mar 1, 2019

[MRG+2] Fix LDA predict_proba() #11796

Merged

agramfort closed this as completed in #11796 Mar 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Linear Discriminant Analysis eigen solver questionable implementation #11727

Linear Discriminant Analysis eigen solver questionable implementation #11727

xiaoxionglin commented Aug 1, 2018 •

edited by glemaitre

Loading

agramfort commented Aug 1, 2018 via email

Uh oh!

xiaoxionglin commented Aug 1, 2018 •

edited

Loading

Uh oh!

xiaoxionglin commented Aug 6, 2018 •

edited

Loading

Uh oh!

agramfort commented Aug 10, 2018

Uh oh!

agamemnonc commented Mar 1, 2019

Uh oh!

agamemnonc commented Mar 4, 2019

Uh oh!

Uh oh!

Linear Discriminant Analysis eigen solver questionable implementation #11727

Linear Discriminant Analysis eigen solver questionable implementation #11727

Comments

xiaoxionglin commented Aug 1, 2018 • edited by glemaitre Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Steps/Code to Reproduce

Actual Results

Expected Results

Versions

agramfort commented Aug 1, 2018 via email

Uh oh!

xiaoxionglin commented Aug 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xiaoxionglin commented Aug 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agramfort commented Aug 10, 2018

Uh oh!

agamemnonc commented Mar 1, 2019

Uh oh!

agamemnonc commented Mar 4, 2019

Uh oh!

xiaoxionglin commented Aug 1, 2018 •

edited by glemaitre

Loading

xiaoxionglin commented Aug 1, 2018 •

edited

Loading

xiaoxionglin commented Aug 6, 2018 •

edited

Loading