-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Linear Discriminant Analysis eigen solver questionable implementation #11727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
don't we have tests to assess that results are the same across solvers?
why is this not picked up?
can you provide a non-regression test?
|
I'm also wondering. I played a bit with iris dataset print(__doc__)
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
iris = datasets.load_iris()
X = iris.data
X=X@np.diag([10,5,1,0.1])
#X= np.concatenate((X,X[:,0:2].mean(1,keepdims=True),np.zeros((150,2))),1)
y = iris.target
target_names = iris.target_names
ldal = LinearDiscriminantAnalysis(solver='lsqr', shrinkage=0.1)
ldal.fit(X,y)
print(ldal.score(X,y))
plt.imshow(ldal.covariance_)
lda = LinearDiscriminantAnalysis(solver='eigen', shrinkage=0.1)
lda.fit(X,y)
print(lda.score(X,y))
plt.figure()
plt.imshow(lda.covariance_) By the implementation difference I mentioned above, as long as the covariance matrix is not identity matrix, there should be difference between these 2 solvers. but it only gives different results in 2 scenarios:
There seems to be something specific to shrinkage... I don't get it. |
by changing self.coef_ = np.dot(self.means_, evecs).dot(evecs.T) to coef_=self.means_@np.linalg.inv(self.covariance_); eigen solver reached the same performance as lsqr solver |
can you propose a non-regression test? |
The reason the test passes is that it only checks that the predictions are the same and not that the posterior probabilities are equal. The test introduced in #11796 ( |
@lxx74n74n can you please confirm that the fix in #11796 solves this issue? Please feel free to jump in the discussion in that PR. |
Uh oh!
There was an error while loading. Please reload this page.
Description
There seems to be a bug in the eigen solver part of LDA.
Steps/Code to Reproduce
When you use LDA with eigen solver. The decision function is implemented as
where self.means_ is the mean for each class.
self.coef_
implemented here is essentially the same as self.means_.https://github.com/scikit-learn/scikit-learn/blob/f0ab589f/sklearn/linear_model/base.py#L278
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/discriminant_analysis.py
Actual Results
This means the decision function becomes
Expected Results
while the true decision function should be
These could all be caused by the wrong line in eigen solver:
where as in lsqr solver it is:
Sw means Covariance within group, the same as self.covariance_
Versions
The text was updated successfully, but these errors were encountered: