-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
FIX: Reduce bias of covariance.MinCovDet
with consistency correction
#32117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
MinCovDet
MinCovDet
, to reduce bias
MinCovDet
, to reduce biascovariance.MinCovDet
, to reduce bias
covariance.MinCovDet
, to reduce biascovariance.MinCovDet
with consistency correction
covariance.MinCovDet
with consistency correctioncovariance.MinCovDet
with consistency correction
In some cases, this test fails under the new implementation: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/covariance/tests/test_robust_covariance.py#L35. The specific test that fails has a much lower tolerance threshold than the other tests of this function, so I suggest increasing the threshold. In the example code of the Issue #23162 it is shown that the new implementation is much less biased than the original implementation. The test failure should not be blamed on the new implementation, but rather on the variability of the results + low threshold for that specific test. |
Reference Issues/PRs
Partially fixes Issue #23162
What does this implement/fix? Explain your changes.
Background:
The output of the
covariance.MinCovDet
estimator is strongly biased (see Issue #23162), because it is lacking a consistency correction. This PR adds the missing correction, reducing the bias. In my comment to the Issue, I explain the problem and show that this PR generates less biased output.Changes:
I added the function
_consistency_correction
to compute the multiplicative consistency factor, and use it to correct the robust covariance estimate here.Also, the correction is also needed in another place of the code. The original implementation used an adhoc correction from the original paper, which I substituted for the correction factor obtained with
_consistency_correction
here. This change increases code consistency, and the new correction is more theoretically grounded.
Any other comments?
The estimate is still slightly biased because it lacks a finite sample correction. This should be added in the future, for which the MinCovDet implementation in R can be used as a template https://rdrr.io/cran/robustbase/src/R/covMcd.R.