Thanks to visit codestin.com
Credit goes to github.com

Skip to content

TST test_k_means_fit_predict: do not test fit determinism together with predict/labels_ equality #13751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jnothman
Copy link
Member

I think this might fix #12644.

The comments there suggest the failure is due to fit and fit_predict learning permuted clusters. I don't think the intention of this test is to check the determinism / idempotence of fit/fit_predict, even if that would also be a good thing to ensure and to test.

The intention of the test here is:

    # check that fit.predict gives same result as fit_predict
    # There's a very small chance of failure with elkan on unstructured dataset
    # because predict method uses fast euclidean distances computation which
    # may cause small numerical instabilities.

therefore this should not call fit twice, but rather check that calling predict after fit_predict gives the same labels. This is what we now test in the present PR.

@jnothman jnothman mentioned this pull request Apr 30, 2019
@qinhanmin2014
Copy link
Member

@jnothman I've reopened #12648, that PR is consistent with 0.20.X

@qinhanmin2014
Copy link
Member

There's a comment at the beginning of the test: # check that fit.predict gives same result as fit_predict.
Perhaps it's acceptable to merge #12648 since that PR is already approved and merged into 0.20.X?

@jnothman
Copy link
Member Author

It was approved as a short term fix, and no other fix was merged on the basis that the problem seemed to have gone away

@jnothman
Copy link
Member Author

I'll remove the reference to fit from the comment, but the point we to check the iterative updating when fitting matched assignment by pairwise_distances_argmin

@jeremiedbb
Copy link
Member

I don't think the intention of this test is to check the determinism / idempotence of fit/fit_predict

Actually it was my intention :)
There's already a test which correspond to what your changes test: test_predict

@jnothman
Copy link
Member Author

Oh!
Hmm. If the point is to test the consistency of multiple fits, why does comparing fit(X).predict(X) to fit_predict(X) come into it??? I agree there is redundancy between those two tests, but I think test_k_means_fit_predict should not be doing what it is, nor named as it is, if its intention is to check the consistency of multiple calls to fit.

@jeremiedbb
Copy link
Member

My intention was just to test that calling fit(X).predict(X) gives the same result as calling fit_predict(X). There are very similar tests for other estimators, like test_bayesian_mixture_fit_predict for instance.

@jnothman
Copy link
Member Author

My intention was just to test that calling fit(X).predict(X) gives the same result as calling fit_predict(X).

But under the assumption that fit_predict(X) and fit(X).labels_ return the same thing (which they do, even if it's not obviously tested), then test_predict tests exactly that... given that the fit is consistent across calls.

Can we rename the failing test to be test_fit_idempotence or something? And then skip it?? :|

@jeremiedbb
Copy link
Member

given that the fit is consistent across calls.

I think it is. If I recalled correctly the failures appear when using elkan algorithm because the predict method always uses lloyd algo and even if both algo should give the same results, there might be some numerical instabilities.

I'm ok to rename and skip the test

@qinhanmin2014
Copy link
Member

@jnothman maybe close this one and merge #12648?

@jnothman jnothman closed this Apr 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_k_means_fit_predict failing on some MacPython runs
3 participants