Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] Fix #2372: StratifiedKFold less impact on the original order of samples. #2450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dnouri
Copy link
Contributor

@dnouri dnouri commented Sep 17, 2013

See #2372 for motivation.


model = SVC(C=10, gamma=0.005)
cv = cval.StratifiedKFold(y, 5)
assert cval.cross_val_score(model, X, y, cv=cv, n_jobs=-1).mean() < 0.91
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not use n_jobs=-1 when not explicitly writing tests for parallel computing as currently it can fail on some platforms (for instance if numpy is built against openblas) and we would have to protect the tests against such failures.

@ogrisel
Copy link
Member

ogrisel commented Sep 19, 2013

Thanks @dnouri for tackling this. +1 for merging once my comments are addressed.

I'm using a smaller number of examples from the digits dataset because
that cuts down test execution time for me from 17s to 4s, and still
yields the very similar results.
It appears that the sorted() call is unnecessary here.
@ogrisel
Copy link
Member

ogrisel commented Sep 19, 2013

Thanks for addressing my comments @dnouri ! This looks good to me. +1 for merge on my side. Any other review @agramfort @GaelVaroquaux @larsmans @mblondel @glouppe @arjoly (I didn't ping Andy intentionally to let him focus on his thesis :) ?

@dnouri could you please add an entry to the doc/whats_new.rst file to document this change?

@agramfort
Copy link
Member

+1 for merge

@dnouri
Copy link
Contributor Author

dnouri commented Sep 20, 2013

Updated whats_new.rst. Thanks for reviewing!

@ogrisel
Copy link
Member

ogrisel commented Sep 20, 2013

Merged by rebase. Thanks @dnouri!

@ogrisel ogrisel closed this Sep 20, 2013
@ogrisel ogrisel reopened this Sep 20, 2013
@ogrisel
Copy link
Member

ogrisel commented Sep 20, 2013

Actually I reverted the merge as it caused a test failure under python 3 (that could probably been having fixed) but more importantly, the strategy used here is no longer real k-fold cross-validation as some samples many never appear in the test set.

[1 4 6] [0 2 3 5]
[0 2 3 5] [1 4 6]
[2 4 5 6] [0 1 3]
[0 1 3 5] [2 4 6]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a regression: sample number #5 is never part of the test set. I will try to come with another way to do this.

@ogrisel
Copy link
Member

ogrisel commented Sep 20, 2013

I came up with a working implementation in #2463. I also reused updated tests from this PR and added more checks. Please let's continue the review over-there.

@ogrisel ogrisel closed this Sep 20, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants