Thanks to visit codestin.com
Credit goes to github.com

Skip to content

LabelSegmentedKFold cross-validation iterator #4709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

BigCrunsh
Copy link

This PR implements a variant of KFold that ensures that the train and test split is disjoint w.r.t. a third-party label. So far as I have seen all other Leave*Out methods either return all or a sample of all combinations of the third-party labels. However, if each party should be incorporated as test instance and the number of possible labels is huge, this k-fold variant is more efficient.

Example: Assume that we have 1.000.000 observations from 10.000 different users and we would like to estimate the generalization performance for new users. So, the train and test split should be disjoint, each user should occur once in the test set, and "leave one user out" would be intractable.

This is related to #4583.

This PR implements a variant of KFold that ensures that the train and
test split is disjoint w.r.t. a third-party label.
@BigCrunsh BigCrunsh force-pushed the label-segmented-kFold branch from 109bf38 to cf47467 Compare May 12, 2015 08:53
@BigCrunsh BigCrunsh changed the title [WIP] LabelSegmentedKFold LabelSegmentedKFold cross-validation iterator May 12, 2015
@amueller
Copy link
Member

Is this different from #4444 ?

@BigCrunsh
Copy link
Author

@amueller: oh, I haven't seen it. thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants