LabelSegmentedKFold cross-validation iterator #4709

BigCrunsh · 2015-05-12T08:06:46Z

This PR implements a variant of KFold that ensures that the train and test split is disjoint w.r.t. a third-party label. So far as I have seen all other Leave*Out methods either return all or a sample of all combinations of the third-party labels. However, if each party should be incorporated as test instance and the number of possible labels is huge, this k-fold variant is more efficient.

Example: Assume that we have 1.000.000 observations from 10.000 different users and we would like to estimate the generalization performance for new users. So, the train and test split should be disjoint, each user should occur once in the test set, and "leave one user out" would be intractable.

This is related to #4583.

This PR implements a variant of KFold that ensures that the train and test split is disjoint w.r.t. a third-party label.

amueller · 2015-05-12T16:13:59Z

Is this different from #4444 ?

BigCrunsh · 2015-05-13T05:40:24Z

@amueller: oh, I haven't seen it. thx.

LabelSegmentedKFold cross-validation iterator

cf47467

This PR implements a variant of KFold that ensures that the train and test split is disjoint w.r.t. a third-party label.

BigCrunsh force-pushed the label-segmented-kFold branch from 109bf38 to cf47467 Compare May 12, 2015 08:53

BigCrunsh changed the title ~~[WIP] LabelSegmentedKFold~~ LabelSegmentedKFold cross-validation iterator May 12, 2015

BigCrunsh closed this May 13, 2015

BigCrunsh mentioned this pull request May 13, 2015

[MRG + 1] Added DisjointLabelKFold to perform K-Fold cv on sets with disjoint labels. #4444

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

LabelSegmentedKFold cross-validation iterator #4709

LabelSegmentedKFold cross-validation iterator #4709

Uh oh!

BigCrunsh commented May 12, 2015

Uh oh!

amueller commented May 12, 2015

Uh oh!

BigCrunsh commented May 13, 2015

Uh oh!

Uh oh!

Uh oh!

LabelSegmentedKFold cross-validation iterator #4709

LabelSegmentedKFold cross-validation iterator #4709

Uh oh!

Conversation

BigCrunsh commented May 12, 2015

Uh oh!

amueller commented May 12, 2015

Uh oh!

BigCrunsh commented May 13, 2015

Uh oh!

Uh oh!