Strafied K Cross Validation python script #5129
emmabartholomeeusen
started this conversation in
General
Replies: 1 comment
-
There's a large discussion on why we do not support over/undersampling: #3269 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
0
I am using Orange to predict customer churn and compare different learners based on accuracy, F1, etc.
As my problem is unbalanced (10% churn - 90% not churn), I want to oversample. However, when using orange, this is not possible to do the oversampling within the cross-validation (test & score block).
Therefore, I want to, based on my input data, generate first 10 folds (stratified - where the distribution 10 % churn / 90 % not churn) is preserved. Then, oversample within each fold to get 50 - 50 distribution. Then, add for each instance the fold number as a feature. Lastly, within the test & score block, do cross validation by feature, namely the fold number. I think I have to implement this myself by using a Python script. Is there anyone that could help me doing this?
Thank you! Emma
Beta Was this translation helpful? Give feedback.
All reactions