-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA Group aware Time-based cross validation #16236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@getgaurav2, tests may be failing. Let us know when you want review. |
Linting is failing, actually |
Sure @jnothman . Will do . Thank you . |
@getgaurav2 - gentle reminder @ogrisel Sent with GitHawk |
@getgaurav2 are you still working on this PR? |
@albertvillanova - yes . I would like to finish this please. Have been busy at work lately . Will find some time this week to make some progress . Thanks Sent with GitHawk |
@getgaurav2 Great! Because I do really need this feature in the next release ;) |
hi @getgaurav2, I tried pushing here but I got an error remote: Permission to getgaurav2/scikit-learn.git denied to Pimpwhippa. My git remote -v are correct I think. Please could you check if you allow edits to this PR. Thank you. |
@getgaurav2, as you have seen, it is not a good idea to rebase onto master during a Pull Request. If you would like to sync branches, it is better you make a merge:
Anyway, it does not seem necessary to do it: see GitHub message: This branch has no conflicts with the base branch. |
Let's see if this helps. Please add test cases into test_split.py |
@albertvillanova - would you want to check the latest code base ? Sent with GitHawk |
@getgaurav2 thank you. I just opened a PR. |
Fast check: why this empty file? |
I would like to point out an issue in your implementation. I have created a new test case:
and this is what I get:
As you can see, there are gaps between the train and the test indices, i.e. they are not contiguous. I would expect the following result:
By the way, and concerning the gaps, if you look at the next v0.24 release version of |
'In each split, test indices must be higher than before, and thus shuffling in cross validator is inappropriate'. |
i mean this is allowed |
@albertvillanova - Thanks for your review. I have incorporated your feedback in the latest Commit . Can you please check again . |
@getgaurav2, thanks for your contribution. Your implementation now gives the expected result in the case I pointed out above. However, when using the new parameters ( Maybe, I would suggest, forgetting about this parameters for the moment, so that this PR can already be finished. You could eventually open a new PR to implement those parameters. What do you think? The unexpected behavior for the parameter
As you can see, there is no gap between training and test sets. The unexpected behavior for the parameter
In this case, I think the result should be:
|
Maybe I would also add the test case I pointed out above:
|
@albertvillanova - Thanks for your feedback. Can you please review the latest commit .
|
@getgaurav2, as you have already done an amazing amount of work here (thank you), my suggestion would be to leave for another PR the implementation of the parameters Just a minor addition: could you please add And please, add a whatsnew entry to the file |
@albertvillanova - Thank you . Can you please check now . |
@getgaurav2 thanks. I think your PR is ready for a review round. Could you please change its name from WIP to MRG? |
@glemaitre this seems like a good candidate for a "time series related" "project board". |
Hi, can I check what is the status of this PR? Would I be correct that this PR would also close issue #6322 and related old PR #6351 's Also I feel like a gap functionality that is similar to timegapsplit #13761 and the current |
@lu0x1a0 this is up for grabs. Now that we have metadata routing, this can nicely move forward if you're up for it. |
@adrinjalali , @lu0x1a0 -- I would love to continue work on it to see this one to completion if that's okay . |
@getgaurav2 that'd be fantastic! |
…nto issue_14257
@adrinjalali - would you please be able to check the issue in doc build to and give some direction ....thank you ! |
Something has gone wrong here probably in your merge with |
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Fixes #14257
1 . Assumes that
groups
are contiguous .2. Split the
groups
into train and test indices using TimeSeriesSplit3. Use these group indices to get the indices for original data.
4. Use the max_train_size parameter to trim the train_array for each iteration of split.
Any other comments?
Question: