-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] RollingWindow cross-validation #3638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Changes Unknown when pulling c806ebf on x0l:cv_timeseries into * on scikit-learn:master*. |
Changes Unknown when pulling 348f114 on x0l:cv_timeseries into * on scikit-learn:master*. |
Changes Unknown when pulling 348f114 on x0l:cv_timeseries into * on scikit-learn:master*. |
Changes Unknown when pulling c056e12 on x0l:cv_timeseries into * on scikit-learn:master*. |
Changes Unknown when pulling 698d723 on x0l:cv_timeseries into * on scikit-learn:master*. |
Changes Unknown when pulling 8854b66 on x0l:cv_timeseries into * on scikit-learn:master*. |
A cross-validation strategy for timeseries, see http://robjhyndman.com/hyndsight/tscvexample Initial commit, tests and unfinished docs
Sorry for the lack of feedback. A lot of the devs are very busy at the moment. |
See also discussion here: #3202 |
I think this has been rejected for the wrong reasons. Having a sequence where order matters, should not be confused with time series. Often you find yourself in a situation where you have a sequence of data points where the ordering matters without necessarily knowing anything about the time, only the relative time. Let's say you want to predict future data points based on all previous data points - to validate this correctly you will have to choose a split that preserves order. You will train on the first part, and test on the last part, which you are pretending not to have seen yet. But you might want to calculate the score on more than just a single split. Anyways, thanks for contributing @0x0L. This class is super helpful! |
Interesting. This wasn't rejected at all, though @amueller made a comment that he clearly went back on in #6322. This PR was ignored for no good reason, and the contributor closed it. A |
This is related to walk-forward optimization/cross-validation, which I had proposed building and Gael had rejected in this issue: I spoke privately with @MechCoder on the topic recently but don't recall if we landed on anything specific. |
@elgehelge Thanks, that's nice As I recall I closed the PR because I thought I would package it and few other contribs (notably RVMs) in separate repo but I never found the time to do so :) |
Ah well, sorry to all whose work was not deemed appropriate at the time. I think there's been a recent move to acknowledge the need for CV splitters that accommodate common kinds of non-IID data. If there is a better (i.e. familiar in its research community) name for |
I think adding splitters that encourage good practices is good. While we don't really support time-series specific models (and I'd like to keep it that way), I think we should acknowledge that people are using sklearn models for time-series data (a lot!) and we should make it easy for them to do The Right Thing (tm) |
A cross-validation strategy for timeseries, see http://robjhyndman.com/hyndsight/tscvexample
Initial commit, tests and unfinished docs
I don't really like the name of the class. Hopefully, someone will find a better name for this.