Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[WIP] RollingWindow cross-validation #3638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

[WIP] RollingWindow cross-validation #3638

wants to merge 1 commit into from

Conversation

0x0L
Copy link

@0x0L 0x0L commented Sep 4, 2014

A cross-validation strategy for timeseries, see http://robjhyndman.com/hyndsight/tscvexample

Initial commit, tests and unfinished docs

I don't really like the name of the class. Hopefully, someone will find a better name for this.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling c806ebf on x0l:cv_timeseries into * on scikit-learn:master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 348f114 on x0l:cv_timeseries into * on scikit-learn:master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 348f114 on x0l:cv_timeseries into * on scikit-learn:master*.

@0x0L 0x0L changed the title WIP RollingWindow cross-validation [WIP] RollingWindow cross-validation Sep 4, 2014
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling c056e12 on x0l:cv_timeseries into * on scikit-learn:master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 698d723 on x0l:cv_timeseries into * on scikit-learn:master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 8854b66 on x0l:cv_timeseries into * on scikit-learn:master*.

A cross-validation strategy for timeseries, see http://robjhyndman.com/hyndsight/tscvexample

Initial commit, tests and unfinished docs
@amueller
Copy link
Member

Sorry for the lack of feedback. A lot of the devs are very busy at the moment.
We don't really have any time-series specific algorithms, so this might not be a great fit.

@mjbommar
Copy link
Contributor

See also discussion here: #3202

@elgehelge
Copy link

I think this has been rejected for the wrong reasons.

Having a sequence where order matters, should not be confused with time series. Often you find yourself in a situation where you have a sequence of data points where the ordering matters without necessarily knowing anything about the time, only the relative time.

Let's say you want to predict future data points based on all previous data points - to validate this correctly you will have to choose a split that preserves order. You will train on the first part, and test on the last part, which you are pretending not to have seen yet. But you might want to calculate the score on more than just a single split.

Anyways, thanks for contributing @0x0L. This class is super helpful!

@jnothman
Copy link
Member

jnothman commented Sep 1, 2016

Interesting. This wasn't rejected at all, though @amueller made a comment that he clearly went back on in #6322. This PR was ignored for no good reason, and the contributor closed it. A TimeSeriesSplit has recently been merged in #6586, but I admit that this implementation has some enviable features. I think we (@yenchenlin?) should look at porting some enhancements from here. And I like the name RollingWindow which may also be a term in the literature.

@mjbommar
Copy link
Contributor

mjbommar commented Sep 2, 2016

This is related to walk-forward optimization/cross-validation, which I had proposed building and Gael had rejected in this issue:
#3202

I spoke privately with @MechCoder on the topic recently but don't recall if we landed on anything specific.

@0x0L
Copy link
Author

0x0L commented Sep 2, 2016

@elgehelge Thanks, that's nice

As I recall I closed the PR because I thought I would package it and few other contribs (notably RVMs) in separate repo but I never found the time to do so :)

@jnothman
Copy link
Member

jnothman commented Sep 3, 2016

Ah well, sorry to all whose work was not deemed appropriate at the time. I think there's been a recent move to acknowledge the need for CV splitters that accommodate common kinds of non-IID data. If there is a better (i.e. familiar in its research community) name for TimeSeriesSplit, you have a couple of days to propose it! If there are features missing, let's work on it.

@amueller
Copy link
Member

amueller commented Sep 6, 2016

I think adding splitters that encourage good practices is good. While we don't really support time-series specific models (and I'd like to keep it that way), I think we should acknowledge that people are using sklearn models for time-series data (a lot!) and we should make it easy for them to do The Right Thing (tm)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants