Closed
Description
To my knowledge, sklearn does not currently support rigorous cross-validation of time-dependent problems. All out-of-the-box cross-validation routines will construct training folds that include future information relative to test folds.
One approach is to implement walk-forward/backward cross-validation, which constrains fold selection so that CV training folds are from time prior to CV test folds, i.e., out-of-sample. Common choices include trailing/rolling windows (k-2, k-1; k) and cumulative windows (1, 2, ..., k-1; k)
References:
- https://riskcalc.moodysrms.com/us/research/crm/validation_tech_report_020305.pdf, pp. 13-15
- http://blog.quantopian.com/parameter-optimization/
If we are OK including this in sklearn.cross_validation
, my plan was to add:
sklearn.cross_validation.TrailingWalkForward(ordered_index=...)
sklearn.cross_validation.CumulativeWalkForward(ordered_index=...)
We'd also need to discuss if/how to integrate this with grid_search
,
Metadata
Metadata
Assignees
Labels
No labels