Description
Describe the workflow you want to enable
The current TimeSeriesSplit
in scikit-learn supports cross-validation for time series data with a single prediction horizon per split, which limits its use for scenarios requiring forecasts over multiple future steps (e.g., predicting 1, 3, and 5 days ahead). I propose adding a new class, MultiHorizonTimeSeriesSplit
, to enable cross-validation with multiple prediction horizons in a single split.
This would allow users to:
- Specify a list of horizons (e.g.,
[1, 3, 5]
) to generate train-test splits where the test set includes indices for multiple future steps. - Evaluate time series models for short, medium, and long-term forecasts simultaneously.
- Simplify workflows for applications like demand forecasting, financial modeling, or weather prediction, avoiding manual splitting.
Example usage with daily temperatures:
from sklearn.model_selection import MultiHorizonTimeSeriesSplit
import numpy as np
# Daily temperatures for 10 days (in °C)
X = np.array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
cv = MultiHorizonTimeSeriesSplit(n_splits=2, horizons=[1, 2])
for train_idx, test_idx in cv.split(X):
print(f"Train indices: {train_idx}, Test indices: {test_idx}")
Expected output:
Train indices: [0 1 2 3 4], Test indices: [5 6]
Train indices: [0 1 2 3 4 5 6], Test indices: [7 8]
Describe your proposed solution
I propose implementing a new class, MultiHorizonTimeSeriesSplit
, inheriting from TimeSeriesSplit
. The class will:
- Add a
horizons
parameter (list of integers) to specify prediction steps. - Modify the
split
method to generate test indices for each horizon while preserving temporal order. - Include input validation to ensure valid horizons and splits.
To ensure the correctness of MultiHorizonTimeSeriesSplit, we will develop unit tests covering various configurations and edge cases. For benchmarking, we will assess the computational efficiency and correctness of the new class compared to manual splitting. We will use synthetic time series to evaluate scalability and measure split generation time and memory usage, running tests on a personal laptop.
Describe alternatives you've considered, if relevant
No response
Additional context
No response