A baseline project for tick-level time-series forecasting. The target is y, modeled with an LSTM dual-head setup (predict non-zero probability first, then regress magnitude conditioned on non-zero), plus a complete pipeline from raw data to training.
config.yaml: single source of truth for paths and training hyperparamsscripts/: runnable pipeline scripts (0→4)src/: model, dataset, losses, and evaluation utilitiesraw_data/: raw data directory (ignored by default)data/: parquet artifacts (ignored by default)models/: trained checkpoints (ignored by default)
Recommended (virtualenv):
python -m venv .venv
source .venv/bin/activate
python -m pip install -U pipDefault input path is config.yaml -> paths.raw_root.
scripts/0_merge_raw_data.py expects a per-day folder layout (folder name is a number). Each day folder contains:
times.csv: timestampsfactor_values.csv: factor features (column count controlled bymerge_raw.factor_count, default prefixf)y_values.csv: targety
From the repo root:
python scripts/0_merge_raw_data.py # merge csv -> partitioned parquet
python scripts/1_cleaning.py # clean (trading hours, outliers, ...)
python scripts/2_feature_engineering.py # add session_id (FE parquet)
python scripts/3_model_input_prep.py # train/val/test split + y_reg
python scripts/4_lstm_dualhead_training.py # train dual-head LSTM (optional Optuna)You can also train the non-zero classifier only:
python scripts/4_lstm_nonzero_training.pyOutputs are written to:
data/(seeconfig.yaml -> paths.*)models/(checkpoint name:config.yaml -> training.checkpoint.*)
src/model.py defines LSTMDualHead:
- Head A: probability of
y != 0(binary classification) - Head B: magnitude conditioned on non-zero (regression on
y_reg = sign(y) * log1p(|y|)) - Common final prediction:
y_pred = p_nonzero * y_hat_nonzero
For research/learning and engineering demonstration only. Not financial advice.