Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Jackson00Han/market_forecasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

market_forecasting

A baseline project for tick-level time-series forecasting. The target is y, modeled with an LSTM dual-head setup (predict non-zero probability first, then regress magnitude conditioned on non-zero), plus a complete pipeline from raw data to training.

Project Structure

  • config.yaml: single source of truth for paths and training hyperparams
  • scripts/: runnable pipeline scripts (0→4)
  • src/: model, dataset, losses, and evaluation utilities
  • raw_data/: raw data directory (ignored by default)
  • data/: parquet artifacts (ignored by default)
  • models/: trained checkpoints (ignored by default)

Setup

Recommended (virtualenv):

python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip

Raw Data Format

Default input path is config.yaml -> paths.raw_root.

scripts/0_merge_raw_data.py expects a per-day folder layout (folder name is a number). Each day folder contains:

  • times.csv: timestamps
  • factor_values.csv: factor features (column count controlled by merge_raw.factor_count, default prefix f)
  • y_values.csv: target y

Run the Pipeline (Raw → Train)

From the repo root:

python scripts/0_merge_raw_data.py          # merge csv -> partitioned parquet
python scripts/1_cleaning.py                # clean (trading hours, outliers, ...)
python scripts/2_feature_engineering.py     # add session_id (FE parquet)
python scripts/3_model_input_prep.py        # train/val/test split + y_reg
python scripts/4_lstm_dualhead_training.py  # train dual-head LSTM (optional Optuna)

You can also train the non-zero classifier only:

python scripts/4_lstm_nonzero_training.py

Outputs are written to:

  • data/ (see config.yaml -> paths.*)
  • models/ (checkpoint name: config.yaml -> training.checkpoint.*)

Model (Brief)

src/model.py defines LSTMDualHead:

  • Head A: probability of y != 0 (binary classification)
  • Head B: magnitude conditioned on non-zero (regression on y_reg = sign(y) * log1p(|y|))
  • Common final prediction: y_pred = p_nonzero * y_hat_nonzero

Disclaimer

For research/learning and engineering demonstration only. Not financial advice.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors