market_forecasting

A baseline project for tick-level time-series forecasting. The target is y, modeled with an LSTM dual-head setup (predict non-zero probability first, then regress magnitude conditioned on non-zero), plus a complete pipeline from raw data to training.

Project Structure

config.yaml: single source of truth for paths and training hyperparams
scripts/: runnable pipeline scripts (0→4)
src/: model, dataset, losses, and evaluation utilities
raw_data/: raw data directory (ignored by default)
data/: parquet artifacts (ignored by default)
models/: trained checkpoints (ignored by default)

Setup

Recommended (virtualenv):

python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip

Raw Data Format

Default input path is config.yaml -> paths.raw_root.

scripts/0_merge_raw_data.py expects a per-day folder layout (folder name is a number). Each day folder contains:

times.csv: timestamps
factor_values.csv: factor features (column count controlled by merge_raw.factor_count, default prefix f)
y_values.csv: target y

Run the Pipeline (Raw → Train)

From the repo root:

python scripts/0_merge_raw_data.py          # merge csv -> partitioned parquet
python scripts/1_cleaning.py                # clean (trading hours, outliers, ...)
python scripts/2_feature_engineering.py     # add session_id (FE parquet)
python scripts/3_model_input_prep.py        # train/val/test split + y_reg
python scripts/4_lstm_dualhead_training.py  # train dual-head LSTM (optional Optuna)

You can also train the non-zero classifier only:

python scripts/4_lstm_nonzero_training.py

Outputs are written to:

data/ (see config.yaml -> paths.*)
models/ (checkpoint name: config.yaml -> training.checkpoint.*)

Model (Brief)

src/model.py defines LSTMDualHead:

Head A: probability of y != 0 (binary classification)
Head B: magnitude conditioned on non-zero (regression on y_reg = sign(y) * log1p(|y|))
Common final prediction: y_pred = p_nonzero * y_hat_nonzero

Disclaimer

For research/learning and engineering demonstration only. Not financial advice.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
eda_draft.ipynb		eda_draft.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

market_forecasting

Project Structure

Setup

Raw Data Format

Run the Pipeline (Raw → Train)

Model (Brief)

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

market_forecasting

Project Structure

Setup

Raw Data Format

Run the Pipeline (Raw → Train)

Model (Brief)

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages