This repository contains our course project for predicting public transport delays (in minutes) for a given stop and timestamp.
We build two models: a baseline OLS regression and an improved ANN (neural network) with feature engineering.
We have trained the model with data we collected ourselves via the VBB API. However, it is also conceivable to collect (presumably less accurate) data via the gtfs standard.
- Predicts
delay_minutesfor each (stop_id,timestamp) - Uses engineered features such as:
- time features (hour/weekday + cyclical sin/cos)
- rush-hour / weekend indicators
- station one-hot encoding
- lag features (previous delays per station)
- station statistics (mean/std delay)
results/– trained models, scalers/encoders, plots, and exported artifactsscripts/orsrc/– training, evaluation, and inference code (ANN + OLS)docker/(orimages/) – Dockerfiles for Subgoal 6/7 images and compose setups
- ANN model:
ann_improved_model_new.keras - OLS model:
currentOlsSolution.pkl - Activation data:
activation_data.csv - Plots: training history, predicted-vs-actual, feature importance
We publish three Docker Hub images and provide docker-compose files for running:
- knowledgeBase – model artifacts under
/tmp/knowledgeBase/ - activationBase – activation data under
/tmp/activationBase/ - codeBase – applies ANN or OLS to the activation data and writes prediction outputs into the shared
/tmpvolume
AGPL-3.0