This project is part of the course "Advanced AI-Based Application Systems", taught by Prof. Dr.-Ing. Marcus Grum at the University of Potsdam. The goal is to predict train delays in the Berlin and Brandenburg regions using AI models. We have implemented both an Artificial Neural Network (ANN) and an Ordinary Least Squares (OLS) regression model to tackle this problem.
βββ code/ # Python scripts for data processing and modeling
β βββ dataPreparation/ # Data cleaning and preprocessing scripts
β β βββ clean_train_data.py
β β βββ clean_weather_data.py
β β βββ fetch_trains.py # Fetches train data from Kaggle
β β βββ match_state_to_station.py
β β βββ merge_and_split_data.py
β β βββ prep_data.py
β β βββ scrape_weather.py # Scrapes weather data from DWD (Unused)
β βββ model/ # Trained models and evaluation results
β β βββ ANN/ # ANN metrics and visualizations
β β βββ OLS/ # OLS metrics and visualizations
β βββ ANN.py # ANN model implementation
β βββ applyANN.py # Script to apply trained ANN model
β βββ applyOLS.py # Script to apply trained OLS model
β βββ feature_selection.py # Feature selection and importance analysis
β βββ OLS_model.py # OLS model implementation
βββ data/ # Dataset storage
β βββ cleaned/ # Cleaned and structured datasets
β βββ scraped/ # scraped data
β βββ DBtrainrides.csv # Large file, it is not included in the git, please download from the provided link below
β βββ scraped_data.csv # Train data from Kaggle
β βββ scraped_weather.csv # Unused
βββ images/ # Docker images for deployment
β βββ activationBase_appointmentOrDisappointment/
β β βββ Dockerfile
β β βββ README.md
β βββ codeBase_appointmentOrDisappointment/
β β βββ Dockerfile
β β βββ README.md
β βββ knowledgeBase_appointmentOrDisappointment/
β β βββ Dockerfile
β β βββ README.md
β βββ learningBase_appointmentOrDisappointment/
β β βββ Dockerfile
β β βββ README.md
β βββ docker-compose.yml # Compose file for training and evaluation
βββ scenarios/ # Use case scenarios for model application
β βββ ANN/
β β βββ docker-compose.yml # Compose file for ANN application
β βββ OLS/
β βββ docker-compose.yml # Compose file for OLS application
βββ .gitignore
βββ README.md
-
Train Delays Data (Kaggle)
-
Weather Data
- Scraped from DWD (Deutscher Wetterdienst) but not used due to mismatched timestamps with train data. See dataset here.
- Meteostat Library: Used instead for weather feature integration.
The data processing workflow includes:
- Cleaning & Normalization: Handling missing values and inconsistencies.
- Feature Selection: Identifying relevant features for training.
- Filtering Data: Selecting only Berlin & Brandenburg regions.
- Data Splitting: 80% training, 20% testing.
- Implemented using TensorFlow.
- Includes feature scaling, batch normalization, and dropout layers.
- Optimized using Mean Squared Error (MSE) and Mean Absolute Error (MAE).
- Implemented using Statsmodels.
- Serves as a baseline regression model.
- Evaluated using residual analysis, Q-Q plots, and regression diagnostics.
The project is containerized for reproducibility and deployment. It includes four Docker images:
| Image Name | Purpose |
|---|---|
| learningBase | Stores training and testing data. |
| activationBase | Stores activation data. |
| knowledgeBase | Stores trained ANN and OLS models. |
| codeBase | Contains scripts for applying models. |
Each image has a README.md explaining its setup, and all images are based on the busybox image.
This project uses Docker for containerization, ensuring reproducible results. Follow these steps to set up the environment and run the models:
-
Clone the Repository:
git clone https://github.com/luisahorlledecastro/AI-CPS.git cd AI-CPS -
Create a Docker Volume:
This volume will be used to share data between the containers and your local machine.
docker volume create ai_system
-
Build the Docker Images:
Build the images for training and prediction:
docker-compose build
This command creates the necessary Docker images.
The initial training and evaluation of the models are performed within a Docker container.
-
Start the Training Container:
docker-compose up -d # Runs in detached mode -
Navigate to the Code Directory:
cd <your_path>/AI-CPS/code
-
Run the Training Scripts:
python ANN.py python OLS_model.py
The trained models and results are saved to the
model/directory on your local machine, which is mounted as a volume. -
Exit the container:
exit -
Stop the Training Container:
docker-compose down
After training the models, you can apply them to new data using the separate docker-compose.yml files in the scenarios directory.
Applying the ANN Model:
-
Navigate to the ANN Scenario Directory:
cd scenarios/ANN -
Run the ANN Application:
docker-compose up -d
Applying the OLS Model:
-
Navigate to the OLS Scenario Directory:
cd scenarios/OLS -
Run the OLS Application:
docker-compose up -d
These commands will start the necessary containers, using the knowledgeBase, activationBase, and codeBase images. The results of the inference are saved to your local machine via the mounted volume.
Model evaluation metrics and visualizations are stored in model_metrics/, inside the respective model directories (ANN/ or OLS/).
The following images are published on Docker Hub and can be pulled directly:
docker pull luisahorlledecastro/learningBase_appointmentOrDisappointment:latest
docker pull luisahorlledecastro/activationBase_appointmentOrDisappointment:latest
docker pull luisahorlledecastro/knowledgeBase_appointmentOrDisappointment:latest
docker pull luisahorlledecastro/codeBase_appointmentOrDisappointment:latest- Python: 3.12.4
- Libraries:
- TensorFlow
- Statsmodels
- Pandas, NumPy, Scikit-learn
- BeautifulSoup (for web scraping)
- Meteostat (for weather data)
- Matplotlib, Seaborn (for visualization)
This project was collaboratively developed by:
- Dita Pelaj
- LuΓsa HΓΆrlle de Castro
This project is licensed under the AGPL-3.0 License as required by the course.