Thanks to visit codestin.com
Credit goes to github.com

Skip to content

luisahorlledecastro/AI-CPS

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

215 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Appointment or Disappointment - AI-Based Train Delay Prediction

This project is part of the course "Advanced AI-Based Application Systems", taught by Prof. Dr.-Ing. Marcus Grum at the University of Potsdam. The goal is to predict train delays in the Berlin and Brandenburg regions using AI models. We have implemented both an Artificial Neural Network (ANN) and an Ordinary Least Squares (OLS) regression model to tackle this problem.


πŸ“ Project Structure

β”œβ”€β”€ code/                      # Python scripts for data processing and modeling
β”‚   β”œβ”€β”€ dataPreparation/       # Data cleaning and preprocessing scripts
β”‚   β”‚   β”œβ”€β”€ clean_train_data.py
β”‚   β”‚   β”œβ”€β”€ clean_weather_data.py
β”‚   β”‚   β”œβ”€β”€ fetch_trains.py        # Fetches train data from Kaggle
β”‚   β”‚   β”œβ”€β”€ match_state_to_station.py
β”‚   β”‚   β”œβ”€β”€ merge_and_split_data.py
β”‚   β”‚   β”œβ”€β”€ prep_data.py
β”‚   β”‚   └── scrape_weather.py      # Scrapes weather data from DWD (Unused)
β”‚   β”œβ”€β”€ model/                 # Trained models and evaluation results
β”‚   β”‚   β”œβ”€β”€ ANN/                # ANN metrics and visualizations
β”‚   β”‚   └── OLS/                # OLS metrics and visualizations
β”‚   β”œβ”€β”€ ANN.py                   # ANN model implementation
β”‚   β”œβ”€β”€ applyANN.py               # Script to apply trained ANN model
β”‚   β”œβ”€β”€ applyOLS.py               # Script to apply trained OLS model
β”‚   β”œβ”€β”€ feature_selection.py      # Feature selection and importance analysis
β”‚   └── OLS_model.py              # OLS model implementation
β”œβ”€β”€ data/                        # Dataset storage
β”‚   β”œβ”€β”€ cleaned/                 # Cleaned and structured datasets
β”‚   └── scraped/                 # scraped data
β”‚       β”œβ”€β”€ DBtrainrides.csv      # Large file, it is not included in the git, please download from the provided link below
β”‚       β”œβ”€β”€ scraped_data.csv      # Train data from Kaggle
β”‚       └── scraped_weather.csv   # Unused
β”œβ”€β”€ images/                      # Docker images for deployment
β”‚   β”œβ”€β”€ activationBase_appointmentOrDisappointment/
β”‚   β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”‚   └── README.md
β”‚   β”œβ”€β”€ codeBase_appointmentOrDisappointment/
β”‚   β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”‚   └── README.md
β”‚   β”œβ”€β”€ knowledgeBase_appointmentOrDisappointment/
β”‚   β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”‚   └── README.md
β”‚   β”œβ”€β”€ learningBase_appointmentOrDisappointment/
β”‚   β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”‚   └── README.md
β”‚   └── docker-compose.yml      # Compose file for training and evaluation
β”œβ”€β”€ scenarios/                 # Use case scenarios for model application
β”‚   β”œβ”€β”€ ANN/
β”‚   β”‚   └── docker-compose.yml  # Compose file for ANN application
β”‚   └── OLS/
β”‚       └── docker-compose.yml  # Compose file for OLS application
β”œβ”€β”€ .gitignore
└── README.md

Data Sources

  • Train Delays Data (Kaggle)

    • Dataset 1: Downloaded using fetch_trains.py, available on Kaggle.
    • Dataset 2: Manually downloaded (not included in Git due to its large size, >700MB). You can download the dataset here.
  • Weather Data

    • Scraped from DWD (Deutscher Wetterdienst) but not used due to mismatched timestamps with train data. See dataset here.
    • Meteostat Library: Used instead for weather feature integration.

Data Preparation

The data processing workflow includes:

  • Cleaning & Normalization: Handling missing values and inconsistencies.
  • Feature Selection: Identifying relevant features for training.
  • Filtering Data: Selecting only Berlin & Brandenburg regions.
  • Data Splitting: 80% training, 20% testing.

🧠 Models

Artificial Neural Network (ANN)

  • Implemented using TensorFlow.
  • Includes feature scaling, batch normalization, and dropout layers.
  • Optimized using Mean Squared Error (MSE) and Mean Absolute Error (MAE).

Ordinary Least Squares (OLS)

  • Implemented using Statsmodels.
  • Serves as a baseline regression model.
  • Evaluated using residual analysis, Q-Q plots, and regression diagnostics.

Docker Setup

The project is containerized for reproducibility and deployment. It includes four Docker images:

Image Name Purpose
learningBase Stores training and testing data.
activationBase Stores activation data.
knowledgeBase Stores trained ANN and OLS models.
codeBase Contains scripts for applying models.

Each image has a README.md explaining its setup, and all images are based on the busybox image.

This project uses Docker for containerization, ensuring reproducible results. Follow these steps to set up the environment and run the models:

Setting Up the Environment

  1. Clone the Repository:

    git clone https://github.com/luisahorlledecastro/AI-CPS.git
    cd AI-CPS
  2. Create a Docker Volume:

    This volume will be used to share data between the containers and your local machine.

    docker volume create ai_system
  3. Build the Docker Images:

    Build the images for training and prediction:

    docker-compose build

    This command creates the necessary Docker images.

Running the Models

Training and Evaluation (Initial Setup):

The initial training and evaluation of the models are performed within a Docker container.

  1. Start the Training Container:

    docker-compose up -d  # Runs in detached mode
  2. Navigate to the Code Directory:

    cd <your_path>/AI-CPS/code
  3. Run the Training Scripts:

    python ANN.py
    python OLS_model.py

    The trained models and results are saved to the model/ directory on your local machine, which is mounted as a volume.

  4. Exit the container:

    exit
  5. Stop the Training Container:

    docker-compose down

Applying the Trained Models (Inference):

After training the models, you can apply them to new data using the separate docker-compose.yml files in the scenarios directory.

Applying the ANN Model:

  1. Navigate to the ANN Scenario Directory:

    cd scenarios/ANN
  2. Run the ANN Application:

    docker-compose up -d

Applying the OLS Model:

  1. Navigate to the OLS Scenario Directory:

    cd scenarios/OLS
  2. Run the OLS Application:

    docker-compose up -d

These commands will start the necessary containers, using the knowledgeBase, activationBase, and codeBase images. The results of the inference are saved to your local machine via the mounted volume.

Accessing Results

Model evaluation metrics and visualizations are stored in model_metrics/, inside the respective model directories (ANN/ or OLS/).

Docker Images

The following images are published on Docker Hub and can be pulled directly:

docker pull luisahorlledecastro/learningBase_appointmentOrDisappointment:latest
docker pull luisahorlledecastro/activationBase_appointmentOrDisappointment:latest
docker pull luisahorlledecastro/knowledgeBase_appointmentOrDisappointment:latest
docker pull luisahorlledecastro/codeBase_appointmentOrDisappointment:latest

βš™οΈ Dependencies

  • Python: 3.12.4
  • Libraries:
    • TensorFlow
    • Statsmodels
    • Pandas, NumPy, Scikit-learn
    • BeautifulSoup (for web scraping)
    • Meteostat (for weather data)
    • Matplotlib, Seaborn (for visualization)

Authors

This project was collaboratively developed by:

  • Dita Pelaj
  • LuΓ­sa HΓΆrlle de Castro

License

This project is licensed under the AGPL-3.0 License as required by the course.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 94.9%
  • Dockerfile 5.1%