Thanks to visit codestin.com
Credit goes to github.com

Skip to content

aliivaezii/FBWM-FTOPSIS-PPO

Repository files navigation

SUPRA-PPO

Integrating Fuzzy MCDM Priors into Deep Reinforcement Learning for Sustainable and Resilient Supplier Selection and Dynamic Order Allocation

Python 3.9+ License: CC BY-NC-ND 4.0 Stable-Baselines3 Gymnasium Code style: PEP 8 Docker


Paper: "Integrating Fuzzy MCDM Priors into Deep Reinforcement Learning for Sustainable and Resilient Supplier Selection and Dynamic Order Allocation"

Authors: Ali Vaezi · Erfan Rabbani · Giulia Bruno

Affiliation: Politecnico di Torino, Italy


Table of Contents


Overview

This repository provides the complete implementation for a two-phase hybrid framework that integrates Multi-Criteria Decision Making (MCDM) with Deep Reinforcement Learning (DRL) for sustainable supplier order allocation under demand uncertainty.

Phase I applies Fuzzy Best-Worst Method (FBWM) and Fuzzy TOPSIS to evaluate suppliers across economic, environmental, and resilience dimensions. Phase II embeds these sustainability priors into a custom PPO agent (SUPRA-PPO) operating within a stochastic supply chain simulation.

The key research question: When and how do sustainability priors improve RL-based order allocation?


Architecture

┌─────────────────────────────────────────────────────────┐
│                    Phase I: MCDM                        │
│  ┌──────────┐    ┌───────────┐    ┌──────────────────┐  │
│  │   FBWM   │───▸│  Criteria │───▸│     FTOPSIS      │  │
│  │ (weights) │    │  Weights  │    │ (supplier ranks) │  │
│  └──────────┘    └───────────┘    └────────┬─────────┘  │
└────────────────────────────────────────────┼────────────┘
                                             │
                            Sustainability Priors (scores)
                                             │
┌────────────────────────────────────────────▼────────────┐
│                 Phase II: RL Training                    │
│  ┌──────────────────┐    ┌───────────────────────────┐  │
│  │  SupplyChainEnv  │◂──▸│       SUPRA-PPO           │  │
│  │  (gymnasium)     │    │  - Adaptive Entropy (ES)   │  │
│  │  - 5 suppliers   │    │  - FTOPSIS reward shaping  │  │
│  │  - 2 products    │    │  - CV-based scheduling     │  │
│  │  - 4 scenarios   │    └───────────────────────────┘  │
│  └──────────────────┘                                   │
└─────────────────────────────────────────────────────────┘

Requirements

  • Python: 3.9 or later
  • Key packages: See requirements.txt for the full pinned list
pip install -r requirements.txt

Quick Start

  1. Create and activate a Python virtual environment:
python -m venv venv
source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. (Optional) Recompute MCDM supplier scores — pre-computed scores are provided in results/mcdm/:
python src/mcdm_evaluation.py
  1. Run the full experiment suite (4 scenarios × 3 models × 3 seeds = 54 runs — 36 main + 18 ablation):
bash run_all_experiments.sh
# or individually:
python src/run_experiment.py
  1. Run the Misaligned Shock adversarial experiment (Section 5.4):
python src/run_misaligned_shock.py
# or for a specific seed subset:
python src/run_misaligned_shock.py --seeds 42,123,456
  1. Generate publication-quality figures:
python src/visualize_results.py

Docker (Alternative)

Build and run the entire experiment suite in a container:

# Build the image
docker build -t supra-ppo .

# Run environment validation
docker run --rm supra-ppo

# Run experiments (results are saved inside the container)
docker run --rm -v $(pwd)/results:/app/results supra-ppo \
    python src/run_experiment.py

# Run visualisation
docker run --rm -v $(pwd)/results:/app/results supra-ppo \
    python src/visualize_results.py

Repository Structure

├── src/
│   ├── run_experiment.py          # Main experiment runner (4 scenarios × 3 models × 3 seeds)
│   ├── mcdm_evaluation.py         # FBWM + FTOPSIS supplier scoring
│   ├── visualize_results.py       # Publication-quality figure generator
│   └── generate_supplier_table.py # Supplier evaluation LaTeX table
├── scripts/
│   ├── check_status.py            # Experiment progress monitor
│   └── evaluate_trained_models.py # Post-training model evaluation
├── test/
│   ├── test_environment.py        # Supply chain environment unit tests
│   └── test_reward_fix.py         # Reward function validation tests
├── results/
│   └── mcdm/                      # Pre-computed MCDM scores (tracked in Git)
│       ├── fbwm_weights.csv       # Fuzzy BWM criteria weights
│       ├── ftopsis_rankings.csv   # Fuzzy TOPSIS supplier rankings
│       ├── dimensional_scores.csv # Per-dimension supplier scores
│       └── supplier_scores_for_rl.npy  # NumPy array used by RL agent
├── requirements.txt               # Pinned Python dependencies
├── Dockerfile                     # Container-based reproducibility
├── .dockerignore                  # Docker build exclusions
├── run_all_experiments.sh         # Shell script to run full experiment suite
├── LICENSE                        # CC BY-NC-ND 4.0 (pre-publication)
└── README.md

Note: Trained model checkpoints (~160 MB) and generated figures are excluded from Git. Run the experiment scripts to reproduce them locally.


Core Features

  • SUPRA-PPO Algorithm: A PPO variant featuring:
    • Adaptive Entropy Scheduling (ES) for dynamic exploration–exploitation balance
    • Uncertainty-Driven Regularisation (UDR) for promoting robust policies
  • Supply Chain Environment: A gymnasium-compatible simulation with:
    • Non-stationary demand (trend, seasonality, stochastic shocks)
    • Lognormal lead times and systemic supplier disruptions
    • Four market scenarios: Stable Operations, High Volatility, Systemic Shock, Misaligned Shock
  • MCDM Integration: Fuzzy Best-Worst Method (FBWM) + Fuzzy TOPSIS (FTOPSIS) to generate sustainability–resilience priors
  • Ablation Study: Three models (M1 = Base-Stock, M2 = Vanilla PPO, M3 = PPO + FTOPSIS priors) across 4 scenarios × 3 seeds (54 total runs)

Experimental Design

Factor Levels Details
Models 3 M1: Base-Stock heuristic, M2: Vanilla PPO, M3: SUPRA-PPO (PPO + FTOPSIS priors)
Scenarios 4 Stable Operations, High Volatility, Systemic Shock, Misaligned Shock
Seeds 3 42, 123, 456 (for statistical robustness)
Timesteps 3M Per model-scenario combination
Sensitivity 5 λ_sust ∈ {0.0, 0.1, 0.2, 0.3, 0.5}
Total runs 54 36 main (3 models × 4 scenarios × 3 seeds) + 18 ablation

Results Preview

Scenario / Metric M1 (Base-Stock) M2 (Vanilla PPO) M3 (SUPRA-PPO)
Stable — cost Baseline Improved Best ($d=1.31$)
High Volatility — cost Baseline Best Penalised by priors
Systemic Shock — sustainability Low Moderate Highest ($d=0.71$)
Misaligned Shock — cost Baseline Moderate Survives ($d=1.09$)
Misaligned Shock — sustainability Best Collapses ($d=2.07$)

Full quantitative results are available after running the experiments.


Reproducibility

  • All configuration parameters are embedded in run_experiment.py — no external config files needed
  • All four scenarios (Stable, High Volatility, Systemic Shock, Misaligned Shock), hyperparameters, and random seeds are explicitly declared in configuration dictionaries
  • The Misaligned Shock adversarial scenario is reproduced via src/run_misaligned_shock.py
  • Pre-computed MCDM scores are provided in results/mcdm/ so the RL experiments can run without recomputing them
  • Large outputs (trained models, generated figures, TensorBoard logs) are excluded from Git via .gitignore
  • To fully reproduce all 54 runs: install dependencies, run bash run_all_experiments.sh, then python src/visualize_results.py

Citation

If you use this code in your research, please cite using the "Cite this repository" button on GitHub, or:

@article{vaezi2026suprappo,
  author    = {Vaezi, Ali and Rabbani, Erfan and Bruno, Giulia},
  title     = {Integrating Fuzzy {MCDM} Priors into Deep Reinforcement Learning
               for Sustainable and Resilient Supplier Selection and Dynamic Order Allocation},
  journal   = {International Journal of Production Economics},
  year      = {2026},
  note      = {Under review},
  url       = {https://github.com/aliivaezii/FBWM-FTOPSIS-PPO},
}

Contributing

Contributions are welcome. Please see CONTRIBUTING.md for guidelines.


License

This project is licensed under CC BY-NC-ND 4.0 during the pre-publication period. You may view and share the code with attribution, but commercial use and derivative works are not permitted. After the associated paper is published, this repository will be re-licensed under the MIT License.

About

FBWM-FTOPSIS-PPO hybrid framework for sustainable supply chain management

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors