SUPRA-PPO

Integrating Fuzzy MCDM Priors into Deep Reinforcement Learning for Sustainable and Resilient Supplier Selection and Dynamic Order Allocation

Paper: "Integrating Fuzzy MCDM Priors into Deep Reinforcement Learning for Sustainable and Resilient Supplier Selection and Dynamic Order Allocation"

Authors: Ali Vaezi · Erfan Rabbani · Giulia Bruno

Affiliation: Politecnico di Torino, Italy

Overview

This repository provides the complete implementation for a two-phase hybrid framework that integrates Multi-Criteria Decision Making (MCDM) with Deep Reinforcement Learning (DRL) for sustainable supplier order allocation under demand uncertainty.

Phase I applies Fuzzy Best-Worst Method (FBWM) and Fuzzy TOPSIS to evaluate suppliers across economic, environmental, and resilience dimensions. Phase II embeds these sustainability priors into a custom PPO agent (SUPRA-PPO) operating within a stochastic supply chain simulation.

The key research question: When and how do sustainability priors improve RL-based order allocation?

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Phase I: MCDM                        │
│  ┌──────────┐    ┌───────────┐    ┌──────────────────┐  │
│  │   FBWM   │───▸│  Criteria │───▸│     FTOPSIS      │  │
│  │ (weights) │    │  Weights  │    │ (supplier ranks) │  │
│  └──────────┘    └───────────┘    └────────┬─────────┘  │
└────────────────────────────────────────────┼────────────┘
                                             │
                            Sustainability Priors (scores)
                                             │
┌────────────────────────────────────────────▼────────────┐
│                 Phase II: RL Training                    │
│  ┌──────────────────┐    ┌───────────────────────────┐  │
│  │  SupplyChainEnv  │◂──▸│       SUPRA-PPO           │  │
│  │  (gymnasium)     │    │  - Adaptive Entropy (ES)   │  │
│  │  - 5 suppliers   │    │  - FTOPSIS reward shaping  │  │
│  │  - 2 products    │    │  - CV-based scheduling     │  │
│  │  - 4 scenarios   │    └───────────────────────────┘  │
│  └──────────────────┘                                   │
└─────────────────────────────────────────────────────────┘

Requirements

Python: 3.9 or later
Key packages: See requirements.txt for the full pinned list

pip install -r requirements.txt

Quick Start

Create and activate a Python virtual environment:

python -m venv venv
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

(Optional) Recompute MCDM supplier scores — pre-computed scores are provided in results/mcdm/:

python src/mcdm_evaluation.py

Run the full experiment suite (4 scenarios × 3 models × 3 seeds = 54 runs — 36 main + 18 ablation):

bash run_all_experiments.sh
# or individually:
python src/run_experiment.py

Run the Misaligned Shock adversarial experiment (Section 5.4):

python src/run_misaligned_shock.py
# or for a specific seed subset:
python src/run_misaligned_shock.py --seeds 42,123,456

Generate publication-quality figures:

python src/visualize_results.py

Docker (Alternative)

Build and run the entire experiment suite in a container:

# Build the image
docker build -t supra-ppo .

# Run environment validation
docker run --rm supra-ppo

# Run experiments (results are saved inside the container)
docker run --rm -v $(pwd)/results:/app/results supra-ppo \
    python src/run_experiment.py

# Run visualisation
docker run --rm -v $(pwd)/results:/app/results supra-ppo \
    python src/visualize_results.py

Repository Structure

├── src/
│   ├── run_experiment.py          # Main experiment runner (4 scenarios × 3 models × 3 seeds)
│   ├── mcdm_evaluation.py         # FBWM + FTOPSIS supplier scoring
│   ├── visualize_results.py       # Publication-quality figure generator
│   └── generate_supplier_table.py # Supplier evaluation LaTeX table
├── scripts/
│   ├── check_status.py            # Experiment progress monitor
│   └── evaluate_trained_models.py # Post-training model evaluation
├── test/
│   ├── test_environment.py        # Supply chain environment unit tests
│   └── test_reward_fix.py         # Reward function validation tests
├── results/
│   └── mcdm/                      # Pre-computed MCDM scores (tracked in Git)
│       ├── fbwm_weights.csv       # Fuzzy BWM criteria weights
│       ├── ftopsis_rankings.csv   # Fuzzy TOPSIS supplier rankings
│       ├── dimensional_scores.csv # Per-dimension supplier scores
│       └── supplier_scores_for_rl.npy  # NumPy array used by RL agent
├── requirements.txt               # Pinned Python dependencies
├── Dockerfile                     # Container-based reproducibility
├── .dockerignore                  # Docker build exclusions
├── run_all_experiments.sh         # Shell script to run full experiment suite
├── LICENSE                        # CC BY-NC-ND 4.0 (pre-publication)
└── README.md

Note: Trained model checkpoints (~160 MB) and generated figures are excluded from Git. Run the experiment scripts to reproduce them locally.

Core Features

SUPRA-PPO Algorithm: A PPO variant featuring:
- Adaptive Entropy Scheduling (ES) for dynamic exploration–exploitation balance
- Uncertainty-Driven Regularisation (UDR) for promoting robust policies
Supply Chain Environment: A gymnasium-compatible simulation with:
- Non-stationary demand (trend, seasonality, stochastic shocks)
- Lognormal lead times and systemic supplier disruptions
- Four market scenarios: Stable Operations, High Volatility, Systemic Shock, Misaligned Shock
MCDM Integration: Fuzzy Best-Worst Method (FBWM) + Fuzzy TOPSIS (FTOPSIS) to generate sustainability–resilience priors
Ablation Study: Three models (M1 = Base-Stock, M2 = Vanilla PPO, M3 = PPO + FTOPSIS priors) across 4 scenarios × 3 seeds (54 total runs)

Experimental Design

Factor	Levels	Details
Models	3	M1: Base-Stock heuristic, M2: Vanilla PPO, M3: SUPRA-PPO (PPO + FTOPSIS priors)
Scenarios	4	Stable Operations, High Volatility, Systemic Shock, Misaligned Shock
Seeds	3	42, 123, 456 (for statistical robustness)
Timesteps	3M	Per model-scenario combination
Sensitivity	5	λ_sust ∈ {0.0, 0.1, 0.2, 0.3, 0.5}
Total runs	54	36 main (3 models × 4 scenarios × 3 seeds) + 18 ablation

Results Preview

Scenario / Metric	M1 (Base-Stock)	M2 (Vanilla PPO)	M3 (SUPRA-PPO)
Stable — cost	Baseline	Improved	Best ($d=1.31$)
High Volatility — cost	Baseline	Best	Penalised by priors
Systemic Shock — sustainability	Low	Moderate	Highest ($d=0.71$)
Misaligned Shock — cost	Baseline	Moderate	Survives ($d=1.09$)
Misaligned Shock — sustainability	—	Best	Collapses ($d=2.07$)

Full quantitative results are available after running the experiments.

Reproducibility

All configuration parameters are embedded in run_experiment.py — no external config files needed
All four scenarios (Stable, High Volatility, Systemic Shock, Misaligned Shock), hyperparameters, and random seeds are explicitly declared in configuration dictionaries
The Misaligned Shock adversarial scenario is reproduced via src/run_misaligned_shock.py
Pre-computed MCDM scores are provided in results/mcdm/ so the RL experiments can run without recomputing them
Large outputs (trained models, generated figures, TensorBoard logs) are excluded from Git via .gitignore
To fully reproduce all 54 runs: install dependencies, run bash run_all_experiments.sh, then python src/visualize_results.py

Citation

If you use this code in your research, please cite using the "Cite this repository" button on GitHub, or:

@article{vaezi2026suprappo,
  author    = {Vaezi, Ali and Rabbani, Erfan and Bruno, Giulia},
  title     = {Integrating Fuzzy {MCDM} Priors into Deep Reinforcement Learning
               for Sustainable and Resilient Supplier Selection and Dynamic Order Allocation},
  journal   = {International Journal of Production Economics},
  year      = {2026},
  note      = {Under review},
  url       = {https://github.com/aliivaezii/FBWM-FTOPSIS-PPO},
}

Contributing

Contributions are welcome. Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under CC BY-NC-ND 4.0 during the pre-publication period. You may view and share the code with attribution, but commercial use and derivative works are not permitted. After the associated paper is published, this repository will be re-licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SUPRA-PPO

Integrating Fuzzy MCDM Priors into Deep Reinforcement Learning for Sustainable and Resilient Supplier Selection and Dynamic Order Allocation

Table of Contents

Overview

Architecture

Requirements

Quick Start

Docker (Alternative)

Repository Structure

Core Features

Experimental Design

Results Preview

Reproducibility

Citation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github		.github
results		results
scripts		scripts
src		src
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_all_experiments.sh		run_all_experiments.sh

Folders and files

Latest commit

History

Repository files navigation

SUPRA-PPO

Integrating Fuzzy MCDM Priors into Deep Reinforcement Learning for Sustainable and Resilient Supplier Selection and Dynamic Order Allocation

Table of Contents

Overview

Architecture

Requirements

Quick Start

Docker (Alternative)

Repository Structure

Core Features

Experimental Design

Results Preview

Reproducibility

Citation

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages