MARL Evacuation Project

A multi-agent reinforcement learning (MARL) project implementing cooperative robot agents for emergency evacuation scenarios using Multi-Agent Proximal Policy Optimization (MAPPO). The project simulates a dynamic environment where autonomous robots must efficiently evacuate humans from a hazardous area with spreading fire and potential robot malfunctions.

Demo

Watch the trained MAPPO agents in action:

MARL.Best.Model.mp4

The video shows trained robots (blue) autonomously navigating a grid environment to rescue humans (green) from spreading fire (red) and safely evacuate them through exits (yellow).

Overview

This project explores how multiple autonomous agents can learn to cooperate in high-stakes evacuation scenarios through deep reinforcement learning. The simulation features:

Dynamic Hazards: Fire that spreads probabilistically across the grid
Robot Malfunctions: Agents can randomly freeze for periods of time
Complex Cooperation: Multiple robots must coordinate to maximize rescues
Distance-Aware Navigation: Agents use positional encoding to make informed decisions
Adaptive Learning: MAPPO algorithm enables efficient multi-agent training

Key Components

Environment: Custom Gymnasium environment with grid-based evacuation simulation
MARL Algorithm: MAPPO (Multi-Agent PPO) with shared critic and individual actors
Baseline Comparisons: A*, greedy, and random agents for performance benchmarking
Reward Shaping: Carefully designed reward structure to encourage rescue efficiency

Features

Environment Dynamics

Grid-based Navigation: Customizable maps loaded from CSV files
Entity Types: Walls, exits, humans, fire, and robots with distinct behaviors
Fire Spreading: Probabilistic fire propagation mechanics
Robot Freezing: Random malfunction events that temporarily disable agents
One-hot Observations: 8-channel observation space including entity positions and distance maps

Training Features

Actor-Critic Architecture: Shared critic with individual actor policies
GAE (Generalized Advantage Estimation): For variance reduction
Entropy Regularization: Encourages exploration during training
Gradient Clipping: Ensures training stability
Checkpoint System: Save and resume training sessions
TensorBoard Integration: Real-time training metrics visualization

Evaluation

Baseline Agents: Compare MAPPO against rule-based and heuristic approaches
Visualization Tools: Render evacuation episodes with Pygame
Performance Metrics: Track rescue rates, efficiency, and coordination

Installation

Prerequisites

Python 3.8+
PyTorch 2.0+
Gymnasium
NumPy
Pygame

Setup

# Clone the repository
git clone https://github.com/yourusername/marl-evacuation-project.git
cd marl-evacuation-project

# Install dependencies
pip install torch gymnasium numpy pygame tensorboard tqdm

# (Optional) Verify installation
python main.py

Usage

Training MAPPO Agents

python training/train_mappo.py

Configure training parameters in config.py:

NUM_EPISODES: Total training episodes (default: 1,000,000)
LR: Learning rate (default: 0.0001)
GAMMA: Discount factor (default: 0.99)
ENTROPY_COEF: Exploration coefficient (default: 0.01)

Evaluating Trained Agents

# Evaluate MAPPO
python training/eval_mappo.py

# Evaluate baselines
python training/eval_baselines.py

Running Simulations

# Run with trained MAPPO model
python main.py

Set DISPLAY = True in config.py to visualize the simulation in real-time with Pygame.

Project Structure

marl-evacuation-project/
│
├── baselines/
│   ├── astar_baseline.py       # Rule-based A* pathfinding agent
│   ├── greedy_baseline.py      # Greedy heuristic agent
│   └── random_baseline.py      # Random-action agent
│
├── demos/
│   └── MARL Best Model.mp4     # Video demonstration of trained agents
│
├── environment/
│   ├── maps/                   # Predefined map layouts
│   │   ├── map1.csv           # Simple 10x10 grid
│   │   ├── map2.csv           # Medium complexity
│   │   └── map3.csv           # Complex environment
│   └── evacuation_env.py       # Gymnasium environment implementation
│
├── mappo_core/
│   ├── actor_critic.py         # Actor-Critic neural network model
│   └── mappo_trainer.py        # MAPPO training algorithm
│
├── results/                    # Generated by utils/visualization.py
│   ├── baseline/               # Baseline agent results
│   └── mappo/                  # MAPPO agent results
│
├── training/
│   ├── eval_baselines.py       # Baseline evaluation script
│   ├── eval_mappo.py           # MAPPO evaluation script
│   └── train_mappo.py          # MAPPO training script
│
├── utils/
│   ├── rollout_buffer.py       # GAE + rollout buffer implementation
│   └── visualization.py        # Rendering and visualization utilities
│
├── checkpoints/                # Saved model checkpoints
├── .gitignore                  # Git ignore rules
├── config.py                   # Hyperparameters and environment configuration
├── main.py                     # Entry point for running simulations
└── README.md                   # This file

Map Legend

Symbol	Description
0	Empty space
1	Wall (obstacle)
2	Exit
3	Human (randomized placement)
4	Fire (randomized placement with spreading)
5	Robot
6	Frozen Robot (malfunctioned)

Configuration

Key parameters in config.py:

Environment Settings

GRID_SIZE: Default grid dimensions (10x10)
MAP_FILE: CSV file defining the map layout
NUM_HUMANS: Number of humans to rescue (default: 5)
NUM_ROBOTS: Number of robot agents (default: 5)
P_FIRE: Fire spreading probability (default: 0.3)
P_FREEZE: Robot freeze probability (default: 0.05)
FREEZE_DURATION: Freeze duration in steps (default: 3)

Reward Structure

HUMAN_RESCUE_REWARD: +1000 for successful rescue
HUMAN_PICKUP_REWARD: +200 for picking up a human
FIRE_PENALTY: -100 when robot touches fire
HUMAN_FIRE_PENALTY: -200 when human touches fire
EARLY_EXIT_PENALTY: -1000 for premature exit
MOVEMENT_REWARD: +0.1 for distance-reducing movement
HUMAN_DISTANCE_PENALTY: -0.5 for being far from humans
EXIT_DISTANCE_PENALTY: -0.03 for being far from exit (when carrying)

Training Parameters

MAX_ENV_STEPS: 40 steps per episode
NUM_EPISODES: 1,000,000 training episodes
LR: 0.0001 learning rate
GAMMA: 0.99 discount factor
LAMBDA: 0.95 GAE parameter
CLIP_PARAM: 0.2 PPO clipping parameter
ENTROPY_COEF: 0.01 entropy coefficient

Algorithm: MAPPO

Multi-Agent Proximal Policy Optimization (MAPPO) is a state-of-the-art MARL algorithm that extends PPO to multi-agent settings:

Shared Critic: All agents share a centralized value function for better credit assignment
Individual Actors: Each agent has its own policy network for decentralized execution
Centralized Training, Decentralized Execution (CTDE): Leverage global information during training while maintaining independent policies during execution
PPO Updates: Clipped surrogate objective for stable policy updates

Architecture

Actor Network: CNN encoder + MLP head for action logits
Critic Network: CNN encoder + MLP head for value estimation
Input: 8-channel one-hot observations + 2D positional encoding
Output: 5-dimensional action space (noop, up, down, left, right)

Performance

The trained MAPPO agents demonstrate:

Coordinated rescue operations
Adaptive fire avoidance
Efficient exit utilization
Superior performance vs. rule-based baselines

License

This project is open-source and available under the MIT License.

Acknowledgments

Built using:

Gymnasium - RL environment framework
PyTorch - Deep learning framework
Pygame - Visualization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MARL Evacuation Project

Demo

Overview

Key Components

Features

Environment Dynamics

Training Features

Evaluation

Installation

Prerequisites

Setup

Usage

Training MAPPO Agents

Evaluating Trained Agents

Running Simulations

Project Structure

Map Legend

Configuration

Environment Settings

Reward Structure

Training Parameters

Algorithm: MAPPO

Architecture

Performance

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
baselines		baselines
environment		environment
mappo_core		mappo_core
results		results
training		training
utils		utils
.gitignore		.gitignore
README.md		README.md
config.py		config.py
main.py		main.py

ayekus/marl-evacuation-project

Folders and files

Latest commit

History

Repository files navigation

MARL Evacuation Project

Demo

Overview

Key Components

Features

Environment Dynamics

Training Features

Evaluation

Installation

Prerequisites

Setup

Usage

Training MAPPO Agents

Evaluating Trained Agents

Running Simulations

Project Structure

Map Legend

Configuration

Environment Settings

Reward Structure

Training Parameters

Algorithm: MAPPO

Architecture

Performance

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages