Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ayekus/marl-evacuation-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MARL Evacuation Project

A multi-agent reinforcement learning (MARL) project implementing cooperative robot agents for emergency evacuation scenarios using Multi-Agent Proximal Policy Optimization (MAPPO). The project simulates a dynamic environment where autonomous robots must efficiently evacuate humans from a hazardous area with spreading fire and potential robot malfunctions.

Demo

Watch the trained MAPPO agents in action:

MARL.Best.Model.mp4

The video shows trained robots (blue) autonomously navigating a grid environment to rescue humans (green) from spreading fire (red) and safely evacuate them through exits (yellow).

Overview

This project explores how multiple autonomous agents can learn to cooperate in high-stakes evacuation scenarios through deep reinforcement learning. The simulation features:

  • Dynamic Hazards: Fire that spreads probabilistically across the grid
  • Robot Malfunctions: Agents can randomly freeze for periods of time
  • Complex Cooperation: Multiple robots must coordinate to maximize rescues
  • Distance-Aware Navigation: Agents use positional encoding to make informed decisions
  • Adaptive Learning: MAPPO algorithm enables efficient multi-agent training

Key Components

  • Environment: Custom Gymnasium environment with grid-based evacuation simulation
  • MARL Algorithm: MAPPO (Multi-Agent PPO) with shared critic and individual actors
  • Baseline Comparisons: A*, greedy, and random agents for performance benchmarking
  • Reward Shaping: Carefully designed reward structure to encourage rescue efficiency

Features

Environment Dynamics

  • Grid-based Navigation: Customizable maps loaded from CSV files
  • Entity Types: Walls, exits, humans, fire, and robots with distinct behaviors
  • Fire Spreading: Probabilistic fire propagation mechanics
  • Robot Freezing: Random malfunction events that temporarily disable agents
  • One-hot Observations: 8-channel observation space including entity positions and distance maps

Training Features

  • Actor-Critic Architecture: Shared critic with individual actor policies
  • GAE (Generalized Advantage Estimation): For variance reduction
  • Entropy Regularization: Encourages exploration during training
  • Gradient Clipping: Ensures training stability
  • Checkpoint System: Save and resume training sessions
  • TensorBoard Integration: Real-time training metrics visualization

Evaluation

  • Baseline Agents: Compare MAPPO against rule-based and heuristic approaches
  • Visualization Tools: Render evacuation episodes with Pygame
  • Performance Metrics: Track rescue rates, efficiency, and coordination

Installation

Prerequisites

  • Python 3.8+
  • PyTorch 2.0+
  • Gymnasium
  • NumPy
  • Pygame

Setup

# Clone the repository
git clone https://github.com/yourusername/marl-evacuation-project.git
cd marl-evacuation-project

# Install dependencies
pip install torch gymnasium numpy pygame tensorboard tqdm

# (Optional) Verify installation
python main.py

Usage

Training MAPPO Agents

python training/train_mappo.py

Configure training parameters in config.py:

  • NUM_EPISODES: Total training episodes (default: 1,000,000)
  • LR: Learning rate (default: 0.0001)
  • GAMMA: Discount factor (default: 0.99)
  • ENTROPY_COEF: Exploration coefficient (default: 0.01)

Evaluating Trained Agents

# Evaluate MAPPO
python training/eval_mappo.py

# Evaluate baselines
python training/eval_baselines.py

Running Simulations

# Run with trained MAPPO model
python main.py

Set DISPLAY = True in config.py to visualize the simulation in real-time with Pygame.

Project Structure

marl-evacuation-project/
│
├── baselines/
│   ├── astar_baseline.py       # Rule-based A* pathfinding agent
│   ├── greedy_baseline.py      # Greedy heuristic agent
│   └── random_baseline.py      # Random-action agent
│
├── demos/
│   └── MARL Best Model.mp4     # Video demonstration of trained agents
│
├── environment/
│   ├── maps/                   # Predefined map layouts
│   │   ├── map1.csv           # Simple 10x10 grid
│   │   ├── map2.csv           # Medium complexity
│   │   └── map3.csv           # Complex environment
│   └── evacuation_env.py       # Gymnasium environment implementation
│
├── mappo_core/
│   ├── actor_critic.py         # Actor-Critic neural network model
│   └── mappo_trainer.py        # MAPPO training algorithm
│
├── results/                    # Generated by utils/visualization.py
│   ├── baseline/               # Baseline agent results
│   └── mappo/                  # MAPPO agent results
│
├── training/
│   ├── eval_baselines.py       # Baseline evaluation script
│   ├── eval_mappo.py           # MAPPO evaluation script
│   └── train_mappo.py          # MAPPO training script
│
├── utils/
│   ├── rollout_buffer.py       # GAE + rollout buffer implementation
│   └── visualization.py        # Rendering and visualization utilities
│
├── checkpoints/                # Saved model checkpoints
├── .gitignore                  # Git ignore rules
├── config.py                   # Hyperparameters and environment configuration
├── main.py                     # Entry point for running simulations
└── README.md                   # This file

Map Legend

Symbol Description
0 Empty space
1 Wall (obstacle)
2 Exit
3 Human (randomized placement)
4 Fire (randomized placement with spreading)
5 Robot
6 Frozen Robot (malfunctioned)

Configuration

Key parameters in config.py:

Environment Settings

  • GRID_SIZE: Default grid dimensions (10x10)
  • MAP_FILE: CSV file defining the map layout
  • NUM_HUMANS: Number of humans to rescue (default: 5)
  • NUM_ROBOTS: Number of robot agents (default: 5)
  • P_FIRE: Fire spreading probability (default: 0.3)
  • P_FREEZE: Robot freeze probability (default: 0.05)
  • FREEZE_DURATION: Freeze duration in steps (default: 3)

Reward Structure

  • HUMAN_RESCUE_REWARD: +1000 for successful rescue
  • HUMAN_PICKUP_REWARD: +200 for picking up a human
  • FIRE_PENALTY: -100 when robot touches fire
  • HUMAN_FIRE_PENALTY: -200 when human touches fire
  • EARLY_EXIT_PENALTY: -1000 for premature exit
  • MOVEMENT_REWARD: +0.1 for distance-reducing movement
  • HUMAN_DISTANCE_PENALTY: -0.5 for being far from humans
  • EXIT_DISTANCE_PENALTY: -0.03 for being far from exit (when carrying)

Training Parameters

  • MAX_ENV_STEPS: 40 steps per episode
  • NUM_EPISODES: 1,000,000 training episodes
  • LR: 0.0001 learning rate
  • GAMMA: 0.99 discount factor
  • LAMBDA: 0.95 GAE parameter
  • CLIP_PARAM: 0.2 PPO clipping parameter
  • ENTROPY_COEF: 0.01 entropy coefficient

Algorithm: MAPPO

Multi-Agent Proximal Policy Optimization (MAPPO) is a state-of-the-art MARL algorithm that extends PPO to multi-agent settings:

  1. Shared Critic: All agents share a centralized value function for better credit assignment
  2. Individual Actors: Each agent has its own policy network for decentralized execution
  3. Centralized Training, Decentralized Execution (CTDE): Leverage global information during training while maintaining independent policies during execution
  4. PPO Updates: Clipped surrogate objective for stable policy updates

Architecture

  • Actor Network: CNN encoder + MLP head for action logits
  • Critic Network: CNN encoder + MLP head for value estimation
  • Input: 8-channel one-hot observations + 2D positional encoding
  • Output: 5-dimensional action space (noop, up, down, left, right)

Performance

The trained MAPPO agents demonstrate:

  • Coordinated rescue operations
  • Adaptive fire avoidance
  • Efficient exit utilization
  • Superior performance vs. rule-based baselines

License

This project is open-source and available under the MIT License.

Acknowledgments

Built using:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages