Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

elte-collective-intelligence/student-mechanism-design

Repository files navigation

Mechanism Design (Scotland Yard): Multi-Agent Reinforcement Learning (TorchRL)

CI Docker codecov License: CC BY-NC-ND 4.0

Quick Start

Prerequisites

  • Docker (recommended) or Python 3.10+
  • CUDA-capable GPU (optional, for faster training)

Installation & Running

# Clone the repository
git clone https://github.com/elte-collective-intelligence/student-mechanism-design.git
cd student-mechanism-design

# Build Docker images
docker build --progress plain -f ./docker/BaseDockerfile -t student_mechanism_design_base .
docker build --progress plain -f ./docker/Dockerfile -t student_mechanism_design .

# Run training experiment
docker run --rm --gpus=all --mount type=bind,src=$PWD,dst=/app student_mechanism_design all

# Run unit tests
docker run --rm --mount type=bind,src=$PWD,dst=/app student_mechanism_design --unit_test

# Run ablation studies
docker run --rm --mount type=bind,src=$PWD,dst=/app student_mechanism_design python src/eval/run_ablations.py --ablation all

Local Development (without Docker)

pip install -r requirements.txt
cd src
python main.py all --agent_configs=mappo --log_configs=verbose

Project Overview

This project implements a mechanism design approach for the Scotland Yard pursuit-evasion game using multi-agent reinforcement learning. Key features include:

  • Partial Observability: MrX is hidden from police with configurable reveal schedules
  • Belief Tracking: Particle filter and learned belief encoders for police
  • Mechanism Design: Configurable tolls, budgets, and reveal policies
  • Meta-Learning: Automatic tuning of mechanism parameters toward 50% win rate
  • Population-Based Self-Play: Policy pools with ELO-style scoring
  • MAPPO & GNN Agents: State-of-the-art multi-agent RL algorithms

Experiment Matrix

Experiment Agents Graph Size Budget Reveal Description
smoke_train 2 15 nodes 10 R=5 Quick sanity check
singular 2-3 15 nodes 8-12 R=5 Single config training
all 2-6 15-20 nodes 4-18 R=5 Full sweep
big_graph 3-4 25+ nodes 10-15 R=5 Large graph evaluation
test 2 12 nodes 10 R=5 Development testing

Running Experiments

# Run specific experiment
docker run --rm --gpus=all --mount type=bind,src=$PWD,dst=/app student_mechanism_design <experiment_name>

# Examples:
docker run ... student_mechanism_design smoke_train
docker run ... student_mechanism_design all
docker run ... student_mechanism_design big_graph

Environment Specification

Observation Space

Each agent receives:

Field Type Description
adjacency_matrix NxN float Binary graph connectivity
node_features NxK float Agent positions encoded as one-hot
edge_index 2xE int Edge list for GNN
edge_features E float Edge weights/costs
action_mask N bool Valid actions (fixed index→node mapping)
valid_actions list[int] Affordable neighbor nodes
belief_map N float MrX location distribution (Police only)
agent_position int Current node
agent_budget float Remaining money

Action Space

  • Type: Discrete(N) where N = number of nodes
  • Masking: Actions masked by budget and topology
  • Mapping: Fixed identity mapping (action i β†’ node i)

Action Mask Implementation

# Fixed index→node mapping ensures consistency
mask[node] = True if (adjacent[current, node] and cost <= budget)
index_to_node = {i: i for i in range(num_nodes)}  # Identity mapping

Mechanism Parameters

Parameter Config Key Default Description
Police Budget police_budget 10 Initial money for police
Reveal Interval reveal_interval 5 Steps between MrX reveals
Reveal Probability reveal_probability 0.0 Stochastic reveal chance
Toll tolls 0.0 Per-edge movement cost
Ticket Price ticket_price 1.0 Base movement cost
Target Win Rate target_win_rate 0.5 Meta-learning objective

Metrics (3 Required)

We report the following three metrics as required by the assignment:

πŸ“Š Metric 1: Balance (Win Rate)

Definition: Fraction of episodes won by MrX

Win Rate = MrX Wins / Total Episodes
Target: 0.50 Β± 0.05

Implementation: src/eval/metrics.py::compute_win_rate()

Why this metric: Measures game balanceβ€”the primary goal of mechanism design. A win rate of 50% indicates fair gameplay where neither side has a systematic advantage.

πŸ“Š Metric 2: Belief Quality (Cross-Entropy)

Definition: Cross-entropy between police belief distribution and true MrX position at reveal times.

CE = -log(belief[true_mrx_position])
Lower is better (more accurate belief)

Implementation: src/eval/metrics.py::belief_cross_entropy()

Why this metric: Measures how well police can track MrX under partial observability. Lower cross-entropy means the belief distribution assigns higher probability to MrX's true location.

πŸ“Š Metric 3: Time-to-Catch / Survival Time

Definition: Average episode length, split by winner.

  • Time-to-Catch: Mean steps when Police wins
  • Survival Time: Mean steps when MrX wins

Implementation: src/eval/metrics.py::compute_time_metrics()

Why this metric: Captures game dynamicsβ€”shorter catch times indicate effective police coordination, while longer survival times indicate successful evasion strategies.


Ablation Studies

Ablation 1: Belief Tracking

Config: src/configs/ablation/belief.yaml

Compares belief tracking methods under partial observability:

Variant Reveal Belief Method Expected Effect
no_belief R=0 None Police severely disadvantaged
particle_filter R=5 Particle Filter Baseline tracking
learned_encoder R=5 Neural Encoder Potentially better generalization

Run:

python src/eval/run_ablations.py --ablation belief --num_episodes 100 --seeds 42 123 456

Expected Results:

  • no_belief: MrX win rate ~70-80% (Police cannot track)
  • particle_filter: MrX win rate ~50-55% (Baseline)
  • learned_encoder: MrX win rate ~45-55% (Comparable or better)

Ablation 2: Mechanism Design

Config: src/configs/ablation/mechanism.yaml

Compares mechanism configurations:

Variant Tolls Budget Reveal Expected Win Rate
no_mechanism 0 ∞ R=0 ~70% MrX (unbalanced)
fixed_mechanism 1.0 15 R=5 ~45% MrX (hand-tuned)
meta_learned learned learned learned ~50% MrX (target)

Run:

python src/eval/run_ablations.py --ablation mechanism --num_episodes 100 --seeds 42 123 456

Expected Results:

  • no_mechanism: Demonstrates need for mechanism design
  • fixed_mechanism: Shows improvement over baseline
  • meta_learned: Achieves target balance through optimization

Running All Ablations

python src/eval/run_ablations.py --ablation all --num_episodes 100 --output_dir logs/ablations

Ablation Results Location

Results are saved to logs/ablations/:

  • belief_results.json: Raw metrics data
  • belief_report.txt: Formatted comparison report
  • mechanism_results.json: Raw metrics data
  • mechanism_report.txt: Formatted comparison report

Failure Analysis

Known Limitations

  1. Belief Collapse: Particle filter can collapse to incorrect modes when reveals are sparse (R > 10)

    • Mitigation: Noise injection, increased particle count, or use learned encoder
  2. Budget Exhaustion: Police may run out of budget before catching MrX on large graphs

    • Mitigation: Meta-learning adjusts budget based on observed win rate
  3. Graph Topology Sensitivity: Performance varies significantly with graph structure (degree distribution, diameter)

    • Mitigation: Curriculum learning over diverse graph distributions
  4. Action Mask Edge Cases: When no moves are affordable, agent stays in place

    • Handled: Environment returns current position as default action
  5. Reward Hacking: Agents may exploit reward shaping rather than achieving true objectives

    • Mitigation: Use terminal rewards primarily, validate with win rate metric

Debugging Tips

# Enable verbose logging
docker run ... student_mechanism_design all --log_configs=verbose

# Visualize episodes (generates GIFs)
docker run ... student_mechanism_design smoke_train --vis_configs=full

# Run unit tests to verify components
docker run ... student_mechanism_design --unit_test

# Check specific test
pytest test/test_action_mask.py -v

Code Structure

src/
β”œβ”€β”€ main.py                     # Training entry point
β”œβ”€β”€ logger.py                   # Logging utilities (WandB, TensorBoard)
β”œβ”€β”€ reward_net.py               # RewardWeightNet for meta-learning
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ ablation/
β”‚   β”‚   β”œβ”€β”€ belief.yaml         # Belief ablation variants
β”‚   β”‚   └── mechanism.yaml      # Mechanism ablation variants
β”‚   β”œβ”€β”€ agent/                  # Agent configurations
β”‚   β”œβ”€β”€ mechanism/default.yaml  # Mechanism parameters
β”‚   └── ...
β”œβ”€β”€ Enviroment/
β”‚   β”œβ”€β”€ yard.py                 # Main environment (CustomEnvironment)
β”‚   β”œβ”€β”€ action_mask.py          # Action masking with fixed indexβ†’node mapping
β”‚   β”œβ”€β”€ belief_module.py        # ParticleBeliefTracker, LearnedBeliefEncoder
β”‚   β”œβ”€β”€ partial_obs.py          # PartialObservationWrapper
β”‚   β”œβ”€β”€ graph_generator.py      # GraphGenerator with seed saving
β”‚   └── graph_layout.py         # ConnectedGraph sampling
β”œβ”€β”€ RLAgent/
β”‚   β”œβ”€β”€ mappo_agent.py          # MAPPO implementation
β”‚   β”œβ”€β”€ gnn_agent.py            # GNN-based DQN agent
β”‚   β”œβ”€β”€ random_agent.py         # Random baseline
β”‚   └── base_agent.py           # Abstract base class
β”œβ”€β”€ selfplay/
β”‚   β”œβ”€β”€ population_manager.py   # Population-based training with ELO
β”‚   β”œβ”€β”€ opponent_modeling.py    # Opponent behavior modeling
β”‚   └── best_response.py        # Best response utilities
β”œβ”€β”€ mechanism/
β”‚   β”œβ”€β”€ mechanism_config.py     # MechanismConfig dataclass
β”‚   β”œβ”€β”€ meta_learning_loop.py   # MetaLearner for mechanism optimization
β”‚   └── reward_weight_integration.py
β”œβ”€β”€ eval/
β”‚   β”œβ”€β”€ metrics.py              # Core metrics (win rate, belief CE, time)
β”‚   β”œβ”€β”€ run_ablations.py        # Ablation study runner
β”‚   β”œβ”€β”€ ood_eval.py             # OOD & robustness evaluation
β”‚   β”œβ”€β”€ belief_quality.py       # Belief cross-entropy
β”‚   └── exploitability.py       # Exploitability proxy
β”œβ”€β”€ experiments/
β”‚   β”œβ”€β”€ all/config.yml
β”‚   β”œβ”€β”€ smoke_train/config.yml
β”‚   β”œβ”€β”€ singular/config.yml
β”‚   └── ...
└── artifacts/                  # Saved model checkpoints
test/
β”œβ”€β”€ test_action_mask.py         # Action mask unit tests
β”œβ”€β”€ test_belief_update.py       # Belief tracking tests
β”œβ”€β”€ env_test.py                 # Environment smoke tests
└── smoke_test.py               # Basic sanity check

Configuration

Hydra-Style Configs

All parameters are configurable via YAML:

# src/configs/mechanism/default.yaml
police_budget: 10
reveal_interval: 5
reveal_probability: 0.0
ticket_price: 1.0
target_win_rate: 0.5
secondary_weight: 0.1
# src/experiments/all/config.yml
agent_configurations:
  - num_police_agents: 2
    agent_money: 10
  - num_police_agents: 3
    agent_money: 8
  # ...
num_episodes: 70
epochs: 200
random_seed: 42

WandB Integration

Set credentials in src/wandb_data.json:

{
  "wandb_api_key": "<your-api-key>",
  "wandb_project": "scotland-yard",
  "wandb_entity": "<your-entity>"
}

Leave as "null" to disable WandB logging.


Tests

Unit Tests

# Run all tests
pytest test/

# Run specific tests
pytest test/test_action_mask.py -v
pytest test/test_belief_update.py -v
pytest test/env_test.py -v

Test Coverage

Test File Description Key Assertions
test_action_mask.py Action mask correctness Fixed index→node mapping, budget constraints
test_belief_update.py Belief tracking Distribution normalization, reveal collapse
env_test.py Environment smoke test Reset/step don't throw exceptions

Required Tests

  1. βœ… Action mask correctness: test_action_mask.py::test_action_mask_fixed_index_node_mapping
  2. βœ… Belief update step: test_belief_update.py::test_belief_updates_and_reveals

References


License

This project is licensed under CC BY-NC-ND 4.0. See the LICENSE file for details.

About

MARL research project, based on the famous board game "Scotland Yard".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 9

Languages