- Docker (recommended) or Python 3.10+
- CUDA-capable GPU (optional, for faster training)
# Clone the repository
git clone https://github.com/elte-collective-intelligence/student-mechanism-design.git
cd student-mechanism-design
# Build Docker images
docker build --progress plain -f ./docker/BaseDockerfile -t student_mechanism_design_base .
docker build --progress plain -f ./docker/Dockerfile -t student_mechanism_design .
# Run training experiment
docker run --rm --gpus=all --mount type=bind,src=$PWD,dst=/app student_mechanism_design all
# Run unit tests
docker run --rm --mount type=bind,src=$PWD,dst=/app student_mechanism_design --unit_test
# Run ablation studies
docker run --rm --mount type=bind,src=$PWD,dst=/app student_mechanism_design python src/eval/run_ablations.py --ablation allpip install -r requirements.txt
cd src
python main.py all --agent_configs=mappo --log_configs=verboseThis project implements a mechanism design approach for the Scotland Yard pursuit-evasion game using multi-agent reinforcement learning. Key features include:
- Partial Observability: MrX is hidden from police with configurable reveal schedules
- Belief Tracking: Particle filter and learned belief encoders for police
- Mechanism Design: Configurable tolls, budgets, and reveal policies
- Meta-Learning: Automatic tuning of mechanism parameters toward 50% win rate
- Population-Based Self-Play: Policy pools with ELO-style scoring
- MAPPO & GNN Agents: State-of-the-art multi-agent RL algorithms
| Experiment | Agents | Graph Size | Budget | Reveal | Description |
|---|---|---|---|---|---|
smoke_train |
2 | 15 nodes | 10 | R=5 | Quick sanity check |
singular |
2-3 | 15 nodes | 8-12 | R=5 | Single config training |
all |
2-6 | 15-20 nodes | 4-18 | R=5 | Full sweep |
big_graph |
3-4 | 25+ nodes | 10-15 | R=5 | Large graph evaluation |
test |
2 | 12 nodes | 10 | R=5 | Development testing |
# Run specific experiment
docker run --rm --gpus=all --mount type=bind,src=$PWD,dst=/app student_mechanism_design <experiment_name>
# Examples:
docker run ... student_mechanism_design smoke_train
docker run ... student_mechanism_design all
docker run ... student_mechanism_design big_graphEach agent receives:
| Field | Type | Description |
|---|---|---|
adjacency_matrix |
NxN float | Binary graph connectivity |
node_features |
NxK float | Agent positions encoded as one-hot |
edge_index |
2xE int | Edge list for GNN |
edge_features |
E float | Edge weights/costs |
action_mask |
N bool | Valid actions (fixed indexβnode mapping) |
valid_actions |
list[int] | Affordable neighbor nodes |
belief_map |
N float | MrX location distribution (Police only) |
agent_position |
int | Current node |
agent_budget |
float | Remaining money |
- Type:
Discrete(N)where N = number of nodes - Masking: Actions masked by budget and topology
- Mapping: Fixed identity mapping (action i β node i)
# Fixed indexβnode mapping ensures consistency
mask[node] = True if (adjacent[current, node] and cost <= budget)
index_to_node = {i: i for i in range(num_nodes)} # Identity mapping| Parameter | Config Key | Default | Description |
|---|---|---|---|
| Police Budget | police_budget |
10 | Initial money for police |
| Reveal Interval | reveal_interval |
5 | Steps between MrX reveals |
| Reveal Probability | reveal_probability |
0.0 | Stochastic reveal chance |
| Toll | tolls |
0.0 | Per-edge movement cost |
| Ticket Price | ticket_price |
1.0 | Base movement cost |
| Target Win Rate | target_win_rate |
0.5 | Meta-learning objective |
We report the following three metrics as required by the assignment:
Definition: Fraction of episodes won by MrX
Win Rate = MrX Wins / Total Episodes
Target: 0.50 Β± 0.05
Implementation: src/eval/metrics.py::compute_win_rate()
Why this metric: Measures game balanceβthe primary goal of mechanism design. A win rate of 50% indicates fair gameplay where neither side has a systematic advantage.
Definition: Cross-entropy between police belief distribution and true MrX position at reveal times.
CE = -log(belief[true_mrx_position])
Lower is better (more accurate belief)
Implementation: src/eval/metrics.py::belief_cross_entropy()
Why this metric: Measures how well police can track MrX under partial observability. Lower cross-entropy means the belief distribution assigns higher probability to MrX's true location.
Definition: Average episode length, split by winner.
- Time-to-Catch: Mean steps when Police wins
- Survival Time: Mean steps when MrX wins
Implementation: src/eval/metrics.py::compute_time_metrics()
Why this metric: Captures game dynamicsβshorter catch times indicate effective police coordination, while longer survival times indicate successful evasion strategies.
Config: src/configs/ablation/belief.yaml
Compares belief tracking methods under partial observability:
| Variant | Reveal | Belief Method | Expected Effect |
|---|---|---|---|
no_belief |
R=0 | None | Police severely disadvantaged |
particle_filter |
R=5 | Particle Filter | Baseline tracking |
learned_encoder |
R=5 | Neural Encoder | Potentially better generalization |
Run:
python src/eval/run_ablations.py --ablation belief --num_episodes 100 --seeds 42 123 456Expected Results:
no_belief: MrX win rate ~70-80% (Police cannot track)particle_filter: MrX win rate ~50-55% (Baseline)learned_encoder: MrX win rate ~45-55% (Comparable or better)
Config: src/configs/ablation/mechanism.yaml
Compares mechanism configurations:
| Variant | Tolls | Budget | Reveal | Expected Win Rate |
|---|---|---|---|---|
no_mechanism |
0 | β | R=0 | ~70% MrX (unbalanced) |
fixed_mechanism |
1.0 | 15 | R=5 | ~45% MrX (hand-tuned) |
meta_learned |
learned | learned | learned | ~50% MrX (target) |
Run:
python src/eval/run_ablations.py --ablation mechanism --num_episodes 100 --seeds 42 123 456Expected Results:
no_mechanism: Demonstrates need for mechanism designfixed_mechanism: Shows improvement over baselinemeta_learned: Achieves target balance through optimization
python src/eval/run_ablations.py --ablation all --num_episodes 100 --output_dir logs/ablationsResults are saved to logs/ablations/:
belief_results.json: Raw metrics databelief_report.txt: Formatted comparison reportmechanism_results.json: Raw metrics datamechanism_report.txt: Formatted comparison report
-
Belief Collapse: Particle filter can collapse to incorrect modes when reveals are sparse (R > 10)
- Mitigation: Noise injection, increased particle count, or use learned encoder
-
Budget Exhaustion: Police may run out of budget before catching MrX on large graphs
- Mitigation: Meta-learning adjusts budget based on observed win rate
-
Graph Topology Sensitivity: Performance varies significantly with graph structure (degree distribution, diameter)
- Mitigation: Curriculum learning over diverse graph distributions
-
Action Mask Edge Cases: When no moves are affordable, agent stays in place
- Handled: Environment returns current position as default action
-
Reward Hacking: Agents may exploit reward shaping rather than achieving true objectives
- Mitigation: Use terminal rewards primarily, validate with win rate metric
# Enable verbose logging
docker run ... student_mechanism_design all --log_configs=verbose
# Visualize episodes (generates GIFs)
docker run ... student_mechanism_design smoke_train --vis_configs=full
# Run unit tests to verify components
docker run ... student_mechanism_design --unit_test
# Check specific test
pytest test/test_action_mask.py -vsrc/
βββ main.py # Training entry point
βββ logger.py # Logging utilities (WandB, TensorBoard)
βββ reward_net.py # RewardWeightNet for meta-learning
βββ configs/
β βββ ablation/
β β βββ belief.yaml # Belief ablation variants
β β βββ mechanism.yaml # Mechanism ablation variants
β βββ agent/ # Agent configurations
β βββ mechanism/default.yaml # Mechanism parameters
β βββ ...
βββ Enviroment/
β βββ yard.py # Main environment (CustomEnvironment)
β βββ action_mask.py # Action masking with fixed indexβnode mapping
β βββ belief_module.py # ParticleBeliefTracker, LearnedBeliefEncoder
β βββ partial_obs.py # PartialObservationWrapper
β βββ graph_generator.py # GraphGenerator with seed saving
β βββ graph_layout.py # ConnectedGraph sampling
βββ RLAgent/
β βββ mappo_agent.py # MAPPO implementation
β βββ gnn_agent.py # GNN-based DQN agent
β βββ random_agent.py # Random baseline
β βββ base_agent.py # Abstract base class
βββ selfplay/
β βββ population_manager.py # Population-based training with ELO
β βββ opponent_modeling.py # Opponent behavior modeling
β βββ best_response.py # Best response utilities
βββ mechanism/
β βββ mechanism_config.py # MechanismConfig dataclass
β βββ meta_learning_loop.py # MetaLearner for mechanism optimization
β βββ reward_weight_integration.py
βββ eval/
β βββ metrics.py # Core metrics (win rate, belief CE, time)
β βββ run_ablations.py # Ablation study runner
β βββ ood_eval.py # OOD & robustness evaluation
β βββ belief_quality.py # Belief cross-entropy
β βββ exploitability.py # Exploitability proxy
βββ experiments/
β βββ all/config.yml
β βββ smoke_train/config.yml
β βββ singular/config.yml
β βββ ...
βββ artifacts/ # Saved model checkpoints
test/
βββ test_action_mask.py # Action mask unit tests
βββ test_belief_update.py # Belief tracking tests
βββ env_test.py # Environment smoke tests
βββ smoke_test.py # Basic sanity check
All parameters are configurable via YAML:
# src/configs/mechanism/default.yaml
police_budget: 10
reveal_interval: 5
reveal_probability: 0.0
ticket_price: 1.0
target_win_rate: 0.5
secondary_weight: 0.1# src/experiments/all/config.yml
agent_configurations:
- num_police_agents: 2
agent_money: 10
- num_police_agents: 3
agent_money: 8
# ...
num_episodes: 70
epochs: 200
random_seed: 42Set credentials in src/wandb_data.json:
{
"wandb_api_key": "<your-api-key>",
"wandb_project": "scotland-yard",
"wandb_entity": "<your-entity>"
}Leave as "null" to disable WandB logging.
# Run all tests
pytest test/
# Run specific tests
pytest test/test_action_mask.py -v
pytest test/test_belief_update.py -v
pytest test/env_test.py -v| Test File | Description | Key Assertions |
|---|---|---|
test_action_mask.py |
Action mask correctness | Fixed indexβnode mapping, budget constraints |
test_belief_update.py |
Belief tracking | Distribution normalization, reveal collapse |
env_test.py |
Environment smoke test | Reset/step don't throw exceptions |
- β
Action mask correctness:
test_action_mask.py::test_action_mask_fixed_index_node_mapping - β
Belief update step:
test_belief_update.py::test_belief_updates_and_reveals
- MAPPO: The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
- Scotland Yard Board Game
- PettingZoo Documentation
- TorchRL Documentation
- Mechanism Design Theory
This project is licensed under CC BY-NC-ND 4.0. See the LICENSE file for details.