A library for applying reinforcement learning to inspection and maintenance planning of deteriorating engineering systems. This library was primarily developed as a pedogogic excercise and for research use.
Example rollout of a DDQN agent in a 5-out-of-5 system:
conda create --name imprl_env -y python==3.9
conda activate imprl_envpip install poetry==1.8 # or conda install -c conda-forge poetry==1.8
poetry installFollowing best practices, poetry install installs the dependencies from the poetry.lock file. This file rigorously specifies all the dependencies required to build the library. It ensures that the project does not break because of unexpected changes in (transitive) dependencies (more info).
Installing additional packages
You can them add via poetry add (official docs) in the command line.
For example, to install Jupyter notebook,
# Allow >=7.1.2, <8.0.0 versions
poetry add notebook@^7.1.2This will resolve the package dependencies (and adjust versions of transitive dependencies if necessary) and install the package. If the package dependency cannot be resolved, try to relax the package version and try again.
For logging, the library relies on wandb. You can log into wandb using your private API key,
wandb login
# <enter wandb API key>The following (multi-agent) reinforcement algorithms are implemented,
- Double Deep Q-Network (DDQN)
- Joint Actor Critic (JAC)
- Deep Centralized Multi-agent Actor Critic (DCMAC)
- Deep Decentralized Multi-agent Actor Critic (DDMAC)
- current implementation does not support constraints in the objective function
- Independent Actor Centralized Critic (IACC)
- also referred to as DDMAC-CTDE in literature
- Independent Actor Centralized Critic with Paramater Sharing (IACC-PS)
- Independent Actor Critic (IAC)
- Independent Actor Critic with Paramater Sharing (IAC-PS)
The base actor-critic algorithm: ACER from SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY by Wang et al., an off-policy algorithm that uses weighted sampling for experience replay.
| Paradigm | Mathematical Framework |
Algorithm | Observation | Action | Critic | Actor |
| CTCE | POMDP | JAC | Joint | Joint | Centralized | Shared |
| MPOMDP | DCMAC | Factored | Shared | |||
| DDMAC | Factored | Independent | ||||
| CTDE | Dec-POMDP | IACC (DDMAC-CTDE) | Independent | Independent | Centralized | Independent |
| IACC-PS (DDMAC-CTDE-PS) | Independent | Shared | ||||
| DTDE | IAC | Independent | Independent | Decentralized | Independent | |
| IAC-PS | Independent | Shared |
This project utilizes the clever abstractions in EPyMARL and the author would like to acknowledge the insights shared in Reinforcement Learning Implementation Tips and Tricks for developing this library.
IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARL
- Benchmarking scalability of cooperative MARL methods in real-world infrastructure management planning problems.
- Environments: (Correlated and uncorrelated) k-out-of-n systems and offshore wind structural systems.
- RL solvers: Provides wrappers for interfacing with several (MA)RL libraries such as EPyMARL, Rllib, MARLlib etc.
IMP-act: Benchmarking MARL for Infrastructure Management Planning at Scale with JAX
- Large-scale road networks with up to 178 agents implemented in JAX for scalability.
- IMP-act-JaxMARL interfaces IMP-act with multi-agent solvers in JaxMARL.
- We also provide NumPy-based environments for compatibility with PyTorch in IMP-act-epymarl.
- Infrastructure management is modeled as a constrained multi-agent POMDP, capturing uncertainty, limited budgets, and interdependent maintenance decisions.
- It employs deep decentralized multi-agent actor-critic (DDMAC) framework with centralized training and decentralized execution (CTDE) for scalable, coordinated decision-making.
- Performance is demonstrated on a real transportation network with 96 components (11 bridges and 85 highway sections), showing significant improvements over traditional methods.