<img src="https://github.com/asokraju/Adv-MARL/blob/master/results/grid_both_numbered.png" width="400" align="left"> We can see that all agents learn a near-optimal policy when there is no adversary in the network. The adversary indeed has a negative impact on the network, i.e., it learns a near-optimal policy but the remaining agents in the network perform poorly compared to the adversary-free scenario. Final states of a simulation of the grid-world after training for 200 episodes are shown in figure below. Final network states in a simulation with no adversary (left) and with an adversary (right) after training for 200 episodes. Blue, yellow, and brown color correspond to the cooperative agents', adversary's, and desired positions, respectively. All agents reach their desired positions when the network is adversary-free, whereas only the adversary finds its desired position when it attacks the network.
0 commit comments