Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit d28f672

Browse files
authored
Update README.md
1 parent 5ae8cd1 commit d28f672

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ True cumulative rewards per episode obtained by each agent in the network. Blue
1717

1818
<img src="https://github.com/asokraju/Adv-MARL/blob/master/results/grid_both_numbered.png" width="400" align="left"> We can see that all agents learn a near-optimal policy when there is no adversary in the network. The adversary indeed has a negative impact on the network, i.e., it learns a near-optimal policy but the remaining agents in the network perform poorly compared to the adversary-free scenario. Final states of a simulation of the grid-world after training for 200 episodes are shown in figure below. Final network states in a simulation with no adversary (left) and with an adversary (right) after training for 200 episodes. Blue, yellow, and brown color correspond to the cooperative agents', adversary's, and desired positions, respectively. All agents reach their desired positions when the network is adversary-free, whereas only the adversary finds its desired position when it attacks the network.
1919

20-
In Fig.~\ref{fig:average_returns}, we compare the true cumulative team-average returns and cumulative estimated rewards obtained by the network in each episode. The estimated reward function converged in both scenarios but the convergence rate was slower in the presence of the adversary.
20+
Next, we compare the true cumulative team-average returns and cumulative estimated rewards obtained by the network in each episode. The estimated reward function converged in both scenarios but the convergence rate was slower in the presence of the adversary.
2121
<img src="https://github.com/asokraju/Adv-MARL/blob/master/results/team_reward-1.png" width="400" align="right">Cumulative team-average rewards per episode for the adversary-free and attacked network. The comparison shows that the former performs significantly better.
2222

2323
## References

0 commit comments

Comments
 (0)