Update README.md

asokraju · web-flow · commit d28f67218bcc · 2020-11-29T19:23:04.000-05:00
diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@ True cumulative rewards per episode obtained by each agent in the network. Blue
 
 <img src="https://github.com/asokraju/Adv-MARL/blob/master/results/grid_both_numbered.png" width="400" align="left"> We can see that all agents learn a near-optimal policy when there is no adversary in the network. The adversary indeed has a negative impact on the network, i.e., it learns a near-optimal policy but the remaining agents in the network perform poorly compared to the adversary-free scenario. Final states of a simulation of the grid-world after training for 200 episodes are shown in figure below. Final network states in a simulation with no adversary (left) and with an adversary (right) after training for 200 episodes. Blue, yellow, and brown color correspond to the cooperative agents', adversary's, and desired positions, respectively. All agents reach their desired positions when the network is adversary-free, whereas only the adversary finds its desired position when it attacks the network.
 
-In Fig.~\ref{fig:average_returns}, we compare the true cumulative team-average returns and cumulative estimated rewards obtained by the network in each episode. The estimated reward function converged in both scenarios but the convergence rate was slower in the presence of the adversary.
+Next, we compare the true cumulative team-average returns and cumulative estimated rewards obtained by the network in each episode. The estimated reward function converged in both scenarios but the convergence rate was slower in the presence of the adversary.
 <img src="https://github.com/asokraju/Adv-MARL/blob/master/results/team_reward-1.png" width="400" align="right">Cumulative team-average rewards per episode for the adversary-free and attacked network. The comparison shows that the former performs significantly better.
 
 ## References