Convergence of Q-network Hack

Hi,

The paper says that the graph changes quickly that makes Q network difficult to converge. Thus, we keep C unchanged in two successive timesteps when computing the Q-loss in training to ease this learning difficulty.

Does anyone know where exactly in the code is this done?