for some certain reasons, need to modify the loss function of the value network in PPO, but the corresponding code cannot be found.