-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Hello,
I have a question regarding the learning in the "dream environment", i.e. training the controller (C) and the RNN (M), while excluding the VAE (V), by feeding M with its own predictions z_{t+1}:
When learning inside the latent space of M, is it possible that there could be an "encoding-shift" of the latent encoding z_t of the game state, such that through imprecise prediciton of z_{t+1} the original "encoding-scheme" of z_t defined by V is not upheld anymore and M effectively uses a similar but different coding scheme for its internal representation/encoding of the game state z_t (resp. the visual state of the game)?
That way learnt features in M and C won't be transferable from the dream environment to the real game, because M expects the new encoding scheme as input z_t and resp. the hidden information h_t based on a past of events encoded in the new scheme. Thus, C is also adapted to z_t and h_t of this new scheme and won't fare well with the z_t coming from V and the resulting h_t, once we try to transfer our learned strategies of C and the prediction capabilites of M back into the real world (observing the actual game through V's encoding).
I hope the issue is somewhat clear.
Thank you,
M. Baumann