Thanks to visit codestin.com
Credit goes to github.com

Skip to content

sergiogcharles/fadee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fadee

FADEE: Faster Adaptation for Decoupled Exploration and Exploitation

This is the code implementation of our group research project for CS330: Deep Multi-Task and Meta Learning.

Abstract: Motivated by recent advances in decoupled meta reinforcement learning, we sought to modify and expand upon DREAM [Liu, 2020]. We first developed extensions of the MapGrid environment that required significantly more involved optimal exploration strategies. By benchmarking vanilla DREAM we found that a recurrent policy was necessary for task-specific exploration. This was confirmed by a number of t-distributed stochastic neighbour embeddings of final exploration policy cell states that indicated strong clusters determined by task ID. This led us to conclude that the cell state of the exploration policy had to encode a belief of the task at hand. While experiments to directly use final cell state as a trajectory embedding failed to converge, a new exploration algorithm based on the task adaptation in PEARL [Rakelly, 2019] was found to achieve optimal rewards in around as much or fewer timesteps as the DREAM based implementation. This method's latent variable was designed to explicitly reflect a belief of the task at hand, which was enough context to provide for an exploration policy that had no hidden state. To improve the robustness of the trajectory encoder, we developed a contrastive loss based on SimCLR [Chen, 2020], finding that it has a positive effect on exploration reward. The contrastive loss we used included a number of augmentations, such as a random starting position in the grid world environment and a randomly sampled hidden state to encode various amounts of initial prior information. Next, we developed methods for semi-supervised meta-reinforcement learning where we only have a limited number of supervised problem IDs and a large number of unsupervised problem IDs. Thus, we effectively decrease the number of fully observed problem IDs needed to do full end-to-end training. By leveraging the SimCLR approach technique, we used only a small number of supervised problem IDs and a large number of unsupervised problem IDs to train the exploration and exploitation policies. We developed an additional k-means clustering technique to infer unsupervised problem IDs, which motivated a MAML-based clustering method to adapt quickly in various unsupervised task settings. This avenue of research yielded some positive observations that we hope to work on in future work.

About

FADEE: Faster Adaptation for Decoupled Exploration and Exploitation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published