fadee

FADEE: Faster Adaptation for Decoupled Exploration and Exploitation

This is the code implementation of our group research project for CS330: Deep Multi-Task and Meta Learning.

Abstract: Motivated by recent advances in decoupled meta reinforcement learning, we sought to modify and expand upon DREAM [Liu, 2020]. We first developed extensions of the MapGrid environment that required significantly more involved optimal exploration strategies. By benchmarking vanilla DREAM we found that a recurrent policy was necessary for task-specific exploration. This was confirmed by a number of t-distributed stochastic neighbour embeddings of final exploration policy cell states that indicated strong clusters determined by task ID. This led us to conclude that the cell state of the exploration policy had to encode a belief of the task at hand. While experiments to directly use final cell state as a trajectory embedding failed to converge, a new exploration algorithm based on the task adaptation in PEARL [Rakelly, 2019] was found to achieve optimal rewards in around as much or fewer timesteps as the DREAM based implementation. This method's latent variable was designed to explicitly reflect a belief of the task at hand, which was enough context to provide for an exploration policy that had no hidden state. To improve the robustness of the trajectory encoder, we developed a contrastive loss based on SimCLR [Chen, 2020], finding that it has a positive effect on exploration reward. The contrastive loss we used included a number of augmentations, such as a random starting position in the grid world environment and a randomly sampled hidden state to encode various amounts of initial prior information. Next, we developed methods for semi-supervised meta-reinforcement learning where we only have a limited number of supervised problem IDs and a large number of unsupervised problem IDs. Thus, we effectively decrease the number of fully observed problem IDs needed to do full end-to-end training. By leveraging the SimCLR approach technique, we used only a small number of supervised problem IDs and a large number of unsupervised problem IDs to train the exploration and exploitation policies. We developed an additional k-means clustering technique to infer unsupervised problem IDs, which motivated a MAML-based clustering method to adapt quickly in various unsupervised task settings. This avenue of research yielded some positive observations that we hope to work on in future work.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Pupper		Pupper
__pycache__		__pycache__
dream-master		dream-master
.DS_Store		.DS_Store
CS330_Final_Project_Report.pdf		CS330_Final_Project_Report.pdf
README.md		README.md
reptile.py		reptile.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

fadee

About

Uh oh!

Releases

Packages

Languages

sergiogcharles/fadee

Folders and files

Latest commit

History

Repository files navigation

fadee

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages