You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cleaning up repo structure: Before we used rl-tools/rl-tools as a monorepo to version everything in the RLtools universe together. This is not great if someone just wants the header-only library (aka just the ./include), one git clone --recursive or git submodule update --init --recursive could trigger gigabytes of downloads. Also, jumping around the history is cumbersome with all the submodules (e.g. for bisecting etc.). Hence, we moved the versioning of adjacent projects to rl-tools/mono and rl-tools/rl-tools is now the submodule-free, lightweight core (~7mb download for full history).
Memory: The main work between v2.0 and v2.1 has been on maturing the memory implementation (aka using RNNs in off-policy algorithms). For more information see the RNN and Memory chapters in the documentation
Flag Environment: We introduce a basic environment to test the recurrent RL algorithms, where the position of two flags is revealed in the initial step and the policy has to memorize them to visit the positions in order. The second position is required because with only one position the agent could cheat by just accelerating into the right direction and hence storing the direction in the state instead of memorizing it internally. You can see the Flag environment and a baseline policy at https://zoo.rl.tools
Adding inference utils: We have added some common inference utils in include/rl_tools/inference that e.g. can expose a pure C interface (e.g. for microcontroller integrations) and more.
Full CUDA training: We have revived the full on-GPU training it is tracked here. It supports full CUDA graph capture which means 1 loop step = 1 graph execution
L2F: The L2F simulator has been modularized and structured better