o3p is a JAX-based library for offline and online off-policy reinforcement learning.
It is currently in BETA VERSION.
Install from source
git clone https://github.com/perrin-isir/o3p.git
We recommand to create a python environment with micromamba, but any python package manager can be used instead.
cd o3p
micromamba create --name o3penv --file environment.yaml
micromamba activate o3penv
pip install -e .
-
About JAX: JAX is in the dependencies, so the procedure will install it on your system. However, if you encounter specific issues with JAX (e.g. it runs on your CPU instead of your GPU), we recommend to install it separately, following instructions at: https://docs.jax.dev/en/latest/installation.html#installation.
-
About TFP: Currently, the latest stable version of TFP is not compatible with the latest version of JAX. Therefore, you should upgrade to a nightly build with this command (within your new o3penv environment):
pip install --upgrade --user tf-nightly tfp-nightly
How to use it
To test offline RL, run:
python test/offline_rl.py
TODO
To test online RL, run:
python test/online_rl.py
Design choices
TODO