This repository is no longer maintained. Please use our new Softlearning package instead.
Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains. The algorithm is based on the paper Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor presented at ICML 2018.
The repository now ships with a self-contained PyTorch implementation (torch_sac) that targets Python 3.10 and runs on Windows and Linux alike. It removes the old rllab and TensorFlow dependencies and was tested with the MuJoCo-based Walker2d benchmark.
-
Ensure you have a working MuJoCo installation (Gymnasium's
mujocoextra installs the precompiled binaries) and Microsoft C++ Redistributable if it is not already available on your machine. -
Create and activate a Python 3.10 virtual environment:
py -3.10 -m venv .venv .venv\Scripts\activate python -m pip install --upgrade pip
-
Install the lightweight dependency set:
pip install -r requirements-windows.txt
-
Launch training (Walker2d by default):
python examples/torch_train.py --env-id Walker2d-v4 --total-steps 1000000
Add
--device cudaif you have a CUDA-capable GPU with a matching PyTorch build.
Training artefacts are written under runs/torch_sac/<env>_<timestamp>_seed<seed> and include progress.csv, periodic checkpoints, and the final policy weights. Command line flags mirror the fields in torch_sac.TrainConfig (batch size, entropy tuning, evaluation cadence, etc.) so experiments can be scripted without editing the codebase. You can also use the package programmatically:
from torch_sac import TrainConfig, train
config = TrainConfig(env_id="Walker2d-v4", total_steps=200_000, log_dir="runs/walker")
run_dir = train(config)
print(f"results stored in {run_dir}")The legacy TensorFlow implementation and documentation remain below for archival purposes.
This implementation uses Tensorflow. For a PyTorch implementation of soft actor-critic, take a look at rlkit by Vitchyr Pong.
See the DIAYN documentation for using SAC for learning diverse skills.
Soft Actor-Critic can be run either locally or through Docker.
You will need to have Docker and Docker Compose installed unless you want to run the environment locally.
Most of the models require a Mujoco license.
If you want to run the Mujoco environments, the docker environment needs to know where to find your Mujoco license key (mjkey.txt). You can either copy your key into <PATH_TO_THIS_REPOSITY>/.mujoco/mjkey.txt, or you can specify the path to the key in your environment variables:
export MUJOCO_LICENSE_PATH=<path_to_mujoco>/mjkey.txt
Once that's done, you can run the Docker container with
docker-compose up
Docker compose creates a Docker container named soft-actor-critic and automatically sets the needed environment variables and volumes.
You can access the container with the typical Docker exec-command, i.e.
docker exec -it soft-actor-critic bash
See examples section for examples of how to train and simulate the agents.
To clean up the setup:
docker-compose down
To get the environment installed correctly, you will first need to clone rllab, and have its path added to your PYTHONPATH environment variable.
- Clone rllab
cd <installation_path_of_your_choice>
git clone https://github.com/rll/rllab.git
cd rllab
git checkout b3a28992eca103cab3cb58363dd7a4bb07f250a0
export PYTHONPATH=$(pwd):${PYTHONPATH}
- Download and copy mujoco files to rllab path:
If you're running on OSX, download https://www.roboti.us/download/mjpro131_osx.zip instead, and copy the
.dylibfiles instead of.sofiles.
mkdir -p /tmp/mujoco_tmp && cd /tmp/mujoco_tmp
wget -P . https://www.roboti.us/download/mjpro131_linux.zip
unzip mjpro131_linux.zip
mkdir <installation_path_of_your_choice>/rllab/vendor/mujoco
cp ./mjpro131/bin/libmujoco131.so <installation_path_of_your_choice>/rllab/vendor/mujoco
cp ./mjpro131/bin/libglfw.so.3 <installation_path_of_your_choice>/rllab/vendor/mujoco
cd ..
rm -rf /tmp/mujoco_tmp
- Copy your Mujoco license key (mjkey.txt) to rllab path:
cp <mujoco_key_folder>/mjkey.txt <installation_path_of_your_choice>/rllab/vendor/mujoco
- Clone sac
cd <installation_path_of_your_choice>
git clone https://github.com/haarnoja/sac.git
cd sac
- Create and activate conda environment
cd sac
conda env create -f environment.yml
source activate sac
The environment should be ready to run. See examples section for examples of how to train and simulate the agents.
Finally, to deactivate and remove the conda environment:
source deactivate
conda remove --name sac --all
- To train the agent
python ./examples/mujoco_all_sac.py --env=swimmer --log_dir="/root/sac/data/swimmer-experiment"
- To simulate the agent (NOTE: This step currently fails with the Docker installation, due to missing display.)
python ./scripts/sim_policy.py /root/sac/data/swimmer-experiment/itr_<iteration>.pkl
mujoco_all_sac.py contains several different environments and there are more example scripts available in the /examples folder. For more information about the agents and configurations, run the scripts with --help flag. For example:
python ./examples/mujoco_all_sac.py --help
usage: mujoco_all_sac.py [-h]
[--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
[--exp_name EXP_NAME] [--mode MODE]
[--log_dir LOG_DIR]
mujoco_all_sac.py contains several different environments and there are more example scripts available in the /examples folder. For more information about the agents and configurations, run the scripts with --help flag. For example:
python ./examples/mujoco_all_sac.py --help
usage: mujoco_all_sac.py [-h]
[--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
[--exp_name EXP_NAME] [--mode MODE]
[--log_dir LOG_DIR]
Benchmark results for some of the OpenAI Gym v2 environments can be found here.
The soft actor-critic algorithm was developed by Tuomas Haarnoja under the supervision of Prof. Sergey Levine and Prof. Pieter Abbeel at UC Berkeley. Special thanks to Vitchyr Pong, who wrote some parts of the code, and Kristian Hartikainen who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by Berkeley Deep Drive.
@article{haarnoja2017soft,
title={Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor},
author={Haarnoja, Tuomas and Zhou, Aurick and Abbeel, Pieter and Levine, Sergey},
booktitle={Deep Reinforcement Learning Symposium},
year={2017}
}