Malte Mosbach*, Jan Niklas Ewertz*, Angel Villar-Corrales, Sven Behnke
Slot-Attention for Object-centric Latent Dynamics (SOLD) is a model-based reinforcement learning algorithm operating on a structured latent representation in its world model.
Start by installing the multi-object-fetch environment suite. Then add the SOLD dependencies to the conda environment by running:
conda env update -n mof -f apptainer/environment.ymlAlternatively, we provide an Apptainer build file to simplify installation.
To build the .sif image, run:
cd apptainer && apptainer build sold.sif multi_object_fetch.defTo start a training run inside the container:
apptainer run --nv ../sold.sif python train_sold.pyNote
If you're on a SLURM cluster, you can submit training jobs using this container with the provided run script sbatch slurm.sh train_sold.py.
The training routine consists of two stages: pre-training a SAVi model and training a SOLD model on top of it.
The SAVi models (or autoencoders generally) are pre-trained on static datasets of random trajectories. Such datasets can be generated using the following script:
python generate_dataset.py experiment=my_dataset env.name=ReachRed_0to4Distractors_Dense-v1To train a SAVi model, specify the dataset to be trained on and model parameters such as the number of slots in train_autoencoder.yaml and run:
python train_autoencoder.py experiment=my_savi_modelShow sample pre-training results
Good SAVi models should learn to split the scene into meaningful objects and keep slots assigned to the same object over time. Examples of SAVi models pre-trained for a reaching and picking task are shown below.To train SOLD, a checkpoint path to the pre-trained SAVi model is required, which can be specified in the train_sold.yaml configuration file.
Then, to start the training, run:
python train_sold.pyAll results are stored in the experiments directory.
Show sample training outputs
When training a SOLD model, you can check different visualisations to monitor the training progress. The dynamics_prediction plot highlights the differences between the ground truth and the predicted future states, and shows the forward prediction of each slot. In addition, visualisations of actor_attention or reward_predictor_attention, as shown below, can be used to understand what the model is paying attention to when predicting the current reward, i.e. which elements of the scene the model considers to be reward-predictive.For further evaluation of a trained model or a set of models in a directory, you can run:
python evaluate_sold.py checkpoint_path=PATH_TO_CHECKPOINT(S)which will log performance metrics and visualizations for the given checkpoints.
Pre-trained SAVi and SOLD models are available in the checkpoints directory.
The SAVi checkpoints can be used to begin training SOLD models right away.
Each checkpoint also includes corresponding TensorBoard logs, allowing you to visualize the expected training dynamics:
tensorboard --logdir checkpointsIf you find this work useful, please consider citing our paper as follows:
@inproceedings{sold2025mosbach,
title={SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels},
author={Malte Mosbach and Jan Niklas Ewertz and Angel Villar-Corrales and Sven Behnke},
booktitle={International Conference on Machine Learning (ICML)},
year={2025}
}