Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Official implementation of the ICML 2025 paper "SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels"

Notifications You must be signed in to change notification settings

maltemosbach/sold

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SOLD: Slot-Attention for Object-centric Latent Dynamics

AIS, University of Bonn

Malte Mosbach*, Jan Niklas Ewertz*, Angel Villar-Corrales, Sven Behnke

[Paper]   [Website]   [BibTeX]

Slot-Attention for Object-centric Latent Dynamics (SOLD) is a model-based reinforcement learning algorithm operating on a structured latent representation in its world model.

SOLD Overview

Installation

Conda

Start by installing the multi-object-fetch environment suite. Then add the SOLD dependencies to the conda environment by running:

conda env update -n mof -f apptainer/environment.yml

Apptainer

Alternatively, we provide an Apptainer build file to simplify installation. To build the .sif image, run:

cd apptainer && apptainer build sold.sif multi_object_fetch.def

To start a training run inside the container:

apptainer run --nv ../sold.sif python train_sold.py

Note

If you're on a SLURM cluster, you can submit training jobs using this container with the provided run script sbatch slurm.sh train_sold.py.

Training

The training routine consists of two stages: pre-training a SAVi model and training a SOLD model on top of it.

Pre-training a SAVi model

The SAVi models (or autoencoders generally) are pre-trained on static datasets of random trajectories. Such datasets can be generated using the following script:

python generate_dataset.py experiment=my_dataset env.name=ReachRed_0to4Distractors_Dense-v1

To train a SAVi model, specify the dataset to be trained on and model parameters such as the number of slots in train_autoencoder.yaml and run:

python train_autoencoder.py experiment=my_savi_model
Show sample pre-training results Good SAVi models should learn to split the scene into meaningful objects and keep slots assigned to the same object over time. Examples of SAVi models pre-trained for a reaching and picking task are shown below.

Training a SOLD model

To train SOLD, a checkpoint path to the pre-trained SAVi model is required, which can be specified in the train_sold.yaml configuration file. Then, to start the training, run:

python train_sold.py

All results are stored in the experiments directory.

Show sample training outputs When training a SOLD model, you can check different visualisations to monitor the training progress. The dynamics_prediction plot highlights the differences between the ground truth and the predicted future states, and shows the forward prediction of each slot.

In addition, visualisations of actor_attention or reward_predictor_attention, as shown below, can be used to understand what the model is paying attention to when predicting the current reward, i.e. which elements of the scene the model considers to be reward-predictive.

For further evaluation of a trained model or a set of models in a directory, you can run:

python evaluate_sold.py checkpoint_path=PATH_TO_CHECKPOINT(S)

which will log performance metrics and visualizations for the given checkpoints.

Checkpoints

Pre-trained SAVi and SOLD models are available in the checkpoints directory. The SAVi checkpoints can be used to begin training SOLD models right away. Each checkpoint also includes corresponding TensorBoard logs, allowing you to visualize the expected training dynamics:

tensorboard --logdir checkpoints

Citation

If you find this work useful, please consider citing our paper as follows:

@inproceedings{sold2025mosbach,
  title={SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels},
  author={Malte Mosbach and Jan Niklas Ewertz and Angel Villar-Corrales and Sven Behnke},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2025}
}

About

Official implementation of the ICML 2025 paper "SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels"

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •