Robotic manipulation remains a core challenge in robotics, particularly for contact-rich tasks such as industrial assembly and disassembly. Existing datasets have significantly advanced learning in manipulation but are primarily focused on simpler tasks like object rearrangement, falling short of capturing the complexity and physical dynamics involved in assembly and disassembly. To bridge this gap, we present REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt), a new dataset designed specifically for contact-rich manipulation tasks. Built around the NIST Assembly Task Board 1 benchmark, REASSEMBLE includes four actions (pick, insert, remove, and place) involving 17 objects. The dataset contains 4,551 demonstrations, of which 4,035 were successful, spanning a total of 781 minutes. Our dataset features multi-modal sensor data including event cameras, force-torque sensors, microphones, and multi-view RGB cameras. This diverse dataset supports research in areas such as learning contact-rich manipulation, task condition identification, action segmentation, and more. We believe REASSEMBLE will be a valuable resource for advancing robotic manipulation in complex, real-world scenarios.
- Multimodality: REASSEMBLE contains data from robot proprioception, RGB cameras, Force&Torque sensors, microphones, and event cameras
- Multitask labels: REASSEMBLE contains labeling which enables research in Temporal Action Segmentation, Motion Policy Learning, Anomaly detection, and Task Inversion.
- Long horizon: Demonstrations in the REASSEMBLE dataset cover long horizon tasks and actions which usually span multiple steps.
- Hierarchical labels: REASSEMBLE contains actions segmentation labels at two hierarchical levels.
List of all the prerequisites required to use the project:
Python 3.10+
conda (recommended)
Step-by-step guide on how to install the project for utilizing the dataset.
# Clone the repository
git clone https://github.com/TUWIEN-ASL/REASSEMBLE.git
# Navigate to the project directory
cd REASSEMBLE
# Create and activate a conda environment (optional but recommended)
conda create -n REASSEMBLE python=3.10
conda activate REASSEMBLE
# Install dependencies
pip install -r requirements.txt
conda install conda-forge::ffmpeg
# Install REASSEMBLE package
pip install -e .The dataset can be downloaded from the following link TUData
Scripts are parametrized with argparse, so in doubt you and always add the -h flag to get a description of the parameters.
Example for running the dataset conversion:
python scripts/conversions/h5_to_rlds.py --input_dir data/REASSEMBLE --output_dir data/ds_rlds --max_files 2
Example for running visualization:
python scripts/visualization/vizualize_data.py data/REASSEMBLE_corrected/2025-01-13-09-43-29.h5 --cleanup
Step-by-step guide on how to install the project for running the teleoperation script, the custom controllers or recording new data.
First, some environmental variables must be set in the .docker/.env file. Avoid whitespaces between the variable name and the value.
- ROBOT_IP refers to the robotic arm
- ROBOT_TYPE is either fr3 or panda
- NATNET_IP refers to the motion capture system
- ROBOT_SERVER_IP is the main PC the recording script is running on
Set up your devices (if needed) in .docker/docker-compose.yml. Then, run the following commands in a terminal:
# Download NatNet SDK (required for the motion capture system)
sh catkin_reas/src/natnet_ros_cpp/install_sdk.sh
# Enable X11 forwarding for GUIs
xhost +local:docker
# Run docker compose (default container name: reassemble)
cd .docker
docker compose up --buildOpen new interactive bash terminals in the docker for running scripts. At least 3 is needed for the teleoperation.
docker exec -it reassemble bashThe docker installs all required dependencies, drivers and libraries. Note that it also replaces certain files in the workspace source direcotry, listed below
catkin_reas/src/panda_moveit_config/config/joint_limits.yaml(Custom joint limits for calibration)asl_libs/EciLinux_amd64/inc/OsEci.h(some overriding methods that break the script must be commented out)catkin_reas/src/haptic_ros/lib/libdhd.so.3(haptic driver library file is copied here after installation)
The workspace is built automatically in the end, followed by installing python packages required for recording new data. Building the docker image fail sometimes due to the gpg key timing out, running the script again usually fixes this error.
- Move the robot to starting position. Make sure that the FCI of the robot is activated.
roslaunch franka_example_controllers move_to_start.launch robot_ip:=$ROBOT_IP- Start force-torque sensor and run calibration script in separate terminals. After the calibration finishes, stop both scripts. If communication to the sensor breaks at some point, unplug it, then start again.
roslaunch aidin_ros FT_AIDIN.launch
roslaunch reassemble_ft_calib franka_calib.launch robot_ip:=$ROBOT_IP- Start the haptic device, the controller and the calibrated force-torque sensor (in this order, using separate terminals).
roslaunch reassemble_haptic HapticDevice.launch force:=true centering:=false
roslaunch reassemble_haptic TeleopFranka.launch use_sim:=false arm_id:=$ROBOT_TYPE robot_ip:=$ROBOT_IP
roslaunch aidin_ros ft_calibrated.launch arm_id:=$ROBOT_TYPE- Start the motion capture system.
roslaunch natnet_ros_cpp natnet_ros.launch serverIP:=$NATNET_IP clientIP:=$ROBOT_SERVER_IP- You can begin recording the data now. The exact topics to record can be set up in the launch file below.
roslaunch record_teleop record.launch base_path:=/root/dataSince no calibration is needed in the simulation, the teleoperation can be started immeditely.
roslaunch reassemble_haptic TeleopFranka.launch use_sim:=true arm_id:=$ROBOT_TYPE robot_ip:=$ROBOT_IP controller:=cartesian_impedance_example_controllerYou can test the microphones and cameras using the launch files below.
roslaunch record_teleop test_mic.launch
roslaunch record_teleop test_cam.launch
roslaunch realsense2_camera rs_camera.launchIt is also possible to launch the controller without teleoperation. Default Franka controllers are available in the franka_example_controllers package. Fine-tuning our controllers is possible via dynamic reconfigure: Run rqt and select Plugins/Configuration/Dynamic Reconfigure. The joint limits for calibration can be modified in asl_libs/joint_limits.yaml before building the docker image and in catkin_reas/src/panda_moveit_config/config/joint_limits.yaml in the docker container.
roslaunch reassemble_controllers cartesian_impedance_controller_damping_ratio.launch robot_ip:=$ROBOT_IP robot:=$ROBOT_TYPEThe dataset consists of several HDF5 (.h5) and JSON (.json) files, organized into two directories. The poses directory contains the JSON files, which store the poses of the cameras and the board in the world coordinate frame. The data directory contains the HDF5 files, which store the sensory readings and annotations collected as part of the REASSEMBLE dataset. Each JSON file can be matched with its corresponding HDF5 file based on their filenames, which include the timestamp when the data was recorded. For example, 2025-01-09-13-59-54_poses.json corresponds to 2025-01-09-13-59-54.h5.
The structure of the JSON files is as follows:
{"Hama1": [
[x ,y, z],
[qx, qy, qz, qw]
],
"Hama2": [
[x ,y, z],
[qx, qy, qz, qw]
],
"DAVIS346": [
[x ,y, z],
[qx, qy, qz, qw]
],
"NIST_Board1": [
[x ,y, z],
[qx, qy, qz, qw]
]
}
[x, y, z] represent the position of the object, and [qx, qy, qz, qw] represent its orientation as a quaternion.
The HDF5 (.h5) format organizes data into two main types of structures: datasets, which hold the actual data, and groups, which act like folders that can contain datasets or other groups. In the diagram below, groups are shown as folder icons, and datasets as file icons. The main group of the file directly contains the video, audio, and event data. To save memory, video and audio are stored as encoded byte strings, while event data is stored as arrays. The robotโs proprioceptive information is kept in the robot_state group as arrays. Because different sensors record data at different rates, the arrays vary in length (signified by the N_xxx variable in the data shapes). To align the sensory data, each sensorโs timestamps are stored separately in the timestamps group. Information about action segments is stored in the segments_info group. Each segment is saved as a subgroup, named according to its order in the demonstration, and includes a start timestamp, end timestamp, a success indicator, and a natural language description of the action. Within each segment, low-level skills are organized under a low_level subgroup, following the same structure as the high-level annotations.
๐ <date_time>.h5
โโโ๐ hama1 - mp4 encoded video
โโโ๐ hama2_audio - mp3 encoded audio
โโโ๐ hama2 - mp4 encoded video
โโโ๐ hama2_audio - mp3 encoded audio
โโโ๐ hand - mp4 encoded video
โโโ๐ hand_audio - mp3 encoded audio
โโโ๐ capture_node - mp4 encoded video (Event camera)
โโโ๐ events - N_events x 3 (x, y, polarity)
โโโ๐ robot_state
โ ย โโโ๐ compensated_base_force - N_bf x 3 (x, y, z)
โ ย โโโ๐ compenseted_base_torque - N_bt x 3 (x, y, z)
โ ย โโโ๐ gripper_positions - N_grip x 2 (left, right)
โ ย โโโ๐ joint_efforts - N_je x 7 (one for each joint)
โ ย โโโ๐ joint_positions - N_jp x 7 (one for each joint)
โ ย โโโ๐ joint_velocities - N_jv x 7 (one for each joint)
โ ย โโโ๐ measured_force - N_mf x 3 (x, y, z)
โ ย โโโ๐ measured_torque - N_mt x 7 (x, y, z)
โ ย โโโ๐ pose - N_poses x 7 (x, y, z, qw, qx, qy, qz)
โ ย โโโ๐ velocity - N_vels x 7 (x, y, z, ฯ, ฮณ, ฮธ)
โโโ๐ timestamps
โ ย โโโ๐ hama1 - N_hama1 x 1
โ ย โโโ๐ hama2 - N_hama1 x 1
โ ย โโโ๐ hand - N_hand x 1
โ ย โโโ๐ capture_node - N_capture x 1
โ ย โโโ๐ events - N_events x 1
โ ย โโโ๐ compensated_base_force - N_bf x 1
โ ย โโโ๐ compenseted_base_torque - N_bt x 1
โ ย โโโ๐ gripper_positions - N_grip x 1
โ ย โโโ๐ joint_efforts - N_je x 1
โ ย โโโ๐ joint_positions - N_jp x 1
โ ย โโโ๐ joint_velocities - N_jv x 1
โ ย โโโ๐ measured_force - N_mf x 1
โ ย โโโ๐ measured_torque - N_mt x 1
โ ย โโโ๐ pose - N_poses x 1
โ ย โโโ๐ velocity - N_vels x 1
โโโ๐ segments_info
ย ย โโโ๐ 0
ย ย โ ย โโโ๐ start - scalar
ย ย โ ย โโโ๐ end - scalar
ย ย โ ย โโโ๐ success - Boolean
ย ย โ ย โโโ๐ text - scalar
ย ย โ ย โโโ๐ Low_level
ย ย โ ย ย ย โโโ๐ 0
ย ย โ ย ย ย โ ย โโโ๐ start - scalar
ย ย โ ย ย ย โ ย โโโ๐ end - scalar
ย ย โ ย ย ย โ ย โโโ๐ success - Boolean
ย ย โ ย ย ย โ ย โโโ๐ text - scalar
ย ย โ ย ย ย โโโ๐ 1
ย ย โ ย ย ย ย ย โฎ
ย ย โโโ๐ 1
ย ย ย ย โฎ
| Recording | Issue |
|---|---|
| 2025-01-10-15-28-50.h5 | hand cam missing at beginning |
| 2025-01-10-16-17-40.h5 | missing hand cam |
| 2025-01-10-17-10-38.h5 | hand cam missing at beginning |
| 2025-01-10-17-54-09.h5 | no empty action at beginning |
| 2025-01-11-14-22-09.h5 | no empty action at beginning |
| 2025-01-11-14-45-48.h5 | F/T not valid for last action |
| 2025-01-11-15-27-19.h5 | F/T not valid for last action |
| 2025-01-11-15-35-08.h5 | F/T not valid for last action |
| 2025-01-13-11-16-17.h5 | gripper broke for last action |
| 2025-01-13-11-18-57.h5 | pose not available for last action |
This project is licensed under the MIT License - see the LICENSE file for details.
- We would like to thank the people responsible for the DROID dataset and for sharing their codebase. The code structure within this repository is inspired by their structure.
@INPROCEEDINGS{Sliwowski-RSS-25,
AUTHOR = {Daniel Sliwowski AND Shail Jadav AND Sergej Stanovcic AND Jedrzej Orbik AND Johannes Heidersberger AND Dongheui Lee},
TITLE = {{Demonstrating REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly}},
BOOKTITLE = {Proceedings of Robotics: Science and Systems},
YEAR = {2025},
ADDRESS = {Los Angeles, USA},
MONTH = {June},
DOI = {}
}
Daniel Sliwowski - [email protected]