Thanks to visit codestin.com
Credit goes to github.com

Skip to content
forked from NVlabs/M2T2

M2T2: Multi-Task Masked Transformer for Object-centric Pick and Plac

License

samiazirar/M2T2

 
 

Repository files navigation

M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place

robot

Introduction

M2T2 (Multi-Task Masked Transformer) is a unified transformer model for learning different low-level action primitives on complex open-world scenes. Given a raw point cloud observation, M2T2 reasons about contact points and predicts collision-free gripper poses for different action modes, including 6-DoF object-centric grasping and orientation-aware placement. M2T2 achieves zero-shot sim2real transfer on the real robot, outperforming the baseline system with state-of-the-art task-specific models by about 19% in overall performance and 37.5% in challenging scenes where the object needs to be re-oriented for collision-free placement. M2T2 also achieves state-of-the-art results on a subset of language conditioned tasks in RLBench. This repository contains a basic PyTorch implementation of M2T2 and demo scripts for how to run the model on real-world as well as simulated scenes. Videos of real-world robot evaluations are available on our project website.

network M2T2 pairs a transformer with a 3D point cloud encoder-decoder to predict contact points and gripper targets for pick and place primitives in complex, cluttered environments. It can also be optionally conditioned on a language goal to predict task-specific grasp/placement poses.

If you find our work useful, please consider citing our paper:

@inproceedings{yuan2023m2t2,
  title     = {M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place},
  author    = {Yuan, Wentao and Murali, Adithyavairavan and Mousavian, Arsalan and Fox, Dieter},
  booktitle = {7th Annual Conference on Robot Learning},
  year      = {2023}
}

Installation

  1. Create and activate conda environment
conda env create -n m2t2 python=3.10
conda activate m2t2
  1. Install CUDA. Replace 11.7.0 with the CUDA version compatible with your nvidia driver. You can check your CUDA version using nvidia-smi.
conda install cuda -c nvidia/label/cuda-11.7.0
  1. Install PyTorch which matches the cuda version. Check https://pytorch.org/get-started/previous-versions.
pip install torch==2.0.1 torchvision==0.15.2
  1. Install PointNet++ custom ops.
pip install pointnet2_ops/
  1. Install additional dependencies.
pip install -r requirements.txt
  1. Install m2t2 as a package.
pip install .

Running M2T2

  1. Download [model weights]. m2t2.pth is the generic pick-and-place model which outputs all possible grasping and placement poses, whereas m2t2_language.pth is a model that outputs a single grasping or placement pose conditioned on a language goal.

  2. Run meshcat-server in a separate terminal. Open http://127.0.0.1:7000/static/ in a browser window for visualization.

  3. Try out M2T2 on different scenes under sample_data.

    1. real_world/00 and real_world/02 contain real-world scenes before grasping
    python demo.py eval.checkpoint=m2t2.pth eval.data_dir=sample_data/real_world/00 eval.mask_thresh=0.4 eval.num_runs=5
    

    Use eval.mask_thresh to control the confidence threshold for output grasps. Use eval.num_runs to aggregate multiple forward passes for more complete prediction.

    1. real_world/00 and real_world/02 contain real-world scenes before placement. It can be run with the same command. Just replace the path in eval.data_dir.

    2. Under rlbench there are two episodes of the task meat_off_grill. Running the following command to see the grasp prediction

    python demo_rlbench.py eval.checkpoint=m2t2_language.pth rlbench.episode=4
    

    and

    python demo_rlbench.py eval.checkpoint=m2t2_language.pth rlbench.episode=4 rlbench.frame_id=90
    

    to see the placement prediction.

Here are some example visualizations pick_demo place_demo

Training

We are not able to release the training dataset at this point due to copyright issues, but we do provide the training script and a few simulated scenes under sample_data/simulation to show the data format. Here's how you would train the model.

python train.py m2t2.action_decoder.max_num_pred=512 data.root_dir=sample_data/simulation train.log_dir=logs/test train.num_gpus=1 train.batch_size=4 train.print_freq=1 train.plot_freq=4

You can visualize training curves and plots by running tensorboard on the log directory:

tensorboard --logdir logs/test

Samis stuff

Using app.py with Optional Visualization

app.py exposes a drop-in replacement for the GPD grasp API that accepts raw object and environment point clouds. Inference always runs headlessly, but you can enable an optional Meshcat view of the sampled scene, object cloud, and predicted grasps by toggling the module-level VISUALIZE flag.

export M2T2_VISUALIZE=1   # or set VISUALIZE = True inside app.py
python app_server.py 

When visualization is active and meshcat-server is running, the code opens a viewer showing the fused point cloud and every returned grasp pose. Leave M2T2_VISUALIZE unset (the default) to keep the service non-interactive.

TODO: Use gpd to allign the code TODO: remove all codex/ssh github code  

About

M2T2: Multi-Task Masked Transformer for Object-centric Pick and Plac

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 82.6%
  • Cuda 6.5%
  • C++ 4.9%
  • Shell 3.8%
  • Dockerfile 1.6%
  • C 0.6%