Existing language instruction-guided online 3D reconstruction systems mainly rely on explicit instructions or queryable maps, showing inadequate capability to handle implicit and complex instructions. In this paper, we first introduce a reasoning reconstruction task. This task inputs an implicit instruction involving complex reasoning and an RGB-D sequence, and outputs incremental 3D reconstruction of instances that conform to the instruction. To handle this task, we propose LIRA: Language Instructed Reconstruction Assistant. It leverages a multimodal large language model to actively reason about the implicit instruction and obtain instruction-relevant 2D candidate instances and their attributes. Then, candidate instances are back-projected into the incrementally reconstructed 3D geometric map, followed by instance fusion and target instance inference. In LIRA, to achieve higher instance fusion quality, we propose TIFF, a Text-enhanced Instance Fusion module operating within Fragment bounding volume, which is learning-based and fuses multiple keyframes simultaneously. Since the evaluation system for this task is not well established, we propose a benchmark ReasonRecon comprising the largest collection of scene-instruction data samples involving implicit reasoning. Experiments demonstrate that LIRA outperforms existing methods in the reasoning reconstruction task and is capable of running in real time.
conda create -n LIRA python=3.9
conda activate LIRA
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia
git clone https://github.com/zhen6618/LIRA.git
cd LIRA
pip install -r requirements.txt
pip install sparsehash
pip install -U openmim
mim install mmcv-full
Install additional LISA environment
- Download and extract ScanNet by following the instructions provided at http://www.scan-net.org/.
python datasets/scannet/download_scannet.py
- Generate depth, color, pose, intrinsics from .sens file (change your file path)
python datasets/scannet/reader.py
Expected directory structure of ScanNet can refer to NeuralRecon
- Extract instance-level semantic labels (change your file path).
python datasets/scannet/batch_load_scannet_data.py
python tools/tsdf_fusion/generate_gt.py --data_path datasets/scannet/ --save_name all_tsdf_9 --window_size 9
python tools/tsdf_fusion/generate_gt.py --test --data_path datasets/scannet/ --save_name all_tsdf_9 --window_size 9
- Instance-level label interpolation (change your file path):
python datasets/scannet/label_interpolate.py
- Download 2D reasoning segmentation dataset and reasoning reconstruction dataset
5.1 for ReasonRecon: Download 2D reasoning segmentation dataset (Scannet_2D_Seg_base_new.tar.gz) , reasoning reconstruction dataset (all_tsdf_9_1.zip, grounding_scene_qa_infos_base_new.zip, grounding_scene_instance_infos_mapping.zip, grounding_scene_instance_infos.zip) from here .
5.2 for ReasonRecon-Extension: Download 2D reasoning segmentation dataset (Scannet_2D_Seg_extension.tar.gz) , reasoning reconstruction dataset (all_tsdf_9_1.zip, grounding_scene_qa_infos_extension.zip, grounding_scene_instance_infos_mapping.zip, grounding_scene_instance_infos.zip) from here .
- Train it with LoRA (change your file path)
cd 2D_Reasoning_Segmentation && deepspeed --master_port=25666 train_ds.py
- When training is finished, get the full model weight (change your file path)
cd ./runs/lisa-7b/ckpt_model && python zero_to_fp32.py . ../pytorch_model.bin
- Merge LoRA weight (change your file path)
python merge_lora_weights_and_save_hf_model.py
You need to use the trained weight of 2D reasoning segmentation module. It is recommended to create a checkpoint folder under the LIRA folder and put it here
cd LIRA
- Train it (Set the correct dataset and model weights paths)
python main.py --cfg ./config/train.yaml
- for ReasonRecon
2D reasoning segmentation: pytorch_model-00001-of-00002.bin, pytorch_model-00002-of-00002.bin, ..., TIFF (our instance fusion module): TIFF_base_new.ckpt from here
- for ReasonRecon-Extension
2D reasoning segmentation: pytorch_model-00001-of-00002.bin, pytorch_model-00002-of-00002.bin, ..., TIFF (our instance fusion module): TIFF_Extansion.ckpt from here
- 2D reasoning segmentation
cd 2D_Reasoning_Segmentation && python chat.py
- Reasoning reconstrcution
cd LIRA && python main.py --cfg ./config/test.yaml
cd 2D_Reasoning_Segmentation && deepspeed --master_port=24999 train_ds.py --eval_only
- All scan-instruction pair inference
cd LIRA && python main.py --cfg ./config/test.yaml
- Eval
python tools/evaluation_3d.py
ICCV2025 coming soon...
LLaVA segment-anything LISA ScanNet NeuralRecon EPRecon LLaMA-Factory