Codestin Search App

LIRA: Reasoning Reconstruction via Multimodal Large Language Models

Existing language instruction-guided online 3D reconstruction systems mainly rely on explicit instructions or queryable maps, showing inadequate capability to handle implicit and complex instructions. In this paper, we first introduce a reasoning reconstruction task. This task inputs an implicit instruction involving complex reasoning and an RGB-D sequence, and outputs incremental 3D reconstruction of instances that conform to the instruction. To handle this task, we propose LIRA: Language Instructed Reconstruction Assistant. It leverages a multimodal large language model to actively reason about the implicit instruction and obtain instruction-relevant 2D candidate instances and their attributes. Then, candidate instances are back-projected into the incrementally reconstructed 3D geometric map, followed by instance fusion and target instance inference. In LIRA, to achieve higher instance fusion quality, we propose TIFF, a Text-enhanced Instance Fusion module operating within Fragment bounding volume, which is learning-based and fuses multiple keyframes simultaneously. Since the evaluation system for this task is not well established, we propose a benchmark ReasonRecon comprising the largest collection of scene-instruction data samples involving implicit reasoning. Experiments demonstrate that LIRA outperforms existing methods in the reasoning reconstruction task and is capable of running in real time.

Installation

conda create -n LIRA python=3.9
conda activate LIRA

conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia

git clone https://github.com/zhen6618/LIRA.git
cd LIRA

pip install -r requirements.txt
pip install sparsehash
pip install -U openmim
mim install mmcv-full

Install additional LISA environment

Dataset

Download and extract ScanNet by following the instructions provided at http://www.scan-net.org/.

python datasets/scannet/download_scannet.py

Generate depth, color, pose, intrinsics from .sens file (change your file path)

python datasets/scannet/reader.py

Expected directory structure of ScanNet can refer to NeuralRecon

Extract instance-level semantic labels (change your file path).

python datasets/scannet/batch_load_scannet_data.py
python tools/tsdf_fusion/generate_gt.py --data_path datasets/scannet/ --save_name all_tsdf_9 --window_size 9
python tools/tsdf_fusion/generate_gt.py --test --data_path datasets/scannet/ --save_name all_tsdf_9 --window_size 9

Instance-level label interpolation (change your file path):

python datasets/scannet/label_interpolate.py

Download 2D reasoning segmentation dataset and reasoning reconstruction dataset

5.1 for ReasonRecon： Download 2D reasoning segmentation dataset (Scannet_2D_Seg_base_new.tar.gz) , reasoning reconstruction dataset (all_tsdf_9_1.zip, grounding_scene_qa_infos_base_new.zip, grounding_scene_instance_infos_mapping.zip, grounding_scene_instance_infos.zip) from here .

5.2 for ReasonRecon-Extension： Download 2D reasoning segmentation dataset (Scannet_2D_Seg_extension.tar.gz) , reasoning reconstruction dataset (all_tsdf_9_1.zip, grounding_scene_qa_infos_extension.zip, grounding_scene_instance_infos_mapping.zip, grounding_scene_instance_infos.zip) from here .

Training

Train 2D reasoning segmentation module

Train it with LoRA (change your file path)

cd 2D_Reasoning_Segmentation && deepspeed --master_port=25666 train_ds.py

When training is finished, get the full model weight (change your file path)

cd ./runs/lisa-7b/ckpt_model && python zero_to_fp32.py . ../pytorch_model.bin

Merge LoRA weight (change your file path)

python merge_lora_weights_and_save_hf_model.py

Train 2D reasoning reconstruction

You need to use the trained weight of 2D reasoning segmentation module. It is recommended to create a checkpoint folder under the LIRA folder and put it here

cd LIRA

Train it (Set the correct dataset and model weights paths)

python main.py --cfg ./config/train.yaml

Pre-trained weights

for ReasonRecon

2D reasoning segmentation: pytorch_model-00001-of-00002.bin, pytorch_model-00002-of-00002.bin, ..., TIFF (our instance fusion module): TIFF_base_new.ckpt from here

for ReasonRecon-Extension

2D reasoning segmentation: pytorch_model-00001-of-00002.bin, pytorch_model-00002-of-00002.bin, ..., TIFF (our instance fusion module): TIFF_Extansion.ckpt from here

Inference

2D reasoning segmentation

cd 2D_Reasoning_Segmentation && python chat.py

Reasoning reconstrcution

cd LIRA && python main.py --cfg ./config/test.yaml

Evaluation

2D reasoning segmentation

cd 2D_Reasoning_Segmentation && deepspeed --master_port=24999 train_ds.py --eval_only

Reasoning reconstrcution

All scan-instruction pair inference

cd LIRA && python main.py --cfg ./config/test.yaml

Eval

python tools/evaluation_3d.py

Citation

ICCV2025 coming soon...

Acknowledgement

LLaVA segment-anything LISA ScanNet NeuralRecon EPRecon LLaMA-Factory

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
2D_Reasoning_Segmentation		2D_Reasoning_Segmentation
LIRA		LIRA
demo		demo
LICENSE		LICENSE
LIRA-supp.pdf		LIRA-supp.pdf
LIRA_ICCV25.pdf		LIRA_ICCV25.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LIRA: Reasoning Reconstruction via Multimodal Large Language Models

Installation

Dataset

Training

Train 2D reasoning segmentation module

Train 2D reasoning reconstruction

Pre-trained weights

Inference

Evaluation

2D reasoning segmentation

Reasoning reconstrcution

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

zhen6618/LIRA

Folders and files

Latest commit

History

Repository files navigation

LIRA: Reasoning Reconstruction via Multimodal Large Language Models

Installation

Dataset

Training

Train 2D reasoning segmentation module

Train 2D reasoning reconstruction

Pre-trained weights

Inference

Evaluation

2D reasoning segmentation

Reasoning reconstrcution

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages