This repository is still not fully built.
In the next two months, the documentation and code structure will be gradually completed.
!!! Some repositories are actually in this repository, can directly install dependencies !!!
Assuming you have conda installed, let's prepare a conda env:
conda_env_name=h3vlfm_world
conda create -n $conda_env_name python=3.9 cmake=3.14.0
conda activate $conda_env_name
Install proper version of torch:
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
Following Habitat-lab's instruction, install Habitat-sim:
conda install habitat-sim=0.3.1 withbullet -c conda-forge -c aihabitat
Then install Habitat-lab
cd habitat-lab
pip install -e habitat-lab
pip install -e habitat-baselines
cd ..
Following Mobile-SAM's instruction:
pip install git+https://github.com/ChaoningZhang/MobileSAM.git
Following GroundingDINO's instruction:
May define CUDA_HOME <= 11.8 export CUDA_HOME=/path/to/cuda-11.8
cd GroundingDINO/
pip install -e . --no-dependencies
Then place the pretrained model weights:
mkdir weights
cd weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..
pip install salesforce-lavis==1.0.2
We use finetuned version of semantic segmentation model RedNet.
Therefore, you need to download the segmentation model in RedNet/model path.
git clone https://github.com/Peterande/D-FINE.git
git clone https://github.com/CSAILVision/places365.git
pip install flask
pip install open3d
pip install dash
pip install scikit-learn
pip install joblib
pip install seaborn
pip install faster_coco_eval
pip install calflops
pip install flash-attn --no-build-isolation
pip install modelscope
pip install opencv-python==4.10.0.84
pip install transformers==4.37.0
pip install openpyxl
pip install supervision==0.25.1
pip install yapf==0.43.0
- Download Scene & Episode Datasets
Following the instructions for HM3D and MatterPort3D in Habitat-lab's Datasets.md.
- Locate Datasets
The file structure should look like this:
data
└── datasets
└── objectnav
├── hm3d
│ └── v1
│ ├── train
│ │ ├── content
│ │ └── train.json.gz
│ └── val
│ ├── content
│ └── val.json.gz
└── mp3d
└── v1
├── train
│ ├── content
│ └── train.json.gz
└── val
├── content
└── val.json.gz
Run the following commands:
./scripts/launch_vlm_servers_qwen25_gdino_with_ram.sh
python -u -m falcon.run --config-name=experiments/qwen25_gdino_objectnav_hm3d_debug_scene.yaml habitat_baselines.num_environments=1 > debug/20250219/eval_llm_single_floor_gdino.log 2>&1