Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[GRSM] Project Page for "GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing"

License

Notifications You must be signed in to change notification settings

Norman-Ou/GeoPix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing

Ruizhe Ou1 · Yuan Hu2,* · Fan Zhang2 · Jiaxin Chen1 · Yu Liu2,3

1Beijing University of Posts and Telecommunications · 2Peking University · 3Peking University Ordos Research Institute of Energy *corresponding authors

ModelScope Open in OpenXLab

GeoPix is a new state-of-the-art pixel-level multi-modal large language model in remote sensing domain, supporting referring image segmentation and other tasks.

Releas🔥

  • [2025.04.11] We release the annotations of GeoPixInstruct. HuggingFace🤗
  • [2025.04.10] GeoPix has been accepted by GRSM (IEEE Geoscience and Remote Sensing Magazine).
  • [2025.02.20] We release the pre-trained checkpoints, inference code and gradio demo!
  • [2025.01.12] We release the paper.

GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing [Arxiv]

Abstract

In this work, we propose GeoPix, a RS MLLM that extends image understanding capabilities to the pixel level. This is achieved by equipping the MLLM with a mask predictor, which transforms visual features from the vision encoder into masks conditioned on the LLM’s segmentation token embeddings. For more details, please refer to the paper.

Demo🚀

1. Installation

conda create -n geopix python=3.10 -y
conda activate geopix
pip install -r requirements.txt
mkdir pretrained_models

2. Download

You can directly download the model from Huggingface, ModelScope or OpenXLab. You also can download the model in python script:

# Huggingface
from huggingface_hub import snapshot_download
snapshot_download(repo_id="Norman-ou/GeoPix-ft-sior_rsicap", local_dir="./pretrained_models")

# ModelScope
from modelscope import snapshot_download
model_dir = snapshot_download("NormanOU/GeoPix-ft-sior_rsicap", local_dir="./pretrained_models")

Once you have prepared all models, the folder tree should be like:

  .
  ├── ...
  ├── model
  ├── pretrained_models
  ├── app.py
  ├── engine.py
  ├── ...
  └── README.md

3. Start a local gradio demo

Run the following command:

python app.py

Thee instruction is well written. Enjoy our work.

image

Inference🔍

Run the following command:

python inference.py

Citation📑

@ARTICLE{10994415,
  author={Ou, Ruizhe and Hu, Yuan and Zhang, Fan and Chen, Jiaxin and Liu, Yu},
  journal={IEEE Geoscience and Remote Sensing Magazine}, 
  title={GeoPix: A multimodal large language model for pixel-level image understanding in remote sensing}, 
  year={2025},
  volume={},
  number={},
  pages={2-16},
  keywords={Visualization;Image segmentation;Training;Integrated circuit modeling;Grounding;Feature extraction;Accuracy;Remote sensing;Prototypes;Predictive models},
  doi={10.1109/MGRS.2025.3560293}
}

Acknowledgement

About

[GRSM] Project Page for "GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing"

Resources

License

Stars

Watchers

Forks

Languages