GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing

Ruizhe Ou¹ · Yuan Hu^2,* · Fan Zhang² · Jiaxin Chen¹ · Yu Liu^2,3

¹Beijing University of Posts and Telecommunications · ²Peking University · ³Peking University Ordos Research Institute of Energy ^*corresponding authors

GeoPix is a new state-of-the-art pixel-level multi-modal large language model in remote sensing domain, supporting referring image segmentation and other tasks.

Releas🔥

[2025.04.11] We release the annotations of GeoPixInstruct. HuggingFace🤗
[2025.04.10] GeoPix has been accepted by GRSM (IEEE Geoscience and Remote Sensing Magazine).
[2025.02.20] We release the pre-trained checkpoints, inference code and gradio demo!
[2025.01.12] We release the paper.

GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing [Arxiv]

Abstract

In this work, we propose GeoPix, a RS MLLM that extends image understanding capabilities to the pixel level. This is achieved by equipping the MLLM with a mask predictor, which transforms visual features from the vision encoder into masks conditioned on the LLM’s segmentation token embeddings. For more details, please refer to the paper.

Demo🚀

1. Installation

conda create -n geopix python=3.10 -y
conda activate geopix
pip install -r requirements.txt
mkdir pretrained_models

2. Download

You can directly download the model from Huggingface, ModelScope or OpenXLab. You also can download the model in python script:

# Huggingface
from huggingface_hub import snapshot_download
snapshot_download(repo_id="Norman-ou/GeoPix-ft-sior_rsicap", local_dir="./pretrained_models")

# ModelScope
from modelscope import snapshot_download
model_dir = snapshot_download("NormanOU/GeoPix-ft-sior_rsicap", local_dir="./pretrained_models")

Once you have prepared all models, the folder tree should be like:

  .
  ├── ...
  ├── model
  ├── pretrained_models
  ├── app.py
  ├── engine.py
  ├── ...
  └── README.md

3. Start a local gradio demo

Run the following command:

python app.py

Thee instruction is well written. Enjoy our work.

Inference🔍

Run the following command:

python inference.py

Citation📑

@ARTICLE{10994415,
  author={Ou, Ruizhe and Hu, Yuan and Zhang, Fan and Chen, Jiaxin and Liu, Yu},
  journal={IEEE Geoscience and Remote Sensing Magazine}, 
  title={GeoPix: A multimodal large language model for pixel-level image understanding in remote sensing}, 
  year={2025},
  volume={},
  number={},
  pages={2-16},
  keywords={Visualization;Image segmentation;Training;Integrated circuit modeling;Grounding;Feature extraction;Accuracy;Remote sensing;Prototypes;Predictive models},
  doi={10.1109/MGRS.2025.3560293}
}

Acknowledgement

This work is built upon the LLaVA and PixelLM

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
dataset		dataset
imgs		imgs
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
engine.py		engine.py
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing

Releas🔥

Abstract

Demo🚀

1. Installation

2. Download

3. Start a local gradio demo

Inference🔍

Citation📑

Acknowledgement

About

Uh oh!

Languages

License

Norman-Ou/GeoPix

Folders and files

Latest commit

History

Repository files navigation

GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing

Releas🔥

Abstract

Demo🚀

1. Installation

2. Download

3. Start a local gradio demo

Inference🔍

Citation📑

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages