Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ekonwang/GeoVista

Repository files navigation

logo

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization


Paper GeovistaBench GeovistaBench GeoVista-RL-6k-7B Webpage
* This repository is intended solely for research purposes.

Quick Start

  1. Setup the environment:
conda create -n geo-vista python==3.10 -y
conda activate geo-vista

bash setup.sh
  1. Set up web search API key

We use Tavily during inference and training. You can sign up for a free account and get your Tavily API key, and then update the TAVILY_API_KEY variable of the .env file. You can run bash examples/search_test.sh to verify your API key is working.

  1. Download the tuned GeoVista model and deploy with vllm

Download from HuggingFace, place it in the ./.temp/checkpoints/LibraTree/GeoVista-RL-6k-7B directory.

python3 scripts/download_hf.py \
--model LibraTree/GeoVista-RL-6k-7B \
--local_model_dir .temp/checkpoints/

then deploy the GeoVista model with vllm:

bash inference/vllm_deploy_geovista_rl_6k.sh

  1. Run an example inference
export VLLM_PORT=8000
export VLLM_HOST="localhost"
# apply env variables
set -a; source .env; set +a;
python examples/infer_example.py \
--multimodal_input examples/geobench-example.png \
--question "Please analyze where is the place."

You will see the model's thinking trajectory and final answer in the console output.

Benchmark

  • We have already released the GeoVista-Bench (GeoBench) Dataset on HuggingFace 🤗, a benchmark that includes photos and panoramas from around the world, along with a subset of satellite images of different cities to rigorously evaluate the geolocalization ability of agentic models.

GeoBench is the first high-resolution, multi-source, globally annotated dataset to evaluate agentic models’ general geolocalization ability.

  • We assess other geolocalization benchmarks with ours along five axes: Global Coverage (GC), indicating whether images span diverse regions worldwide; Reasonable Localizability (RC), measuring whether non-localizable or trivially localizable images are filtered out to preserve meaningful difficulty; High Resolution (HR), requiring all images to have at least $1~\mathrm{M}$ pixels for reliable visual clue extraction; Data Variety (DV), capturing whether multiple image modalities or sources are included to test generalization; and Nuanced Evaluation (NE), which checks whether precise coordinates are available to enable fine-grained distance-based metrics such as haversine distance gap.
Benchmark Year GC RC HR DV NE
Im2GPS 2008
YFCC4k 2017
Google Landmarks v2 2020
VIGOR 2022
OSV-5M 2024
GeoComp 2025
GeoBench (ours) 2025

Inference and Evaluation on GeoBench

We provide the whole inference and evaluation pipeline for GeoVista on GeoBench.

Inference

  • Download the GeoBench dataset from HuggingFace and place it in the ./.temp/datasets directory.
python3 scripts/download_hf.py \
--dataset LibraTree/GeoVistaBench \
--local_dataset_dir ./.temp/datasets
  • Download the pre-trained model from HuggingFace and place it in the ./.temp/checkpoints directory.
python3 scripts/download_hf.py \
--model LibraTree/GeoVista-RL-12k-7B \
--local_model_dir .temp/checkpoints/
  • Deploy the GeoVista model with vllm:
bash inference/vllm_deploy.sh
  • Configure the settings including the output directory, run the inference script:
bash inference/run_inference.sh

After running the above commands, you should be able to see the inference results in the specified output directory, e.g., ./.temp/outputs/geobench/geovista-rl-12k-7b/, which contains the inference_<timestamp>.jsonl file with the inference results.

Evaluation

  • After obtaining the inference results, you can evaluate the geolocalization performance using the evaluation script:
MODEL_NAME=geovista-rl-12k-7b
BENCHMARK=geobench
EVALUATION_RESULT=".temp/outputs/${BENCHMARK}/${MODEL_NAME}/evaluation.jsonl"

python3 eval/eval_infer_geolocation.py \
  --pred_jsonl <The inference file path> \
  --out_jsonl ${EVALUATION_RESULT}\
  --dataset_dir .temp/datasets/${BENCHMARK} \
  --num_samples 1500 \
  --model_verifier \
  --no_eval_accurate_dist \
  --timeout 120 --debug | tee .temp/outputs/${BENCHMARK}/${MODEL_NAME}/evaluation.log 2>&1

You can acclerate the evaluation process by changing the workers argument in the above command (default is 1):

  --workers 8 \

Nuanced Evaluation

  • To run nuanced evaluation on GeoBench, please refer to evaluation.md for guidance.

Training Pipeline

GeoVista is trained in two stages: (1) Cold‑Start supervised fine‑tuning (SFT) and (2) Reinforcement Learning.

  • Cold‑Start SFT: We have open‑sourced the 1.5k‑sample GeoVista‑Cold‑Start dataset used to train GeoVista‑SFT‑7B. Download the dataset and run the SFT training script at scripts/sft.py.

  • Reinforcement Learning: Coming soon.

BibTex

Please consider citing our paper and starring this repo if you find them helpful. Thank you!

@misc{wang2025geovistawebaugmentedagenticvisual,
      title={GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization}, 
      author={Yikun Wang and Zuyan Liu and Ziyi Wang and Pengfei Liu and Han Hu and Yongming Rao},
      year={2025},
      eprint={2511.15705},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.15705}, 
}

Star History

StarHistory

Acknowledgements

  • We thank Tavily, Google Cloud for providing reliable web search API and geocoding services for research use. Also we thank Mapillary for providing high-quality street-level images around the world.
  • We would like to thank the contributors to the VeRL, TRL, gpt-researcher and DeepEyes repositories, for their open-sourced framework or research.

About

Official repo for "GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published