GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

* This repository is intended solely for research purposes.

Quick Start

Setup the environment:

conda create -n geo-vista python==3.10 -y
conda activate geo-vista

bash setup.sh

Set up web search API key

We use Tavily during inference and training. You can sign up for a free account and get your Tavily API key, and then update the TAVILY_API_KEY variable of the .env file. You can run bash examples/search_test.sh to verify your API key is working.

Download the tuned GeoVista model and deploy with vllm

Download from HuggingFace, place it in the ./.temp/checkpoints/LibraTree/GeoVista-RL-6k-7B directory.

python3 scripts/download_hf.py \
--model LibraTree/GeoVista-RL-6k-7B \
--local_model_dir .temp/checkpoints/

then deploy the GeoVista model with vllm:

bash inference/vllm_deploy_geovista_rl_6k.sh

Run an example inference

export VLLM_PORT=8000
export VLLM_HOST="localhost"
# apply env variables
set -a; source .env; set +a;
python examples/infer_example.py \
--multimodal_input examples/geobench-example.png \
--question "Please analyze where is the place."

You will see the model's thinking trajectory and final answer in the console output.

Benchmark

We have already released the GeoVista-Bench (GeoBench) Dataset on HuggingFace 🤗, a benchmark that includes photos and panoramas from around the world, along with a subset of satellite images of different cities to rigorously evaluate the geolocalization ability of agentic models.

GeoBench is the first high-resolution, multi-source, globally annotated dataset to evaluate agentic models’ general geolocalization ability.

We assess other geolocalization benchmarks with ours along five axes: Global Coverage (GC), indicating whether images span diverse regions worldwide; Reasonable Localizability (RC), measuring whether non-localizable or trivially localizable images are filtered out to preserve meaningful difficulty; High Resolution (HR), requiring all images to have at least $1~\mathrm{M}$ pixels for reliable visual clue extraction; Data Variety (DV), capturing whether multiple image modalities or sources are included to test generalization; and Nuanced Evaluation (NE), which checks whether precise coordinates are available to enable fine-grained distance-based metrics such as haversine distance gap.

Benchmark	Year	GC	RC	HR	DV	NE
Im2GPS	2008	✓
YFCC4k	2017	✓
Google Landmarks v2	2020	✓
VIGOR	2022				✓
OSV-5M	2024	✓	✓			✓
GeoComp	2025	✓	✓			✓
GeoBench (ours)	2025	✓	✓	✓	✓	✓

Inference and Evaluation on GeoBench

We provide the whole inference and evaluation pipeline for GeoVista on GeoBench.

Inference

Download the GeoBench dataset from HuggingFace and place it in the ./.temp/datasets directory.

python3 scripts/download_hf.py \
--dataset LibraTree/GeoVistaBench \
--local_dataset_dir ./.temp/datasets

Download the pre-trained model from HuggingFace and place it in the ./.temp/checkpoints directory.

python3 scripts/download_hf.py \
--model LibraTree/GeoVista-RL-12k-7B \
--local_model_dir .temp/checkpoints/

Deploy the GeoVista model with vllm:

bash inference/vllm_deploy.sh

Configure the settings including the output directory, run the inference script:

bash inference/run_inference.sh

After running the above commands, you should be able to see the inference results in the specified output directory, e.g., ./.temp/outputs/geobench/geovista-rl-12k-7b/, which contains the inference_<timestamp>.jsonl file with the inference results.

Evaluation

After obtaining the inference results, you can evaluate the geolocalization performance using the evaluation script:

MODEL_NAME=geovista-rl-12k-7b
BENCHMARK=geobench
EVALUATION_RESULT=".temp/outputs/${BENCHMARK}/${MODEL_NAME}/evaluation.jsonl"

python3 eval/eval_infer_geolocation.py \
  --pred_jsonl <The inference file path> \
  --out_jsonl ${EVALUATION_RESULT}\
  --dataset_dir .temp/datasets/${BENCHMARK} \
  --num_samples 1500 \
  --model_verifier \
  --no_eval_accurate_dist \
  --timeout 120 --debug | tee .temp/outputs/${BENCHMARK}/${MODEL_NAME}/evaluation.log 2>&1

You can acclerate the evaluation process by changing the workers argument in the above command (default is 1):

  --workers 8 \

Nuanced Evaluation

To run nuanced evaluation on GeoBench, please refer to evaluation.md for guidance.

Training Pipeline

GeoVista is trained in two stages: (1) Cold‑Start supervised fine‑tuning (SFT) and (2) Reinforcement Learning.

Cold‑Start SFT: We have open‑sourced the 1.5k‑sample GeoVista‑Cold‑Start dataset used to train GeoVista‑SFT‑7B. Download the dataset and run the SFT training script at scripts/sft.py.
Reinforcement Learning: Coming soon.

BibTex

Please consider citing our paper and starring this repo if you find them helpful. Thank you!

@misc{wang2025geovistawebaugmentedagenticvisual,
      title={GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization}, 
      author={Yikun Wang and Zuyan Liu and Ziyi Wang and Pengfei Liu and Han Hu and Yongming Rao},
      year={2025},
      eprint={2511.15705},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.15705}, 
}

Star History

Acknowledgements

We thank Tavily, Google Cloud for providing reliable web search API and geocoding services for research use. Also we thank Mapillary for providing high-quality street-level images around the world.
We would like to thank the contributors to the VeRL, TRL, gpt-researcher and DeepEyes repositories, for their open-sourced framework or research.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
assets		assets
docs		docs
eval		eval
examples		examples
external		external
inference		inference
scripts		scripts
verl		verl
.env		.env
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
install_geovista.sh		install_geovista.sh
pyproject.toml		pyproject.toml
setup.py		setup.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Quick Start

Benchmark

Inference and Evaluation on GeoBench

Inference

Evaluation

Nuanced Evaluation

Training Pipeline

BibTex

Star History

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

ekonwang/GeoVista

Folders and files

Latest commit

History

Repository files navigation

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Quick Start

Benchmark

Inference and Evaluation on GeoBench

Inference

Evaluation

Nuanced Evaluation

Training Pipeline

BibTex

Star History

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages