Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning

Zilun Zhang , Zian Guan , Tiancheng Zhao , Haozhan Shen , Tianyu Li , Yuxiang Cai , Zhonggen Su , Zhaojun Liu

show 2 more authors

Jianwei Yin Xiang Li

Authors on Pith no claims yet

classification 💻 cs.CV cs.AI

keywords geo-r1referringfew-shotfine-tuninggeneralizationgeospatialmodelexpression

0 comments

read the original abstract

Referring expression understanding in remote sensing poses unique challenges, as it requires reasoning over complex object-context relationships. While supervised fine-tuning (SFT) on multimodal large language models achieves strong performance with massive labeled datasets, they struggle in data-scarce scenarios, leading to poor generalization. To address this limitation, we propose Geo-R1, a reasoning-centric reinforcement fine-tuning (RFT) paradigm for few-shot geospatial referring. Geo-R1 enforces the model to first generate explicit, interpretable reasoning chains that decompose referring expressions, and then leverage these rationales to localize target objects. This "reason first, then act" process enables the model to make more effective use of limited annotations, enhances generalization, and provides interpretability. We validate Geo-R1 on three carefully designed few-shot geospatial referring benchmarks, where our model consistently and substantially outperforms SFT baselines. It also demonstrates strong cross-dataset generalization, highlighting its robustness. Code and data will be released at: https://github.com/Geo-R1/geo-r1.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RemoteAgent: Bridging Vague Human Intents and Earth Observation with RL-based Agentic MLLMs
cs.CV 2026-04 unverdicted novelty 7.0

RemoteAgent uses RL fine-tuning on VagueEO to align MLLMs for vague EO intent recognition, handling simple tasks internally and routing dense predictions to tools via Model Context Protocol.
RemoteZero: Geospatial Reasoning with Zero Human Annotations
cs.CV 2026-05 unverdicted novelty 6.0

RemoteZero replaces coordinate supervision with intrinsic semantic verification to enable box-free GRPO training and self-evolution for geospatial reasoning.
RemoteShield: Enable Robust Multimodal Large Language Models for Earth Observation
cs.CV 2026-04 unverdicted novelty 6.0

RemoteShield improves robustness of Earth observation MLLMs by training on semantic equivalence clusters of clean and perturbed inputs via preference learning to maintain consistent reasoning under noise.