GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Pengyue Jia; Seongheon Park; Sharon Li; Song Gao; Xiangyu Zhao

arxiv: 2505.13731 · v4 · pith:KJPR5X3Tnew · submitted 2025-05-19 · 💻 cs.CV

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Pengyue Jia , Seongheon Park , Song Gao , Xiangyu Zhao , Sharon Li This is my paper

classification 💻 cs.CV

keywords georankerrankingbestcandidatesdistance-awaregeographicimagemodel

0 comments

read the original abstract

Worldwide image geolocalization-the task of predicting GPS coordinates from images taken anywhere on Earth-poses a fundamental challenge due to the vast diversity in visual content across regions. While recent approaches adopt a two-stage pipeline of retrieving candidates and selecting the best match, they typically rely on simplistic similarity heuristics and point-wise supervision, failing to model spatial relationships among candidates. In this paper, we propose GeoRanker, a distance-aware ranking framework that leverages large vision-language models to jointly encode query-candidate interactions and predict geographic proximity. In addition, we introduce a multi-order distance loss that ranks both absolute and relative distances, enabling the model to reason over structured spatial relationships. To support this, we curate GeoRanking, the first dataset explicitly designed for geographic ranking tasks with multimodal candidate information. GeoRanker achieves state-of-the-art results on two well-established benchmarks (IM2GPS3K and YFCC4K), significantly outperforming current best methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

VidTAG: Temporally Aligned Video to GPS Geolocalization with Denoising Sequence Prediction at a Global Scale
cs.CV 2026-04 unverdicted novelty 7.0

VidTAG achieves fine-grained global video-to-GPS geolocalization via temporal frame alignment and denoising sequence refinement, reporting 20% gains at 1 km over GeoCLIP and 25% on CityGuessr68k.
Skill-Conditioned Visual Geolocation for Vision-Language Models
cs.CV 2026-04 unverdicted novelty 7.0

GeoSkill uses an evolving Skill-Graph initialized from expert trajectories and grown via autonomous analysis of successful and failed reasoning rollouts to boost geolocation accuracy, faithfulness, and generalization ...
Skill-Conditioned Visual Geolocation for Vision-Language Models
cs.CV 2026-04 unverdicted novelty 7.0

GeoSkill lets vision-language models improve geolocation accuracy and reasoning by maintaining an evolving Skill-Graph that grows through autonomous analysis of successful and failed rollouts on web-scale image data.
GeoGNN: Time Series Geo-Localization using Two-Tower Graph Neural Networks
cs.LG 2026-06 unverdicted novelty 6.0

GeoGNN is a two-tower GNN that learns geographic cell embeddings from adjacency graphs and matches them to temporal representations via dot-product similarity plus classification, improving geolocalization accuracy by...