arxiv: 2604.03120 · v1 · submitted 2026-04-03 · 💻 cs.CV · cs.RO

Recognition: 1 theorem link

· Lean Theorem

SCC-Loc: A Unified Semantic Cascade Consensus Framework for UAV Thermal Geo-Localization

Xiaoran Zhang , Yu Liu , Jinyu Liang , Kangqiushi Li , Zhiwei Huang , Huaxin Xiao

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:30 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords thermal geo-localizationUAV navigationcross-modal matchingDINOv2semantic cascadesatellite alignmentGNSS-denied positioning

0 comments

The pith

SCC-Loc achieves 9.37 m mean error in UAV thermal geo-localization by sharing one DINOv2 backbone across retrieval and matching

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the severe feature ambiguity that arises when matching thermal UAV images to visible satellite references for absolute positioning without GNSS. It introduces SCC-Loc, a framework that reuses a single DINOv2 backbone for both coarse retrieval and fine matching while adding three modules to correct spatial deviations, remove cross-modal outliers, and select reliable poses. The approach also releases the Thermal-UAV dataset of 11,890 thermal queries paired with satellite ortho-photos and aligned DSMs. Experiments report new state-of-the-art results, cutting mean localization error to 9.37 m and raising accuracy inside a 5 m threshold by a factor of 7.6 over the prior best method. A sympathetic reader would care because the work targets reliable all-weather UAV operation in signal-denied environments where conventional registration fails.

Core claim

SCC-Loc performs accurate absolute position estimation from thermal UAV images against satellite references by sharing DINOv2 features and chaining semantic-guided viewport alignment, cascaded spatial-adaptive texture-structure filtering, and consensus-driven reliability-aware position selection to resolve modality gaps.

What carries the argument

The unified Semantic-Cascade-Consensus framework that shares a DINOv2 backbone and deploys SGVA for adaptive crop alignment, C-SATSF for geometric outlier removal, and CD-RAPS for physically constrained pose optimization.

Load-bearing premise

The semantic features from DINOv2 combined with the three proposed modules will generalize to new thermal-visible pairs without overfitting to the specific alignment quality or scene statistics of the Thermal-UAV dataset.

What would settle it

Evaluating SCC-Loc on an independent thermal UAV dataset collected in a different geographic region or under different thermal conditions and checking whether the reported 9.37 m mean error and 7.6-fold tight-threshold gain are preserved.

Figures

Figures reproduced from arXiv: 2604.03120 by Huaxin Xiao, Jinyu Liang, Kangqiushi Li, Xiaoran Zhang, Yu Liu, Zhiwei Huang.

**Figure 2.** Figure 2: Overview of the proposed SCC-Loc framework. The pipeline consists of four main stages: [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Visual samples from the constructed Thermal-UAV dataset. The dataset systematically captures profound modality discrepancies and diurnal thermal [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative visualization of the proposed SCC-Loc pipeline in (a) Urban and (b) Rural scenarios. The process illustrates the adaptive correction of spatial [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

Cross-modal Thermal Geo-localization (TG) provides a robust, all-weather solution for Unmanned Aerial Vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, profound thermal-visible modality gaps introduce severe feature ambiguity, systematically corrupting conventional coarse-to-fine registration. To dismantle this bottleneck, we propose SCC-Loc, a unified Semantic-Cascade-Consensus localization framework. By sharing a single DINOv2 backbone across global retrieval and MINIMA$_{\text{RoMa}}$ matching, it minimizes memory footprint and achieves zero-shot, highly accurate absolute position estimation. Specifically, we tackle modality ambiguity by introducing three cohesive components. First, we design the Semantic-Guided Viewport Alignment (SGVA) module to adaptively optimize satellite crop regions, effectively correcting initial spatial deviations. Second, we develop the Cascaded Spatial-Adaptive Texture-Structure Filtering (C-SATSF) mechanism to explicitly enforce geometric consistency, thereby eradicating dense cross-modal outliers. Finally, we propose the Consensus-Driven Reliability-Aware Position Selection (CD-RAPS) strategy to derive the optimal solution through a synergy of physically constrained pose optimization. To address data scarcity, we construct Thermal-UAV, a comprehensive dataset providing 11,890 diverse thermal queries referenced against a large-scale satellite ortho-photo and corresponding spatially aligned Digital Surface Model (DSM). Extensive experiments demonstrate that SCC-Loc establishes a new state-of-the-art, suppressing the mean localization error to 9.37 m and providing a 7.6-fold accuracy improvement within a strict 5-m threshold over the strongest baseline. Code and dataset are available at https://github.com/FloralHercules/SCC-Loc.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SCC-Loc assembles a shared DINOv2 backbone with three new modules for thermal-to-satellite UAV localization and ships a fresh dataset, but every accuracy number stays inside that one collection.

read the letter

SCC-Loc is a working pipeline for matching thermal UAV images to satellite views that gets down to about 9 meters mean error on their test set. They share a DINOv2 backbone for both retrieval and matching, then add SGVA to fix the initial crop, C-SATSF to clean up texture mismatches, and CD-RAPS to pick the best pose. That assembly plus the new Thermal-UAV dataset with 11k pairs is what they bring. The new parts are those three modules and the dataset itself. Using one backbone across stages is a sensible memory saver, and the semantic guidance idea fits the modality gap problem. Releasing code and data is the right move here. The soft spot is that every number, including the 7.6-fold gain at 5m threshold, comes from this single new dataset. There are no results on older thermal geo-localization benchmarks or on held-out areas with different conditions. That makes it hard to tell if the modules solve the general problem or just fit the way this data was gathered and aligned. The abstract talks about extensive experiments, but without ablations or stats on variance it's tough to see how much each piece adds. This is for people in UAV navigation or remote sensing who need a concrete starting point for thermal localization. The engineering is straightforward enough that a reader can grab the code and try it. I would send it to peer review. The artifacts make it worth a referee's time even if more cross-dataset testing would strengthen the case.

Referee Report

1 major / 1 minor

Summary. The paper presents SCC-Loc, a unified semantic cascade consensus framework for cross-modal UAV thermal geo-localization. It shares a single DINOv2 backbone for global retrieval and MINIMA_RoMa matching, and introduces three modules (SGVA for adaptive satellite crop optimization, C-SATSF for enforcing geometric consistency via cascaded filtering, and CD-RAPS for reliability-aware pose selection). The authors release the Thermal-UAV dataset (11,890 thermal queries aligned to satellite ortho-photos and DSM) and report SOTA results of 9.37 m mean localization error with a 7.6-fold accuracy gain inside a strict 5 m threshold.

Significance. If the gains hold beyond the new dataset, the work provides a practical, memory-efficient advance for GNSS-denied UAV navigation by directly tackling thermal-visible feature ambiguity with semantic features and consensus optimization. The open release of code and dataset, together with the zero-shot use of a shared backbone, are clear strengths that support reproducibility and follow-on research.

major comments (1)

[Experiments section] Experiments section: all quantitative results, including the reported 9.37 m mean error and 7.6-fold improvement at the 5 m threshold, are obtained exclusively on the newly introduced Thermal-UAV dataset. To substantiate the claim of solving the general modality-gap problem rather than exploiting collection-specific regularities (viewport statistics, DSM alignment quality, or thermal-visible pair construction), evaluation on at least one established prior TG benchmark or a held-out geographic split is required.

minor comments (1)

[Method section] The integration of MINIMA_RoMa with the DINOv2 backbone should be described with a brief equation or pseudocode in the method section for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on generalizability. We agree that all current quantitative results are reported exclusively on the new Thermal-UAV dataset and that additional validation is required to demonstrate that SCC-Loc addresses the modality gap in a general manner rather than dataset-specific artifacts. In the revised manuscript we will add a geographically held-out split evaluation (no location overlap between training and test sets) and will explicitly discuss the limitations of relying solely on the new dataset.

read point-by-point responses

Referee: [Experiments section] Experiments section: all quantitative results, including the reported 9.37 m mean error and 7.6-fold improvement at the 5 m threshold, are obtained exclusively on the newly introduced Thermal-UAV dataset. To substantiate the claim of solving the general modality-gap problem rather than exploiting collection-specific regularities (viewport statistics, DSM alignment quality, or thermal-visible pair construction), evaluation on at least one established prior TG benchmark or a held-out geographic split is required.

Authors: We acknowledge the validity of this concern. The Thermal-UAV dataset was constructed specifically to fill the data gap in thermal-to-satellite geo-localization, and all reported metrics (9.37 m mean error, 7.6-fold gain at 5 m) are indeed obtained on this new collection. In the revision we will add a held-out geographic split experiment: the dataset will be partitioned by geographic region so that test queries come from entirely unseen locations, thereby testing robustness to different viewport statistics and DSM characteristics. We will report the same metrics on this split and include an analysis of any performance drop. While we agree an established prior TG benchmark would be ideal, most existing cross-modal geo-localization datasets are either not publicly released, use different sensor modalities, or lack aligned DSMs; we will note this limitation and state that future work will seek compatible benchmarks when they become available. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces SCC-Loc as a framework combining a shared DINOv2 backbone with three new modules (SGVA, C-SATSF, CD-RAPS) and evaluates performance empirically on the newly constructed Thermal-UAV dataset. No mathematical derivations, equations, or predictions are presented that reduce by construction to the method's own inputs or fitted parameters. Performance numbers (e.g., 9.37 m mean error) are reported as direct experimental outcomes rather than outputs of any self-referential fitting process. No load-bearing self-citations, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation appear in the provided text. The central claims rest on external pre-trained features and measured results on the introduced data, making the chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on the standard assumption that DINOv2 features transfer across thermal-visible domains and on the domain assumption that satellite ortho-photos plus DSM provide reliable geometric ground truth. No new physical entities are postulated.

axioms (2)

domain assumption DINOv2 features are sufficiently invariant to thermal-visible modality shift for both retrieval and dense matching
Invoked when sharing the single backbone across global retrieval and MINIMA_RoMa matching stages
domain assumption Semantic cues can reliably correct initial spatial deviations between thermal query and satellite reference
Core premise of the SGVA module

pith-pipeline@v0.9.0 · 5633 in / 1482 out tokens · 26645 ms · 2026-05-13T19:30:49.467722+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By sharing a single DINOv2 backbone across global retrieval and MINIMA RoMa matching... Semantic-Guided Viewport Alignment (SGVA) module... Cascaded Spatial-Adaptive Texture-Structure Filtering (C-SATSF)... Consensus-Driven Reliability-Aware Position Selection (CD-RAPS)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 4 internal anchors

[1]

Search and rescue operation using uavs: A case study,

I. Martinez-Alpiste, G. Golcarenarenji, Q. Wang, and J. M. Alcaraz- Calero, “Search and rescue operation using uavs: A case study,”Expert Syst. Appl., vol. 178, p. 114937, 2021

work page 2021
[2]

Drones and border control: An examination of state and non-state actor use of uavs along borders,

R. Koslowski, “Drones and border control: An examination of state and non-state actor use of uavs along borders,” inResearch Handbook on International Migration and Digital Technology. Edward Elgar Publishing, 2021, pp. 152–165

work page 2021
[3]

University-1652: A multi-view multi- source benchmark for drone-based geo-localization,

Z. Zheng, Y . Wei, and Y . Yang, “University-1652: A multi-view multi- source benchmark for drone-based geo-localization,” inProc. ACM Int. Conf. Multimedia, 2020, pp. 1395–1403

work page 2020
[4]

A review on deep learning for uav absolute visual localization,

A. Couturier and M. A. Akhloufi, “A review on deep learning for uav absolute visual localization,”Drones, vol. 8, no. 11, p. 622, 2024

work page 2024
[5]

Long-range uav thermal geo-localization with satellite imagery,

J. Xiao, D. Tortei, E. Roura, and G. Loianno, “Long-range uav thermal geo-localization with satellite imagery,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS). IEEE, 2023, pp. 5820–5827

work page 2023
[6]

Sthn: Deep homography estimation for uav thermal geo-localization with satellite imagery,

J. Xiao, N. Zhang, D. Tortei, and G. Loianno, “Sthn: Deep homography estimation for uav thermal geo-localization with satellite imagery,”IEEE Robot. Autom. Lett., 2024

work page 2024
[7]

Uasthn: Uncertainty-aware deep homography estimation for uav satellite-thermal geo-localization,

J. Xiao and G. Loianno, “Uasthn: Uncertainty-aware deep homography estimation for uav satellite-thermal geo-localization,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA). IEEE, 2025, pp. 14 066–14 072

work page 2025
[8]

Leveraging map retrieval and alignment for robust uav visual geo-localization,

M. He, J. Liu, P. Gu, and Z. Meng, “Leveraging map retrieval and alignment for robust uav visual geo-localization,”IEEE Trans. Instrum. Meas., vol. 73, pp. 1–13, 2024

work page 2024
[9]

Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark

Y . Ye, X. Teng, S. Chen, Z. Li, L. Liu, Q. Yu, and T. Tan, “Exploring the best way for uav visual localization under low-altitude multi-view observation condition: a benchmark,”arXiv:2503.10692, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Airgeonet: A map-guided visual geo-localization approach for aerial vehicles,

X. Meng, W. Guo, K. Zhou, T. Sun, L. Deng, S. Yu, and Y . Feng, “Airgeonet: A map-guided visual geo-localization approach for aerial vehicles,”IEEE Trans. Geosci. Remote Sens., 2024

work page 2024
[11]

Xoftr: Cross-modal feature matching transformer,

¨O. Tuzcuo˘glu, A. K ¨oksal, B. Sofu, S. Kalkan, and A. A. Alatan, “Xoftr: Cross-modal feature matching transformer,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 4275–4286

work page 2024
[12]

Uav- geoloc: A large-vocabulary dataset and geometry-transformed method for uav geo-localization,

R. Wu, J. Deng, M. Mou, X. He, M. Zhang, Y . Liu, and S. Yan, “Uav- geoloc: A large-vocabulary dataset and geometry-transformed method for uav geo-localization,”IEEE Robot. Autom. Lett., 2025

work page 2025
[13]

Game4loc: A uav geo-localization benchmark from game data,

Y . Ji, B. He, Z. Tan, and L. Wu, “Game4loc: A uav geo-localization benchmark from game data,” inProc. AAAI Conf. Artif. Intell., vol. 39, no. 4, 2025, pp. 3913–3921

work page 2025
[14]

Uav-visloc: A large-scale dataset for uav visual localization,

W. Xu, Y . Yao, J. Cao, Z. Wei, C. Liu, J. Wang, and M. Peng, “Uav-visloc: A large-scale dataset for uav visual localization,” arXiv:2405.11936, 2024

work page arXiv 2024
[15]

Sues-200: A multi-height multi-scene cross-view image benchmark across drone and satellite,

R. Zhu, L. Yin, M. Yang, F. Wu, Y . Yang, and W. Hu, “Sues-200: A multi-height multi-scene cross-view image benchmark across drone and satellite,”IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 9, pp. 4825–4839, 2023

work page 2023
[16]

Vision- based uav self-positioning in low-altitude urban environments,

M. Dai, E. Zheng, Z. Feng, L. Qi, J. Zhuang, and W. Yang, “Vision- based uav self-positioning in low-altitude urban environments,”IEEE Trans. Image Process., vol. 33, pp. 493–508, 2023. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 15

work page 2023
[17]

Uav geo-localization for navigation: A survey,

D. Avola, L. Cinque, E. Emam, F. Fontana, G. L. Foresti, M. R. Marini, A. Mecca, and D. Pannone, “Uav geo-localization for navigation: A survey,”IEEE Access, 2024

work page 2024
[18]

Mmgeo: Multimodal compositional geo-localization for uavs,

Y . Ji, B. He, Z. Tan, and L. Wu, “Mmgeo: Multimodal compositional geo-localization for uavs,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2025, pp. 25 165–25 175

work page 2025
[19]

Netvlad: Cnn architecture for weakly supervised place recognition,

R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 5297–5307

work page 2016
[20]

A transformer-based fea- ture segmentation and region alignment method for uav-view geo- localization,

M. Dai, J. Hu, J. Zhuang, and E. Zheng, “A transformer-based fea- ture segmentation and region alignment method for uav-view geo- localization,”IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 7, pp. 4376–4389, 2021

work page 2021
[21]

Camp: A cross-view geo-localization method using contrastive attributes mining and position-aware partitioning,

Q. Wu, Y . Wan, Z. Zheng, Y . Zhang, G. Wang, and Z. Zhao, “Camp: A cross-view geo-localization method using contrastive attributes mining and position-aware partitioning,”IEEE Trans. Geosci. Remote Sens., 2024

work page 2024
[22]

Segcn: A semantic-aware graph convolutional network for uav geo-localization,

X. Liu, Z. Wang, Y . Wu, and Q. Miao, “Segcn: A semantic-aware graph convolutional network for uav geo-localization,”IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., vol. 17, pp. 6055–6066, 2024

work page 2024
[23]

Anyloc: Towards universal visual place recognition,

N. Keetha, A. Mishra, J. Karhade, K. M. Jatavallabhula, S. Scherer, M. Krishna, and S. Garg, “Anyloc: Towards universal visual place recognition,”IEEE Robot. Autom. Lett., vol. 9, no. 2, pp. 1286–1293, 2023

work page 2023
[24]

DINOv3

O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoaet al., “Dinov3,” arXiv:2508.10104, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Fast normalized cross-correlation,

J.-C. Yoo and T. H. Han, “Fast normalized cross-correlation,”Circuits, Syst. Signal Process., vol. 28, no. 6, pp. 819–843, 2009

work page 2009
[26]

Superpoint: Self- supervised interest point detection and description,

D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self- supervised interest point detection and description,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2018, pp. 224–236

work page 2018
[27]

Loftr: Detector-free local feature matching with transformers,

J. Sun, Z. Shen, Y . Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 8922–8931

work page 2021
[28]

Roma: Robust dense feature matching,

J. Edstedt, Q. Sun, G. B ¨okman, M. Wadenb¨ack, and M. Felsberg, “Roma: Robust dense feature matching,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 19 790–19 800

work page 2024
[29]

Minima: Modality invariant image matching,

J. Ren, X. Jiang, Z. Li, D. Liang, X. Zhou, and X. Bai, “Minima: Modality invariant image matching,” inProc. Comput. Vis. Pattern Recognit. Conf. (CVPR), 2025, pp. 23 059–23 068

work page 2025
[30]

Os-fpi: A coarse- to-fine one-stream network for uav geolocalization,

J. Chen, E. Zheng, M. Dai, Y . Chen, and Y . Lu, “Os-fpi: A coarse- to-fine one-stream network for uav geolocalization,”IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., vol. 17, pp. 7852–7866, 2024

work page 2024
[31]

Enhancing uav geo-location with multi-modal transformer networks: The mmglt approach,

W. Xu, N. Chen, J. Yuan, J. Fan, W. Chen, and E. Zheng, “Enhancing uav geo-location with multi-modal transformer networks: The mmglt approach,”IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., 2026

work page 2026
[32]

Uav-tirvis: A benchmark dataset for thermal–visible image registration from aerial platforms,

C.-E. Vasile, C. B ˆır˘a, and R. Hobincu, “Uav-tirvis: A benchmark dataset for thermal–visible image registration from aerial platforms,”J. Imag., vol. 11, no. 12, p. 432, 2025

work page 2025
[33]

Mcgs-reid: A visible-infrared vehicle reidentification method using modal-cross graph sampler,

J. Liu, C. Zhao, C. Zhao, N. Su, W. Lu, Y . Yan, S. Feng, and Y . Qu, “Mcgs-reid: A visible-infrared vehicle reidentification method using modal-cross graph sampler,”IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens., vol. 18, pp. 18 806–18 818, 2024

work page 2024
[34]

Multimodal absolute visual localization for unmanned aerial vehicles,

Z. Liu, H. Li, Z. Zhang, Y . Lyu, and J. Xiong, “Multimodal absolute visual localization for unmanned aerial vehicles,”IEEE Trans. Veh. Technol., vol. 73, no. 11, pp. 16 402–16 415, 2024

work page 2024
[35]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,”arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

Fine-tuning cnn image retrieval with no human annotation,

F. Radenovi ´c, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no human annotation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 7, pp. 1655–1668, 2018

work page 2018
[37]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[38]

Pnp problem revisited,

Y . Wu and Z. Hu, “Pnp problem revisited,”J. Math. Imag. Vis., vol. 24, no. 1, pp. 131–141, 2006

work page 2006
[39]

Locality preserving matching,

J. Ma, J. Zhao, J. Jiang, H. Zhou, and X. Guo, “Locality preserving matching,”Int. J. Comput. Vis., vol. 127, no. 5, pp. 512–531, 2019

work page 2019
[40]

Reliable image matching via photometric and geometric constraints structured by delaunay triangulation,

S. Jiang and W. Jiang, “Reliable image matching via photometric and geometric constraints structured by delaunay triangulation,”ISPRS J. Photogrammetry Remote Sens., vol. 153, pp. 1–20, 2019

work page 2019
[41]

Visual place recognition: A survey,

S. Lowry, N. S ¨underhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual place recognition: A survey,”IEEE Trans. Robot., vol. 32, no. 1, pp. 1–19, 2015

work page 2015
[42]

Vins-mono: A robust and versatile monocular visual-inertial state estimator,

T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,”IEEE Trans. Robot., vol. 34, no. 4, pp. 1004–1020, 2018

work page 2018
[43]

A micro lie theory f or state es timation in robotics,

J. Sola, J. Deray, and D. Atchuthan, “A micro lie theory for state estimation in robotics,”arXiv:1812.01537, 2018

work page arXiv 2018
[44]

Fundamentals of statistical signal processing: Estima- tion theory,

S. K. Sengijpta, “Fundamentals of statistical signal processing: Estima- tion theory,” 1995

work page 1995
[45]

On degeneracy of optimization- based state estimation problems,

J. Zhang, M. Kaess, and S. Singh, “On degeneracy of optimization- based state estimation problems,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA). IEEE, 2016, pp. 809–816

work page 2016
[46]

Ortholoc: Uav 6-dof localization and calibration using orthographic geodata.arXiv preprint arXiv:2509.18350, 2025

O. Dhaouadi, R. Marin, J. Meier, J. Kaiser, and D. Cremers, “Ortholoc: Uav 6-dof localization and calibration using orthographic geodata,” arXiv:2509.18350, 2025

work page arXiv 2025
[47]

Geovins: Geographic-visual-inertial navigation system for large-scale drift-free aerial state estimation,

C. Li, M. He, C. Chen, J. Liu, X. Lyu, G. Huang, and Z. Meng, “Geovins: Geographic-visual-inertial navigation system for large-scale drift-free aerial state estimation,”IEEE Trans. Robot., 2025. Xiaoran Zhangreceived the B.E. degree in sim- ulation engineering from the National University of Defense Technology, Changsha, China, in 2024, where he is curren...

work page 2025