GeoMix: Descriptor-Free Visual Localization via Global Context and Multi-Detector Training

Esa Rahtu; Juho Kannala; Xinjue Wang; Yejun Zhang; Zihan Wang

arxiv: 2607.02486 · v1 · pith:LSXMOUMMnew · submitted 2026-07-02 · 💻 cs.CV

GeoMix: Descriptor-Free Visual Localization via Global Context and Multi-Detector Training

Yejun Zhang , Xinjue Wang , Zihan Wang , Esa Rahtu , Juho Kannala This is my paper

Pith reviewed 2026-07-03 14:42 UTC · model grok-4.3

classification 💻 cs.CV

keywords descriptor-free visual localization2D-3D matchingglobal contextmulti-detector traininggeometric discriminabilitykeypoint embeddingscross-attention

0 comments

The pith

GeoMix closes much of the accuracy gap in descriptor-free visual localization by adding global context and training across multiple keypoint detectors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Descriptor-free visual localization avoids storing high-dimensional descriptors to improve privacy and simplify map maintenance, yet its accuracy has lagged behind methods that use appearance descriptors. The paper identifies insufficient geometric discriminability as the core problem and shows that local spatial embeddings, global cross-attention context nodes, and joint training on keypoints from several detectors can strengthen geometry-only matching. Experiments on multiple benchmarks demonstrate large reductions in pose error and zero-shot generalization to new detectors. A reader would care if these changes make descriptor-free pipelines viable where storage costs or privacy constraints rule out appearance features.

Core claim

GeoMix is a descriptor-free 2D-3D matching framework that strengthens geometric discriminability at three levels: locally with directional and distance-aware embeddings for neighborhood aggregation, globally with learnable context nodes that aggregate scene-wide information via cross-attention, and at training time with Mix-Training that exploits the detector-agnostic geometry space to optimize representations across multiple keypoint detectors. On MegaDepth, Cambridge Landmarks, 7Scenes, and Aachen Day-Night this yields state-of-the-art results among descriptor-free methods, cutting 75th-percentile rotation error by 89 percent and translation error by up to 90 percent while generalizing zer

What carries the argument

Mix-Training, which jointly optimizes heterogeneous keypoints from multiple detectors inside a single shared geometry-only embedding space, augmented by local directional embeddings and global learnable context nodes with cross-attention.

If this is right

75th-percentile rotation error drops by 89 percent and translation error by up to 90 percent versus the prior best descriptor-free baseline.
The learned representations generalize zero-shot to keypoint detectors never seen during training.
The performance gap to descriptor-based localization pipelines narrows while still avoiding high-dimensional descriptor storage.
Scene privacy is preserved and map maintenance is simplified because no appearance descriptors are required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Structure-from-motion or other geometry-only pipelines could adopt similar multi-detector training to reduce dependence on any single keypoint detector.
Detector choice may become less critical once a shared geometry space is learned, potentially allowing systems to pick detectors based on speed or robustness alone.
The same global context nodes might help resolve ambiguities in other sparse matching tasks that currently rely only on local neighborhoods.

Load-bearing premise

Keypoints produced by different detectors share enough common geometric structure that they can be trained together in one embedding space without descriptor alignment or detector-specific adaptations.

What would settle it

If single-detector training on the same data produces equal or lower pose error than multi-detector Mix-Training, or if the trained model fails to match performance on a completely new keypoint detector type.

Figures

Figures reproduced from arXiv: 2607.02486 by Esa Rahtu, Juho Kannala, Xinjue Wang, Yejun Zhang, Zihan Wang.

**Figure 1.** Figure 1: Comparison of visual localization pipelines. (a) Descriptor-based methods achieve high accuracy but require large storage, raise privacy concerns, and incur costly map maintenance. (b-i) Descriptor-free methods offer compact, privacy-preserving localization yet lag in accuracy. (b-ii) GeoMix bridges this gap with directional and distance-aware geometric embeddings, global context nodes, and a Mix-Training… view at source ↗

**Figure 2.** Figure 2: Overview of GeoMix. A shared encoder lifts 2D keypoints and 3D points to bearing vectors with RGB colors. The attention module alternates self-attention and crossattention over L layers: self-attention aggregates local geometry via annular convolution and max-pooling enriched by directional and distance embeddings, while learnable global context (GC) nodes broadcast scene-wide information through cross-… view at source ↗

**Figure 3.** Figure 3: Keypoint distributions of SIFT, SuperPoint, and DISK on the same image. Each detector produces a distinct spatial layout with few shared keypoints (rightmost), motivating Mix-Training to learn detector-agnostic geometric representations. Subsequently, following A2-GNN [75], we utilize a learning-based outlier rejection module to refine these initial matches. This classifier estimates a confidence score f… view at source ↗

**Figure 4.** Figure 4: Sensitivity to outlier ratio. Reprojection AUC at 1, 5, and 10 px thresholds for prior descriptor-free methods and GeoMix as the outlier ratio increases from 0 to 1. Oracle denotes the upper bound obtained with ground-truth matches. Ablation Study. Tab. 6 presents a progressive ablation on MegaDepth with k = 10 retrieved images, starting from the A2-GNN [75] baseline. We first add the learnable global cont… view at source ↗

**Figure 1.** Figure 1: Qualitative comparison on the OldHospital scene. Top: 3D point cloud. Bottom: 2D reprojection results. Cyan: 2D keypoints; Green: inlier matches; Red: outlier matches [PITH_FULL_IMAGE:figures/full_fig_p027_1.png] view at source ↗

**Figure 2.** Figure 2: Qualitative comparison on the ShopFacade scene. Top: 3D point cloud. Bottom: 2D reprojection results. Cyan: 2D keypoints; Green: inlier matches; Red: outlier matches [PITH_FULL_IMAGE:figures/full_fig_p028_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison on the StMarysChurch scene. Top: 3D point cloud. Bottom: 2D reprojection results. Cyan: 2D keypoints; Green: inlier matches; Red: outlier matches [PITH_FULL_IMAGE:figures/full_fig_p029_3.png] view at source ↗

read the original abstract

Descriptor-free visual localization eliminates high-dimensional descriptor storage, preserves scene privacy, and simplifies map maintenance, yet its accuracy still lags far behind descriptor-based pipelines. We identify this gap to insufficient geometric discriminability in geometry-only matching. Without visual appearance, current methods underutilize local geometry cues, lack the global context among keypoints, and overfit to a single keypoint detector. We further observe that descriptor-free matching naturally enables multi-detector training, as heterogeneous keypoints can be optimized in a shared geometry-only space without aligning descriptor spaces. Building on these insights, we propose GeoMix, a descriptor-free 2D-3D matching framework that strengthens geometric discriminability at three levels. Locally, directional and distance-aware embeddings enrich neighborhood aggregation with fine-grained spatial structure. Globally, learnable context nodes aggregate and redistribute scene-wide information via cross-attention to resolve ambiguities beyond local receptive fields. At the training level, Mix-Training exploits this detector-agnostic geometry space to learn representations across multiple keypoint detectors. Extensive experiments on MegaDepth, Cambridge Landmarks, 7Scenes, and Aachen Day-Night show that GeoMix sets a new state of the art among descriptor-free methods, reducing 75th-percentile rotation error by 89\% and translation error by up to 90\% over the previous best, while generalizing zero-shot to unseen detectors and narrowing the gap to descriptor-based pipelines. Code is available at $\href{https://github.com/YejunZhang/Geomix}{\text{this links}}$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GeoMix combines directional local embeddings, global context nodes, and multi-detector training to push descriptor-free localization accuracy, but the large reported gains rest on an abstract-only view so far.

read the letter

The main takeaway is that this work strengthens geometry-only matching at three levels and claims to close much of the gap to descriptor-based methods on MegaDepth, Cambridge, 7Scenes, and Aachen. The multi-detector training angle stands out because it exploits the fact that no descriptor alignment is needed, letting the model see heterogeneous keypoints in one space.

What the paper does well is name the concrete shortcomings in prior descriptor-free work—weak local geometry cues, missing scene-wide context, and single-detector overfitting—and then give a direct architectural response. The directional/distance embeddings and the learnable context nodes with cross-attention are straightforward additions that target those gaps. Making the training detector-agnostic is a clean observation that follows from the problem setup.

The soft spots are mostly around verification. The abstract states 89-90% error reductions and zero-shot generalization to unseen detectors, yet without ablations, error bars, or the exact baseline implementations it is difficult to judge whether the gains come from the proposed components or from other factors. The stress-test point about possible detector bias in the shared embedding space is reasonable; the summary does not describe an explicit invariance term or masking that would guarantee the representations stay detector-neutral. If the full paper shows careful controls on that, the claim strengthens; otherwise it remains an open question.

This is aimed at people building visual localization pipelines for robotics or AR who already care about map size and privacy. A reader who works on keypoint matching or 2D-3D correspondence would find the architecture details useful once the experiments are inspected. It is worth sending to a serious referee because the problem is practical, the synthesis is new enough, and the empirical claims are specific enough to be checked.

Referee Report

2 major / 1 minor

Summary. The paper introduces GeoMix, a descriptor-free 2D-3D matching framework for visual localization. It strengthens geometric discriminability via local directional/distance-aware embeddings for neighborhood aggregation, global learnable context nodes that use cross-attention to incorporate scene-wide information, and Mix-Training that jointly optimizes heterogeneous keypoints from multiple detectors in a shared geometry-only embedding space. Experiments on MegaDepth, Cambridge Landmarks, 7Scenes, and Aachen Day-Night report new state-of-the-art results among descriptor-free methods, with 75th-percentile rotation error reduced by 89% and translation error by up to 90% relative to prior best, plus zero-shot generalization to unseen detectors and a narrowed gap to descriptor-based pipelines.

Significance. If the empirical claims hold after verification of ablations and generalization, the work would be significant for descriptor-free localization: it directly addresses the accuracy gap with descriptor-based methods while preserving advantages in storage, privacy, and map maintenance. The multi-detector training strategy, if shown to produce detector-agnostic representations, represents a practical advance over single-detector approaches.

major comments (2)

[Abstract] Abstract: The central claim of zero-shot generalization to unseen detectors rests on the assertion that 'descriptor-free matching naturally enables multi-detector training' in a shared geometry-only space without alignment or detector-specific adaptations. However, the manuscript provides no explicit mechanism (e.g., invariance loss, detector-identity masking, or distribution matching) to enforce detector-agnostic embeddings, leaving open the possibility that reported gains reflect detector-specific biases in keypoint distributions rather than the architecture.
[Abstract] Abstract and experimental claims: The reported 89% and 90% reductions in 75th-percentile rotation and translation errors are load-bearing for the SOTA conclusion, yet the abstract does not reference ablations isolating the contribution of local embeddings, global context nodes, or Mix-Training, nor does it report error bars or statistical tests. Without these, it is unclear whether the gains are robust or attributable to data selection or baseline implementation details.

minor comments (1)

[Abstract] The abstract mentions 'Code is available at this links' but the hyperlink text is incomplete; ensure the repository link is functional and includes the exact training and evaluation scripts used for the reported numbers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our work. We provide point-by-point responses to the major comments below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of zero-shot generalization to unseen detectors rests on the assertion that 'descriptor-free matching naturally enables multi-detector training' in a shared geometry-only space without alignment or detector-specific adaptations. However, the manuscript provides no explicit mechanism (e.g., invariance loss, detector-identity masking, or distribution matching) to enforce detector-agnostic embeddings, leaving open the possibility that reported gains reflect detector-specific biases in keypoint distributions rather than the architecture.

Authors: The manuscript argues that the purely geometric nature of the matching task (operating on positions, directions, and distances without appearance descriptors) inherently permits joint optimization across detectors in a shared space, as no descriptor alignment is required. This architectural property is presented as the enabling mechanism, with empirical support from the zero-shot generalization results to unseen detectors. We acknowledge that additional explicit regularizers could further strengthen the claim and will expand the discussion of this design choice in the methods section of the revision. revision: partial
Referee: [Abstract] Abstract and experimental claims: The reported 89% and 90% reductions in 75th-percentile rotation and translation errors are load-bearing for the SOTA conclusion, yet the abstract does not reference ablations isolating the contribution of local embeddings, global context nodes, or Mix-Training, nor does it report error bars or statistical tests. Without these, it is unclear whether the gains are robust or attributable to data selection or baseline implementation details.

Authors: Abstracts are concise by nature and highlight the main results; the full manuscript provides ablations in the experiments section that isolate the contributions of local directional embeddings, global context nodes, and Mix-Training. The error reductions are obtained under standard benchmark protocols on multiple datasets. We agree that reporting error bars would improve transparency and will add them to the revised manuscript or supplementary material. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical engineering contribution with external validation

full rationale

The paper presents GeoMix as an empirical architecture (local directional embeddings, global context nodes via cross-attention, and multi-detector Mix-Training) whose performance claims rest on benchmark results (MegaDepth, Cambridge Landmarks, 7Scenes, Aachen) rather than any closed-form derivation or first-principles reduction. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text; the zero-shot generalization claim is asserted as a consequence of the detector-agnostic geometry space but is not derived by construction from the inputs. This is the normal case of a methods paper whose central results are externally falsifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on standard supervised deep learning assumptions (neural networks can learn useful geometric features from labeled correspondences) and the unverified premise that multi-detector data can be mixed without introducing bias; no new physical entities or ad-hoc constants are introduced.

axioms (1)

domain assumption Neural networks trained on geometry-only inputs can learn representations that generalize across different keypoint detectors without descriptor alignment.
Invoked in the description of Mix-Training and zero-shot generalization claims.

pith-pipeline@v0.9.1-grok · 5819 in / 1408 out tokens · 26975 ms · 2026-07-03T14:42:06.994701+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

107 extracted references · 7 canonical work pages · 3 internal anchors

[1]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Fully Geometric Panoramic Localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[2]

Advances in Neural Information Processing Systems , year=

LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment , author=. Advances in Neural Information Processing Systems , year=
[3]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Superglue: Learning feature matching with graph neural networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[4]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Lightglue: Local feature matching at light speed , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[5]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

LoFTR: Detector-free local feature matching with transformers , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[6]

Wang, Shuzhe and Kannala, Juho and Barath, Daniel , booktitle=
[7]

Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition , pages=

Predator: Registration of 3d point clouds with low overlap , author=. Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition , pages=
[8]

European Conference on Computer Vision , pages=

Is geometry enough for matching in visual localization? , author=. European Conference on Computer Vision , pages=. 2022 , organization=

2022
[9]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[10]

Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16 , pages=

Solving the blind perspective-n-point problem end-to-end with robust differentiable geometric optimization , author=. Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16 , pages=. 2020 , organization=

2020
[11]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Progressive correspondence pruning by consensus learning , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[12]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Geometric transformer for fast and robust point cloud registration , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[13]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

Are large-scale 3d models really necessary for accurate visual localization? , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
[14]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

From coarse to fine: Robust hierarchical localization at large scale , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[15]

Vision Research Lab, University of California, Santa Barbara , year=

RANSAC for Dummies , author=. Vision Research Lab, University of California, Santa Barbara , year=
[16]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

How privacy-preserving are line clouds? recovering scene details from 3d lines , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[17]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Privacy Preserving Localization via Coordinate Permutations , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[18]

Instance Normalization: The Missing Ingredient for Fast Stylization

Instance normalization: The missing ingredient for fast stylization , author=. arXiv preprint arXiv:1607.08022 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[19]

Rectifier nonlinearities improve neural network acoustic models , author=. Proc. icml , volume=. 2013 , organization=

2013
[20]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=
[21]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Learning to find good correspondences , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[22]

FirstName LastName , title =
[23]

FirstName Alpher , title =
[24]

Journal of Foo , volume = 13, number = 1, pages =

FirstName Alpher and FirstName Fotheringham-Smythe , title =. Journal of Foo , volume = 13, number = 1, pages =
[25]

Journal of Foo , volume = 14, number = 1, pages =

FirstName Alpher and FirstName Fotheringham-Smythe and FirstName Gamow , title =. Journal of Foo , volume = 14, number = 1, pages =
[26]

FirstName Alpher and FirstName Gamow , title =
[27]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Megadepth: Learning single-view depth prediction from internet photos , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[28]

Proceedings of the IEEE international conference on computer vision , pages=

Posenet: A convolutional network for real-time 6-dof camera relocalization , author=. Proceedings of the IEEE international conference on computer vision , pages=
[29]

Adam: A Method for Stochastic Optimization

Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[30]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Scene coordinate regression forests for camera relocalization in RGB-D images , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[31]

Structure-from-Motion Revisited , booktitle=

Sch\". Structure-from-Motion Revisited , booktitle=
[32]

NetVLAD:

Arandjelovic, Relja and Gronat, Petr and Torii, Akihiko and Pajdla, Tomas and Sivic, Josef , booktitle=. NetVLAD:
[33]

Fine-tuning

Radenovi. Fine-tuning. IEEE transactions on pattern analysis and machine intelligence , volume=. 2018 , publisher=

2018
[34]

Advances in neural information processing systems , volume=

Sinkhorn distances: Lightspeed computation of optimal transport , author=. Advances in neural information processing systems , volume=
[35]

Pacific Journal of Mathematics , volume=

Concerning nonnegative matrices and doubly stochastic matrices , author=. Pacific Journal of Mathematics , volume=. 1967 , publisher=

1967
[36]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Learning multi-scene absolute pose regression with transformers , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[37]

IEEE transactions on pattern analysis and machine intelligence , volume=

Visual camera re-localization from RGB and RGB-D images using DSAC , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2021 , publisher=

2021
[38]

IEEE transactions on pattern analysis and machine intelligence , volume=

Efficient & effective prioritized matching for large-scale image-based localization , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2016 , publisher=

2016
[39]

Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

Superpoint: Self-supervised interest point detection and description , author=. Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=
[40]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Hybrid scene compression for visual localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[41]

International journal of computer vision , volume=

EP n P: An accurate O (n) solution to the P n P problem , author=. International journal of computer vision , volume=. 2009 , publisher=

2009
[42]

Pattern Recognition: 25th DAGM Symposium, Magdeburg, Germany, September 10-12, 2003

Locally optimized RANSAC , author=. Pattern Recognition: 25th DAGM Symposium, Magdeburg, Germany, September 10-12, 2003. Proceedings 25 , pages=. 2003 , organization=

2003
[43]

Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part IV 16 , pages=

Self-supervising fine-grained region similarities for large-scale image localization , author=. Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part IV 16 , pages=. 2020 , organization=

2020
[44]

International Journal of Computer Vision , volume=

End-to-end learning of deep visual representations for image retrieval , author=. International Journal of Computer Vision , volume=. 2017 , publisher=

2017
[45]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Learning with average precision: Training image retrieval with a listwise loss , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[46]

2012 IEEE conference on computer vision and pattern recognition , pages=

Three things everyone should know to improve object retrieval , author=. 2012 IEEE conference on computer vision and pattern recognition , pages=. 2012 , organization=

2012
[47]

2011 International Conference on Computer Vision , pages=

Fast image-based localization using direct 2d-to-3d matching , author=. 2011 International Conference on Computer Vision , pages=. 2011 , organization=

2011
[48]

Computer Vision--ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part I 12 , pages=

Improving image-based localization by active correspondence search , author=. Computer Vision--ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part I 12 , pages=. 2012 , organization=

2012
[49]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Clustergnn: Cluster-based coarse-to-fine graph neural network for efficient feature matching , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[50]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[51]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

CoMatcher: Multi-View Collaborative Feature Matching , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[52]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Learning two-view correspondences and geometry using order-aware network , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
[53]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

Minimal scene descriptions from structure from motion models , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
[54]

2015 IEEE international conference on robotics and automation (ICRA) , pages=

The gist of maps-summarizing experience for lifelong localization , author=. 2015 IEEE international conference on robotics and automation (ICRA) , pages=. 2015 , organization=

2015
[55]

Computer Vision--ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part II 11 , pages=

Location recognition using prioritized feature matching , author=. Computer Vision--ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part II 11 , pages=. 2010 , organization=

2010
[56]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Scenesqueezer: Learning to compress scene for camera relocalization , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[57]

European Conference on Computer Vision (ECCV) , pages=

MAD-DR: Map Compression for Visual Localization with Matchness Aware Descriptor Dimension Reduction , author=. European Conference on Computer Vision (ECCV) , pages=. 2025 , publisher=

2025
[58]

2023 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Learning-based dimensionality reduction for computing compact and effective local feature descriptors , author=. 2023 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2023 , organization=

2023
[59]

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004

PCA-SIFT: A more distinctive representation for local image descriptors , author=. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. , volume=. 2004 , organization=

2004
[60]

Proceedings of the IEEE International Conference on Computer Vision , pages=

Hyperpoints and fine vocabularies for large-scale location recognition , author=. Proceedings of the IEEE International Conference on Computer Vision , pages=
[61]

IEEE transactions on pattern analysis and machine intelligence , volume=

Product quantization for nearest neighbor search , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2010 , publisher=

2010
[62]

arXiv preprint arXiv:2407.15540 , year=

Differentiable Product Quantization for Memory Efficient Camera Relocalization , author=. arXiv preprint arXiv:2407.15540 , year=

work page arXiv
[63]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Structure-from-motion revisited , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[64]

ACM siggraph 2006 papers , pages=

Photo tourism: exploring photo collections in 3D , author=. ACM siggraph 2006 papers , pages=

2006
[65]

2013 International Conference on 3D Vision-3DV 2013 , pages=

Towards linear-time incremental structure from motion , author=. 2013 International Conference on 3D Vision-3DV 2013 , pages=. 2013 , organization=

2013
[66]

Handbook of augmented reality , pages=

Augmented reality: an overview , author=. Handbook of augmented reality , pages=. 2011 , publisher=

2011
[67]

Foundations and Trends

A survey of augmented reality , author=. Foundations and Trends. 2015 , publisher=

2015
[68]

IEEE Transactions on robotics , volume=

Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age , author=. IEEE Transactions on robotics , volume=. 2016 , publisher=

2016
[69]

Artificial intelligence review , volume=

Visual simultaneous localization and mapping: a survey , author=. Artificial intelligence review , volume=. 2015 , publisher=

2015
[70]

IEEE transactions on robotics , volume=

Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras , author=. IEEE transactions on robotics , volume=. 2017 , publisher=

2017
[71]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Continual learning for image-based camera localization , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[72]

International Journal of Computer Vision , pages=

Hscnet++: Hierarchical scene coordinate classification and regression for visual localization with transformer , author=. International Journal of Computer Vision , pages=. 2024 , publisher=

2024
[73]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Hierarchical scene coordinate classification and regression for visual localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[74]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

GLACE: Global Local Accelerated Coordinate Encoding , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[75]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Cross-descriptor visual localization and mapping , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[76]

International conference on machine learning , pages=

Batch normalization: Accelerating deep network training by reducing internal covariate shift , author=. International conference on machine learning , pages=. 2015 , organization=

2015
[77]

Advances in neural information processing systems , volume=

R2d2: Reliable and repeatable detector and descriptor , author=. Advances in neural information processing systems , volume=
[78]

Proceedings of the ieee/cvf conference on computer vision and pattern recognition , pages=

D2-net: A trainable cnn for joint description and detection of local features , author=. Proceedings of the ieee/cvf conference on computer vision and pattern recognition , pages=
[79]

Advances in Neural Information Processing Systems , volume=

DISK: Learning local features with policy gradient , author=. Advances in Neural Information Processing Systems , volume=
[80]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

24/7 place recognition by view synthesis , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Showing first 80 references.

[1] [1]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Fully Geometric Panoramic Localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[2] [2]

Advances in Neural Information Processing Systems , year=

LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment , author=. Advances in Neural Information Processing Systems , year=

[3] [3]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Superglue: Learning feature matching with graph neural networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[4] [4]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Lightglue: Local feature matching at light speed , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[5] [5]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

LoFTR: Detector-free local feature matching with transformers , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[6] [6]

Wang, Shuzhe and Kannala, Juho and Barath, Daniel , booktitle=

[7] [7]

Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition , pages=

Predator: Registration of 3d point clouds with low overlap , author=. Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition , pages=

[8] [8]

European Conference on Computer Vision , pages=

Is geometry enough for matching in visual localization? , author=. European Conference on Computer Vision , pages=. 2022 , organization=

2022

[9] [9]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

[10] [10]

Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16 , pages=

Solving the blind perspective-n-point problem end-to-end with robust differentiable geometric optimization , author=. Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16 , pages=. 2020 , organization=

2020

[11] [11]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Progressive correspondence pruning by consensus learning , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[12] [12]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Geometric transformer for fast and robust point cloud registration , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[13] [13]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

Are large-scale 3d models really necessary for accurate visual localization? , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

[14] [14]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

From coarse to fine: Robust hierarchical localization at large scale , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[15] [15]

Vision Research Lab, University of California, Santa Barbara , year=

RANSAC for Dummies , author=. Vision Research Lab, University of California, Santa Barbara , year=

[16] [16]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

How privacy-preserving are line clouds? recovering scene details from 3d lines , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[17] [17]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Privacy Preserving Localization via Coordinate Permutations , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[18] [18]

Instance Normalization: The Missing Ingredient for Fast Stylization

Instance normalization: The missing ingredient for fast stylization , author=. arXiv preprint arXiv:1607.08022 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

Rectifier nonlinearities improve neural network acoustic models , author=. Proc. icml , volume=. 2013 , organization=

2013

[20] [20]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

[21] [21]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Learning to find good correspondences , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

[22] [22]

FirstName LastName , title =

[23] [23]

FirstName Alpher , title =

[24] [24]

Journal of Foo , volume = 13, number = 1, pages =

FirstName Alpher and FirstName Fotheringham-Smythe , title =. Journal of Foo , volume = 13, number = 1, pages =

[25] [25]

Journal of Foo , volume = 14, number = 1, pages =

FirstName Alpher and FirstName Fotheringham-Smythe and FirstName Gamow , title =. Journal of Foo , volume = 14, number = 1, pages =

[26] [26]

FirstName Alpher and FirstName Gamow , title =

[27] [27]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Megadepth: Learning single-view depth prediction from internet photos , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

[28] [28]

Proceedings of the IEEE international conference on computer vision , pages=

Posenet: A convolutional network for real-time 6-dof camera relocalization , author=. Proceedings of the IEEE international conference on computer vision , pages=

[29] [29]

Adam: A Method for Stochastic Optimization

Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Scene coordinate regression forests for camera relocalization in RGB-D images , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

[31] [31]

Structure-from-Motion Revisited , booktitle=

Sch\". Structure-from-Motion Revisited , booktitle=

[32] [32]

NetVLAD:

Arandjelovic, Relja and Gronat, Petr and Torii, Akihiko and Pajdla, Tomas and Sivic, Josef , booktitle=. NetVLAD:

[33] [33]

Fine-tuning

Radenovi. Fine-tuning. IEEE transactions on pattern analysis and machine intelligence , volume=. 2018 , publisher=

2018

[34] [34]

Advances in neural information processing systems , volume=

Sinkhorn distances: Lightspeed computation of optimal transport , author=. Advances in neural information processing systems , volume=

[35] [35]

Pacific Journal of Mathematics , volume=

Concerning nonnegative matrices and doubly stochastic matrices , author=. Pacific Journal of Mathematics , volume=. 1967 , publisher=

1967

[36] [36]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Learning multi-scene absolute pose regression with transformers , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[37] [37]

IEEE transactions on pattern analysis and machine intelligence , volume=

Visual camera re-localization from RGB and RGB-D images using DSAC , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2021 , publisher=

2021

[38] [38]

IEEE transactions on pattern analysis and machine intelligence , volume=

Efficient & effective prioritized matching for large-scale image-based localization , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2016 , publisher=

2016

[39] [39]

Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

Superpoint: Self-supervised interest point detection and description , author=. Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

[40] [40]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Hybrid scene compression for visual localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[41] [41]

International journal of computer vision , volume=

EP n P: An accurate O (n) solution to the P n P problem , author=. International journal of computer vision , volume=. 2009 , publisher=

2009

[42] [42]

Pattern Recognition: 25th DAGM Symposium, Magdeburg, Germany, September 10-12, 2003

Locally optimized RANSAC , author=. Pattern Recognition: 25th DAGM Symposium, Magdeburg, Germany, September 10-12, 2003. Proceedings 25 , pages=. 2003 , organization=

2003

[43] [43]

Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part IV 16 , pages=

Self-supervising fine-grained region similarities for large-scale image localization , author=. Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part IV 16 , pages=. 2020 , organization=

2020

[44] [44]

International Journal of Computer Vision , volume=

End-to-end learning of deep visual representations for image retrieval , author=. International Journal of Computer Vision , volume=. 2017 , publisher=

2017

[45] [45]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Learning with average precision: Training image retrieval with a listwise loss , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[46] [46]

2012 IEEE conference on computer vision and pattern recognition , pages=

Three things everyone should know to improve object retrieval , author=. 2012 IEEE conference on computer vision and pattern recognition , pages=. 2012 , organization=

2012

[47] [47]

2011 International Conference on Computer Vision , pages=

Fast image-based localization using direct 2d-to-3d matching , author=. 2011 International Conference on Computer Vision , pages=. 2011 , organization=

2011

[48] [48]

Computer Vision--ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part I 12 , pages=

Improving image-based localization by active correspondence search , author=. Computer Vision--ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part I 12 , pages=. 2012 , organization=

2012

[49] [49]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Clustergnn: Cluster-based coarse-to-fine graph neural network for efficient feature matching , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[50] [50]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[51] [51]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

CoMatcher: Multi-View Collaborative Feature Matching , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[52] [52]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Learning two-view correspondences and geometry using order-aware network , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

[53] [53]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

Minimal scene descriptions from structure from motion models , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

[54] [54]

2015 IEEE international conference on robotics and automation (ICRA) , pages=

The gist of maps-summarizing experience for lifelong localization , author=. 2015 IEEE international conference on robotics and automation (ICRA) , pages=. 2015 , organization=

2015

[55] [55]

Computer Vision--ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part II 11 , pages=

Location recognition using prioritized feature matching , author=. Computer Vision--ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part II 11 , pages=. 2010 , organization=

2010

[56] [56]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Scenesqueezer: Learning to compress scene for camera relocalization , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[57] [57]

European Conference on Computer Vision (ECCV) , pages=

MAD-DR: Map Compression for Visual Localization with Matchness Aware Descriptor Dimension Reduction , author=. European Conference on Computer Vision (ECCV) , pages=. 2025 , publisher=

2025

[58] [58]

2023 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Learning-based dimensionality reduction for computing compact and effective local feature descriptors , author=. 2023 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2023 , organization=

2023

[59] [59]

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004

PCA-SIFT: A more distinctive representation for local image descriptors , author=. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. , volume=. 2004 , organization=

2004

[60] [60]

Proceedings of the IEEE International Conference on Computer Vision , pages=

Hyperpoints and fine vocabularies for large-scale location recognition , author=. Proceedings of the IEEE International Conference on Computer Vision , pages=

[61] [61]

IEEE transactions on pattern analysis and machine intelligence , volume=

Product quantization for nearest neighbor search , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2010 , publisher=

2010

[62] [62]

arXiv preprint arXiv:2407.15540 , year=

Differentiable Product Quantization for Memory Efficient Camera Relocalization , author=. arXiv preprint arXiv:2407.15540 , year=

work page arXiv

[63] [63]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Structure-from-motion revisited , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

[64] [64]

ACM siggraph 2006 papers , pages=

Photo tourism: exploring photo collections in 3D , author=. ACM siggraph 2006 papers , pages=

2006

[65] [65]

2013 International Conference on 3D Vision-3DV 2013 , pages=

Towards linear-time incremental structure from motion , author=. 2013 International Conference on 3D Vision-3DV 2013 , pages=. 2013 , organization=

2013

[66] [66]

Handbook of augmented reality , pages=

Augmented reality: an overview , author=. Handbook of augmented reality , pages=. 2011 , publisher=

2011

[67] [67]

Foundations and Trends

A survey of augmented reality , author=. Foundations and Trends. 2015 , publisher=

2015

[68] [68]

IEEE Transactions on robotics , volume=

Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age , author=. IEEE Transactions on robotics , volume=. 2016 , publisher=

2016

[69] [69]

Artificial intelligence review , volume=

Visual simultaneous localization and mapping: a survey , author=. Artificial intelligence review , volume=. 2015 , publisher=

2015

[70] [70]

IEEE transactions on robotics , volume=

Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras , author=. IEEE transactions on robotics , volume=. 2017 , publisher=

2017

[71] [71]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Continual learning for image-based camera localization , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[72] [72]

International Journal of Computer Vision , pages=

Hscnet++: Hierarchical scene coordinate classification and regression for visual localization with transformer , author=. International Journal of Computer Vision , pages=. 2024 , publisher=

2024

[73] [73]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Hierarchical scene coordinate classification and regression for visual localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[74] [74]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

GLACE: Global Local Accelerated Coordinate Encoding , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[75] [75]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Cross-descriptor visual localization and mapping , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[76] [76]

International conference on machine learning , pages=

Batch normalization: Accelerating deep network training by reducing internal covariate shift , author=. International conference on machine learning , pages=. 2015 , organization=

2015

[77] [77]

Advances in neural information processing systems , volume=

R2d2: Reliable and repeatable detector and descriptor , author=. Advances in neural information processing systems , volume=

[78] [78]

Proceedings of the ieee/cvf conference on computer vision and pattern recognition , pages=

D2-net: A trainable cnn for joint description and detection of local features , author=. Proceedings of the ieee/cvf conference on computer vision and pattern recognition , pages=

[79] [79]

Advances in Neural Information Processing Systems , volume=

DISK: Learning local features with policy gradient , author=. Advances in Neural Information Processing Systems , volume=

[80] [80]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

24/7 place recognition by view synthesis , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=