Recognition: 2 theorem links
· Lean TheoremAsymLoc: Towards Asymmetric Feature Matching for Efficient Visual Localization
Pith reviewed 2026-05-10 18:04 UTC · model grok-4.3
The pith
A small student model reaches up to 95% of a large teacher's accuracy in visual localization by aligning features for simple nearest-neighbor matching.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AsymLoc is a distillation framework that aligns a Student to its Teacher through a combination of a geometry-driven matching objective and a joint detector-descriptor distillation objective, enabling fast, parameter-less nearest-neighbor matching. Experiments on HPatches, ScanNet, IMC2022, and Aachen show that it achieves up to 95% of the teacher's localization accuracy using an order of magnitude smaller models, significantly outperforming existing baselines.
What carries the argument
Asymmetric distillation framework that uses a geometry-driven matching objective together with joint detector-descriptor distillation to align teacher and student features for nearest-neighbor matching.
Where Pith is reading between the lines
- The same alignment objectives could be applied to other matching-heavy tasks such as image retrieval or stereo reconstruction to reduce reliance on learned matchers.
- Testing the framework when the student architecture differs more radically from the teacher would reveal how robust the geometry-driven objectives are to architectural gaps.
- Combining this distillation with model quantization or pruning could produce even smaller students while preserving the reported accuracy levels.
Load-bearing premise
The geometry-driven matching objective combined with joint detector-descriptor distillation can align features from the teacher and student models well enough to support accurate parameter-less nearest-neighbor matching.
What would settle it
Evaluating the distilled student on a new dataset with large domain shift where localization accuracy falls below 70% of the teacher's would show that the alignment is not sufficient.
Figures
read the original abstract
Precise and real-time visual localization is critical for applications like AR/VR and robotics, especially on resource-constrained edge devices such as smart glasses, where battery life and heat dissipation can be a primary concerns. While many efficient models exist, further reducing compute without sacrificing accuracy is essential for practical deployment. To address this, we propose asymmetric visual localization: a large Teacher model processes pre-mapped database images offline, while a lightweight Student model processes the query image online. This creates a challenge in matching features from two different models without resorting to heavy, learned matchers. We introduce AsymLoc, a novel distillation framework that aligns a Student to its Teacher through a combination of a geometry-driven matching objective and a joint detector-descriptor distillation objective, enabling fast, parameter-less nearest-neighbor matching. Extensive experiments on HPatches, ScanNet, IMC2022, and Aachen show that AsymLoc achieves up to 95% of the teacher's localization accuracy using an order of magnitude smaller models, significantly outperforming existing baselines and establishing a new state-of-the-art efficiency-accuracy trade-off.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AsymLoc, a distillation framework for asymmetric visual localization in which a large teacher model processes database images offline and a lightweight student model processes query images online. Alignment is achieved via a geometry-driven matching objective combined with joint detector-descriptor distillation, enabling parameter-less nearest-neighbor matching between the two models. Experiments across HPatches, ScanNet, IMC2022, and Aachen report that the student reaches up to 95% of teacher localization accuracy at roughly 10x smaller model size while outperforming prior baselines and establishing a new efficiency-accuracy trade-off.
Significance. If the empirical results hold, the work would be significant for practical deployment of visual localization on edge devices such as AR/VR headsets and mobile robots, where compute, power, and heat constraints are severe. The asymmetric teacher-student design with direct NN matching avoids heavy learned matchers and is supported by evaluation on four diverse datasets (HPatches, ScanNet, IMC2022, Aachen), which provides reasonably broad empirical grounding for the efficiency claims.
major comments (2)
- [§3.2] §3.2 (Distillation Objectives): the geometry-driven matching loss is asserted to produce descriptor spaces that support direct NN matching without learned components, yet the manuscript does not provide an explicit analysis or ablation showing that the student-teacher feature distributions are sufficiently aligned (e.g., no cosine-similarity histograms or nearest-neighbor recall curves between teacher and student descriptors). This is load-bearing for the central claim that parameter-less matching suffices.
- [§4.3] §4.3, Table 3 (Aachen day-night results): the reported 95% relative accuracy figure is given without per-sequence breakdowns, standard deviations across runs, or comparison against the teacher under identical RANSAC settings; without these, it is impossible to judge whether the efficiency gain is robust or dataset-specific.
minor comments (3)
- [§3.2] Notation for the joint detector-descriptor loss is introduced without a compact equation reference; adding a single boxed equation summarizing L_det + L_desc would improve readability.
- [§4.1] The abstract and introduction repeatedly use “order of magnitude smaller” without stating the exact parameter counts or FLOPs of the teacher and student backbones; a small table in §4.1 would clarify this.
- Qualitative figures showing successful and failure-case matches between teacher and student descriptors would help readers understand the limits of the asymmetric alignment.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive evaluation of the work's significance for edge-device visual localization. We address each major comment below with clarifications and commitments to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Distillation Objectives): the geometry-driven matching loss is asserted to produce descriptor spaces that support direct NN matching without learned components, yet the manuscript does not provide an explicit analysis or ablation showing that the student-teacher feature distributions are sufficiently aligned (e.g., no cosine-similarity histograms or nearest-neighbor recall curves between teacher and student descriptors). This is load-bearing for the central claim that parameter-less matching suffices.
Authors: We agree that direct evidence of descriptor alignment would strengthen the central claim. While the high localization accuracy achieved via parameter-less NN matching across four diverse benchmarks (HPatches, ScanNet, IMC2022, Aachen) provides strong indirect validation, we will add explicit analysis in the revision. Specifically, we will include cosine-similarity histograms for teacher-teacher, student-student, and cross teacher-student descriptor pairs on a held-out set, along with nearest-neighbor recall curves at varying distance thresholds. These will be placed in §3.2 or a new supplementary section to demonstrate the alignment induced by the geometry-driven matching objective. revision: yes
-
Referee: [§4.3] §4.3, Table 3 (Aachen day-night results): the reported 95% relative accuracy figure is given without per-sequence breakdowns, standard deviations across runs, or comparison against the teacher under identical RANSAC settings; without these, it is impossible to judge whether the efficiency gain is robust or dataset-specific.
Authors: We thank the referee for this suggestion to improve transparency. The 95% figure is an aggregate over the Aachen day and night sequences. In the revised Table 3 we will report separate day and night results with per-sequence breakdowns. We will also rerun the evaluation with multiple RANSAC random seeds (e.g., 5 runs) and report mean and standard deviation for both teacher and student to quantify variance. Finally, we will explicitly confirm and document that all teacher and student results use identical RANSAC hyperparameters (inlier threshold, maximum iterations, etc.) for direct comparability. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper introduces AsymLoc as a novel distillation framework that combines a geometry-driven matching objective with joint detector-descriptor distillation to enable parameter-less nearest-neighbor matching between a large teacher model (database) and lightweight student model (query). The central claims of achieving up to 95% of teacher localization accuracy with an order of magnitude smaller models are supported directly by empirical results on HPatches, ScanNet, IMC2022, and Aachen rather than by any reduction to self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or prior-work ansatzes are shown to collapse the derivation chain into its inputs by construction; the method is presented as a self-contained proposal with independent experimental validation of the efficiency-accuracy trade-off.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce AsymLoc, a novel distillation framework that aligns a Student to its Teacher through a combination of a geometry-driven matching objective and a joint detector-descriptor distillation objective, enabling fast, parameter-less nearest-neighbor matching.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Extensive experiments on HPatches, ScanNet, IMC2022, and Aachen show that AsymLoc achieves up to 95% of the teacher's localization accuracy using an order of magnitude smaller models
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Variational information dis- tillation for knowledge transfer
Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil Lawrence, and Zhenwen Dai. Variational information dis- tillation for knowledge transfer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9163–9171, 2019. 3
2019
-
[2]
Arandjelovi’c, P
R. Arandjelovi’c, P. Gronat, A. Torii, T. Pajdla, and J. Sivic. NetVLAD: CNN architecture for weakly supervised place recognition. InIEEE Conference on Computer Vision and Patter Recognition (CVPR), 2016. 2
2016
-
[3]
Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors
Vassileios Balntas, Karel Lenc, Andrea Vedaldi, and Krys- tian Mikolajczyk. Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5173–5182, 2017. 5
2017
-
[4]
Megaloc: One retrieval to place them all
Gabriele Berton and Carlo Masone. Megaloc: One retrieval to place them all. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2861–2867, 2025. 2
2025
-
[5]
Re- thinking visual geo-localization for large-scale applications
Gabriele Berton, Carlo Masone, and Barbara Caputo. Re- thinking visual geo-localization for large-scale applications. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 2
2022
-
[6]
Earthmatch: Iterative coregistration for fine-grained localization of astro- naut photography
Gabriele Berton, Gabriele Goletto, Gabriele Trivigno, Alex Stoken, Barbara Caputo, and Carlo Masone. Earthmatch: Iterative coregistration for fine-grained localization of astro- naut photography. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024. 2
2024
-
[7]
Crocodl: Cross-device collaborative dataset for local- ization
Hermann Blum, Alessandro Mercurio, Joshua O’Reilly, Tim Engelbracht, Mihai Dusmanu, Marc Pollefeys, and Zuria Bauer. Crocodl: Cross-device collaborative dataset for local- ization. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 27424–27434, 2025. 1
2025
-
[8]
A case for using rotation invariant features in state of the art feature matchers
Georg B ¨okman and Fredrik Kahl. A case for using rotation invariant features in state of the art feature matchers. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5110–5119, 2022. 3
2022
-
[9]
Asymmetric met- ric learning for knowledge transfer
Mateusz Budnik and Yannis Avrithis. Asymmetric met- ric learning for knowledge transfer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8228–8238, 2021. 2
2021
-
[10]
Asymmetric metric learning for knowledge transfer
Mateusz Budnik and Yannis Avrithis. Asymmetric metric learning for knowledge transfer. InCVPR, 2021. 3, 6, 7
2021
-
[11]
Rdd: Robust feature detector and descriptor using deformable transformer
Gonglin Chen, Tianwen Fu, Haiwei Chen, Wenbin Teng, Hanyuan Xiao, and Yajie Zhao. Rdd: Robust feature detector and descriptor using deformable transformer. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 6394–6403, 2025. 3
2025
-
[12]
Learning to Match Features with Seeded Graph Matching Network
Hongkai Chen, Zixin Luo, Jiahui Zhang, Lei Zhou, Xuyang Bai, Zeyu Hu, Chiew-Lan Tai, and Long Quan. Learning to Match Features with Seeded Graph Matching Network. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1550–1559, 2021. 2
2021
-
[13]
Aspanformer: Detector-free image matching with adaptive span transformer
Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Ming- min Zhen, Tian Fang, David Mckinnon, Yanghai Tsin, and Long Quan. Aspanformer: Detector-free image matching with adaptive span transformer. InEuropean conference on computer vision, pages 20–36. Springer, 2022. 3
2022
-
[14]
Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5828–5839, 2017. 5
2017
-
[15]
Superpoint: Self-supervised interest point detection and description
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabi- novich. Superpoint: Self-supervised interest point detection and description. InCVPR Workshops, 2018. 2, 6
2018
-
[16]
Compatibility- aware heterogeneous visual search
Rahul Duggal, Hao Zhou, Shuo Yang, Yuanjun Xiong, Wei Xia, Zhuowen Tu, and Stefano Soatto. Compatibility- aware heterogeneous visual search. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10723–10732, 2021. 2
2021
-
[17]
Compatibility-aware heterogeneous visual search
Shivansh Duggal, Xiaojun Wu, and Saurabh Mittal. Compatibility-aware heterogeneous visual search. InCVPR,
-
[18]
D2- net: A trainable cnn for joint description and detection of local features
Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Polle- feys, Josef Sivic, Akihiko Torii, and Torsten Sattler. D2- net: A trainable cnn for joint description and detection of local features. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8092–8101, 2019. 2
2019
-
[19]
Roma: Robust dense fea- ture matching
Johan Edstedt, Qiyu Sun, Georg B ¨okman, M ˚arten Wadenb¨ack, and Michael Felsberg. Roma: Robust dense fea- ture matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19790– 19800, 2024. 3
2024
-
[20]
Silk: Simple learned keypoints
Pierre Gleize, Weiyao Wang, and Matt Feiszli. Silk: Simple learned keypoints. InProceedings of the IEEE/CVF interna- tional conference on computer vision, pages 22499–22508,
-
[21]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 3
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[22]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco An- dreetto, and Hartwig Adam. Mobilenets: Efficient convolu- tional neural networks for mobile vision applications.arXiv preprint arXiv:1704.04861, 2017. 1
work page internal anchor Pith review arXiv 2017
-
[23]
Towards visual feature translation
Jie Hu, Rongrong Ji, Hong Liu, Shengchuan Zhang, Cheng Deng, and Qi Tian. Towards visual feature translation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 3004–3013, 2019. 2
2019
-
[24]
Towards visual feature translation
Jie Hu, Rongrong Ji, Hong Liu, Shengchuan Zhang, Cheng Deng, and Qi Tian. Towards visual feature translation. In CVPR, 2019. 3
2019
-
[25]
Optimal transport ag- gregation for visual place recognition
Sergio Izquierdo and Javier Civera. Optimal transport ag- gregation for visual place recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 2
2024
-
[26]
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, and Andr ´e Araujo. OmniGlue: Generalizable Feature Matching with Foundation Model Guidance. InProceed- 9 ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20719–20729, 2024. 2
2024
-
[27]
Edm: Equirect- angular projection-oriented dense kernelized feature match- ing
Dongki Jung, Jaehoon Choi, Yonghan Lee, Somi Jeong, Tae- jae Lee, Dinesh Manocha, and Suyong Yeon. Edm: Equirect- angular projection-oriented dense kernelized feature match- ing. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 6337–6347, 2025. 3
2025
-
[28]
Adam: A Method for Stochastic Optimization
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.CoRR, abs/1412.6980, 2014. 6
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[29]
Efficient loftr: Efficient local feature match- ing with transformers
Wei Li and et al. Efficient loftr: Efficient local feature match- ing with transformers. InECCV, 2022. 3
2022
-
[30]
Megadepth: Learning single- view depth prediction from internet photos
Zhengqi Li and Noah Snavely. Megadepth: Learning single- view depth prediction from internet photos. InComputer Vision and Pattern Recognition (CVPR), 2018. 5
2018
-
[31]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEu- ropean Conference on Computer Vision (ECCV), pages 740–
-
[32]
Lightglue: Local feature matching at light speed
Philipp Lindenberger, Paul-Edouard Sarlin, Marc Pollefeys, and Mihai Dusmanu. Lightglue: Local feature matching at light speed. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18448–18458, 2023. 2
2023
-
[33]
Efficient global 2d- 3d matching for camera localization in a large-scale 3d map
Liu Liu, Hongdong Li, and Yuchao Dai. Efficient global 2d- 3d matching for camera localization in a large-scale 3d map. InCVPR, 2017. 2
2017
-
[34]
ContextDesc: Lo- cal Descriptor Augmentation with Cross-Modality Context
Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, and Long Quan. ContextDesc: Lo- cal Descriptor Augmentation with Cross-Modality Context. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2
2019
-
[35]
Working hard to know your neighbor’s mar- gins: Local descriptor learning loss
Anastasiia Mishchuk, Dmytro Mishkin, Filip Radenovic, and Jiri Matas. Working hard to know your neighbor’s mar- gins: Local descriptor learning loss. InAdvances in Neural Information Processing Systems (NeurIPS), 2017. 2
2017
-
[36]
Relational knowledge distillation
Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. Relational knowledge distillation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3967–3976, 2019. 3, 6, 7
2019
-
[37]
Image matching challenge 2022: Summary and results
(Kaggle / CVPR Workshop Participants). Image matching challenge 2022: Summary and results. InCVPR Workshop on Image Matching: Local Features & Beyond, 2022. 5
2022
-
[38]
Xfeat: Accelerated fea- tures for lightweight image matching
Guilherme Potje, Felipe Cadar, Andr ´e Araujo, Renato Mar- tins, and Erickson R Nascimento. Xfeat: Accelerated fea- tures for lightweight image matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2682–2691, 2024. 5, 6
2024
-
[39]
Minima: Modality invariant im- age matching
Jiangwei Ren, Xingyu Jiang, Zizhuo Li, Dingkang Liang, Xin Zhou, and Xiang Bai. Minima: Modality invariant im- age matching. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23059–23068, 2025. 3
2025
-
[40]
R2d2: Reliable and repeatable detector and descriptor
Jerome Revaud, Claudio de Souza, Martin Humenberger, and Philippe Weinzaepfel. R2d2: Reliable and repeatable detector and descriptor. InNeurIPS, 2019. 2
2019
-
[41]
Fit- nets: Hints for thin deep nets
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fit- nets: Hints for thin deep nets. InInternational Conference on Learning Representations (ICLR), 2015. 3
2015
-
[42]
Mobilenetv2: In- verted residuals and linear bottlenecks
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: In- verted residuals and linear bottlenecks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4510–4520, 2018. 1
2018
-
[43]
Siegwart, and C ´esar Cadena
Paul-Edouard Sarlin, Fr’ed’eric Debraine, Marcin Dymczyk, Roland Y . Siegwart, and C ´esar Cadena. Leveraging deep visual descriptors for hierarchical efficient localization. In Conference on Robot Learning, 2018. 1
2018
-
[44]
From coarse to fine: Robust hierarchical localization at large scale
Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. From coarse to fine: Robust hierarchical localization at large scale. InCVPR, 2019. 2, 5
2019
-
[45]
From coarse to fine: Robust hierarchical localization at large scale
Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. From coarse to fine: Robust hierarchical localization at large scale. InCVPR, pages 12716–12725,
-
[46]
Superglue: Learning feature matching with graph neural networks
Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InCVPR, 2020. 2, 5
2020
-
[47]
Sch¨onberger, Pablo Speciale, Lukas Gruber, Viktor Larsson, Ondrej Miksik, and Marc Pollefeys
Paul-Edouard Sarlin, Mihai Dusmanu, Johannes L. Sch¨onberger, Pablo Speciale, Lukas Gruber, Viktor Larsson, Ondrej Miksik, and Marc Pollefeys. LaMAR: Benchmark- ing Localization and Mapping for Augmented Reality. In ECCV, 2022. 1
2022
-
[48]
Efficient & effective prioritized matching for large-scale image-based localization.IEEE PAMI, 39(9):1744–1756, 2017
Torsten Sattler, Bastian Leibe, and Leif Kobbelt. Efficient & effective prioritized matching for large-scale image-based localization.IEEE PAMI, 39(9):1744–1756, 2017. 2
2017
-
[49]
Are large-scale 3d models really necessary for accurate visual lo- calization? In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6175–6184, 2017
Torsten Sattler, Akihiko Torii, Josef Sivic, Marc Pollefeys, Hajime Taira, Masatoshi Okutomi, and Tomas Pajdla. Are large-scale 3d models really necessary for accurate visual lo- calization? In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6175–6184, 2017. 2
2017
-
[50]
Benchmarking 6dof outdoor visual localization in changing conditions
Torsten Sattler, Will Maddern, Carl Toft, Akihiko Torii, Lars Hammarstrand, Erik Stenborg, Daniel Safari, Masatoshi Okutomi, Marc Pollefeys, Josef Sivic, et al. Benchmarking 6dof outdoor visual localization in changing conditions. In CVPR, 2018. 1
2018
-
[51]
Benchmarking 6dof outdoor visual localization in changing conditions
Torsten Sattler, Will Maddern, Carl Toft, Akihiko Torii, Josef Sivic, Fredrik Kahl, Masatoshi Okutomi, Marc Pollefeys, Tomas Pajdla, Lars Hammarstrand, Erik Stenborg, David Sa- fari, Tommaso Cavallari, Luigi Di Stefano, Andrea Torsello, Dmytro Mishkin, Jiri Matas, Marc Pollefeeys, and Linus Svarm. Benchmarking 6dof outdoor visual localization in changing ...
2018
-
[52]
Towards backward-compatible repre- sentation learning
Yujun Shen and et al. Towards backward-compatible repre- sentation learning. InCVPR, 2020. 3
2020
-
[53]
Towards backward-compatible representation learning
Yantao Shen, Yuanjun Xiong, Wei Xia, and Stefano Soatto. Towards backward-compatible representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 6368–6377, 2020. 2 10
2020
-
[54]
Ames: Asymmetric and memory-efficient similarity estimation for instance-level retrieval
Pavel Suma, Giorgos Kordopatis-Zilos, Ahmet Iscen, and Giorgos Tolias. Ames: Asymmetric and memory-efficient similarity estimation for instance-level retrieval. InEuropean Conference on Computer Vision, pages 307–325. Springer,
-
[55]
Loftr: Detector-free local feature matching with transformers
Jiaming Sun, Zehong Shen, Yuang Wang, Hang Bao, Xi- aowei Zhou, and Ping Luo. Loftr: Detector-free local feature matching with transformers. InCVPR, 2021. 3, 5
2021
-
[56]
City-scale localization for cameras with known ver- tical direction.IEEE PAMI, 39(7):1455–1461, 2017
Linus Sv ¨arm, Olof Enqvist, Fredrik Kahl, and Magnus Os- karsson. City-scale localization for cameras with known ver- tical direction.IEEE PAMI, 39(7):1455–1461, 2017. 2
2017
-
[57]
Inloc: Indoor visual localization with dense matching and view synthesis
Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, and Ak- ihiko Torii. Inloc: Indoor visual localization with dense matching and view synthesis. InCVPR, 2018. 2
2018
-
[58]
Mingxing Tan and Quoc V . Le. Efficientnet: Rethinking model scaling for convolutional neural networks. InPro- ceedings of the 36th International Conference on Machine Learning (ICML), pages 6105–6114, 2019. 1
2019
-
[59]
L2-Net: Deep Learn- ing of Discriminative Patch Descriptor in Euclidean Space
Yurun Tian, Bin Fan, and Fuchao Wu. L2-Net: Deep Learn- ing of Discriminative Patch Descriptor in Euclidean Space. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 2
2017
-
[60]
SOSNet: Second Order Similarity Regularization for Local Descriptor Learning
Yurun Tian, Xin Yu, Bin Fan, Fuchao Wu, Huub Heijnen, and Vassileios Balntas. SOSNet: Second Order Similarity Regularization for Local Descriptor Learning. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2
2019
-
[61]
Con- trastive representation distillation
Yonglong Tian, Dilip Krishnan, and Phillip Isola. Con- trastive representation distillation. InInternational Confer- ence on Learning Representations (ICLR), 2020. 3
2020
-
[62]
Seman- tic match consistency for long-term visual localization
Carl Toft, Erik Stenborg, Lars Hammarstrand, Lucas Brynte, Marc Pollefeys, Torsten Sattler, and Fredrik Kahl. Seman- tic match consistency for long-term visual localization. In ECCV, 2O18. 2
-
[63]
DISK: Learning local features with policy gradient
Michal Tyszkiewicz, Pascal Fua, and Eduard Trulls. DISK: Learning local features with policy gradient. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 2
2020
-
[64]
Matchformer: Interleaving attention in transformers for feature matching
Qing Wang, Jiaming Zhang, Kailun Yang, Kunyu Peng, and Rainer Stiefelhagen. Matchformer: Interleaving attention in transformers for feature matching. InProceedings of the Asian conference on computer vision, pages 2746–2762,
-
[65]
Contextual similarity distillation for asymmetric image retrieval
Hui Wu, Min Wang, Wengang Zhou, Houqiang Li, and Qi Tian. Contextual similarity distillation for asymmetric image retrieval. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9489–9498,
-
[66]
Contextual similarity distillation for asymmetric image retrieval
Xiaohang Wu and et al. Contextual similarity distillation for asymmetric image retrieval. InCVPR, 2022. 3, 6
2022
-
[67]
D3still: Decoupled differential distil- lation for asymmetric image retrieval
Luchen Xie and et al. D3still: Decoupled differential distil- lation for asymmetric image retrieval. InCVPR, 2024. 3, 6
2024
-
[68]
D3still: Decoupled differential distillation for asymmetric image retrieval
Yi Xie, Yihong Lin, Wenjie Cai, Xuemiao Xu, Huaidong Zhang, Yong Du, and Shengfeng He. D3still: Decoupled differential distillation for asymmetric image retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 17181–17190, 2024. 2, 7
2024
-
[69]
LIFT: Learned Invariant Feature Transform
Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. LIFT: Learned Invariant Feature Transform. InEuro- pean Conference on Computer Vision (ECCV), 2016. 2
2016
-
[70]
A gift from knowledge distillation: Fast optimization, network minimization and transfer learning
Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. InProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 4133–4141, 2017. 3
2017
-
[71]
Paying more at- tention to attention: Improving the performance of convolu- tional neural networks via attention transfer
Sergey Zagoruyko and Nikos Komodakis. Paying more at- tention to attention: Improving the performance of convolu- tional neural networks via attention transfer. InInternational Conference on Learning Representations (ICLR), 2017. 3
2017
-
[72]
Scaling vision transformers
Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, and Lu- cas Beyer. Scaling vision transformers. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12104–12113, 2022. 1
2022
-
[73]
Xiaoming Zhao, Xingming Wu, Jinyu Miao, Weihai Chen, Peter C. Y . Chen, and Zhengguo Li. Alike: Accurate and lightweight keypoint detection and descriptor extraction. IEEE Transactions on Multimedia, 2022. 2 11 A. Appendix A.1. Hyperparameter Ablations In equation 10, we defined detector-weighted similarity ma- trices as: ¯SST ij = wS i τs SST ij wT j τt ...
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.