RoMa: Robust Dense Feature Matching
Pith reviewed 2026-05-24 08:54 UTC · model grok-4.3
The pith
RoMa combines frozen DINOv2 features with ConvNet fine features and anchor-probability decoding to achieve robust dense feature matching.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RoMa establishes robust dense correspondences by leveraging frozen DINOv2 features combined with specialized ConvNet fine features to form a precisely localizable feature pyramid, decoded via a tailored transformer that predicts anchor probabilities to express multimodality, and trained with regression-by-classification followed by robust regression.
What carries the argument
The feature pyramid of frozen DINOv2 features plus ConvNet fine features, together with the transformer match decoder that predicts anchor probabilities.
If this is right
- Dense matching becomes more reliable under extreme appearance and geometric changes.
- Downstream tasks such as 3D reconstruction gain from higher quality correspondences.
- The model generalizes better across datasets without per-dataset retraining.
- Multimodal predictions allow better handling of ambiguous regions in images.
Where Pith is reading between the lines
- Similar fusion strategies could be tested with other large pretrained vision models.
- The anchor probability approach might apply to other correspondence problems like optical flow.
- Performance on WxBS suggests potential for real-world applications in outdoor or seasonal monitoring.
Load-bearing premise
The combination of frozen DINOv2 features, ConvNet fine features, anchor-probability decoding, and the regression loss will generalize to unseen real-world image distributions without dataset-specific adjustments.
What would settle it
Evaluation on a new set of image pairs with novel appearance changes, such as extreme weather or unseen object categories, where the accuracy does not exceed previous methods by a large margin.
Figures
read the original abstract
Feature matching is an important computer vision task that involves estimating correspondences between two images of a 3D scene, and dense methods estimate all such correspondences. The aim is to learn a robust model, i.e., a model able to match under challenging real-world changes. In this work, we propose such a model, leveraging frozen pretrained features from the foundation model DINOv2. Although these features are significantly more robust than local features trained from scratch, they are inherently coarse. We therefore combine them with specialized ConvNet fine features, creating a precisely localizable feature pyramid. To further improve robustness, we propose a tailored transformer match decoder that predicts anchor probabilities, which enables it to express multimodality. Finally, we propose an improved loss formulation through regression-by-classification with subsequent robust regression. We conduct a comprehensive set of experiments that show that our method, RoMa, achieves significant gains, setting a new state-of-the-art. In particular, we achieve a 36% improvement on the extremely challenging WxBS benchmark. Code is provided at https://github.com/Parskatt/RoMa
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes RoMa for robust dense feature matching. It combines frozen DINOv2 features with specialized ConvNet fine features to create a localizable feature pyramid, employs a transformer decoder that predicts anchor probabilities to handle multimodality, and uses an improved regression-by-classification loss followed by robust regression. Comprehensive experiments are reported to show significant gains over prior methods, including a 36% improvement on the challenging WxBS benchmark, establishing a new state-of-the-art. Code is released at the provided GitHub link.
Significance. If the performance claims hold under scrutiny, the work demonstrates a practical way to leverage large-scale pretrained foundation models for improved robustness in dense matching without training from scratch. The open-source code is a clear strength that supports reproducibility and further research.
major comments (2)
- [Abstract / Experiments] Abstract and Experiments section: the central claim of a 36% improvement and new SOTA on WxBS is load-bearing, yet the abstract (and by extension the reported experiments) provides no detail on baseline implementations, error bars, statistical significance, or data splits. This directly affects assessment of whether the reported gains are reliable.
- [Method / Experiments] Method and Experiments: the robustness claim rests on the specific fusion of frozen DINOv2, added ConvNet features, anchor-probability decoding, and the loss producing stable performance without per-dataset retuning. No ablation or evaluation on fresh distributions outside the reported benchmarks is described to test this assumption.
minor comments (2)
- Figure captions and legends should explicitly state the metrics and baselines shown in all quantitative plots for immediate readability.
- Notation for the anchor-probability output and the subsequent regression step should be introduced with a single consistent equation reference rather than scattered descriptions.
Simulated Author's Rebuttal
We thank the referee for the constructive review and positive assessment of the work's significance and reproducibility. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract / Experiments] Abstract and Experiments section: the central claim of a 36% improvement and new SOTA on WxBS is load-bearing, yet the abstract (and by extension the reported experiments) provides no detail on baseline implementations, error bars, statistical significance, or data splits. This directly affects assessment of whether the reported gains are reliable.
Authors: We agree the abstract is concise and omits these specifics. The experiments section reports results using official baseline implementations and standard dataset protocols for splits. Error bars and significance tests are uncommon in this literature, but gains are consistent across benchmarks. We will revise the abstract to briefly note the evaluation setup, baselines, and data splits, and add a short discussion of reliability in the experiments section. revision: yes
-
Referee: [Method / Experiments] Method and Experiments: the robustness claim rests on the specific fusion of frozen DINOv2, added ConvNet features, anchor-probability decoding, and the loss producing stable performance without per-dataset retuning. No ablation or evaluation on fresh distributions outside the reported benchmarks is described to test this assumption.
Authors: The robustness without per-dataset retuning is supported by consistent SOTA results across the diverse reported benchmarks, which include extreme variations. While evaluations on entirely new distributions are not included, the current benchmarks test the components under challenging conditions. We will add a discussion of this scope in the experiments section and expand component ablations where feasible. revision: partial
Circularity Check
No circularity: empirical architecture validated on external benchmarks
full rationale
The paper presents RoMa as an empirical construction: frozen DINOv2 features combined with added ConvNet fine features, a transformer decoder outputting anchor probabilities, and regression-by-classification loss. All performance claims (including the 36% WxBS gain) are reported from direct experiments on standard benchmarks. No equations, predictions, or first-principles derivations are given that reduce by construction to fitted parameters or self-citations; the central claims rest on measured generalization rather than definitional equivalence.
Axiom & Free-Parameter Ledger
free parameters (1)
- training hyperparameters and loss weights
axioms (1)
- domain assumption DINOv2 features remain significantly more robust than local features trained from scratch under real-world changes
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a tailored transformer match decoder that predicts anchor probabilities... regression-by-classification loss for coarse global matches, while we use robust regression loss for the refinement stage
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
leveraging frozen pretrained features from the foundation model DINOv2... creating a precisely localizable feature pyramid
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Improving Local Feature Matching by Entropy-inspired Scale Adaptability and Flow-endowed Local Consistency
A semi-dense image matching pipeline adds scale adaptability via score-matrix hints at the coarse stage and local flow consistency via gradient loss at the fine stage.
Reference graph
Works this paper leans on
-
[1]
HPatches: A benchmark and evaluation of handcrafted and learned local descriptors
Vassileios Balntas, Karel Lenc, Andrea Vedaldi, and Krys- tian Mikolajczyk. HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. In Proceedings of the IEEE conference on computer vision and pattern recog- nition, pages 5173–5182, 2017
work page 2017
-
[2]
MAGSAC++, a fast, reliable and accurate robust estimator
Daniel Barath, Jana Noskova, Maksym Ivashechkin, and Jiri Matas. MAGSAC++, a fast, reliable and accurate robust estimator. In Conference on Computer Vision and Pattern Recognition, 2020. 8
work page 2020
-
[3]
A general and adaptive robust loss func- tion
Jonathan T Barron. A general and adaptive robust loss func- tion. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 4331–4339,
-
[4]
Surf: Speeded up robust features
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. InEuropean conference on com- puter vision, pages 404–417. Springer, 2006. 3
work page 2006
-
[5]
The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields
Michael J Black and Paul Anandan. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer vision and image understanding, 63(1):75– 104, 1996. 3
work page 1996
-
[6]
Michael J Black and Anand Rangarajan. On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. International journal of com- puter vision, 19(1):57–91, 1996. 3
work page 1996
-
[7]
A case for using rotation invariant features in state of the art feature matchers
Georg B ¨okman and Fredrik Kahl. A case for using rotation invariant features in state of the art feature matchers. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5110–5119, 2022. 3
work page 2022
-
[8]
On the Opportunities and Risks of Foundation Models
Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Alt- man, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021. 3
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[9]
Ignas Budvytis, Marvin Teichmann, Tomas V ojir, and Roberto Cipolla. Large scale joint semantic re-localisation and scene understanding via globally unique instance coor- dinate regression. In Proceedings of the British Machine Vi- sion Conference (BMVC) , pages 86.1–86.13. BMV A Press,
-
[10]
Improving transformer-based image matching by cascaded capturing spatially informa- tive keypoints
Chenjie Cao and Yanwei Fu. Improving transformer-based image matching by cascaded capturing spatially informa- tive keypoints. In Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV) , pages 12129–12139, 2023. 7
work page 2023
-
[11]
Emerg- ing properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In Pro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 3
work page 2021
-
[12]
ASpanFormer: Detector-free image matching with adaptive span transformer
Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, David Mckinnon, Yanghai Tsin, and Long Quan. ASpanFormer: Detector-free image matching with adaptive span transformer. InProc. European Conference on Computer Vision (ECCV), 2022. 3, 7, 8
work page 2022
-
[13]
Scannet: Richly-annotated 3d reconstructions of indoor scenes
Angela Dai, Angel X Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017. 7, 8
work page 2017
-
[14]
Superpoint: Self-supervised interest point detection and description
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabi- novich. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages 224–236, 2018. 3, 7
work page 2018
-
[15]
BERT: Pre-training of deep bidirectional trans- formers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional trans- formers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the As- sociation for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages 4171–4186, Minnea...
work page 2019
-
[16]
D2-Net: A Trainable CNN for Joint Detection and Description of Lo- cal Features
Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Polle- feys, Josef Sivic, Akihiko Torii, and Torsten Sattler. D2-Net: A Trainable CNN for Joint Detection and Description of Lo- cal Features. In Proceedings of the 2019 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2019. 4
work page 2019
-
[17]
DKM: Dense kernelized feature matching for geometry estimation
Johan Edstedt, Ioannis Athanasiadis, M ˚arten Wadenb ¨ack, and Michael Felsberg. DKM: Dense kernelized feature matching for geometry estimation. In IEEE Conference on Computer Vision and Pattern Recognition, 2023. 1, 2, 3, 4, 5, 6, 7, 8
work page 2023
-
[18]
Channel smoothing: Efficient robust smoothing of low-level signal features
Michael Felsberg, P-E Forssen, and H Scharr. Channel smoothing: Efficient robust smoothing of low-level signal features. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 28(2):209–222, 2006. 3
work page 2006
-
[19]
Wasserstein distances for stereo disparity estimation
Divyansh Garg, Yan Wang, Bharath Hariharan, Mark Camp- bell, Kilian Q Weinberger, and Wei-Lun Chao. Wasserstein distances for stereo disparity estimation. Advances in Neural Information Processing Systems, 33:22517–22529, 2020. 3
work page 2020
-
[20]
Neural reprojection error: Merging feature learning and cam- era pose estimation
Hugo Germain, Vincent Lepetit, and Guillaume Bourmaud. Neural reprojection error: Merging feature learning and cam- era pose estimation. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 414–423, 2021. 3
work page 2021
-
[21]
SiLK: Simple Learned Keypoints
Pierre Gleize, Weiyao Wang, and Matt Feiszli. SiLK: Simple Learned Keypoints. In ICCV, 2023. 7
work page 2023
-
[22]
Pre- dicting disparity distributions
Gustav H ¨ager, Mikael Persson, and Michael Felsberg. Pre- dicting disparity distributions. In 2021 IEEE International Conference on Robotics and Automation (ICRA) , pages 4363–4369. IEEE, 2021. 3
work page 2021
-
[23]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 4
work page 2016
-
[24]
Masked autoencoders are scalable vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16000–16009, 2022. 1
work page 2022
-
[25]
Image matching chal- lenge 2022, 2022
Addison Howard, Eduard Trulls, Kwang Moo Yi, Dmitry Mishkin, Sohier Dane, and Yuhe Jin. Image matching chal- lenge 2022, 2022. 7, 8 9
work page 2022
-
[26]
Jan J Koenderink. The structure of images. Biological cy- bernetics, 50(5):363–370, 1984. 1
work page 1984
-
[27]
Hierarchical scene coordinate classification and regression for visual localization
Xiaotian Li, Shuzhe Wang, Yi Zhao, Jakob Verbeek, and Juho Kannala. Hierarchical scene coordinate classification and regression for visual localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11983–11992, 2020. 3
work page 2020
-
[28]
Megadepth: Learning single- view depth prediction from internet photos
Zhengqi Li and Noah Snavely. Megadepth: Learning single- view depth prediction from internet photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2041–2050, 2018. 4, 7, 8
work page 2041
-
[29]
Yutong Lin, Ze Liu, Zheng Zhang, Han Hu, Nanning Zheng, Stephen Lin, and Yue Cao. Could giant pre-trained image models extract universal representations? Advances in Neu- ral Information Processing Systems, 35:8332–8346, 2022. 1
work page 2022
-
[30]
Tony Lindeberg. Scale-space theory: A basic tool for analyz- ing structures at different scales.Journal of applied statistics, 21(1-2):225–270, 1994. 1
work page 1994
-
[31]
LightGlue: Local Feature Matching at Light Speed
Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Polle- feys. LightGlue: Local Feature Matching at Light Speed. In ICCV, 2023. 7
work page 2023
-
[32]
Camliflow: bidirectional camera-lidar fusion for joint optical flow and scene flow estimation
Haisong Liu, Tao Lu, Yihui Xu, Jia Liu, Wenjie Li, and Lijun Chen. Camliflow: bidirectional camera-lidar fusion for joint optical flow and scene flow estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5791–5801, 2022. 3
work page 2022
-
[33]
Distinctive image features from scale- invariant keypoints
David G Lowe. Distinctive image features from scale- invariant keypoints. International journal of computer vi- sion, 60(2):91–110, 2004. 3
work page 2004
-
[34]
Dgc-net: Dense ge- ometric correspondence network
Iaroslav Melekhov, Aleksei Tiulpin, Torsten Sattler, Marc Pollefeys, Esa Rahtu, and Juho Kannala. Dgc-net: Dense ge- ometric correspondence network. In 2019 IEEE Winter Con- ference on Applications of Computer Vision (WACV), pages 1034–1042. IEEE, 2019. 3
work page 2019
-
[35]
WxBS: Wide Baseline Stereo Generalizations
Dmytro Mishkin, Jiri Matas, Michal Perdoch, and Karel Lenc. WxBS: Wide Baseline Stereo Generalizations. InPro- ceedings of the British Machine Vision Conference. BMV A,
-
[36]
Pats: Patch area transportation with subdivision for local feature matching
Junjie Ni, Yijin Li, Zhaoyang Huang, Hongsheng Li, Hujun Bao, Zhaopeng Cui, and Guofeng Zhang. Pats: Patch area transportation with subdivision for local feature matching. In The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2023. 1, 3, 7
work page 2023
-
[37]
Maxime Oquab, Timoth ´ee Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Rus- sell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang- Wen Li, Wojciech Galuba, Mike Rabbat, Mido Assran, Nico- las Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, Julien Mairal, Patri...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[38]
Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning (Adaptive Com- putation and Machine Learning). The MIT Press, 2005. 4, 1
work page 2005
-
[39]
R2d2: Reliable and repeatable detec- tor and descriptor
Jerome Revaud, Cesar De Souza, Martin Humenberger, and Philippe Weinzaepfel. R2d2: Reliable and repeatable detec- tor and descriptor. Advances in neural information process- ing systems, 32:12405–12415, 2019. 3
work page 2019
-
[40]
From coarse to fine: Robust hierarchical localization at large scale
Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. From coarse to fine: Robust hierarchical localization at large scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 12716–12725, 2019. 1, 8
work page 2019
-
[41]
Superglue: Learning feature matching with graph neural networks
Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020. 1, 3, 7, 8
work page 2020
-
[42]
Back to the feature: Learning robust camera localization from pixels to pose
Paul-Edouard Sarlin, Ajaykumar Unagar, Mans Larsson, Hugo Germain, Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars Hammarstrand, Fredrik Kahl, et al. Back to the feature: Learning robust camera localization from pixels to pose. In Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition , pages 3247–3257, 2021. 4
work page 2021
-
[43]
Structure- from-motion revisited
Johannes L Schonberger and Jan-Michael Frahm. Structure- from-motion revisited. In Proceedings of the IEEE con- ference on computer vision and pattern recognition , pages 4104–4113, 2016. 1
work page 2016
-
[44]
LoFTR: Detector-free local feature matching with transformers
Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 8922–8931, 2021. 1, 3, 7, 8
work page 2021
-
[45]
Inloc: Indoor visual localization with dense matching and view synthesis
Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, and Ak- ihiko Torii. Inloc: Indoor visual localization with dense matching and view synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 7199–7209, 2018. 8
work page 2018
-
[46]
Quadtree attention for vision transformers
Shitao Tang, Jiahui Zhang, Siyu Zhu, and Ping Tan. Quadtree attention for vision transformers. In International Confer- ence on Learning Representations, 2022. 3, 7
work page 2022
-
[47]
Prior guided feature enrich- ment network for few-shot segmentation
Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Zhicheng Yang, Ruiyu Li, and Jiaya Jia. Prior guided feature enrich- ment network for few-shot segmentation. IEEE transactions on pattern analysis and machine intelligence , 44(2):1050– 1065, 2020. 1
work page 2020
-
[48]
Lu ´ıs Torgo and Jo ˜ao Gama. Regression by classification. In Advances in Artificial Intelligence , pages 51–60, Berlin, Heidelberg, 1996. Springer Berlin Heidelberg. 3
work page 1996
-
[49]
Prune Truong, Martin Danelljan, Luc V Gool, and Radu Timofte. GOCor: Bringing Globally Optimized Correspon- dence V olumes into Your Neural Network.Advances in Neu- ral Information Processing Systems, 33, 2020. 1
work page 2020
-
[50]
GLU- Net: Global-local universal network for dense flow and cor- respondences
Prune Truong, Martin Danelljan, and Radu Timofte. GLU- Net: Global-local universal network for dense flow and cor- respondences. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 6258– 6268, 2020. 3
work page 2020
-
[51]
Learning accurate dense correspondences and when 10 to trust them
Prune Truong, Martin Danelljan, Luc Van Gool, and Radu Timofte. Learning accurate dense correspondences and when 10 to trust them. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 5714– 5724, 2021. 3
work page 2021
-
[52]
PDC-Net+: Enhanced Probabilistic Dense Corre- spondence Network
Prune Truong, Martin Danelljan, Radu Timofte, and Luc Van Gool. PDC-Net+: Enhanced Probabilistic Dense Corre- spondence Network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 1, 4, 6, 7, 8
work page 2023
-
[53]
Tyszkiewicz, Pascal Fua, and Eduard Trulls
Michal J. Tyszkiewicz, Pascal Fua, and Eduard Trulls. DISK: learning local features with policy gradient. In NeurIPS,
-
[54]
Proper reuse of image classification features im- proves object detection
Cristina Vasconcelos, Vighnesh Birodkar, and Vincent Du- moulin. Proper reuse of image classification features im- proves object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 13628–13637, 2022. 1
work page 2022
-
[55]
MatchFormer: Interleaving attention in transformers for feature matching
Qing Wang, Jiaming Zhang, Kailun Yang, Kunyu Peng, and Rainer Stiefelhagen. MatchFormer: Interleaving attention in transformers for feature matching. In Asian Conference on Computer Vision, 2022. 7
work page 2022
-
[56]
Masked feature predic- tion for self-supervised visual pre-training
Chen Wei, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, and Christoph Feichtenhofer. Masked feature predic- tion for self-supervised visual pre-training. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14668–14678, 2022. 1
work page 2022
-
[57]
Sholom M. Weiss and Nitin Indurkhya. Rule-based regres- sion. In Proceedings of the 13th International Joint Confer- ence on Artificial Intelligence. Chamb´ery, France, August 28 - September 3, 1993, pages 1072–1078. Morgan Kaufmann,
work page 1993
-
[58]
Sholom M. Weiss and Nitin Indurkhya. Rule-based machine learning methods for functional prediction. J. Artif. Intell. Res., 3:383–403, 1995. 3
work page 1995
-
[59]
Andrew P. Witkin. Scale space filtering. Proc. 8th Inter- national Joint on Artificial Intelligence , pages 1091–1022,
-
[60]
Revealing the dark secrets of masked im- age modeling
Zhenda Xie, Zigang Geng, Jingcheng Hu, Zheng Zhang, Han Hu, and Yue Cao. Revealing the dark secrets of masked im- age modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14475– 14485, 2023. 2
work page 2023
-
[61]
ASTR: Adaptive spot-guided transformer for consistent local feature matching
Jiahuan Yu, Jiahao Chang, Jianfeng He, Tianzhu Zhang, Jiyang Yu, and Wu Feng. ASTR: Adaptive spot-guided transformer for consistent local feature matching. In The IEEE/CVF Computer Vision and Pattern Recognition Con- ference (CVPR), 2023. 7
work page 2023
-
[62]
ibot: Image bert pre-training with online tokenizer
Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. ibot: Image bert pre-training with online tokenizer. InInternational Conference on Learn- ing Representations, 2022. 1, 3
work page 2022
-
[63]
PMatch: Paired masked image modeling for dense geometric matching
Shengjie Zhu and Xiaoming Liu. PMatch: Paired masked image modeling for dense geometric matching. In Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023. 3, 7 11 RoMa: Robust Dense Feature Matching Supplementary Material In this supplementary material, we provide further de- tails and qualitative examples that could n...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.