pith. machine review for the scientific record. sign in

arxiv: 2604.11809 · v1 · submitted 2026-04-13 · 💻 cs.CV

Recognition: unknown

Who Handles Orientation? Investigating Invariance in Feature Matching

David Nordstr\"om, Fredrik Kahl, Georg B\"okman, Johan Edstedt

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:01 UTC · model grok-4.3

classification 💻 cs.CV
keywords feature matchingrotation invariancekeypoint descriptorsimage matchingdata augmentationsparse matching3D vision
0
0 comments X

The pith

Incorporating rotation invariance in the descriptor achieves similar performance to handling it in the matcher but enables faster matchers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines where to add rotation invariance in a sparse keypoint matching pipeline to better handle large in-plane rotations between images. By training models on a large set of 3D vision datasets using data augmentation for rotations and then testing on common benchmarks, the authors compare learning the invariance in the descriptor versus in the matcher. They find that early incorporation in the descriptor gives comparable accuracy while allowing the matcher to be simpler and faster since invariance is handled sooner. The experiments also reveal that this does not degrade performance on upright images when using large-scale training, and that more training data helps the models generalize to rotated cases.

Core claim

Incorporating rotation invariance already in the descriptor yields similar performance to handling it in the matcher. However, rotation invariance is achieved earlier in the matcher when it is learned in the descriptor, allowing for a faster rotation-invariant matcher. Enforcing rotation invariance does not hurt upright performance when trained at scale. Increasing the training data size substantially improves generalization to rotated images.

What carries the argument

The stage at which rotation invariance is incorporated, either in the keypoint descriptor or the subsequent feature matcher, within a modern sparse matching pipeline.

If this is right

  • Rotation invariance can be learned at the descriptor stage with matching accuracy comparable to later incorporation in the matcher.
  • Early invariance in the descriptor supports simpler and faster matcher designs.
  • Large-scale training with augmentation maintains performance on non-rotated images.
  • Increasing training data size improves generalization to rotated views across benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Pipeline designers could prioritize early-stage invariance to reduce computational cost in rotation-robust systems.
  • The benefits of scale suggest that broader 3D data collection may enhance robustness to other transformations without new model constraints.
  • Similar stage-comparison experiments could be applied to scale invariance or illumination changes in matching pipelines.

Load-bearing premise

The chosen collection of 3D vision datasets together with the data augmentation strategy will produce rotation invariance that generalizes to held-out evaluation benchmarks without dataset-specific overfitting.

What would settle it

A large performance drop on a new benchmark containing rotation angles or image types outside the training augmentation distribution would challenge the generalization of the learned invariance.

Figures

Figures reproduced from arXiv: 2604.11809 by David Nordstr\"om, Fredrik Kahl, Georg B\"okman, Johan Edstedt.

Figure 1
Figure 1. Figure 1: Who Handles the Orientation? We compare incorporating rotation invariance in (a) the matcher and (b) both the matcher and descriptor, by reporting the AUC@20 on MegaDepth-1500 [27, 41] (rotated and upright) for different stopping layers using N “ 2048 keypoints. We average over 5 runs and include a 95% confidence interval. ever, generalization to out-of-distribution upright match￾ing tasks marginally impro… view at source ↗
Figure 2
Figure 2. Figure 2: Experimental setup. We train descriptors and matchers with and without rotation augmentation, resulting in the three different models used for our main experiments. NoRot corresponds to the LoMa-B baseline. LoFTR [41], RoMa [18], and MASt3R [14]. Detector-free matching is computationally heavy as it is done per-pixel. In this work, we instead consider sparse matching. Typically, sparse image matching follo… view at source ↗
Figure 3
Figure 3. Figure 3: Training batch. Illustration of a training batch with rotation augmentation view at source ↗
Figure 4
Figure 4. Figure 4: ScanNet-1500 performance. We plot the AUC@20 for our different models view at source ↗
Figure 5
Figure 5. Figure 5: HardMatch group breakdown. We report mAA@10px for different subgroups of the non-rotated version of HardMatch. We find that our rotation-invariant models achieve significant gains on celestial matching. form poorly due to the large in-plane rotations in SatAst. NoRot only achieves 18.8, comparable with SP+SG, while the rotation-invariant matchers score above 40. 4.4. Next Level Matching on HardMatch HardMa… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative HardMatch pair. Rotation invariance significantly improves robustness to image pairs that lack a canonical orientation, such as star constellations. NoRot fails to find any matches above the threshold τ view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative impact of rotation invariance. Descriptor features (a) trained with rotation augmentation and (b) without, for an image and its 180˝ rotation. We visualize the features using SVD, mapping the top three components to RGB view at source ↗
Figure 8
Figure 8. Figure 8: Performance for arbitrary rotations. We report AUC@20 on MegaDepth-1500 at different rotation angles. We average over 5 runs and include a 95% confidence interval view at source ↗
Figure 9
Figure 9. Figure 9: Visualizing an arbitrary rotation. Qualitative matches using circular cropping with α A “ 0 ˝ and α B “ 210˝ . rotations in view at source ↗
read the original abstract

Finding matching keypoints between images is a core problem in 3D computer vision. However, modern matchers struggle with large in-plane rotations. A straightforward mitigation is to learn rotation invariance via data augmentation. However, it remains unclear at which stage rotation invariance should be incorporated. In this paper, we study this in the context of a modern sparse matching pipeline. We perform extensive experiments by training on a large collection of 3D vision datasets and evaluating on popular image matching benchmarks. Surprisingly, we find that incorporating rotation invariance already in the descriptor yields similar performance to handling it in the matcher. However, rotation invariance is achieved earlier in the matcher when it is learned in the descriptor, allowing for a faster rotation-invariant matcher. Further, we find that enforcing rotation invariance does not hurt upright performance when trained at scale. Finally, we study the emergence of rotation invariance through scale and find that increasing the training data size substantially improves generalization to rotated images. We release two matchers robust to in-plane rotations that achieve state-of-the-art performance on e.g. multi-modal (WxBS), extreme (HardMatch), and satellite image matching (SatAst). Code is available at https://github.com/davnords/loma.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. This paper investigates the stage at which rotation invariance should be incorporated in modern sparse keypoint matching pipelines for 3D computer vision. By training models on a large aggregated collection of 3D vision datasets using data augmentation and evaluating on held-out standard benchmarks (including multi-modal WxBS, extreme HardMatch, and satellite SatAst), the authors find that descriptor-level invariance yields performance comparable to matcher-level handling. They further report that descriptor-level invariance emerges earlier in the pipeline (enabling lighter matchers), does not degrade upright performance when trained at scale, and improves with larger training sets. Two rotation-robust matchers are released that achieve state-of-the-art results on the cited challenging benchmarks.

Significance. If the central empirical claims hold, the work supplies practical design guidance for rotation-robust matchers that can be both accurate and computationally lighter. The scale of the training regime, controlled ablations isolating descriptor versus matcher invariance, scaling experiments on generalization, and public code release constitute clear strengths that support reproducibility and adoption in 3D vision pipelines. The results also provide evidence that descriptor-level invariance can be sufficient without sacrificing upright performance.

minor comments (3)
  1. [Abstract and §4] Abstract and §4 (Experiments): the manuscript refers to 'a large collection of 3D vision datasets' without enumerating them or their relative sizes; a brief table or list would help readers evaluate the training distribution and potential domain coverage.
  2. [§4.2] §4.2 and training details: while the released code enables verification, the paper provides limited explicit description of exact training schedules, optimizer settings, and hyperparameter choices for the descriptor-only versus matcher-augmented variants; adding a concise summary table would improve clarity without altering the claims.
  3. [§5] Figure captions and §5 (Results): several plots compare rotated versus upright performance; ensuring all axis labels explicitly state the rotation range (e.g., 0–180°) and the exact metric (e.g., AUC@5°) would prevent minor misinterpretation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation to accept the manuscript. The summary accurately captures our empirical investigation into the stage at which rotation invariance should be incorporated in sparse keypoint matching pipelines.

Circularity Check

0 steps flagged

No significant circularity; empirical results are self-contained

full rationale

The paper reports controlled ablation experiments that train rotation-invariant descriptors and matchers on aggregated 3D vision datasets and evaluate generalization on held-out benchmarks (e.g., WxBS, HardMatch, SatAst). No equations, fitted parameters, or self-citations are used to derive performance claims; all reported outcomes (descriptor vs. matcher invariance, scaling effects, upright preservation) follow directly from the experimental protocol and external test sets. This is the standard non-circular case for large-scale empirical computer-vision work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical study that relies on the standard domain assumption that data augmentation during training induces the desired rotation invariance; no new mathematical entities or fitted parameters are introduced beyond conventional deep-learning practice.

axioms (1)
  • domain assumption Data augmentation can be used to learn rotation invariance in neural network descriptors and matchers for feature matching.
    This assumption underpins the entire experimental design of training with rotated augmentations.

pith-pipeline@v0.9.0 · 5519 in / 1256 out tokens · 65781 ms · 2026-05-10T15:01:33.260640+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 6 canonical work pages · 2 internal anchors

  1. [1]

    Mapillary planet-scale depth dataset

    Manuel L ´opez Antequera, Pau Gargallo, Markus Hofinger, Samuel Rota Bulo, Yubin Kuang, and Peter Kontschieder. Mapillary planet-scale depth dataset. InECCV, 2020. 4

  2. [2]

    Map-free visual relocalization: Metric pose relative to a single image

    Eduardo Arnold, Jamie Wynn, Sara Vicente, Guillermo Garcia-Hernando, Aron Monszpart, Victor Prisacariu, Dani- yar Turmukhambetov, and Eric Brachmann. Map-free visual relocalization: Metric pose relative to a single image. In ECCV, 2022. 4

  3. [3]

    Scenescript: Reconstructing scenes with an autoregressive structured language model

    Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell Orme, Jakob Engel, Edward Miller, Richard Newcombe, and Vasileios Balntas. Scenescript: Reconstructing scenes with an autoregressive structured language model. InECCV, 2025. 4

  4. [4]

    Surf: Speeded up robust features

    Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. InECCV, 2006. 2

  5. [5]

    Earthmatch: Iterative coregistration for fine-grained localization of astro- naut photography

    Gabriele Berton, Gabriele Goletto, Gabriele Trivigno, Alex Stoken, Barbara Caputo, and Carlo Masone. Earthmatch: Iterative coregistration for fine-grained localization of astro- naut photography. InCVPR, 2024. 5

  6. [6]

    A case for using rota- tion invariant features in state of the art feature matchers

    Georg B ¨okman and Fredrik Kahl. A case for using rota- tion invariant features in state of the art feature matchers. In CVPRW, 2022. 2

  7. [7]

    Affine steerers for structured keypoint descrip- tion

    Georg B ¨okman, Johan Edstedt, Michael Felsberg, and Fredrik Kahl. Affine steerers for structured keypoint descrip- tion. InECCV, 2024. 2

  8. [8]

    Steerers: A framework for rotation equivariant keypoint descriptors

    Georg B ¨okman, Johan Edstedt, Michael Felsberg, and Fredrik Kahl. Steerers: A framework for rotation equivariant keypoint descriptors. InCVPR, 2024. 1, 2, 7

  9. [9]

    Flop- ping for flops: Leveraging equivariance for computational efficiency

    Georg B ¨okman, David Nordstr ¨om, and Fredrik Kahl. Flop- ping for flops: Leveraging equivariance for computational efficiency. InICML, 2025. 7

  10. [10]

    Virtual KITTI 2

    Yohann Cabon, Naila Murray, and Martin Humenberger. Vir- tual kitti 2.arXiv preprint arXiv:2001.10773, 2020. 4

  11. [11]

    Group equivariant convolu- tional networks

    Taco Cohen and Max Welling. Group equivariant convolu- tional networks. InICML. PMLR, 2016. 2

  12. [12]

    Scannet: Richly-annotated 3d reconstructions of indoor scenes

    Angela Dai, Angel X Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017. 4

  13. [13]

    Superpoint: Self-supervised interest point detection and description

    Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabi- novich. Superpoint: Self-supervised interest point detection and description. InCVPR, 2018. 2

  14. [14]

    Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion

    Bardienus Pieter Duisterhof, Lojze Zust, Philippe Weinza- epfel, Vincent Leroy, Yohann Cabon, and Jerome Revaud. Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion. In3dv, 2025. 3

  15. [15]

    DKM: Dense kernelized feature matching for geometry estimation

    Johan Edstedt, Ioannis Athanasiadis, M ˚arten Wadenb ¨ack, and Michael Felsberg. DKM: Dense kernelized feature matching for geometry estimation. InCVPR, 2023. 4

  16. [16]

    Dedode v2: Analyzing and improving the dedode keypoint detector

    Johan Edstedt, Georg B ¨okman, and Zhenjun Zhao. Dedode v2: Analyzing and improving the dedode keypoint detector. InCVPRW, 2024. 3

  17. [17]

    DeDoDe: Detect, Don’t Describe – De- scribe, Don’t Detect for Local Feature Matching

    Johan Edstedt, Georg B ¨okman, M ˚arten Wadenb ¨ack, and Michael Felsberg. DeDoDe: Detect, Don’t Describe – De- scribe, Don’t Detect for Local Feature Matching. In3dv,

  18. [18]

    RoMa: Robust dense feature matching

    Johan Edstedt, Qiyu Sun, Georg B ¨okman, M ˚arten Wadenb¨ack, and Michael Felsberg. RoMa: Robust dense feature matching. InCVPR, 2024. 2, 3, 4

  19. [19]

    Dad: Distilled reinforcement learn- ing for diverse keypoint detection.arXiv preprint arXiv:2503.07347, 2025

    Johan Edstedt, Georg B ¨okman, M ˚arten Wadenb ¨ack, and Michael Felsberg. Dad: Distilled reinforcement learn- ing for diverse keypoint detection.arXiv preprint arXiv:2503.07347, 2025. 1, 3

  20. [20]

    Roma v2: Harder better faster denser feature matching.arXiv preprint arXiv:2511.15706, 2025

    Johan Edstedt, David Nordstr ¨om, Yushan Zhang, Georg B¨okman, Jonathan Astermark, Viktor Larsson, Anders Hey- den, Fredrik Kahl, M ˚arten Wadenb¨ack, and Michael Fels- berg. Roma v2: Harder better faster denser feature matching. arXiv preprint arXiv:2511.15706, 2025. 2, 4, 5

  21. [21]

    Virtual worlds as proxy for multi-object tracking anal- ysis

    Adrien Gaidon, Qiao Wang, Yohann Cabon, and Eleonora Vig. Virtual worlds as proxy for multi-object tracking anal- ysis. InCVPR, 2016. 4

  22. [22]

    Cambridge university press,

    Richard Hartley and Andrew Zisserman.Multiple view ge- ometry in computer vision. Cambridge university press,

  23. [23]

    Megasynth: Scaling up 3d scene reconstruction with synthe- sized data

    Hanwen Jiang, Zexiang Xu, Desai Xie, Ziwen Chen, Haian Jin, Fujun Luan, Zhixin Shu, Kai Zhang, Sai Bi, Xin Sun, Ji- uxiang Gu, Qixing Huang, Georgios Pavlakos, and Hao Tan. Megasynth: Scaling up 3d scene reconstruction with synthe- sized data. InCVPR, 2025. 4

  24. [24]

    Self- supervised learning of image scale and orientation

    Jongmin Lee, Yoonwoo Jeong, and Minsu Cho. Self- supervised learning of image scale and orientation. In BMVC, 2021. 2

  25. [25]

    Self- supervised equivariant learning for oriented keypoint detec- tion

    Jongmin Lee, Byungjin Kim, and Minsu Cho. Self- supervised equivariant learning for oriented keypoint detec- tion. InCVPR, 2022. 1, 2

  26. [26]

    Learning rotation-equivariant features for visual corre- spondence

    Jongmin Lee, Byungjin Kim, Seungwook Kim, and Minsu Cho. Learning rotation-equivariant features for visual corre- spondence. InCVPR, 2023. 1, 2

  27. [27]

    Megadepth: Learning single- view depth prediction from internet photos

    Zhengqi Li and Noah Snavely. Megadepth: Learning single- view depth prediction from internet photos. InCVPR, 2018. 2, 4, 6, 7

  28. [28]

    LightGlue: Local Feature Matching at Light Speed

    Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Polle- feys. LightGlue: Local Feature Matching at Light Speed. In ICCV, 2023. 1, 2

  29. [29]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InICLR, 2019. 4 9

  30. [30]

    Distinctive image features from scale- invariant keypoints.IJCV, 2004

    David G Lowe. Distinctive image features from scale- invariant keypoints.IJCV, 2004. 1, 2

  31. [31]

    A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation

    Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. InCVPR, 2016. 4

  32. [32]

    WxBS: Wide baseline stereo generalizations

    Dmytro Mishkin, Jiri Matas, Michal Perdoch, and Karel Lenc. WxBS: Wide baseline stereo generalizations. In BMVC, 2015. 5

  33. [33]

    Repeata- bility Is Not Enough: Learning Affine Regions via Discrim- inability

    Dmytro Mishkin, Filip Radenovi ´c, and Jiˇri Matas. Repeata- bility Is Not Enough: Learning Affine Regions via Discrim- inability. InECCV, 2018. 2

  34. [34]

    Stronger vits with octic equivariance.arXiv preprint arXiv:2505.15441, 2025

    David Nordstr ¨om, Johan Edstedt, Fredrik Kahl, and Georg B¨okman. Stronger vits with octic equivariance.arXiv preprint arXiv:2505.15441, 2025. 7

  35. [35]

    Loma: Lo- cal feature matching revisited, 2026

    David Nordstr ¨om, Johan Edstedt, Georg B ¨okman, Jonathan Astermark, Anders Heyden, Viktor Larsson, M ˚arten Wadenb¨ack, Michael Felsberg, and Fredrik Kahl. Loma: Lo- cal feature matching revisited, 2026. 1, 2, 3, 4, 5

  36. [36]

    Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction

    Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. InICCV, 2021. 4

  37. [37]

    Susskind

    Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M. Susskind. Hypersim: A photorealistic syn- thetic dataset for holistic indoor scene understanding. In ICCV, 2021. 4

  38. [38]

    ORB: An efficient alternative to SIFT or SURF

    Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. ORB: An efficient alternative to SIFT or SURF. InICCV, 2011. 2

  39. [39]

    Superglue: Learning feature matching with graph neural networks

    Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InCVPR, 2020. 1, 2, 4

  40. [40]

    Oriane Sim ´eoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timoth´ee Darcet, Th´eo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie,...

  41. [41]

    LoFTR: Detector-free local feature matching with transformers

    Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. LoFTR: Detector-free local feature matching with transformers. InCVPR, 2021. 2, 3, 4, 6, 7

  42. [42]

    Smd-nets: Stereo mixture density networks

    Fabio Tosi, Yiyi Liao, Carolin Schmitt, and Andreas Geiger. Smd-nets: Stereo mixture density networks. InCVPR, 2021. 4

  43. [43]

    Deit iii: Revenge of the vit

    Hugo Touvron, Matthieu Cord, and Herv ´e J ´egou. Deit iii: Revenge of the vit. InComputer Vision – ECCV 2022, pages 516–533, Cham, 2022. Springer Nature Switzerland. 1

  44. [44]

    Megascenes: Scene-level view synthesis at scale

    Joseph Tung, Gene Chou, Ruojin Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, and Noah Snavely. Megascenes: Scene-level view synthesis at scale. InECCV, 2024. 4

  45. [45]

    Tilde: A temporally invariant learned detector

    Yannick Verdie, Kwang Yi, Pascal Fua, and Vincent Lepetit. Tilde: A temporally invariant learned detector. InCVPR,

  46. [46]

    Aerialmegadepth: Learning aerial-ground reconstruction and view synthesis

    Khiem Vuong, Anurag Ghosh, Deva Ramanan, Srinivasa Narasimhan, and Shubham Tulsiani. Aerialmegadepth: Learning aerial-ground reconstruction and view synthesis. In CVPR, 2025. 4

  47. [47]

    Vggt: Vi- sual geometry grounded transformer

    Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Vi- sual geometry grounded transformer. InCVPR, 2025. 1

  48. [48]

    Spatialvid: A large-scale video dataset with spatial annotations.arXiv preprint arXiv:2509.09676,

    Jiahao Wang, Yufeng Yuan, Rujie Zheng, Youtian Lin, Jian Gao, Lin-Zhuo Chen, Yajie Bao, Yi Zhang, Chang Zeng, Yanxi Zhou, et al. Spatialvid: A large-scale video dataset with spatial annotations.arXiv preprint arXiv:2509.09676,

  49. [49]

    Tartanair: A dataset to push the limits of visual slam

    Wenshan Wang, Delong Zhu, Xiangwei Wang, Yaoyu Hu, Yuheng Qiu, Chen Wang, Yafei Hu, Ashish Kapoor, and Se- bastian Scherer. Tartanair: A dataset to push the limits of visual slam. InIROS, 2020. 4

  50. [50]

    Maurice Weiler, Patrick Forr ´e, Erik Verlinde, and Max Welling.Equivariant and Coordinate Independent Convo- lutional Networks. 2023. 2

  51. [51]

    Blendedmvs: A large- scale dataset for generalized multi-view stereo networks

    Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. Blendedmvs: A large- scale dataset for generalized multi-view stereo networks. In CVPR, 2020. 4

  52. [52]

    Scannet++: A high-fidelity dataset of 3d indoor scenes

    Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. Scannet++: A high-fidelity dataset of 3d indoor scenes. InICCV, 2023. 4

  53. [53]

    Lift: Learned invariant feature transform

    Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. Lift: Learned invariant feature transform. InECCV,

  54. [54]

    Ufm: A simple path towards unified dense correspondence with flow

    Yuchen Zhang, Nikhil Keetha, Chenwei Lyu, Bhuvan Jhamb, Yutian Chen, Yuheng Qiu, Jay Karhade, Shreyas Jha, Yaoyu Hu, Deva Ramanan, Sebastian Scherer, and Wenshan Wang. Ufm: A simple path towards unified dense correspondence with flow. InNeurIPS, 2025. 2

  55. [55]

    Xiaoming Zhao, Xingming Wu, Weihai Chen, Peter C. Y . Chen, Qingsong Xu, and Zhengguo Li. Aliked: A lighter keypoint and descriptor extraction network via deformable transformation.IEEE TIP, 72, 2023. 1, 2 10