pith. machine review for the scientific record. sign in

arxiv: 2604.03377 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: 3 theorem links

· Lean Theorem

ViBA: Implicit Bundle Adjustment with Geometric and Temporal Consistency for Robust Visual Matching

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:20 UTC · model grok-4.3

classification 💻 cs.CV
keywords bundle adjustmentvisual odometryfeature learningimplicit differentiationgeometric consistencytemporal consistencykeypoint detectiononline learning
0
0 comments X

The pith

ViBA embeds implicit bundle adjustment into feature learning to enforce geometric and temporal consistency for visual matching.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ViBA as a framework for training keypoint detectors and descriptors directly on unconstrained video streams by folding geometric optimization into the learning process. An initial tracking network finds inter-frame correspondences, depth-based filtering removes outliers, and a differentiable global bundle adjustment jointly refines camera poses and feature positions by minimizing reprojection errors. Long-term temporal consistency across frames is added to stabilize the learned representations. If correct, this removes the need for expensive pose and depth labels while improving accuracy in visual odometry and localization pipelines.

Core claim

ViBA shows that an implicitly differentiable geometric residual framework, consisting of tracking, depth-based outlier filtering, and global bundle adjustment that minimizes reprojection errors, can be combined with long-term temporal consistency to produce stable and accurate feature representations that reduce absolute translation error by 12-18% and absolute rotation error by 5-10% on EuRoC and UMA datasets while running at 36-91 FPS and retaining over 90% accuracy on unseen sequences.

What carries the argument

implicitly differentiable global bundle adjustment that jointly refines camera poses and feature positions by minimizing reprojection errors

If this is right

  • Enables continuous online learning of features without requiring accurate pose or depth annotations
  • Delivers real-time inference speeds of 36-91 FPS on standard hardware
  • Maintains over 90% localization accuracy on sequences not seen during training
  • Improves navigation performance through more stable keypoints and descriptors

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same implicit-differentiation pattern could be applied to other geometric losses such as those in multi-view stereo or visual-inertial fusion.
  • Removing the depth filter entirely might allow extension to fully monocular settings if alternative consistency checks are introduced.
  • The approach suggests that end-to-end differentiable optimization can replace separate supervised training stages in many vision-based localization systems.

Load-bearing premise

Depth-based outlier filtering can reliably remove incorrect correspondences without discarding too many valid ones while keeping implicit differentiation numerically stable during online training on unconstrained streams.

What would settle it

If training ViBA on a sequence with noisy or missing depth estimates causes either the error reductions to vanish or the optimization to become unstable, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2604.03377 by Hailiang Tang, Tisheng Zhang, Xiaoji Niu, Yan Wang, Yuqing Wang.

Figure 1
Figure 1. Figure 1: The pipeline of the proposed method. The snowflake denotes a frozen network, the flame indicates an activated network, and the spark associated with the loss marks the transition from frozen to activated. of iterative solvers [27]. This property makes it particularly attractive for integrating global geometric optimization into learning-based perception systems. Although implicit differ￾entiation has been … view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of the point matching network. It comprises a shared encoder and a feature matching module. Blocks [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Feature tracking and geometric initialization. The [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overall training loss construction pipeline integrating differentiable bundle adjustment with multi-frame temporal [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Trade-offs between accuracy and runtime at varying [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Overview of the geometric initialization process. An anchor frame and a terminal frame are selected to estimate the [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

Most existing image keypoint detection and description methods rely on datasets with accurate pose and depth annotations, limiting scalability and generalization, and often degrading navigation and localization performance. We propose ViBA, a sustainable learning framework that integrates geometric optimization with feature learning for continuous online training on unconstrained video streams. Embedded in a standard visual odometry pipeline, it consists of an implicitly differentiable geometric residual framework: (i) an initial tracking network for inter-frame correspondences, (ii) depth-based outlier filtering, and (iii) differentiable global bundle adjustment that jointly refines camera poses and feature positions by minimizing reprojection errors. By combining geometric consistency from BA with long-term temporal consistency across frames, ViBA enforces stable and accurate feature representations. We evaluate ViBA on EuRoC and UMA datasets. Compared with state-of-the-art methods such as SuperPoint+SuperGlue, ALIKED, and LightGlue, ViBA reduces mean absolute translation error (ATE) by 12-18% and absolute rotation error (ARE) by 5-10% across sequences, while maintaining real-time inference speeds (FPS 36-91). When evaluated on unseen sequences, it retains over 90% localization accuracy, demonstrating robust generalization. These results show that ViBA supports continuous online learning with geometric and temporal consistency, consistently improving navigation and localization in real-world scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes ViBA, a framework for continuous online training of keypoint detectors and descriptors on unconstrained video streams. It embeds an initial tracking network, depth-based outlier filtering, and an implicitly differentiable global bundle adjustment (jointly optimizing poses and feature positions via reprojection errors) into a visual odometry pipeline, claiming that the combination of geometric consistency from BA and long-term temporal consistency yields stable feature representations. On EuRoC and UMA datasets, ViBA reports 12-18% lower ATE and 5-10% lower ARE versus SuperPoint+SuperGlue, ALIKED, and LightGlue while running at 36-91 FPS and retaining >90% accuracy on unseen sequences.

Significance. If the implicit differentiation of global BA remains numerically stable under online training, the method offers a practical route to self-supervised feature learning that directly optimizes for downstream localization accuracy rather than proxy losses, potentially reducing reliance on expensive pose/depth annotations and improving generalization in real-world navigation.

major comments (2)
  1. [Abstract / §3 (Differentiable BA)] The description of the implicitly differentiable global bundle adjustment (abstract and §3) provides no details on Hessian conditioning, damping strategies, iteration limits, or gradient regularization. Without these, it is unclear whether the implicit differentiation remains stable when processing noisy initial tracks in streaming video, which directly affects the central claim of reliable continuous online training and the reported ATE/ARE gains.
  2. [Abstract / §3 (Outlier filtering)] The depth-based outlier filtering step is listed as a core component (abstract and §3), yet no quantitative analysis or ablation is supplied on its false-positive versus false-negative rates or on how it avoids discarding valid correspondences in low-texture or fast-motion sequences. This assumption is load-bearing for the robustness claim.
minor comments (2)
  1. [Abstract] The abstract states real-time speeds (FPS 36-91) but does not clarify whether these timings include the full BA optimization or only the forward pass of the tracking network.
  2. [Tables/Figures] Table captions and figure legends should explicitly state the number of sequences and the exact metric definitions (e.g., whether ATE is root-mean-square or mean absolute) to allow direct comparison with cited baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript accordingly to improve clarity and provide the requested analyses.

read point-by-point responses
  1. Referee: [Abstract / §3 (Differentiable BA)] The description of the implicitly differentiable global bundle adjustment (abstract and §3) provides no details on Hessian conditioning, damping strategies, iteration limits, or gradient regularization. Without these, it is unclear whether the implicit differentiation remains stable when processing noisy initial tracks in streaming video, which directly affects the central claim of reliable continuous online training and the reported ATE/ARE gains.

    Authors: We agree that additional implementation details are needed for reproducibility and to support the stability claim. In the revised manuscript, we will expand Section 3 with specifics on the Levenberg-Marquardt solver used for implicit differentiation, including the damping parameter schedule (initial lambda = 1e-3 with adaptive adjustment), a maximum of 8 iterations per optimization step for online efficiency, and gradient regularization via norm clipping at 0.5. We will also include a short convergence analysis on noisy tracks from the EuRoC sequences to demonstrate numerical stability. revision: yes

  2. Referee: [Abstract / §3 (Outlier filtering)] The depth-based outlier filtering step is listed as a core component (abstract and §3), yet no quantitative analysis or ablation is supplied on its false-positive versus false-negative rates or on how it avoids discarding valid correspondences in low-texture or fast-motion sequences. This assumption is load-bearing for the robustness claim.

    Authors: We acknowledge that a quantitative evaluation of the outlier filter is missing and would strengthen the robustness argument. In the revision, we will add an ablation subsection (new §4.3) reporting precision/recall metrics for the depth-based filter on EuRoC sequences stratified by texture level and motion speed. This will show that the chosen depth threshold yields low false-positive rates (<8%) while preserving >92% of valid correspondences even in challenging low-texture and fast-motion cases. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes a framework combining an initial tracking network, depth-based outlier filtering, and differentiable global bundle adjustment to enforce geometric and temporal consistency. Performance is reported via ATE/ARE reductions on external public datasets (EuRoC, UMA) against independent SOTA baselines (SuperPoint+SuperGlue, ALIKED, LightGlue), with no equations or steps shown that reduce predictions to fitted inputs by construction. No self-citations are invoked as load-bearing for uniqueness or ansatz; the central claims rest on empirical comparison rather than tautological redefinition of inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Relies on standard computer vision assumptions about geometry and differentiability of reprojection errors; no new entities postulated.

free parameters (1)
  • network weights
    The initial tracking network parameters are learned from data during online training.
axioms (1)
  • domain assumption The bundle adjustment residual can be made differentiable
    Invoked to allow gradient flow for joint optimization with the learning network.

pith-pipeline@v0.9.0 · 5551 in / 1291 out tokens · 66880 ms · 2026-05-13T20:20:05.407267+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

  1. [1]

    Nister, O

    D. Nister, O. Naroditsky, and J. Bergen. Visual odometry. InProceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 1, pages I–I, 2004

  2. [2]

    Vins-mono: A robust and versatile monocular visual-inertial state estimator.IEEE Transactions on Robotics, 34(4):1004–1020, 2018

    Tong Qin, Peiliang Li, and Shaojie Shen. Vins-mono: A robust and versatile monocular visual-inertial state estimator.IEEE Transactions on Robotics, 34(4):1004–1020, 2018

  3. [3]

    Ra ´ul Mur-Artal, J. M. M. Montiel, and Juan D. Tard ´os. Orb-slam: A versatile and accurate monocular slam system.IEEE Transactions on Robotics, 31(5):1147–1163, 2015

  4. [4]

    Tard ´os

    Ra ´ul Mur-Artal and Juan D. Tard ´os. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras.IEEE Transactions on Robotics, 33(5):1255–1262, 2017

  5. [5]

    Ogi-slam2: A hybrid map slam framework grounded in inertial-based slam.IEEE Transactions on Instrumentation and Measurement, 71:1– 14, 2022

    Yiming Ding, Zhi Xiong, Jun Xiong, Yan Cui, and Zhiguo Cao. Ogi-slam2: A hybrid map slam framework grounded in inertial-based slam.IEEE Transactions on Instrumentation and Measurement, 71:1– 14, 2022

  6. [6]

    Parallel tracking and mapping for small ar workspaces

    Georg Klein and David Murray. Parallel tracking and mapping for small ar workspaces. In2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pages 225–234, 2007

  7. [7]

    Direct sparse odometry, 2016

    Jakob Engel, Vladlen Koltun, and Daniel Cremers. Direct sparse odometry, 2016

  8. [8]

    Davi- son

    Ankur Handa, Thomas Whelan, John McDonald, and Andrew J. Davi- son. A benchmark for rgb-d visual odometry, 3d reconstruction and slam. In2014 IEEE International Conference on Robotics and Automation (ICRA), pages 1524–1531, 2014

  9. [9]

    Multi-sensor fusion towards vins: A concise tutorial, survey, framework and challenges

    Nam Van Dinh and Gon-Woo Kim. Multi-sensor fusion towards vins: A concise tutorial, survey, framework and challenges. In2020 IEEE International Conference on Big Data and Smart Computing (BigComp), pages 459–462, 2020

  10. [10]

    David G. Lowe. Distinctive image features from scale-invariant key- points.International Journal of Computer Vision, 60:91–110, 2004

  11. [11]

    Hu Zhang, Zhaohui Tang, Yongfang Xie, and Weihua Gui. Rpi-surf: A feature descriptor for bubble velocity measurement in froth flotation with relative position information.IEEE Transactions on Instrumentation and Measurement, 70:1–14, 2021

  12. [12]

    A novel shi-tomasi corner detection algorithm based on progressive probabilistic hough transform

    Zixin Mu and Zifan Li. A novel shi-tomasi corner detection algorithm based on progressive probabilistic hough transform. In2018 Chinese Automation Congress (CAC), pages 2918–2922, 2018

  13. [13]

    Super- point: Self-supervised interest point detection and description.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 337–33712, 2017

    Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Super- point: Self-supervised interest point detection and description.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 337–33712, 2017

  14. [14]

    Xiaoming Zhao, Xingming Wu, Weihai Chen, Peter C. Y . Chen, Qing- song Xu, and Zhengguo Li. Aliked: A lighter keypoint and descriptor extraction network via deformable transformation.IEEE Transactions on Instrumentation and Measurement, 72:1–16, 2023

  15. [15]

    Aslfeat: Learning local features of accurate shape and localization.2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6588–6597, 2020

    Zixin Luo, Lei Zhou, Xuyang Bai, Hongkai Chen, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, and Long Quan. Aslfeat: Learning local features of accurate shape and localization.2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6588–6597, 2020. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14

  16. [16]

    Dedode: Detect, don’t describe — describe, don’t detect for local feature matching

    Johan Edstedt, Georg B ¨okman, M ˚arten Wadenb¨ack, and Michael Fels- berg. Dedode: Detect, don’t describe — describe, don’t detect for local feature matching. In2024 International Conference on 3D Vision (3DV), pages 148–157, 2024

  17. [17]

    Gim: Learning generalizable image matcher from internet videos.ArXiv, abs/2402.11095, 2024

    Xuelun Shen, Zhipeng Cai, Wei Yin, Matthias M ¨uller, Zijun Li, Kaixuan Wang, Xiaozhi Chen, and Cheng Wang. Gim: Learning generalizable image matcher from internet videos.ArXiv, abs/2402.11095, 2024

  18. [18]

    Breaking of brightness consistency in optical flow with a lightweight cnn network

    Yicheng Lin, Shuo Wang, Yunlong Jiang, and Bin Han. Breaking of brightness consistency in optical flow with a lightweight cnn network. IEEE Robot. Autom. Lett., 9(8):6840–6847, 2024

  19. [19]

    Superglue: Learning feature matching with graph neural networks, 2020

    Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks, 2020

  20. [20]

    Learning to find good correspondences, 2018

    Kwang Moo Yi, Eduard Trulls, Yuki Ono, Vincent Lepetit, Mathieu Salzmann, and Pascal Fua. Learning to find good correspondences, 2018

  21. [21]

    Cl ´ement Godard, Oisin Mac Aodha, and Gabriel J. Brostow. Unsuper- vised monocular depth estimation with left-right consistency. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6602–6611, 2017

  22. [22]

    Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction, 2018

    Huangying Zhan, Ravi Garg, Chamara Saroj Weerasekera, Kejie Li, Harsh Agarwal, and Ian Reid. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction, 2018

  23. [23]

    L2-net: Deep learning of discriminative patch descriptor in euclidean space

    Yurun Tian, Bin Fan, and Fuchao Wu. L2-net: Deep learning of discriminative patch descriptor in euclidean space. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6128–6136, 2017

  24. [25]

    Bundle adjustment method using sparse bfgs solution.Remote Sensing Letters, 9(8):789–798, 2018

    Yanyan Li, Shiyue Fan, Yanbiao Sun, Wang Qiang, and Shanlin Sun. Bundle adjustment method using sparse bfgs solution.Remote Sensing Letters, 9(8):789–798, 2018

  25. [26]

    Ba-net: Dense bundle adjustment network, 2019

    Chengzhou Tang and Ping Tan. Ba-net: Dense bundle adjustment network, 2019

  26. [27]

    Learning correspondence uncertainty via differentiable nonlinear least squares, 2023

    Dominik Muhle, Lukas Koestler, Krishna Murthy Jatavallabhula, and Daniel Cremers. Learning correspondence uncertainty via differentiable nonlinear least squares, 2023

  27. [28]

    Zhang, Y

    Y . Zhang, Y . Hu, Y . Song, et al. Learning vision-based agile flight via differentiable physics.Nature Machine Intelligence, 7:954–966, 2025

  28. [29]

    Howell, Simon Le Cleac’h, Jan Br ¨udigam, Qianzhong Chen, Jiankai Sun, J

    Taylor A. Howell, Simon Le Cleac’h, Jan Br ¨udigam, Qianzhong Chen, Jiankai Sun, J. Zico Kolter, Mac Schwager, and Zachary Manchester. Dojo: A differentiable physics engine for robotics, 2025

  29. [30]

    Tony Lindeberg.Scale Invariant Feature Transform, volume 7. 05 2012

  30. [31]

    Speeded-up robust features (surf).Computer Vision and Image Un- derstanding, 110(3):346–359, 2008

    Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. Speeded-up robust features (surf).Computer Vision and Image Un- derstanding, 110(3):346–359, 2008. Similarity Matching in Computer Vision and Multimedia

  31. [32]

    Orb: An efficient alternative to sift or surf

    Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: An efficient alternative to sift or surf. In2011 International Conference on Computer Vision, pages 2564–2571, 2011

  32. [33]

    D2-net: A trainable cnn for joint description and detection of local features

    Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Pollefeys, Josef Sivic, Akihiko Torii, and Torsten Sattler. D2-net: A trainable cnn for joint description and detection of local features. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8084–8093, 2019

  33. [34]

    Curran Associates Inc., Red Hook, NY , USA, 2019

    Jerome Revaud, Philippe Weinzaepfel, C ´esar De Souza, and Martin Humenberger.R2D2: repeatable and reliable detector and descriptor. Curran Associates Inc., Red Hook, NY , USA, 2019

  34. [35]

    Dk-slam: Monocular visual slam with deep keypoint learning, tracking, and loop closing.Applied Sciences, 15(14), 2025

    Hao Qu, Lilian Zhang, Jun Mao, Junbo Tie, Xiaofeng He, Xiaoping Hu, Yifei Shi, and Changhao Chen. Dk-slam: Monocular visual slam with deep keypoint learning, tracking, and loop closing.Applied Sciences, 15(14), 2025

  35. [36]

    DXSLAM: A robust and efficient visual SLAM system with deep features.arXiv preprint arXiv:2008.05416, 2020

    Dongjiang Li, Xuesong Shi, Qiwei Long, Shenghui Liu, Wei Yang, Fangshi Wang, Qi Wei, and Fei Qiao. DXSLAM: A robust and efficient visual SLAM system with deep features.arXiv preprint arXiv:2008.05416, 2020

  36. [37]

    Jan Czarnowski, Tristan Laidlow, Ronald Clark, and Andrew J. Davison. Deepfactors: Real-time probabilistic dense monocular slam.IEEE Robotics and Automation Letters, 5(2):721–728, April 2020

  37. [38]

    Deep patch visual odometry, 2023

    Zachary Teed, Lahav Lipson, and Jia Deng. Deep patch visual odometry, 2023

  38. [39]

    Droid-slam: Deep visual slam for monoc- ular, stereo, and rgb-d cameras, 2022

    Zachary Teed and Jia Deng. Droid-slam: Deep visual slam for monoc- ular, stereo, and rgb-d cameras, 2022

  39. [40]

    Tinghui Zhou, Matthew Brown, Noah Snavely, and David G. Lowe. Unsupervised learning of depth and ego-motion from video, 2017

  40. [41]

    Digging into self-supervised monocular depth estimation, 2019

    Cl ´ement Godard, Oisin Mac Aodha, Michael Firman, and Gabriel Brostow. Digging into self-supervised monocular depth estimation, 2019

  41. [42]

    Glu-net: Global- local universal network for dense flow and correspondences, 2021

    Prune Truong, Martin Danelljan, and Radu Timofte. Glu-net: Global- local universal network for dense flow and correspondences, 2021

  42. [43]

    Pdcnet: A lightweight and efficient robotic grasp detection framework via partial convolution and knowledge distillation.Computer Vision and Image Understanding, 259:104441, 2025

    Yanshu Jiang, Yanze Fang, and Liwei Deng. Pdcnet: A lightweight and efficient robotic grasp detection framework via partial convolution and knowledge distillation.Computer Vision and Image Understanding, 259:104441, 2025

  43. [44]

    Xiaoming Zhao, Xingming Wu, Jinyu Miao, Weihai Chen, Peter C. Y . Chen, and Zhengguo Li. Alike: Accurate and lightweight keypoint detection and descriptor extraction.IEEE Transactions on Multimedia, 25:3101–3112, 2023

  44. [45]

    Luis Pineda, Taosha Fan, Maurizio Monge, Shobha Venkataraman, Paloma Sodhi, Ricky T. Q. Chen, Joseph Ortiz, Daniel DeTone, Austin Wang, Stuart Anderson, Jing Dong, Brandon Amos, and Mustafa Mukadam. Theseus: A library for differentiable nonlinear optimization, 2023

  45. [46]

    Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors

    Vassileios Balntas, Karel Lenc, Andrea Vedaldi, and Krystian Mikola- jczyk. Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors. InCVPR 2017, pages 3852–3861, 2017

  46. [47]

    Cambridge University Press, 2 edition, 2004

    Richard Hartley and Andrew Zisserman.Multiple View Geometry in Computer Vision. Cambridge University Press, 2 edition, 2004

  47. [48]

    Fischler and Robert C

    Martin A. Fischler and Robert C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Commun. ACM, 24(6):381–395, June 1981

  48. [49]

    Light- glue: Local feature matching at light speed

    Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. Light- glue: Local feature matching at light speed. InICCV 2023, pages 17581– 17592, 2023

  49. [50]

    Zippypoint: Fast interest point detection, description, and matching through mixed precision discretization

    Menelaos Kanakis, Simon Maurer, Matteo Spallanzani, Ajad Chhatkuli, and Luc Van Gool. Zippypoint: Fast interest point detection, description, and matching through mixed precision discretization. InCVPRW 2023, pages 6114–6123, 2023

  50. [51]

    Nascimento

    Guilherme Potje, Felipe Cadar, Andr ´e Araujo, Renato Martins, and Erickson R. Nascimento. Xfeat: Accelerated features for lightweight image matching. InCVPR 2024, pages 2682–2691, 2024

  51. [52]

    Scene coordinate regression forests for camera relocalization in rgb-d images

    Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. Scene coordinate regression forests for camera relocalization in rgb-d images. In2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 2930–2937, 2013

  52. [53]

    Openvins: A research platform for visual-inertial es- timation

    Patrick Geneva, Kevin Eckenhoff, Woosik Lee, Yulin Yang, and Guo- quan Huang. Openvins: A research platform for visual-inertial es- timation. In2020 IEEE International Conference on Robotics and Automation (ICRA), pages 4666–4672, 2020

  53. [54]

    The euroc micro aerial vehicle datasets.The International Journal of Robotics Research, 2016

    Michael Burri, Janosch Nikolic, Pascal Gohl, Thomas Schneider, Joern Rehder, Sammy Omari, Markus W Achtelik, and Roland Siegwart. The euroc micro aerial vehicle datasets.The International Journal of Robotics Research, 2016

  54. [55]

    The uma-vi dataset: Visual–inertial odometry in low-textured and dynamic illumination environments.The International Journal of Robotics Research, 39(9):1052–1060, 2020

    David Zu ˜niga-No¨el, Alberto Jaenal, Ruben Gomez-Ojeda, and Javier Gonzalez-Jimenez. The uma-vi dataset: Visual–inertial odometry in low-textured and dynamic illumination environments.The International Journal of Robotics Research, 39(9):1052–1060, 2020