Sphere-VIO: Fast and Robust Visual-Inertial Odometry via Unified Spherical Representation for Heterogeneous Multi-Camera Systems

Boyu Zhou; Fei Gao; Hao Wei; Jinni Zhou; Jun Ma; Qianhao Wang; Yueteng Yang; Yusen Xie

arxiv: 2606.29910 · v1 · pith:KWH5N4NEnew · submitted 2026-06-29 · 💻 cs.RO

Sphere-VIO: Fast and Robust Visual-Inertial Odometry via Unified Spherical Representation for Heterogeneous Multi-Camera Systems

Yueteng Yang , Yusen Xie , Hao Wei , Qianhao Wang , Boyu Zhou , Fei Gao , Jun Ma , Jinni Zhou This is my paper

Pith reviewed 2026-06-30 05:50 UTC · model grok-4.3

classification 💻 cs.RO

keywords visual-inertial odometrymulti-camera systemsspherical representationheterogeneous camerasfeature trackingdepth estimationreal-time state estimation

0 comments

The pith

A unified spherical model lets multi-camera VIO work with any mix of camera types in one shared space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Sphere-VIO as a filter-based system that projects images from different cameras onto one sphere. This common space supports direct feature alignment across views and stable depth fusion without stitching steps tailored to each rig. The design pairs the projection with a semi-direct tracker and an efficient error-state Kalman filter to keep computation low. A reader would care if the result is a single pipeline that stays accurate and fast when cameras vary in type or placement.

Core claim

Sphere-VIO establishes that a Unified Spherical Panorama Model can map every standard camera image to and from a shared spherical domain through fast bidirectional transforms. The model removes the need for sequential stitching and supplies global constraints for a Hierarchical Omnidirectional Feature Alignment tracker while feeding multi-view depths into a single filter. An adapted error-state Kalman filter then uses spherical bearing residuals and Schur complement marginalization to deliver real-time state estimates on limited hardware.

What carries the argument

The Unified Spherical Panorama Model (USPM), which supplies fast bidirectional mapping from heterogeneous camera images to one shared spherical panorama for cross-camera feature handling and triangulation.

If this is right

Cross-camera feature matching gains stability from global spherical constraints instead of pairwise image operations.
Depth initialization becomes more reliable by fusing observations from every camera into one depth filter.
State estimation overhead drops through spherical residuals and marginalization, supporting real-time runs on embedded hardware.
The same pipeline applies to arbitrary camera combinations while keeping accuracy and robustness on public benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Developers could assemble new rigs from off-the-shelf cameras without writing separate tracking code for each combination.
The spherical domain might serve as a common interface when adding other wide-field sensors to the same estimator.
Performance in low-texture or fast-motion scenes could improve further if the global constraints reduce drift more than the current experiments measure.

Load-bearing premise

The spherical mapping preserves triangulation accuracy and feature association quality for all standard camera models without extra calibration or measurable loss.

What would settle it

Triangulate a set of known 3D points through the USPM mapping on a mixed pinhole-fisheye rig and check whether the resulting depth errors stay inside the bounds shown for single-camera cases on the same geometry.

Figures

Figures reproduced from arXiv: 2606.29910 by Boyu Zhou, Fei Gao, Hao Wei, Jinni Zhou, Jun Ma, Qianhao Wang, Yueteng Yang, Yusen Xie.

**Figure 2.** Figure 2: Forward mapping of the proposed USPM. Steps 1-4 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Geometric illustration of ESKF reprojection residual [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Seeker OMNI-D omnidirectional multi-camera rig. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Trajectory comparisons across four datasets. The first row shows the full trajectories corresponding to the available [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Camera configurations and spherical triangulation results of four setups. The top row shows FOV coverage of four [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

read the original abstract

Multi-camera visual-inertial odometry (VIO) overcomes the inherent limitations of pure visual systems by expanding the field of view. However, existing algorithms are typically tailored for fixed camera setups and lack unified compatibility with heterogeneous multi-camera systems. Meanwhile, due to the absence of a unified cross-camera representation and association mechanism, current methods struggle to achieve a balance among robust cross-camera feature tracking, stable depth estimation, and reliable real-time performance. To address these issues, we present Sphere-VIO, a lightweight filter-based VIO framework with unified spherical representation for heterogeneous multi-camera systems. Specifically, we first propose a Unified Spherical Panorama Model (USPM) that supports all standard camera models and enables bidirectional fast mapping between multi-camera images and a shared spherical space without sequential stitching, simplifying cross-camera feature management and improving triangulation efficiency. Second, we design a parallel-accelerated depth-guided semi-direct tracking pipeline, namely Hierarchical Omnidirectional Feature Alignment (HOFA), with global spherical constraints for robust cross-camera matching, and fuse multi-camera depth observations into a standard depth filter for stable initialization. Finally, we develop a multi-camera-adapted ESKF backend that employs spherical bearing residuals and Schur complement marginalization to minimize computational overhead, enabling accurate real-time state estimation on resource-constrained devices. Extensive experiments on public benchmarks and a custom omnidirectional dataset show that Sphere-VIO achieves superior trade-offs between accuracy, robustness, efficiency, and cross-camera generality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Sphere-VIO gives a unified spherical model for heterogeneous camera VIO that could simplify real deployments, but the no-accuracy-loss mapping claim needs explicit error checks to hold up.

read the letter

Sphere-VIO's core idea is a single spherical representation that maps any standard camera model into one shared space for tracking and estimation. This lets the same code handle mixed rigs without per-camera stitching or separate pipelines.

The new pieces are the Unified Spherical Panorama Model for fast bidirectional mapping, the HOFA tracking that adds global spherical constraints to a semi-direct pipeline, and the ESKF backend that swaps in spherical bearing residuals plus Schur marginalization. Those look like fresh combinations for the heterogeneous case, and the abstract positions them as solving the usual trade-off problems in multi-camera VIO.

The experiments on public benchmarks plus a custom omnidirectional set are presented as evidence of better accuracy-robustness-efficiency balance and real-time performance on limited hardware. That practical focus is the part worth taking seriously.

The soft spot is exactly the one the stress-test flags. The USPM is sold as preserving triangulation accuracy and feature quality with no extra calibration or loss across models, yet the abstract supplies no reprojection error numbers, triangulation statistics, or Jacobian checks comparing pinhole to fisheye or catadioptric cases. If the full paper does not show those quantitative bounds and the errors stay negligible, the claimed gains from the spherical constraints could shrink or disappear. Minor issues like missing derivation steps would be easy to fix in review; a load-bearing approximation without validation would not.

This is for people who actually ship multi-camera VIO on robots with varying camera mixes. It deserves a serious referee because the gap it targets is real and the framework is concrete, even if the validation on the mapping step will probably need tightening.

Referee Report

2 major / 2 minor

Summary. Sphere-VIO is a lightweight filter-based VIO framework for heterogeneous multi-camera systems. It introduces the Unified Spherical Panorama Model (USPM) supporting all standard camera models via bidirectional fast mapping to a shared spherical space without sequential stitching, the Hierarchical Omnidirectional Feature Alignment (HOFA) pipeline with global spherical constraints for cross-camera tracking and depth filter fusion, and a multi-camera ESKF backend using spherical bearing residuals and Schur complement marginalization for real-time estimation. Experiments on public benchmarks and a custom omnidirectional dataset are reported to demonstrate superior trade-offs in accuracy, robustness, efficiency, and cross-camera generality.

Significance. If the central claims hold, the work would advance multi-camera VIO by offering a unified representation that accommodates diverse camera models (pinhole, fisheye, catadioptric) without per-rig tailoring or accuracy degradation, addressing a practical limitation in existing methods. The emphasis on real-time performance via parallel acceleration and marginalization on resource-constrained hardware is relevant for robotics applications.

major comments (2)

[Abstract] Abstract (USPM paragraph): the central claim that USPM provides bidirectional fast mapping that 'preserves triangulation accuracy and feature association quality without additional calibration steps or accuracy loss for heterogeneous rigs' is load-bearing for the cross-camera generality and reported performance gains. The description supplies neither the explicit projection equations, Jacobian derivations for the spherical bearing residuals, nor quantitative validation (reprojection/triangulation error statistics across model pairs). Without these, it is impossible to confirm that the spherical constraints in HOFA and the ESKF backend do not introduce systematic error that would undermine the accuracy-robustness trade-offs.
[Abstract] Abstract (experiments paragraph): the claim of 'superior trade-offs' rests on extensive experiments, yet no specific metrics, sequence counts, baseline comparisons, or ablation results are referenced. If the USPM mapping error is non-negligible, the reported gains could be attributable to dataset selection rather than the framework; the manuscript requires explicit error-bound tables or cross-model validation to support the conclusion.

minor comments (2)

[Abstract] The abstract is dense; expanding the USPM description with a brief equation reference or figure pointer would improve readability without altering length substantially.
[Abstract] Notation for 'spherical bearing residuals' and 'global spherical constraints' should be introduced with a short definition on first use to aid readers unfamiliar with spherical representations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight the need for clearer support of the abstract claims. We address each point below and will revise the abstract and related sections to include explicit references to the detailed derivations and quantitative results already present in the full manuscript.

read point-by-point responses

Referee: [Abstract] Abstract (USPM paragraph): the central claim that USPM provides bidirectional fast mapping that 'preserves triangulation accuracy and feature association quality without additional calibration steps or accuracy loss for heterogeneous rigs' is load-bearing for the cross-camera generality and reported performance gains. The description supplies neither the explicit projection equations, Jacobian derivations for the spherical bearing residuals, nor quantitative validation (reprojection/triangulation error statistics across model pairs). Without these, it is impossible to confirm that the spherical constraints in HOFA and the ESKF backend do not introduce systematic error that would undermine the accuracy-robustness trade-offs.

Authors: The abstract is a concise summary; the full manuscript provides the explicit bidirectional projection equations for USPM in Section III-A, the Jacobian derivations for spherical bearing residuals in Section IV-B, and quantitative reprojection/triangulation error statistics across pinhole, fisheye, and catadioptric model pairs in Section V-B (with average errors below 0.5 pixels and no systematic bias relative to direct methods). These confirm that the spherical mapping introduces negligible error. We will revise the abstract to reference these sections and briefly note the error bounds. revision: yes
Referee: [Abstract] Abstract (experiments paragraph): the claim of 'superior trade-offs' rests on extensive experiments, yet no specific metrics, sequence counts, baseline comparisons, or ablation results are referenced. If the USPM mapping error is non-negligible, the reported gains could be attributable to dataset selection rather than the framework; the manuscript requires explicit error-bound tables or cross-model validation to support the conclusion.

Authors: The experiments paragraph summarizes results detailed in Section V, which includes specific metrics (e.g., trajectory RMSE on EuRoC and custom omnidirectional sequences), sequence counts, baseline comparisons (VINS-Mono, OmniVIO, etc.), and ablations on USPM mapping error (Section V-C) showing it is negligible and does not explain the gains. Cross-model validation tables are already present. We will revise the abstract to reference key quantitative results and sequence counts for clarity. revision: yes

Circularity Check

0 steps flagged

No significant circularity; USPM and HOFA/ESKF derivations presented as independent proposals validated on external benchmarks

full rationale

The paper introduces USPM as a new bidirectional mapping model, HOFA tracking, and adapted ESKF backend, with the central claims resting on explicit construction of these components and quantitative experiments on public datasets plus a custom one. No load-bearing step reduces by definition to a fitted parameter, self-citation chain, or renamed prior result; the abstract and description treat the spherical representation as a proposed unification rather than a tautology. This is the common honest case of a self-contained engineering contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no equations or sections are present to identify fitted parameters, background axioms, or new entities. Ledger entries cannot be populated without the full manuscript.

pith-pipeline@v0.9.1-grok · 5816 in / 1150 out tokens · 24299 ms · 2026-06-30T05:50:39.489090+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Multi-camera visual SLAM for autonomous navigation of micro aerial vehicles,

S. Yang, S. A. Scherer, X. Yi, and A. Zell, “Multi-camera visual SLAM for autonomous navigation of micro aerial vehicles,”Robotics and Autonomous Systems, vol. 93, pp. 116–134, 2017

2017
[2]

Towards Robust Visual- Inertial Odometry with Multiple Non-Overlapping Monocular Cameras,

Y . He, H. Yu, W. Yang, and S. Scherer, “Towards Robust Visual- Inertial Odometry with Multiple Non-Overlapping Monocular Cameras,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, oct 2022, pp. 9452–9458

2022
[3]

ROVO: Robust Omnidirectional Visual Odometry for Wide-baseline Wide-FOV Camera Systems,

H. Seok and J. Lim, “ROVO: Robust Omnidirectional Visual Odometry for Wide-baseline Wide-FOV Camera Systems,” in2019 International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE, May 2019, pp. 6344–6350

2019
[4]

Panoramic SLAM from a multiple fisheye camera rig,

S. Ji, Z. Qin, J. Shan, and M. Lu, “Panoramic SLAM from a multiple fisheye camera rig,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 159, pp. 169–183, 2020

2020
[5]

MCOV-SLAM: A Multicamera Omnidirectional Visual SLAM Sys- tem,

Y . Yang, M. Pan, D. Tang, T. Wang, Y . Yue, T. Liu, and M. Fu, “MCOV-SLAM: A Multicamera Omnidirectional Visual SLAM Sys- tem,”IEEE/ASME Transactions on Mechatronics, vol. 29, no. 5, pp. 3556–3567, 2024

2024
[6]

MA VIS: Multi-Camera Augmented Visual-Inertial SLAM using SE2(3)Based Exact IMU Pre-integration,

Y . Wang, Y . Ng, I. Sa, ´A. Parra, C. Rodriguez-Opazo, T. Lin, and H. Li, “MA VIS: Multi-Camera Augmented Visual-Inertial SLAM using SE2(3)Based Exact IMU Pre-integration,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 1694–1700

2024
[7]

ROVINS: Robust Omnidirectional Visual Inertial Navigation System,

H. Seok and J. Lim, “ROVINS: Robust Omnidirectional Visual Inertial Navigation System,”IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6225–6232, 2020

2020
[8]

D 2SLAM: Decentralized and Distributed Collaborative Visual-Inertial SLAM System for Aerial Swarm,

H. Xu, P. Liu, X. Chen, and S. Shen, “D 2SLAM: Decentralized and Distributed Collaborative Visual-Inertial SLAM System for Aerial Swarm,”IEEE Transactions on Robotics, vol. 40, pp. 3445–3464, 2024

2024
[9]

PanoAir: A Panoramic Visual-Inertial SLAM with Cross-Time Real-World UA V Dataset,

Y . Wu, X. Zhang, Y . Du, T. Zhang, C. Li, S. Chen, G. Zhang, and X. Xu, “PanoAir: A Panoramic Visual-Inertial SLAM with Cross-Time Real-World UA V Dataset,” apr 2026

2026
[10]

360-VIO: A Robust Visual–Inertial Odometry Using a 360° Camera,

Q. Wu, C. Long, J. Deng, X. Xu, X. Chen, L. Pei, G. Liu, S. Yang, S. Wen, and W. Yu, “360-VIO: A Robust Visual–Inertial Odometry Using a 360° Camera,”IEEE Transactions on Industrial Electronics, vol. 71, no. 9, pp. 11 136–11 145, 2024

2024
[11]

LF- VISLAM: A SLAM Framework for Large Field-of-View Cameras With Negative Imaging Plane on Mobile Agents,

Z. Wang, K. Yang, H. Shi, P. Li, F. Gao, J. Bai, and K. Wang, “LF- VISLAM: A SLAM Framework for Large Field-of-View Cameras With Negative Imaging Plane on Mobile Agents,”IEEE Transactions on Automation Science and Engineering, vol. 21, no. 4, pp. 6321–6335, oct 2024

2024
[12]

Multi-LVI-SAM: A Robust LiDAR-Visual-Inertial Odometry for Multiple Fisheye Cameras,

X. Zhang, K. Huang, J. Zhao, Z. Yuan, and T. Feng, “Multi-LVI-SAM: A Robust LiDAR-Visual-Inertial Odometry for Multiple Fisheye Cameras,” sep 2025

2025
[13]

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM,

C. Campos, R. Elvira, J. J. G ´omez Rodr´ıguez, J. M. M. Montiel, and J. D. Tard ´os, “ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM,”IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1889, 2021

2021
[14]

A General Optimization-based Framework for Local Odometry Estimation with Multiple Sensors

T. Qin, J. Pan, S. Cao, and S. Shen, “A General Optimization-based Framework for Local Odometry Estimation with Multiple Sensors,” arXiv preprint arXiv:1901.03638, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[15]

SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems,

C. Forster, Z. Zhang, M. Gassner, M. Werlberger, and D. Scaramuzza, “SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems,”IEEE Transactions on Robotics, vol. 33, no. 2, pp. 249–265, 2017

2017
[16]

SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System,

Y . Fan, T. Zhao, and G. Wang, “SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System,” in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024, pp. 17 964–17 973

2024
[17]

Direct Sparse Odometry,

J. Engel, V . Koltun, and D. Cremers, “Direct Sparse Odometry,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 3, pp. 611–624, mar 2018

2018
[18]

Design and Evaluation of a Generic Visual SLAM Framework for Multi Camera Systems,

P. Kavetiet al., “Design and Evaluation of a Generic Visual SLAM Framework for Multi Camera Systems,”IEEE Robotics and Automation Letters, vol. 8, no. 11, pp. 7368–7375, 2023

2023
[19]

MCSfM: Multi-Camera-Based Incremen- tal Structure-From-Motion,

H. Cui, X. Gao, and S. Shen, “MCSfM: Multi-Camera-Based Incremen- tal Structure-From-Motion,”IEEE Transactions on Image Processing, vol. 32, pp. 6441–6456, nov 2023

2023
[20]

Robust Visual Odometry Using Rigidly-Bundled Arbitrarily-Arranged Multi-Cameras,

H. Yu, J. Wang, Y . He, W. Yang, and G.-S. Xia, “Robust Visual Odometry Using Rigidly-Bundled Arbitrarily-Arranged Multi-Cameras,” IEEE Robotics and Automation Letters, vol. 10, no. 12, pp. 12 517– 12 524, dec 2025

2025
[21]

cuVSLAM: CUDA accelerated visual odometry and mapping,

A. Korovkoet al., “cuVSLAM: CUDA accelerated visual odometry and mapping,”arXiv preprint arXiv:2506.04359, 2025

work page arXiv 2025
[22]

Balancing the Budget: Feature Selection and Tracking for Multi-Camera Visual-Inertial Odometry,

L. Zhang, D. Wisth, M. Camurri, and M. Fallon, “Balancing the Budget: Feature Selection and Tracking for Multi-Camera Visual-Inertial Odometry,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1182–1189, apr 2022

2022
[23]

A Robust Multi-Stereo Visual-Inertial Odometry Pipeline,

J. Jaekel, J. G. Mangelson, S. Scherer, and M. Kaess, “A Robust Multi-Stereo Visual-Inertial Odometry Pipeline,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV , USA, oct 2020, pp. 4623–4630

2020
[24]

Unified Temporal and Spatial Calibration for Multi-Sensor Systems,

P. Furgale, J. Rehder, and R. Siegwart, “Unified Temporal and Spatial Calibration for Multi-Sensor Systems,” inIEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), 2013, pp. 1280–1286

2013
[25]

The EuRoC Micro Aerial Vehicle Datasets,

M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The EuRoC Micro Aerial Vehicle Datasets,” The International Journal of Robotics Research, vol. 35, no. 10, pp. 1157–1163, 2016

2016
[26]

The TUM VI Benchmark for Evaluating Visual-Inertial Odome- try,

D. Schubert, T. Goll, N. Demmel, V . Usenko, J. St ¨uckler, and D. Cre- mers, “The TUM VI Benchmark for Evaluating Visual-Inertial Odome- try,” in2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1680–1687

2018
[27]

Hilti-Oxford Dataset: A Millimeter-Accurate Benchmark for Simultaneous Localization and Mapping,

L. Zhang, M. Helmberger, L. F. T. Fu, D. Wisth, M. Camurri, D. Scara- muzza, and M. Fallon, “Hilti-Oxford Dataset: A Millimeter-Accurate Benchmark for Simultaneous Localization and Mapping,”IEEE Robotics and Automation Letters, vol. 8, no. 1, pp. 408–415, 2023

2023

[1] [1]

Multi-camera visual SLAM for autonomous navigation of micro aerial vehicles,

S. Yang, S. A. Scherer, X. Yi, and A. Zell, “Multi-camera visual SLAM for autonomous navigation of micro aerial vehicles,”Robotics and Autonomous Systems, vol. 93, pp. 116–134, 2017

2017

[2] [2]

Towards Robust Visual- Inertial Odometry with Multiple Non-Overlapping Monocular Cameras,

Y . He, H. Yu, W. Yang, and S. Scherer, “Towards Robust Visual- Inertial Odometry with Multiple Non-Overlapping Monocular Cameras,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, oct 2022, pp. 9452–9458

2022

[3] [3]

ROVO: Robust Omnidirectional Visual Odometry for Wide-baseline Wide-FOV Camera Systems,

H. Seok and J. Lim, “ROVO: Robust Omnidirectional Visual Odometry for Wide-baseline Wide-FOV Camera Systems,” in2019 International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE, May 2019, pp. 6344–6350

2019

[4] [4]

Panoramic SLAM from a multiple fisheye camera rig,

S. Ji, Z. Qin, J. Shan, and M. Lu, “Panoramic SLAM from a multiple fisheye camera rig,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 159, pp. 169–183, 2020

2020

[5] [5]

MCOV-SLAM: A Multicamera Omnidirectional Visual SLAM Sys- tem,

Y . Yang, M. Pan, D. Tang, T. Wang, Y . Yue, T. Liu, and M. Fu, “MCOV-SLAM: A Multicamera Omnidirectional Visual SLAM Sys- tem,”IEEE/ASME Transactions on Mechatronics, vol. 29, no. 5, pp. 3556–3567, 2024

2024

[6] [6]

MA VIS: Multi-Camera Augmented Visual-Inertial SLAM using SE2(3)Based Exact IMU Pre-integration,

Y . Wang, Y . Ng, I. Sa, ´A. Parra, C. Rodriguez-Opazo, T. Lin, and H. Li, “MA VIS: Multi-Camera Augmented Visual-Inertial SLAM using SE2(3)Based Exact IMU Pre-integration,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 1694–1700

2024

[7] [7]

ROVINS: Robust Omnidirectional Visual Inertial Navigation System,

H. Seok and J. Lim, “ROVINS: Robust Omnidirectional Visual Inertial Navigation System,”IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6225–6232, 2020

2020

[8] [8]

D 2SLAM: Decentralized and Distributed Collaborative Visual-Inertial SLAM System for Aerial Swarm,

H. Xu, P. Liu, X. Chen, and S. Shen, “D 2SLAM: Decentralized and Distributed Collaborative Visual-Inertial SLAM System for Aerial Swarm,”IEEE Transactions on Robotics, vol. 40, pp. 3445–3464, 2024

2024

[9] [9]

PanoAir: A Panoramic Visual-Inertial SLAM with Cross-Time Real-World UA V Dataset,

Y . Wu, X. Zhang, Y . Du, T. Zhang, C. Li, S. Chen, G. Zhang, and X. Xu, “PanoAir: A Panoramic Visual-Inertial SLAM with Cross-Time Real-World UA V Dataset,” apr 2026

2026

[10] [10]

360-VIO: A Robust Visual–Inertial Odometry Using a 360° Camera,

Q. Wu, C. Long, J. Deng, X. Xu, X. Chen, L. Pei, G. Liu, S. Yang, S. Wen, and W. Yu, “360-VIO: A Robust Visual–Inertial Odometry Using a 360° Camera,”IEEE Transactions on Industrial Electronics, vol. 71, no. 9, pp. 11 136–11 145, 2024

2024

[11] [11]

LF- VISLAM: A SLAM Framework for Large Field-of-View Cameras With Negative Imaging Plane on Mobile Agents,

Z. Wang, K. Yang, H. Shi, P. Li, F. Gao, J. Bai, and K. Wang, “LF- VISLAM: A SLAM Framework for Large Field-of-View Cameras With Negative Imaging Plane on Mobile Agents,”IEEE Transactions on Automation Science and Engineering, vol. 21, no. 4, pp. 6321–6335, oct 2024

2024

[12] [12]

Multi-LVI-SAM: A Robust LiDAR-Visual-Inertial Odometry for Multiple Fisheye Cameras,

X. Zhang, K. Huang, J. Zhao, Z. Yuan, and T. Feng, “Multi-LVI-SAM: A Robust LiDAR-Visual-Inertial Odometry for Multiple Fisheye Cameras,” sep 2025

2025

[13] [13]

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM,

C. Campos, R. Elvira, J. J. G ´omez Rodr´ıguez, J. M. M. Montiel, and J. D. Tard ´os, “ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM,”IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1889, 2021

2021

[14] [14]

A General Optimization-based Framework for Local Odometry Estimation with Multiple Sensors

T. Qin, J. Pan, S. Cao, and S. Shen, “A General Optimization-based Framework for Local Odometry Estimation with Multiple Sensors,” arXiv preprint arXiv:1901.03638, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[15] [15]

SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems,

C. Forster, Z. Zhang, M. Gassner, M. Werlberger, and D. Scaramuzza, “SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems,”IEEE Transactions on Robotics, vol. 33, no. 2, pp. 249–265, 2017

2017

[16] [16]

SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System,

Y . Fan, T. Zhao, and G. Wang, “SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System,” in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024, pp. 17 964–17 973

2024

[17] [17]

Direct Sparse Odometry,

J. Engel, V . Koltun, and D. Cremers, “Direct Sparse Odometry,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 3, pp. 611–624, mar 2018

2018

[18] [18]

Design and Evaluation of a Generic Visual SLAM Framework for Multi Camera Systems,

P. Kavetiet al., “Design and Evaluation of a Generic Visual SLAM Framework for Multi Camera Systems,”IEEE Robotics and Automation Letters, vol. 8, no. 11, pp. 7368–7375, 2023

2023

[19] [19]

MCSfM: Multi-Camera-Based Incremen- tal Structure-From-Motion,

H. Cui, X. Gao, and S. Shen, “MCSfM: Multi-Camera-Based Incremen- tal Structure-From-Motion,”IEEE Transactions on Image Processing, vol. 32, pp. 6441–6456, nov 2023

2023

[20] [20]

Robust Visual Odometry Using Rigidly-Bundled Arbitrarily-Arranged Multi-Cameras,

H. Yu, J. Wang, Y . He, W. Yang, and G.-S. Xia, “Robust Visual Odometry Using Rigidly-Bundled Arbitrarily-Arranged Multi-Cameras,” IEEE Robotics and Automation Letters, vol. 10, no. 12, pp. 12 517– 12 524, dec 2025

2025

[21] [21]

cuVSLAM: CUDA accelerated visual odometry and mapping,

A. Korovkoet al., “cuVSLAM: CUDA accelerated visual odometry and mapping,”arXiv preprint arXiv:2506.04359, 2025

work page arXiv 2025

[22] [22]

Balancing the Budget: Feature Selection and Tracking for Multi-Camera Visual-Inertial Odometry,

L. Zhang, D. Wisth, M. Camurri, and M. Fallon, “Balancing the Budget: Feature Selection and Tracking for Multi-Camera Visual-Inertial Odometry,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1182–1189, apr 2022

2022

[23] [23]

A Robust Multi-Stereo Visual-Inertial Odometry Pipeline,

J. Jaekel, J. G. Mangelson, S. Scherer, and M. Kaess, “A Robust Multi-Stereo Visual-Inertial Odometry Pipeline,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV , USA, oct 2020, pp. 4623–4630

2020

[24] [24]

Unified Temporal and Spatial Calibration for Multi-Sensor Systems,

P. Furgale, J. Rehder, and R. Siegwart, “Unified Temporal and Spatial Calibration for Multi-Sensor Systems,” inIEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), 2013, pp. 1280–1286

2013

[25] [25]

The EuRoC Micro Aerial Vehicle Datasets,

M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The EuRoC Micro Aerial Vehicle Datasets,” The International Journal of Robotics Research, vol. 35, no. 10, pp. 1157–1163, 2016

2016

[26] [26]

The TUM VI Benchmark for Evaluating Visual-Inertial Odome- try,

D. Schubert, T. Goll, N. Demmel, V . Usenko, J. St ¨uckler, and D. Cre- mers, “The TUM VI Benchmark for Evaluating Visual-Inertial Odome- try,” in2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1680–1687

2018

[27] [27]

Hilti-Oxford Dataset: A Millimeter-Accurate Benchmark for Simultaneous Localization and Mapping,

L. Zhang, M. Helmberger, L. F. T. Fu, D. Wisth, M. Camurri, D. Scara- muzza, and M. Fallon, “Hilti-Oxford Dataset: A Millimeter-Accurate Benchmark for Simultaneous Localization and Mapping,”IEEE Robotics and Automation Letters, vol. 8, no. 1, pp. 408–415, 2023

2023