A High-accuracy Event-based Underwater SLAM System
Pith reviewed 2026-06-26 21:07 UTC · model grok-4.3
The pith
A structure-aware metric for Time Surfaces combined with two-stage optimization enables the first high-accuracy event-based underwater stereo SLAM.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present the first high-accuracy event-based underwater stereo SLAM system. They design a structure-aware metric for Time Surfaces based on structure tensor coherence and gradients to evaluate structural information density. Optimal TS generation is decoupled into Bayesian Optimization for prior TS before initialization and asynchronous online local searching during tracking. Prior disparity ensures precise data association while a latest-observation-first triangulation mechanism provides stability. The UWE dataset is contributed as a high-quality real-world underwater event benchmark.
What carries the argument
Structure-aware metric for TS based on structure tensor coherence and gradients that evaluates structural information density to select optimal Time Surfaces for underwater SLAM.
If this is right
- The proposed system maintains accuracy despite fluctuating camera velocities in underwater settings.
- Precise matching is achieved in wide stereo baselines with repetitive textures.
- The UWE dataset enables future development and evaluation of underwater event-based methods.
- Competitive performance is demonstrated on both public datasets and the new UWE dataset.
Where Pith is reading between the lines
- The two-stage TS optimization could be adapted to other dynamic environments where event camera velocities vary.
- The structure tensor-based metric might improve TS quality assessment in non-underwater applications as well.
- Open-sourcing the code and data will allow independent verification and extension by the community.
Load-bearing premise
The structure-aware metric based on structure tensor coherence and gradients quantitatively evaluates TS structural information density in a way that enables reliable optimal TS selection for underwater conditions.
What would settle it
Running the system on new underwater trajectories with extreme velocity changes or novel textures where the metric selects poor TS leading to tracking failure or low accuracy.
Figures
read the original abstract
While event cameras offer immense potential for underwater SLAM, existing Time Surface (TS)-based methods prove highly unreliable when deployed underwater. Fluctuating camera velocities severely degrade TS imaging quality, while wide stereo baselines and repetitive underwater textures induce critical matching failures, frequently triggering system failure. To overcome these challenges, we develop the first high-accuracy event-based underwater stereo SLAM system. A structure-aware metric for TS is designed based on structure tensor coherence and gradients to quantitatively evaluate TS structural information density. By decoupling the optimal TS generation into two distinct stages based on system initialization, Bayesian Optimization(BO) first predicts an optimal prior TS sequentially before initialization while we set an asynchronous online local searching method periodically to obtain appropriate TS in real-time during the tracking stage. We use the prior disparity to guarantee precise data association and "latest-observation-first'' triangulation mechanism to realize stable triangulation. As a benchmark for these solutions and a resource for the community, we also contribute UWE, the first high-quality real-world underwater event dataset containing variable camera motions, complex textures and different trajectory features. Extensive evaluations on public datasets and UWE show the competitive accuracy performance of the proposed SLAM system compared to the state-of-the-art event-based method. The code and data will be open-sourced.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to introduce the first high-accuracy event-based underwater stereo SLAM system. It addresses velocity-induced degradation of Time Surfaces (TS) and stereo matching failures via a structure-aware metric based on structure tensor coherence and gradients to quantify TS structural information density; a two-stage TS optimization using Bayesian Optimization (BO) for prior TS prediction before initialization and asynchronous online local search during tracking; prior-disparity association for data association; and a latest-observation-first triangulation mechanism. The work also contributes the UWE dataset (first high-quality real-world underwater event dataset with variable motions, complex textures, and diverse trajectories) and reports competitive accuracy against state-of-the-art event-based methods on public datasets and UWE, with code and data to be open-sourced.
Significance. If the accuracy claims and metric effectiveness are substantiated, the work would represent a meaningful advance for event-based perception in challenging underwater environments, where standard cameras struggle with lighting and turbidity. The staged BO approach for TS adaptation, combined with the new UWE benchmark and open-sourcing commitment, provides concrete resources that could accelerate follow-on research in marine robotics and event vision.
major comments (2)
- [metric design (abstract and §3)] The central claim that the structure-aware metric 'quantitatively evaluates TS structural information density in a way that enables reliable optimal TS selection' (abstract) rests on the structure tensor coherence and gradients without reported ablation against simpler alternatives (e.g., event density or contrast maximization). This is load-bearing for the TS optimization pipeline and the overall accuracy improvement; a quantitative comparison showing superior correlation with SLAM performance is needed.
- [TS generation stages (§4)] The two-stage BO + online search procedure is presented as decoupling initialization from tracking, yet no derivation or pseudocode details the objective function, acquisition function, or how the asynchronous local search avoids the velocity-induced degradation that the metric is meant to solve. Without these, it is unclear whether the reported accuracy gains are attributable to the method or to dataset-specific tuning.
minor comments (2)
- [abstract] The abstract states 'extensive evaluations ... show the competitive accuracy performance' but provides no numerical values, error metrics, or table references; these should be summarized with key numbers (e.g., ATE on UWE sequences) already in the abstract or introduction.
- [system overview] Notation for 'prior disparity' and 'latest-observation-first' triangulation is introduced without a forward reference to the relevant equations or algorithm box; adding one would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will incorporate the requested clarifications and comparisons in the revised manuscript.
read point-by-point responses
-
Referee: [metric design (abstract and §3)] The central claim that the structure-aware metric 'quantitatively evaluates TS structural information density in a way that enables reliable optimal TS selection' (abstract) rests on the structure tensor coherence and gradients without reported ablation against simpler alternatives (e.g., event density or contrast maximization). This is load-bearing for the TS optimization pipeline and the overall accuracy improvement; a quantitative comparison showing superior correlation with SLAM performance is needed.
Authors: We agree that an explicit ablation would strengthen the central claim. In the revision we will add a quantitative comparison of the structure-aware metric against event density and contrast maximization, reporting Pearson/Spearman correlations of each metric with downstream SLAM accuracy (ATE/RPE) on both public and UWE sequences. This will directly substantiate that the structure-tensor formulation yields superior TS selection. revision: yes
-
Referee: [TS generation stages (§4)] The two-stage BO + online search procedure is presented as decoupling initialization from tracking, yet no derivation or pseudocode details the objective function, acquisition function, or how the asynchronous local search avoids the velocity-induced degradation that the metric is meant to solve. Without these, it is unclear whether the reported accuracy gains are attributable to the method or to dataset-specific tuning.
Authors: We accept that additional formalization is required. The revised manuscript will provide (i) the explicit objective function maximized by BO, (ii) the acquisition function employed, (iii) pseudocode for both the sequential prior-TS stage and the asynchronous local search, and (iv) an explanation of how the metric-driven search interval prevents velocity-induced TS degradation. Experiments on multiple independent datasets already indicate the gains are not dataset-specific; the added material will make this transparent. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's derivation chain relies on standard, externally defined components: a structure-aware metric constructed from the well-known structure tensor coherence and gradients, Bayesian optimization applied to select TS parameters in two stages, prior-disparity association, and latest-observation triangulation. None of these steps reduce by the paper's own equations to a fitted input renamed as a prediction, nor do they depend on self-citations whose content is unverified or imported as a uniqueness theorem. The new UWE dataset and evaluations against public data provide independent benchmarks. The abstract and described pipeline contain no self-definitional loops, ansatz smuggling, or renaming of known results, rendering the central SLAM claims self-contained against external methods and data.
Axiom & Free-Parameter Ledger
free parameters (1)
- TS generation parameters
axioms (1)
- domain assumption Structure tensor coherence and gradients measure structural information density in time surfaces for underwater event data
Reference graph
Works this paper leans on
-
[1]
Deep event visual odometry,
S. Klenk, M. Motzet, L. Koestler, and D. Cremers, “Deep event visual odometry,” in2024 International conference on 3D vision (3DV), pp. 739–749, IEEE, 2024
2024
-
[2]
Deio: Deep event inertial odometry,
W. Guan, F. Lin, P. Chen, and P. Lu, “Deio: Deep event inertial odometry,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4606–4615, 2025
2025
-
[3]
Esvio: Event-based stereo visual inertial odometry,
P. Chen, W. Guan, and P. Lu, “Esvio: Event-based stereo visual inertial odometry,”IEEE Robotics and Automation Letters, vol. 8, no. 6, pp. 3661–3668, 2023
2023
-
[4]
Esvo2: Direct visual-inertial odometry with stereo event cameras,
J. Niu, S. Zhong, X. Lu, S. Shen, G. Gallego, and Y . Zhou, “Esvo2: Direct visual-inertial odometry with stereo event cameras,”IEEE Transactions on Robotics, 2025
2025
-
[5]
Exposure control using bayesian optimization based on entropy weighted image gradient,
J. Kim, Y . Cho, and A. Kim, “Exposure control using bayesian optimization based on entropy weighted image gradient,” in2018 IEEE International conference on robotics and automation (ICRA), pp. 857– 864, IEEE, 2018
2018
-
[6]
Practical bayesian optimization of machine learning algorithms,
J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,”Advances in neural information processing systems, vol. 25, 2012
2012
-
[7]
Lifetime estimation of events from dynamic vision sensors,
E. Mueggler, C. Forster, N. Baumli, G. Gallego, and D. Scaramuzza, “Lifetime estimation of events from dynamic vision sensors,” in2015 IEEE international conference on Robotics and Automation (ICRA), pp. 4874–4881, IEEE, 2015
2015
-
[8]
Speed invariant time surface for learning to detect corner points with event-based cameras,
J. Manderscheid, A. Sironi, N. Bourdis, D. Migliore, and V . Lepetit, “Speed invariant time surface for learning to detect corner points with event-based cameras,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10245–10254, 2019
2019
-
[9]
Comparing repre- sentations in tracking for event camera-based slam,
J. Jiao, H. Huang, L. Li, Z. He, Y . Zhu, and M. Liu, “Comparing repre- sentations in tracking for event camera-based slam,” inProceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp. 1369–1376, 2021
2021
-
[10]
T-esvo: improved event- based stereo visual odometry via adaptive time-surface and truncated signed distance function,
Z. Liu, D. Shi, R. Li, Y . Zhang, and S. Yang, “T-esvo: improved event- based stereo visual odometry via adaptive time-surface and truncated signed distance function,”Advanced Intelligent Systems, vol. 5, no. 9, p. 2300027, 2023
2023
-
[11]
Underwater robot self-localization method using tightly coupled events, images, inertial, and acoustic fusion,
J. Fan, X. Liu, Y . Ou, P. Zhang, C. Zhou, and Z. Hou, “Underwater robot self-localization method using tightly coupled events, images, inertial, and acoustic fusion,”IEEE Transactions on Industrial Elec- tronics, vol. 72, no. 5, pp. 5126–5135, 2024
2024
-
[12]
Hkcoral: Benchmark for dense coral growth form segmentation in the wild,
Z. Ziqiang, L. Haixin, H. W. Fong, H. W. Yue, P. Y . C. Apple, and Y . Sai-Kit, “Hkcoral: Benchmark for dense coral growth form segmentation in the wild,”IEEE J. Ocean. Eng.(JOE), 2024
2024
-
[13]
Semantic segmentation of underwater imagery: Dataset and benchmark,
M. J. Islam, C. Edge, Y . Xiao, P. Luo, M. Mehtaz, C. Morse, S. S. Enan, and J. Sattar, “Semantic segmentation of underwater imagery: Dataset and benchmark,” in2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1769–1776, IEEE, 2020
2020
-
[14]
Ueof: A bench- mark dataset for underwater event-based optical flow,
N. Truong, P. P. Karmokar, and W. J. Beksi, “Ueof: A bench- mark dataset for underwater event-based optical flow,”arXiv preprint arXiv:2601.10054, 2026
arXiv 2026
-
[15]
Sonar visual inertial slam of underwater structures,
S. Rahman, A. Q. Li, and I. Rekleitis, “Sonar visual inertial slam of underwater structures,” in2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 5190–5196, IEEE, 2018
2018
-
[16]
Flsea: Underwater visual-inertial and stereo-vision forward-looking datasets,
Y . Randall and T. Treibitz, “Flsea: Underwater visual-inertial and stereo-vision forward-looking datasets,”arXiv preprint arXiv:2302.12772, 2023
arXiv 2023
-
[17]
Aquaticvision: Benchmarking visual slam in underwater environment with events and frames,
Y . Peng, Y . Hong, Z. Hong, A. P.-Y . Chui, and J. Wu, “Aquaticvision: Benchmarking visual slam in underwater environment with events and frames,”arXiv preprint arXiv:2505.03448, 2025
arXiv 2025
-
[18]
An iterative image registration technique with an application to stereo vision,
B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” inIJCAI’81: 7th international joint conference on Artificial intelligence, vol. 2, pp. 674–679, 1981
1981
-
[19]
Hots: A hierarchy of event-based time-surfaces for pattern recognition,
X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benos- man, “Hots: A hierarchy of event-based time-surfaces for pattern recognition,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 7, pp. 1346–1359, 2017
2017
-
[20]
A scheme for coherence-enhancing dif- fusion filtering with optimized rotation invariance,
J. Weickert and H. Scharr, “A scheme for coherence-enhancing dif- fusion filtering with optimized rotation invariance,”Journal of Visual Communication and Image Representation, vol. 13, no. 1-2, pp. 103– 118, 2002
2002
-
[21]
Gaussian processes for machine learning,
M. Seeger, “Gaussian processes for machine learning,”International journal of neural systems, vol. 14, no. 02, pp. 69–106, 2004
2004
-
[22]
Efficient global optimiza- tion of expensive black-box functions,
D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimiza- tion of expensive black-box functions,”Journal of Global optimization, vol. 13, no. 4, pp. 455–492, 1998
1998
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.