pith. machine review for the scientific record. sign in

arxiv: 2604.13244 · v1 · submitted 2026-04-14 · 💻 cs.CV · cs.AI· cs.RO

Recognition: unknown

4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:15 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.RO
keywords maritime computer visionbenchmark challengesreal-time performanceevaluation protocolsdatasetsmethod trendsCVPR workshopembedded feasibility
0
0 comments X

The pith

The MaCVi 2026 workshop report defines five benchmark challenges for maritime computer vision that test both predictive accuracy and real-time embedded performance using dedicated datasets and protocols.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a set of standardized benchmarks by detailing the challenge setups, evaluation protocols, datasets, and benchmark tracks for maritime computer vision tasks. It presents quantitative results from participating teams, qualitative comparisons of methods, and cross-challenge analyses of emerging trends while including technical reports from top performers to share design choices and lessons. A sympathetic reader would care because these resources create a common reference point for measuring progress in a domain where computer vision supports navigation, surveillance, and safety at sea, with explicit attention to constraints of embedded hardware. The report positions the benchmarks as tools that allow direct comparison of approaches across multiple tasks.

Core claim

By organizing these five challenges the workshop supplies datasets, leaderboards, and evaluation rules that enable quantitative assessment of computer vision algorithms under maritime conditions while requiring both high accuracy and real-time operation on embedded platforms, and the report uses the collected submissions to identify performance patterns and practical implementation insights.

What carries the argument

The five benchmark tracks, each defined by its own dataset, evaluation protocol, and combined accuracy-plus-real-time metric that together allow systematic comparison of methods across distinct maritime vision problems.

If this is right

  • Quantitative leaderboards allow direct ranking of methods on accuracy and speed for each task.
  • Technical reports from leading teams reveal repeatable design patterns that improve both accuracy and embedded performance.
  • Cross-challenge trend analysis identifies method components that succeed across multiple maritime scenarios.
  • Public release of datasets and leaderboards at macvi.org enables ongoing participation and extension by the community.
  • Lessons on practical trade-offs guide development of algorithms suitable for onboard vessel processing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future work could test whether methods tuned on these benchmarks maintain their relative ranking when applied to operational maritime systems with different sensors or weather distributions.
  • The real-time emphasis may steer research toward lightweight architectures that balance accuracy with power and latency limits typical of maritime hardware.
  • Adding challenges for rarer events such as night-time navigation or heavy fog could expose gaps that current tracks leave unmeasured.

Load-bearing premise

The assumption that the five chosen benchmark tasks, datasets, and metrics sufficiently represent the diversity and difficulty of real-world maritime computer vision problems.

What would settle it

A set of new maritime video sequences collected under conditions absent from the challenge datasets on which the current top-ranked methods show substantially lower accuracy or fail real-time constraints would indicate the benchmarks do not generalize.

Figures

Figures reproduced from arXiv: 2604.13244 by Akib Mashrur, Alberto Quattrini Li, Andreas Michel, Arnold Wiliem, Arpita Vats, Arpit Vaishya, Arshad Jamal, Benjamin Kiefer, Bettina Felten, Borja Carrillo Perez, Chun-Ming Tsai, Dimitris Gahtidis, Dominik Hildebrand, Doyeon Lee, Ersin Kaya, Gorthi Rama Krishna Sai Subrahmanyam, Hansol Kim, Hyewon Chun, Ivan Martinovi\'c, Janez Pers, Jan Lukas Augustin, Jannick Kuester, Jannik Sheikh, Jeeyeon Jeon, Jemo Maeng, Jiahui Wang, Jon Muhovi\v{c}, Jose Mateus Raitz Persch, Josip \v{S}ari\'c, Jun-Wei Hsieh, Justin Davis, Kyoobin Lee, Licheng Jiao, Lingling Li, Mahmut Karaaslan, Martin Weinmann, Matej Kristan, Matija Ter\v{s}ek, Mehmet E. Belviranli, Ming-Ching Chang, Mingi Jeong, Philipp Gorczak, Rafia Rahim, Rahul Harsha Cheppally, Sangmin Park, Sangmun Lee, Seongju Lee, Steve Xie, Tze-Hsiang Tang, Vinayak Nageli, Wolfgang Gross, Wonwoo Jo, Xu Liu, Yuan Feng, Yusi Cao.

Figure 1
Figure 1. Figure 1: Overview of MaCVi @ CVPR 2026 challenges, including Vision-to-Chart Data Association, Thermal Object Detection Maritime [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative detection examples from the Vision-to-Chart Data Association challenge. Each panel shows the overlay of predicted and ground-truth bounding boxes and predicted index of associated chart marker [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Per-class and per-size AP breakdown for the Thermal Object Detection challenge. Left: AP for vessel vs. navigational object classes. Right: AP at COCO size thresholds (small / medium / large). Vessel detection is consistently easier than navigational object detection across all methods. Small object AP remains the main bottleneck. ranked methods outperform the organizer baseline, with the first two methods… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative detection examples from the Thermal Object Detection challenge. Each row shows one test image with ground truth (green) and predictions from the top three methods and baseline. Columns left to right: ground truth, 1st place, 2nd place, 3rd place, baseline. Predictions are filtered at confidence ≥ 0.3. V = vessel, N = navigational object. demonstrates that even modest amounts of unlabeled data c… view at source ↗
Figure 5
Figure 5. Figure 5: Precision–Recall curves for the Thermal Object Detec￾tion challenge. Curves are shown for each method at IoU=0.50. line contributed a further +0.012 AP. These domain-specific adaptations—annotation refinement and geometric filtering— are complementary to the ensemble and semi-supervised strategies employed by the other top teams. 3.2.3. Discussion and Challenge Winners The winners of the Thermal Object Det… view at source ↗
Figure 6
Figure 6. Figure 6: Panoptic quality stratified according to scene attributes. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Dynamic obstacle detection rate stratified according to [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative results of the top-performing methods in the LaRS panoptic segmentation challenge. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Detection rate and FP as a function of obstacle size for [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Detection F1 under challenging visual conditions for [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Detection F1 across scene environments and reflection [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Training pipeline. WS = WaterScenes, RB = ROSEBUD, [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Overview of GatedMemorySAM. Each modality is en [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗
read the original abstract

The 4th Workshop on Maritime Computer Vision (MaCVi) is organized as part of CVPR 2026. This edition features five benchmark challenges with emphasis on both predictive accuracy and embedded real-time feasibility. This report summarizes the MaCVi 2026 challenge setup, evaluation protocols, datasets, and benchmark tracks, and presents quantitative results, qualitative comparisons, and cross-challenge analyses of emerging method trends. We also include technical reports from top-performing teams to highlight practical design choices and lessons learned across the benchmark suite. Datasets, leaderboards, and challenge resources are available at https://macvi.org/workshop/cvpr26.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 0 minor

Summary. The manuscript is an overview report of the 4th Workshop on Maritime Computer Vision (MaCVi) at CVPR 2026. It describes the setup of five benchmark challenges emphasizing both accuracy and embedded real-time performance, details the evaluation protocols and datasets, reports quantitative results and leaderboards from participant submissions, provides qualitative comparisons and cross-challenge trend analyses, and includes technical reports from top teams. All resources are linked publicly at https://macvi.org/workshop/cvpr26.

Significance. As a descriptive archival document of organized challenges, this report is significant for the maritime computer vision community because it establishes public benchmarks with explicit real-time constraints, documents quantitative outcomes tied directly to stated protocols, and surfaces practical design lessons from top submissions. The public leaderboards and datasets support reproducibility and future work; the cross-challenge analyses help identify method trends without introducing new scientific claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review of our manuscript and for recommending acceptance. We appreciate the recognition of the report's value as an archival document for the maritime computer vision community, including its documentation of benchmarks, protocols, results, and cross-challenge analyses.

Circularity Check

0 steps flagged

Descriptive workshop report with no derivations or predictions

full rationale

The manuscript is an archival overview of the MaCVi 2026 challenge: it documents benchmark setups, datasets, evaluation protocols, participant results, and method trends from externally run submissions. No equations, parameter fittings, predictions, or load-bearing derivations appear anywhere in the text. All quantitative results are reported from public leaderboards and team submissions rather than being recomputed or fitted within the paper. Self-citations, if present, are limited to prior workshop editions and do not justify any central claim. The document is self-contained as descriptive documentation against external benchmarks and resources.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a workshop overview report with no mathematical derivations, fitted parameters, or postulated entities; all content rests on the existence of the organized challenges and public datasets.

pith-pipeline@v0.9.0 · 5676 in / 1083 out tokens · 51578 ms · 2026-05-10T16:15:05.260851+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 11 canonical work pages · 4 internal anchors

  1. [1]

    Domenico Bloisi. Background modeling and foreground detection for maritime video surveillance.Chapter in Hand- book on Background Modeling and Foreground Detection for Video Surveillance: Traditional and Recent Approaches, Implementations, Benchmarking and Evaluation, Chapman and Hall/CRC, pages 14–1, 2014. 2

  2. [2]

    WaSR–A Water Segmenta- tion and Refinement Maritime Obstacle Detection Network

    Borja Bovcon and Matej Kristan. WaSR–A Water Segmenta- tion and Refinement Maritime Obstacle Detection Network. IEEE Transactions on Cybernetics, pages 1–14, 2021. 2

  3. [3]

    The mastr1325 dataset for training deep usv obstacle detection models

    Borja Bovcon, Jon Muhoviˇc, Janez Perˇs, and Matej Kristan. The mastr1325 dataset for training deep usv obstacle detection models. InInt. Conf. Intell. Robots and Systems, pages 3431–

  4. [4]

    Mixed pseudo labels for semi-supervised object detection.arXiv preprint arXiv:2312.07006, 2023

    Zeming Chen, Wenwei Zhang, Xinjiang Wang, Kai Chen, and Zhi Wang. Mixed pseudo labels for semi-supervised object detection.arXiv preprint arXiv:2312.07006, 2023. 3, 10

  5. [5]

    Schwing, Alexan- der Kirillov, and Rohit Girdhar

    Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexan- der Kirillov, and Rohit Girdhar. Masked-attention mask trans- former for universal image segmentation. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 5, 11

  6. [6]

    MMSegmenta- tion: Openmmlab semantic segmentation toolbox and benchmark

    MMSegmentation Contributors. MMSegmenta- tion: Openmmlab semantic segmentation toolbox and benchmark. https : / / github.com / open - mmlab/mmsegmentation, 2020. 14

  7. [7]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 6, 16

  8. [8]

    Rsuigm: Realistic synthetic underwater image gener- ation with image formation model.ACM Trans

    Chaitra Desai, Sujay Benur, Ujwala Patil, and Uma Mude- nagudi. Rsuigm: Realistic synthetic underwater image gener- ation with image formation model.ACM Trans. Multimedia Comput. Commun. Appl., 21(1), Dec. 2024. 2

  9. [9]

    Detection of bodies in maritime rescue operations using unmanned aerial vehicles with multispectral cameras.Journal of Field Robotics, 36(4):782–796, 2019

    Antonio-Javier Gallego, Antonio Pertusa, Pablo Gil, and Robert B Fisher. Detection of bodies in maritime rescue operations using unmanned aerial vehicles with multispectral cameras.Journal of Field Robotics, 36(4):782–796, 2019. 2

  10. [10]

    YOLOX: Exceeding YOLO Series in 2021

    Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. Yolox: Exceeding yolo series in 2021.arXiv preprint arXiv:2107.08430, 2021. 14

  11. [11]

    Cubuk, Quoc V

    Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung- Yi Lin, Ekin D. Cubuk, Quoc V . Le, and Barret Zoph. Simple copy-paste is a strong data augmentation method for instance segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2918–2928, 2021. 14

  12. [12]

    Maritime collision avoidance dataset germany, english channel, and the netherlands

    Philipp Gorczak, Thomas L ¨ubcke, Martin Portier, and Hel- mut Schmid. Maritime collision avoidance dataset germany, english channel, and the netherlands. InJournal of Physics: Conference Series, volume 3123, page 012024. IOP Publish- ing, 2025. 3

  13. [13]

    Lvis: A dataset for large vocabulary instance segmentation

    Agrim Gupta, Piotr Dollar, and Ross Girshick. Lvis: A dataset for large vocabulary instance segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5356–5364, 2019. 11

  14. [14]

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. InInternational Conference on Learning Representations (ICLR), 2022. 15

  15. [15]

    arXiv preprint arXiv:2509.20787

    Shihua Huang, Yongjie Hou, Longfei Liu, Xuanlong Yu, and Xi Shen. Real-time object detection meets dinov3.arXiv preprint arXiv:2509.20787, 2025. 12

  16. [16]

    Ultralytics YOLO, Jan

    Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics YOLO, Jan. 2023. 4, 11

  17. [17]

    Vessel detection and classification from spaceborne optical images: A literature survey.Remote sensing of environment, 207:1–26,

    Urˇska Kanjir, Harm Greidanus, and Kri ˇstof O ˇstir. Vessel detection and classification from spaceborne optical images: A literature survey.Remote sensing of environment, 207:1–26,

  18. [18]

    Your vit is secretly an image segmentation model

    Tommie Kerssies, Niccolo Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan De Geus. Your vit is secretly an image segmentation model. InProceedings of the computer vi- sion and pattern recognition conference, pages 25303–25313,

  19. [19]

    Real-time radar–vision association via monocular distance estimation

    Benjamin Kiefer, Dominik Hildebrand, Rafia Rahim, Mahmut Karaaslan, Michael DeFilippo, Ersin Kaya, and Andreas Zell. Real-time radar–vision association via monocular distance estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2026. 2

  20. [20]

    Benjamin Kiefer, Matej Kristan, Janez Per ˇs, Lojze ˇZust, Fabio Poiesi, Fabio Andrade, Alexandre Bernardino, Matthew Dawkins, Jenni Raitoharju, Yitong Quan, Adem Atmaca, Ti- mon H¨ofer, Qiming Zhang, Yufei Xu, Jing Zhang, Dacheng Tao, Lars Sommer, Raphael Spraul, Hangyue Zhao, Hongpu Zhang, Yanyun Zhao, Jan Lukas Augustin, Eui-ik Jeon, Impyeong Lee, Luca...

  21. [21]

    Leveraging synthetic data in object detection on unmanned aerial vehicles

    Benjamin Kiefer, David Ott, and Andreas Zell. Leveraging synthetic data in object detection on unmanned aerial vehicles. arXiv preprint arXiv:2112.12252, 2021. 2

  22. [22]

    Approx- imate supervised object distance estimation on unmanned surface vehicles, 2025

    Benjamin Kiefer, Yitong Quan, and Andreas Zell. Approx- imate supervised object distance estimation on unmanned surface vehicles, 2025. 2

  23. [23]

    2nd workshop on maritime computer vision (macvi) 2024: Challenge results

    Benjamin Kiefer, Lojze ˇZust, Matej Kristan, Janez Per ˇs, Matija Terˇsek, Arnold Wiliem, Martin Messmer, Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Heng-Cheng Kuo, Jie Mei, Jenq-Neng Hwang, Daniel Stadler, Lars Sommer, Kaer Huang, Aiguo Zheng, Weitu Chong, Kanokphan Lert- niphonphan, Jun Xie, Feng Chen, Jian Li, Zhepeng Wang, Luca Zedda, Andrea Lod...

  24. [24]

    3rd workshop on maritime computer vision (macvi) 2025: Challenge results

    Benjamin Kiefer, Lojze Zust, Matej Kristan, Janez Pers, Matija Tersek, Uma Mudenagudi, Chaitra Desai, Arnold Wil- iem, Marten Kreis, Nikhil Akalwadi, Zhiqiang Zhong, Zhe Zhang, Sujie Liu, Xuran Chen, Yang Yang, Matej Fabijanic, Fausto Ferreira, Seongju Lee, Shanliang Yao, Himanshu Ku- mar, Aurelius Marcus, Gregor Novak, Yuan Feng, Annie Cheng, Thien Nguye...

  25. [25]

    Panoptic segmentation

    Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. Panoptic segmentation. InProceed- ings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2019-June, pages 9396–9405. IEEE Computer Society, June 2019. 5

  26. [26]

    Real-time fusion of visual and chart data for enhanced maritime vision, 2025

    Marten Kreis and Benjamin Kiefer. Real-time fusion of visual and chart data for enhanced maritime vision, 2025. 2

  27. [27]

    A novel performance evaluation methodology for single-target trackers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(11):2137– 2155, 2016

    Matej Kristan, Jiri Matas, Aleˇs Leonardis, Tomas V oj´ı˜r, Ro- man Pflugfelder, Gustavo Fern´andez, Georg Nebehay, Fatih Porikli, and Luka ˇCehovin. A novel performance evaluation methodology for single-target trackers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(11):2137– 2155, 2016. 2

  28. [28]

    ROSEBUD: A deep fluvial segmentation dataset for monocular vision-based river navigation and ob- stacle avoidance.Sensors, 22(13):4681, 2022

    Reeve Lambert, Jalil Chavez-Galaviz, Jianwen Li, and Nina Mahmoudian. ROSEBUD: A deep fluvial segmentation dataset for monocular vision-based river navigation and ob- stacle avoidance.Sensors, 22(13):4681, 2022. 14

  29. [29]

    Ni, and Heung-Yeung Shum

    Feng Li, Hao Zhang, Huaizhe xu, Shilong Liu, Lei Zhang, Lionel M. Ni, and Heung-Yeung Shum. Mask DINO: To- wards A Unified Transformer-based Framework for Object Detection and Segmentation, Dec. 2022. 12

  30. [30]

    Exploring plain vision transformer backbones for object de- tection

    Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. Exploring plain vision transformer backbones for object de- tection. InEuropean conference on computer vision, pages 280–296. Springer, 2022. 11

  31. [31]

    C. Liao, X. Zheng, Y . Lyu, H. Xue, Y . Cao, J. Wang, K. Yang, and X. Hu. Memorysam: Memorize modalities and semantics with segment anything model 2 for multi-modal semantic segmentation.arXiv preprint arXiv:2503.06700, 2025. 15

  32. [32]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021. 12, 16

  33. [33]

    A convnet for the 2020s

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11976–11986, 2022. 7, 13

  34. [34]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 16

  35. [35]

    Decoupled weight de- cay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learning Representations, 2019. 12

  36. [36]

    Luxonis. OAK4-D. https://docs.luxonis.com/ hardware/products/OAK%204%20D . Luxonis doc- umentation. Accessed: 2026-04-08. 6

  37. [37]

    Rtmdet: An empirical study of designing real-time object detectors.arXiv preprint arXiv:2212.07784,

    Chengqi Lyu, Wenwei Zhang, Haian Huang, Yue Zhou, Yudong Wang, Yanyi Liu, Shilong Zhang, and Kai Chen. Rtmdet: An empirical study of designing real-time object detectors.arXiv preprint arXiv:2212.07784, 2022. 3, 10

  38. [38]

    Dustnet++: Deep learning-based visual regression for dust density estimation

    Andreas Michel, Martin Weinmann, Jannick Kuester, Faisal Alnasser, Tomas Gomez, Mark Falvey, Rainer Schmitz, Wolf- gang Middelmann, and Stefan Hinz. Dustnet++: Deep learning-based visual regression for dust density estimation. International Journal of Computer Vision, 133(7):4220–4244,

  39. [39]

    Dustnet: Attention to dust

    Andreas Michel, Martin Weinmann, Fabian Schenkel, Tomas Gomez, Mark Falvey, Rainer Schmitz, Wolfgang Middel- mann, and Stefan Hinz. Dustnet: Attention to dust. InDAGM German Conference on Pattern Recognition, pages 211–226. Springer, 2023. 16

  40. [40]

    MULTIAQUA: A multimodal maritime dataset and robust training strategies for multimodal semantic segmentation, 2025

    Jon Muhoviˇc and Janez Perˇs. MULTIAQUA: A multimodal maritime dataset and robust training strategies for multimodal semantic segmentation, 2025. 14

  41. [41]

    Dinov2: Learning robust visual features without supervision

    Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. Transactions on Machine Learning Research, 2023. 11

  42. [42]

    Are object detection assessment criteria ready for maritime computer vision?IEEE Transactions on Intelligent Trans- portation Systems, 21(12):5295–5304, 2019

    Dilip K Prasad, Huixu Dong, Deepu Rajan, and Chai Quek. Are object detection assessment criteria ready for maritime computer vision?IEEE Transactions on Intelligent Trans- portation Systems, 21(12):5295–5304, 2019. 2

  43. [43]

    Puigcerver, C

    J. Puigcerver, C. Riquelme, B. Mustafa, and N. Houlsby. From sparse to soft mixtures of experts. InInternational Conference on Learning Representations (ICLR), 2024. 15

  44. [44]

    N. Ravi, V . Gabber, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolber, L. Gustafson, E. Mintun, J. Pan, K. V . Alwala, N. Carion, C.-Y . Wu, R. Girshick, P. Doll´ar, and C. Feichtenhofer. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024. 15

  45. [45]

    arXiv preprint arXiv:2104.10972 , year=

    Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, and Lihi Zelnik- Manor. Imagenet-21k pretraining for the masses.arXiv preprint arXiv:2104.10972, 2021. 12

  46. [46]

    Rf-detr: Neural architecture search for real-time detection transformers

    Isaac Robinson, Peter Robicheaux, Matvei Popov, Deva Ra- manan, and Neehar Peri. Rf-detr: Neural architecture search for real-time detection transformers. InICLR, 2026. 3, 10, 11, 12

  47. [47]

    DINOv3

    Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha¨el Ramamonjisoa, et al. Di- nov3.arXiv preprint arXiv:2508.10104, 2025. 6, 7, 11, 12, 13

  48. [48]

    Weighted boxes fusion: Ensembling boxes from different object detection models.Image and Vision Computing, 107:104117, 2021

    Roman Solovyev, Weimin Wang, and Tatiana Gabruseva. Weighted boxes fusion: Ensembling boxes from different object detection models.Image and Vision Computing, 107:104117, 2021. 3, 10, 11

  49. [49]

    Seadronessee: A maritime benchmark for detecting humans in open water

    Leon Amadeus Varga, Benjamin Kiefer, Martin Messmer, and Andreas Zell. Seadronessee: A maritime benchmark for detecting humans in open water. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2260–2270, 2022. 2

  50. [50]

    Rsos- net: Real-time surface obstacle segmentation network for uncrewed waterborne vehicles.IEEE Transactions on Intelli- gent Transportation Systems, 27(1):1052–1065, 2026

    Ning Wang, Yuan Feng, Lixin Tian, and Yi Wei. Rsos- net: Real-time surface obstacle segmentation network for uncrewed waterborne vehicles.IEEE Transactions on Intelli- gent Transportation Systems, 27(1):1052–1065, 2026. 7, 13, 14

  51. [51]

    Bhattacharyya

    Jiacong Xu, Zixiang Xiong, and Shankar P. Bhattacharyya. PIDNet: A real-time semantic segmentation network inspired by PID controllers. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 19529–19539, 2023. 13

  52. [52]

    X. Xu, J. Yang, W. Shi, S. Ding, L. Luo, and J. Liu. Physaug: A physical-guided and frequency-based data augmentation for single-domain generalized object detection. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 21815–21823, 2025. 15

  53. [53]

    Waterscenes: A multi-task 4d radar-camera fusion dataset and benchmarks for autonomous driving on water surfaces.IEEE Transactions on Intelligent Transportation Systems, 2024

    Shanliang Yao, Runwei Guan, Zhaodong Wu, Yi Ni, Zile Huang, Ryan Wen Liu, Yong Yue, Weiping Ding, Eng Gee Lim, Hyungjoon Seo, et al. Waterscenes: A multi-task 4d radar-camera fusion dataset and benchmarks for autonomous driving on water surfaces.IEEE Transactions on Intelligent Transportation Systems, 2024. 14

  54. [54]

    Dino: Detr with improved denoising anchor boxes for end-to-end object detection

    Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M Ni, and Heung-Yeung Shum. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. InICLR, 2023. 3, 10

  55. [55]

    Dense distinct query for end-to-end object detection

    Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Chengqi Lyu, Wenwei Zhang, Ping Luo, and Kai Chen. Dense distinct query for end-to-end object detection. InCVPR, 2023. 3, 10

  56. [56]

    Detrs with col- laborative hybrid assignments training

    Zhuofan Zong, Guanglu Song, and Yu Liu. Detrs with col- laborative hybrid assignments training. InICCV, 2023. 3, 10

  57. [57]

    Contrast limited adaptive histogram equal- ization

    Karel Zuiderveld. Contrast limited adaptive histogram equal- ization. InGraphics gems IV, pages 474–485. 1994. 16

  58. [58]

    Learning maritime obstacle detection from weak annotations by scaffolding

    Lojze ˇZust and Matej Kristan. Learning maritime obstacle detection from weak annotations by scaffolding. InProceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 955–964, 2022. 2

  59. [59]

    PanSR: An object-centric mask transformer for panoptic segmentation.arXiv preprint arXiv:2412.10589, 2024

    Lojze ˇZust and Matej Kristan. PanSR: An object-centric mask transformer for panoptic segmentation.arXiv preprint arXiv:2412.10589, 2024. 5, 6, 7

  60. [60]

    LaRS: A diverse panoptic maritime obstacle detection dataset and benchmark

    Lojze ˇZust, Janez Perˇs, and Matej Kristan. LaRS: A diverse panoptic maritime obstacle detection dataset and benchmark. InInternational Conference on Computer Vision (ICCV),