pith. sign in

arxiv: 2605.15421 · v1 · pith:YZ5ALXTVnew · submitted 2026-05-14 · 💻 cs.CV

U-SEG: Uncertainty in SEGmentation -- A systematic multi-variable exploration

Pith reviewed 2026-05-19 15:25 UTC · model grok-4.3

classification 💻 cs.CV
keywords uncertainty estimationsemantic segmentationpanoptic segmentationmodel ensemblescalibrationtime seriesdeep learningcomputer vision
0
0 comments X

The pith

A broad test of uncertainty estimation in segmentation finds that harder panoptic tasks reduce performance and that results vary sharply across datasets and backbones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper runs a large-scale comparison of uncertainty methods for both semantic and panoptic segmentation. It varies datasets, network backbones, video time series, sample diversity, and ensemble versus deterministic approaches while measuring effects on several downstream tasks. The central finding is that panoptic segmentation tends to produce weaker uncertainty signals, that generalization across conditions is not reliable, and that some common techniques add value only in narrow cases. This matters because uncertainty estimates are meant to flag errors in deployed systems, so knowing which factors improve or degrade them helps decide when and how to use them in practice.

Core claim

Through systematic experiments the authors show that the more difficult panoptic segmentation task usually produces worse uncertainty performance than semantic segmentation, while high variance between datasets and backbones indicates that generalization cannot be assumed. Time-series samples from video can improve estimates in particular configurations yet frequently do not justify the added cost. Sample diversity helps most on the calibration task but otherwise does not outperform simpler baselines. A deterministic network is sufficient for some downstream uses, but ensembles deliver clear gains once the right deployment conditions are met.

What carries the argument

A multi-variable experimental framework that jointly varies datasets, backbones, downstream tasks, time series, sample diversity, and ensemble versus deterministic uncertainty methods for semantic and panoptic segmentation.

If this is right

  • Panoptic segmentation produces lower-quality uncertainty estimates than semantic segmentation in most tested settings.
  • Incorporating prior video frames improves uncertainty only for certain model and task combinations and is often not worth the computational cost.
  • Methods that increase sample diversity mainly benefit calibration and rarely surpass simpler uncertainty baselines on other tasks.
  • Deterministic uncertainty suffices for some applications while ensembles yield measurable gains only when deployment conditions allow their advantages to appear.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners facing panoptic segmentation may need to invest more in model-specific uncertainty tuning rather than relying on off-the-shelf methods.
  • The cost-benefit findings for time series and ensembles could guide resource allocation when building real-time segmentation pipelines.
  • High observed variance suggests that uncertainty quality may need to be validated on each new dataset and backbone instead of assumed from prior benchmarks.
  • These sensitivities point toward the value of developing uncertainty techniques that remain stable when task difficulty increases.

Load-bearing premise

The chosen datasets, backbones, and downstream tasks are representative enough of real-world conditions that the observed performance patterns can be read as general guidance.

What would settle it

A follow-up study that uses a substantially different collection of datasets and models and still finds low variance and strong generalization of uncertainty quality across semantic and panoptic tasks would contradict the reported patterns.

Figures

Figures reproduced from arXiv: 2605.15421 by Frank P. Ferrie, Michael Smith.

Figure 1
Figure 1. Figure 1: A simplified overview of our framework (Sec. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Results on the downstream tasks (V8) of failure detection and calibration, plotted by (a) prediction model V3 and by (b) dataset (V1) and backbone (V2). The letter-value plot [27] is an extension of the box plot, with the median shown by the middle line, the largest box representing 50% of the data, and each subsequent smaller box representing an additional half (75%, 87.5%, etc.). Circles represent outlie… view at source ↗
Figure 3
Figure 3. Figure 3: Results on the out-of-distribution detection downstream [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Segmentation performance by dataset and backbone with [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

In this study, we explore in depth a few under-studied topics at the intersection of uncertainty estimation and segmentation. Prior work has shown that the quality of uncertainty estimates can be very sensitive to a range of variables. As one of the main uses of uncertainty estimation is to help identify and deal with prediction errors in practical scenarios, any factors that affect this must be clearly identified. For example, do more challenging domains or different datasets and architectures result in worse performance when using uncertainty estimates? Can prior frames in a video sequence in fact provide useful uncertainty estimates comparable to other approaches? Is it possible to combine uncertainty estimation approaches, taking advantage of sample diversity, to get better estimates? Finally, when might it make sense to use an ensemble-based uncertainty estimate over a deterministic network? We address these questions by creating a framework for and executing a large scale study across many variables such as datasets, backbones, and downstream tasks, for both semantic and panoptic segmentation. We find that a) the more challenging task of panoptic segmentation usually results in worse performance while high performance variance between datasets and backbones indicates that generalization is not guaranteed, b) time series samples can be useful for specific configurations, but in many cases are not worth the cost, c) sample diversity shows the most promise in the downstream task of calibration, but otherwise fails to beat simpler alternatives, d) a deterministic approach is adequate for some downstream tasks, but ensembles allow for significant improvements if the right conditions can be achieved in deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript conducts a large-scale empirical study on uncertainty estimation for semantic and panoptic segmentation. It systematically varies datasets, backbones, uncertainty methods (including MC dropout, ensembles, and test-time augmentation), time-series inputs, and sample diversity, then evaluates impacts on downstream tasks such as calibration and error detection. The reported findings are that panoptic segmentation generally yields worse uncertainty performance than semantic segmentation with high variance across datasets and backbones, time-series samples provide benefits only in specific configurations, sample diversity is most useful for calibration but otherwise does not outperform simpler baselines, and deterministic models suffice for some tasks while ensembles can deliver gains when deployment conditions allow.

Significance. If the patterns survive broader validation, the work supplies actionable empirical guidance on uncertainty method selection in segmentation pipelines, underscoring generalization risks and computational trade-offs. The multi-variable framework itself is a constructive contribution that could serve as a template for future studies; the explicit reporting of high cross-dataset and cross-backbone variance is a strength that tempers over-generalization.

major comments (1)
  1. [§3 and Table 1] §3 (Experimental Setup) and Table 1: the collection of datasets, backbones, and uncertainty estimators is presented without explicit justification or diversity analysis. Because the abstract itself highlights high performance variance across these axes, the load-bearing claims (a)–(d) cannot be treated as general guidance until the authors demonstrate that the chosen experimental slice is representative rather than an artifact of the specific selection.
minor comments (2)
  1. [§4.2] §4.2: the description of how time-series samples are incorporated into the uncertainty pipeline lacks a clear diagram or pseudocode, making it difficult to reproduce the exact configurations that reportedly yield benefits.
  2. [Figure 5] Figure 5: axis labels and legend entries for the calibration plots are too small; increasing font size would improve readability without altering the results.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address the major comment below, indicating where revisions will be made.

read point-by-point responses
  1. Referee: [§3 and Table 1] §3 (Experimental Setup) and Table 1: the collection of datasets, backbones, and uncertainty estimators is presented without explicit justification or diversity analysis. Because the abstract itself highlights high performance variance across these axes, the load-bearing claims (a)–(d) cannot be treated as general guidance until the authors demonstrate that the chosen experimental slice is representative rather than an artifact of the specific selection.

    Authors: We agree that §3 and Table 1 would benefit from explicit justification and a brief diversity analysis of the selected datasets, backbones, and uncertainty estimators. These choices were made to include standard, widely adopted benchmarks and methods from the segmentation and uncertainty literature (e.g., common urban and indoor scene datasets, representative CNN and transformer backbones, and established techniques such as MC dropout, ensembles, and test-time augmentation) in order to enable direct comparison with prior work while covering a practical range of difficulties. In the revised manuscript we will expand §3 with a dedicated paragraph that states the selection criteria, supplies supporting citations, and adds a short diversity analysis (covering domain types, input resolutions, class counts, and method categories). Regarding the load-bearing claims (a)–(d), the manuscript already frames them as empirical observations from the tested configurations rather than universal guidance; the abstract and main text explicitly highlight the high cross-dataset and cross-backbone variance and state that “generalization is not guaranteed.” We therefore do not claim the results apply outside the explored slice. The requested justification will be added, but we do not believe additional exhaustive experiments are required to support the current, tempered claims. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical multi-variable study

full rationale

The paper reports results from a large-scale experimental exploration of uncertainty estimation methods across datasets, backbones, semantic vs. panoptic segmentation, time-series inputs, sample diversity, and deterministic vs. ensemble approaches. No equations, derivations, fitted parameters, or predictions are defined in terms of the reported outcomes themselves. Claims (a-d) are observational patterns extracted from the experimental slice rather than quantities forced by construction or self-citation chains. The work is self-contained as an empirical benchmark study with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical evaluation paper; no new mathematical axioms, free parameters, or invented physical entities are introduced. All methods and metrics are drawn from prior literature.

pith-pipeline@v0.9.0 · 5799 in / 1202 out tokens · 46997 ms · 2026-05-19T15:25:13.342999+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages

  1. [1]

    Improving Multi-Class Cali- bration through Normalization-Aware Isotonic Techniques

    Alon Arad and Saharon Rosset. Improving Multi-Class Cali- bration through Normalization-Aware Isotonic Techniques. In Forty-second International Conference on Machine Learning, 2025

  2. [2]

    Uncertainty-Aware Deep Learning for Automated Skin Can- cer Classification: A Comprehensive Evaluation.arXiv preprint arXiv:2506.10302, 2025

    Hamzeh Asgharnezhad, Pegah Tabarisaadi, Abbas Khos- ravi, Roohallah Alizadehsani, and U Rajendra Acharya. Uncertainty-Aware Deep Learning for Automated Skin Can- cer Classification: A Comprehensive Evaluation.arXiv preprint arXiv:2506.10302, 2025

  3. [3]

    Test-time Data Augmentation for Estimation of Heteroscedastic Aleatoric Uncertainty in Deep Neural Networks

    Murat Seckin Ayhan and Philipp Berens. Test-time Data Augmentation for Estimation of Heteroscedastic Aleatoric Uncertainty in Deep Neural Networks. InMedical Imaging with Deep Learning, 2018

  4. [4]

    FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation

    Yasser Benigmim, Mohammad Fahes, Tuan-Hung Vu, An- drei Bursuc, and Raoul de Charette. FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 21471–21481, 2025

  5. [5]

    Weight Uncertainty in Neural Network

    Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight Uncertainty in Neural Network. In Proceedings of the 32nd International Conference on Machine Learning, pages 1613–1622, Lille, France, 2015. PMLR

  6. [6]

    Density-Softmax: Efficient Test- time Model for Uncertainty Estimation and Robustness under Distribution Shifts

    Ha Manh Bui and Anqi Liu. Density-Softmax: Efficient Test- time Model for Uncertainty Estimation and Robustness under Distribution Shifts. InProceedings of the 41st International Conference on Machine Learning, pages 4822–4853. PMLR, 2024

  7. [7]

    The Many Faces of Reliability: Uncertainy Estimation and Ensemble Approaches, 2023

    Andrei Bursuc. The Many Faces of Reliability: Uncertainy Estimation and Ensemble Approaches, 2023

  8. [8]

    Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Fam- ily Distributions

    Bertrand Charpentier, Oliver Borchert, Daniel Z¨ugner, Simon Geisler, and Stephan G¨unnemann. Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Fam- ily Distributions. InInternational Conference on Learning Representations, 2022

  9. [9]

    Stronger, Steadier & Su- perior: Geometric Consistency in Depth VFM Forges Do- main Generalized Semantic Segmentation

    Siyu Chen, Ting Han, Changshe Zhang, Xin Luo, Meiliu Wu, Guorong Cai, and Jinhe Su. Stronger, Steadier & Su- perior: Geometric Consistency in Depth VFM Forges Do- main Generalized Semantic Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8285–8295, 2025

  10. [10]

    Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, and David J. Fleet. A Generalist Framework for Panoptic Segmen- tation of Images and Videos. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 909–919, 2023

  11. [11]

    Collins, Yukun Zhu, Ting Liu, Thomas S

    Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, and Liang-Chieh Chen. Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

  12. [13]

    Towards Cali- brated Multi-label Deep Neural Networks

    Jiacheng Cheng and Nuno Vasconcelos. Towards Cali- brated Multi-label Deep Neural Networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27589–27599, 2024

  13. [16]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. InInternational Conference on Learning Representa- tions, 2021

  14. [17]

    Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christo- pher K. I. Williams, John Winn, and Andrew Zisserman. The Pascal Visual Object Classes Challenge: A Retrospective. International Journal of Computer Vision, 111(1):98–136, 2015

  15. [18]

    Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

    Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. InProceedings of The 33rd International Confer- ence on Machine Learning, pages 1050–1059, New York, New York, USA, 2016. PMLR

  16. [19]

    Active Vision in the Era of Convolutional Neural Networks

    Dimitrios Gallos and Frank Ferrie. Active Vision in the Era of Convolutional Neural Networks. In2019 16th Conference on Computer and Robot Vision (CRV), pages 81–88, 2019

  17. [20]

    CLIP-Adapted Region-to-Text Learning for Generative Open- V ocabulary Semantic Segmentation

    Jiannan Ge, Lingxi Xie, Hongtao Xie, Pandeng Li, Sun- Ao Liu, Xiaopeng Zhang, Qi Tian, and Yongdong Zhang. CLIP-Adapted Region-to-Text Learning for Generative Open- V ocabulary Semantic Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 24034–24044, 2025

  18. [22]

    Visual attention network.Computa- tional Visual Media, 9(4):733–752, 2023

    Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, and Shi-Min Hu. Visual attention network.Computa- tional Visual Media, 9(4):733–752, 2023

  19. [23]

    Deep Residual Learning for Image Recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

  20. [24]

    Towards Corner Case Detection by Model- ing the Uncertainty of Instance Segmentation Networks

    Florian Heidecker, Abdul Hannan, Maarten Bieshaar, and Bernhard Sick. Towards Corner Case Detection by Model- ing the Uncertainty of Instance Segmentation Networks. In Pattern Recognition. ICPR International Workshops and Chal- lenges, pages 361–374, Cham, 2021. Springer International Publishing

  21. [25]

    Sampling-based Uncertainty Estimation for an Instance Seg- mentation Network.arXiv preprint arXiv:2305.14977, 2023

    Florian Heidecker, Ahmad El-Khateeb, and Bernhard Sick. Sampling-based Uncertainty Estimation for an Instance Seg- mentation Network.arXiv preprint arXiv:2305.14977, 2023

  22. [26]

    Deep Reinforcement Learning That Matters.Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 2018

    Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep Reinforcement Learning That Matters.Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 2018. Section: AAAI Tech- nical Track: Machine Learning

  23. [27]

    M ¨uller

    Heike Hofmann, Hadley Wickham, and Karen Kafadar. Letter-Value Plots: Boxplots for Large Data.Jour- nal of Computational and Graphical Statistics, 26(3): 469–477, 2017. Publisher: ASA Website eprint: https://doi.org/10.1080/10618600.2017.1305277

  24. [32]

    Simple and Scalable Predictive Uncertainty Estima- tion using Deep Ensembles

    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and Scalable Predictive Uncertainty Estima- tion using Deep Ensembles. InAdvances in Neural Informa- tion Processing Systems. Curran Associates, Inc., 2017

  25. [33]

    Landgraf, M

    S. Landgraf, M. Hillemann, K. Wursthorn, and M. Ulrich. Uncertainty-aware Cross-Entropy for Semantic Segmentation. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, X-2-2024:129–136, 2024

  26. [34]

    Packed Ensembles for efficient uncertainty estima- tion

    Olivier Laurent, Adrien Lafage, Enzo Tartaglione, Geof- frey Daniel, Jean-marc Martinez, Andrei Bursuc, and Gianni Franchi. Packed Ensembles for efficient uncertainty estima- tion. InThe Eleventh International Conference on Learning Representations, 2023

  27. [35]

    Region- based evidential deep learning to quantify uncertainty and improve robustness of brain tumor segmentation.Neural Computing and Applications, 35(30):22071–22085, 2023

    Hao Li, Yang Nan, Javier Del Ser, and Guang Yang. Region- based evidential deep learning to quantify uncertainty and improve robustness of brain tumor segmentation.Neural Computing and Applications, 35(30):22071–22085, 2023

  28. [36]

    Vicinal Label Supervision for Reliable Aleatoric and Epistemic Uncertainty Estimation

    Linye Li, Yufei Chen, and Xiaodong Yue. Vicinal Label Supervision for Reliable Aleatoric and Epistemic Uncertainty Estimation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  29. [38]

    Improving Accuracy and Calibration via Differentiated Deep Mutual Learning

    Han Liu, Peng Cui, Bingning Wang, Weipeng Chen, Yupeng Zhang, Jun Zhu, and Xiaolin Hu. Improving Accuracy and Calibration via Differentiated Deep Mutual Learning. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25812–25821, 2025

  30. [39]

    Uncertainty Quantification and Confi- dence Calibration in Large Language Models: A Survey

    Xiaoou Liu, Tiejin Chen, Longchao Da, Chacha Chen, Zhen Lin, and Hua Wei. Uncertainty Quantification and Confi- dence Calibration in Large Language Models: A Survey. In Proceedings of the 31st ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining V.2, pages 6107–6117, New York, NY , USA, 2025. Association for Computing Machinery. event-place: Tor...

  31. [40]

    Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, 2021

  32. [41]

    False Negative Reduction in Video Instance Seg- mentation using Uncertainty Estimates

    Kira Maag. False Negative Reduction in Video Instance Seg- mentation using Uncertainty Estimates. In2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), pages 1279–1286, 2021

  33. [42]

    Improving Video Instance Segmentation by Light-weight Temporal Un- certainty Estimates

    Kira Maag, Matthias Rottmann, Serin Varghese, Fabian H¨uger, Peter Schlicht, and Hanno Gottschalk. Improving Video Instance Segmentation by Light-weight Temporal Un- certainty Estimates. In2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2021. ISSN: 2161- 4407

  34. [43]

    Metrics reloaded: Pitfalls and recommendationsfor imageanalysisvalidationURL:https://arxiv

    Lena Maier-Hein, Annika Reinke, Patrick Godau, Minu D Ti- zabi, Florian Buettner, Evangelia Christodoulou, Ben Glocker, Fabian Isensee, Jens Kleesiek, Michal Kozubek, and others. Metrics reloaded: Recommendations for image analysis vali- dation.arXiv preprint arXiv:2206.01653, 2022

  35. [44]

    Ti- zabi, Florian Buettner, Evangelia Christodoulou, Ben Glocker, Fabian Isensee, Jens Kleesiek, Michal Kozubek, Mauricio Reyes, Michael A

    Lena Maier-Hein, Annika Reinke, Patrick Godau, Minu D. Ti- zabi, Florian Buettner, Evangelia Christodoulou, Ben Glocker, Fabian Isensee, Jens Kleesiek, Michal Kozubek, Mauricio Reyes, Michael A. Riegler, Manuel Wiesenfarth, A. Emre Kavur, Carole H. Sudre, Michael Baumgartner, Matthias Eisenmann, Doreen Heckmann-N¨otzel, Tim R¨adsch, Laura Acion, Michela A...

  36. [45]

    PhD Thesis, University of Cambridge, 2019

    Andrey Malinin.Uncertainty estimation in deep learning with application to spoken language assessment. PhD Thesis, University of Cambridge, 2019

  37. [46]

    Evaluating Merging Strategies for Sampling- based Uncertainty Techniques in Object Detection

    Dimity Miller, Feras Dayoub, Michael Milford, and Niko S¨underhauf. Evaluating Merging Strategies for Sampling- based Uncertainty Techniques in Object Detection. In2019 In- ternational Conference on Robotics and Automation (ICRA), pages 2348–2354, 2019

  38. [48]

    Panoptic Out-of- Distribution Segmentation.IEEE Robotics and Automation Letters, 9(5):4075–4082, 2024

    Rohit Mohan, Kiran Kumaraswamy, Juana Valeria Hur- tado, K ¨ursat Petek, and Abhinav Valada. Panoptic Out-of- Distribution Segmentation.IEEE Robotics and Automation Letters, 9(5):4075–4082, 2024. Conference Name: IEEE Robotics and Automation Letters

  39. [50]

    Torr, and Yarin Gal

    Jishnu Mukhoti, Andreas Kirsch, Joost van Amersfoort, Philip H.S. Torr, and Yarin Gal. Deep Deterministic Un- certainty: A New Simple Baseline. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24384–24394, 2023

  40. [51]

    Epistemic Uncertainty for Gener- ated Image Detection

    Jun Nie, Yonggang Zhang, Tongliang Liu, Yiu-ming Cheung, Bo Han, and Xinmei Tian. Epistemic Uncertainty for Gener- ated Image Detection. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  41. [53]

    Epistemic Neural Networks

    Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, MORTEZA IBRAHIMI, Xiuyuan Lu, and Benjamin Van Roy. Epistemic Neural Networks. In Advances in Neural Information Processing Systems, pages 2795–2823. Curran Associates, Inc., 2023

  42. [54]

    Obtaining Well Calibrated Probabilities Using Bayesian Binning.Proceedings of the AAAI Conference on Artificial Intelligence, 29(1), 2015

    Mahdi Pakdaman Naeini, Gregory Cooper, and Milos Hauskrecht. Obtaining Well Calibrated Probabilities Using Bayesian Binning.Proceedings of the AAAI Conference on Artificial Intelligence, 29(1), 2015. Section: Main Track: Novel Machine Learning Algorithms

  43. [55]

    Explor- ing Weather-aware Aggregation and Adaptation for Semantic Segmentation under Adverse Conditions

    Yuwen Pan, Rui Sun, Wangkai Li, and Tianzhu Zhang. Explor- ing Weather-aware Aggregation and Adaptation for Semantic Segmentation under Adverse Conditions. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13952–13962, 2025

  44. [56]

    Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, Jos ´e Miguel Hern´andez-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A Osbor...

  45. [57]

    Model ratatouille: recy- cling diverse models for out-of-distribution generalization

    Alexandre Ram´e, Kartik Ahuja, Jianyu Zhang, Matthieu Cord, L´eon Bottou, and David Lopez-Paz. Model ratatouille: recy- cling diverse models for out-of-distribution generalization. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA, 2023. JMLR.org

  46. [59]

    Where are we with calibration under dataset shift in image classification?Transactions on Machine Learn- ing Research, 2025

    M´elanie Roschewitz, Raghav Mehta, Fabio De Sousa Ribeiro, and Ben Glocker. Where are we with calibration under dataset shift in image classification?Transactions on Machine Learn- ing Research, 2025

  47. [60]

    EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation

    Suman Saha, Lukas Hoyer, Anton Obukhov, Dengxin Dai, and Luc Van Gool. EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 19234–19245, 2023

  48. [61]

    Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation

    Sebastian Schmidt, Julius Koerner, Dominik Fuchsgruber, Ste- fano Gasperini, Federico Tombari, and Stephan G¨unnemann. Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 23646–23656, 2025

  49. [62]

    Are you sure? Analysing Un- certainty Quantification Approaches for Real-world Speech Emotion Recognition

    Oliver Schr¨ufer, Manuel Milling, Felix Burkhardt, Florian Eyben, and Bj ¨orn Schuller. Are you sure? Analysing Un- certainty Quantification Approaches for Real-world Speech Emotion Recognition. InInterspeech 2024, pages 3210–3214,

  50. [63]

    Eviden- tial Deep Learning to Quantify Classification Uncertainty

    Murat Sensoy, Lance Kaplan, and Melih Kandemir. Eviden- tial Deep Learning to Quantify Classification Uncertainty. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2018

  51. [64]

    C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379–423, 1948. Conference Name: The Bell System Technical Journal

  52. [65]

    Fully Convolutional Networks for Semantic Segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4):640–651, 2017

    Evan Shelhamer, Jonathan Long, and Trevor Darrell. Fully Convolutional Networks for Semantic Segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4):640–651, 2017

  53. [67]

    Re- thinking Aleatoric and Epistemic Uncertainty

    Freddie Bickford Smith, Jannik Kossen, Eleanor Trollope, Mark van der Wilk, Adam Foster, and Tom Rainforth. Re- thinking Aleatoric and Epistemic Uncertainty. InForty- second International Conference on Machine Learning, 2025

  54. [70]

    Semantic segmentation using Vision Transformers: A sur- vey.Engineering Applications of Artificial Intelligence, 126: 106669, 2023

    Hans Thisanke, Chamli Deshan, Kavindu Chamith, Sachith Seneviratne, Rajith Vidanaarachchi, and Damayanthi Herath. Semantic segmentation using Vision Transformers: A sur- vey.Engineering Applications of Artificial Intelligence, 126: 106669, 2023

  55. [71]

    Un- certainty Aware Training to Improve Uncertainty Active Learning for Semantic Segmentation

    Moritz Thoma, Tobias Preintner, Emad Aghajanzadeh, Shambhavi Balamuthu Sampath, Pierpaolo Mori, Nael Fasfous, Manoj-Rohit Vemparala, Alexander Frickenstein, Daniel Mueller-Gritschneder, and Ulf Schlichtmann. Un- certainty Aware Training to Improve Uncertainty Active Learning for Semantic Segmentation. InProceedings of the IEEE/CVF Conference on Computer V...

  56. [72]

    Are you sure? Measuring models bias in content moderation through uncertainty

    Alessandra Urbinati, Mirko Lai, Simona Frenda, and Marco Stranisci. Are you sure? Measuring models bias in content moderation through uncertainty. InFindings of the Associ- ation for Computational Linguistics: EMNLP 2025, pages 18061–18076, Suzhou, China, 2025. Association for Compu- tational Linguistics

  57. [73]

    A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation

    MMA Valiuddin, RJG van Sloun, CGA Viviers, PHN de With, and F van der Sommen. A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation. arXiv preprint arXiv:2411.16370, 2024

  58. [75]

    Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

    Zhixiang Wei, Lin Chen, Yi Jin, Xiaoxiao Ma, Tianle Liu, Pengyang Ling, Ben Wang, Huaian Chen, and Jinjin Zheng. Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 28619–28630, 2024

  59. [76]

    Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, and Dustin Tran

    Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W. Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, and Dustin Tran. Combining Ensembles and Data Augmentation Can Harm Your Calibration. InInternational Conference on Learning Representations, 2021

  60. [77]

    Hoff- man, Yarin Gal, Yingzhen Li, Melanie F

    Andrew Gordon Wilson, Pavel Izmailov, Matthew D. Hoff- man, Yarin Gal, Yingzhen Li, Melanie F. Pradier, Sharad Vikram, Andrew Foong, Sanae Lotfi, and Sebastian Farquhar. Evaluating Approximate Inference in Bayesian Deep Learn- ing. InProceedings of the NeurIPS 2021 Competitions and Demonstrations Track, pages 113–124. PMLR, 2022. ISSN: 2640-3498

  61. [78]

    Dual-Temporal Exemplar Representation Network for Video Semantic Segmentation

    Xiaolong Xu, Lei Zhang, Jiayi Li, Lituan Wang, Yifan Guan, Yu Yan, Leyi Zhang, and Hao Song. Dual-Temporal Exemplar Representation Network for Video Semantic Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10775–10785, 2025

  62. [79]

    Dauphin, and David Lopez-Paz

    Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond Empirical Risk Minimiza- tion. InInternational Conference on Learning Representa- tions, 2018

  63. [80]

    interruption

    Zhi-Hua Zhou.Ensemble Methods: Foundations and Algo- rithms. Chapman and Hall/CRC, New York, 2 edition, 2025. U-SEG: Uncertainty in SEGmentation - A systematic multi-variable exploration Supplementary Material A1. Implementation Details A1.1. Datasets As discussed in Sec. 4, we evaluate over the VIPER [16] and Cityscapes [2] datasets, both of which are dr...

  64. [81]

    in exchange for better uncertainty estimates is acceptable, at least for downstream tasks such as failure detection. A7.2. Out-of-distribution performance In Fig. A2, we show the OOD performance without nor- malization on the Cityscapes dataset and with a distribution shift where we use the VIPER model instead. We see that un-normalized results are broadl...

  65. [82]

    Results are averaged over 3 runs; error bars in red show the 95%confidence interval, generated via bootstrapping. 0.0 0.5 1.0 1.5 2.0 Per Image Inference Time (s) Prediction Model Baseline scale horizontalFlip+scale horizontalFlip scale horizontalFlip scale horizontalFlip Time 0 Time 1 Time 2 Time 0 Time 1 MC 0 MC 3 Averaging Inference Time T otal # of Sa...

  66. [83]

    Results are averaged over 3 runs; error bars in red show the 95% confidence interval, generated via bootstrapping

    to sample aggregation (V4), executed on a Nvidia RTX 4090. Results are averaged over 3 runs; error bars in red show the 95% confidence interval, generated via bootstrapping. Supplementary Material References

  67. [84]

    Schwing, Alexan- der Kirillov, and Rohit Girdhar

    Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexan- der Kirillov, and Rohit Girdhar. Masked-attention Mask Trans- former for Universal Image Segmentation. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1280–1289, 2022

  68. [85]

    The Cityscapes Dataset for Semantic Urban Scene Understanding

    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding. InProc. of the IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2016

  69. [86]

    Waslander

    Jacob Deery, Chang Won Lee, and Steven L. Waslander. ProPanDL: A Modular Architecture for Uncertainty-Aware Panoptic Segmentation. In2023 20th Conference on Robots and Vision (CRV), pages 137–144, 2023

  70. [87]

    Cubuk, Quoc V

    Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung- Yi Lin, Ekin D. Cubuk, Quoc V . Le, and Barret Zoph. Simple Copy-Paste Is a Strong Data Augmentation Method for In- stance Segmentation. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 2918–2928, 2021

  71. [88]

    Weinberger

    Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On Calibration of Modern Neural Networks. InProceedings of the 34th International Conference on Machine Learning, pages 1321–1330. PMLR, 2017

  72. [89]

    Efficient Uncertainty Estimation for Semantic Segmentation in Videos

    Po-Yu Huang, Wan-Ting Hsu, Chun-Yueh Chiu, Ting-Fan Wu, and Min Sun. Efficient Uncertainty Estimation for Semantic Segmentation in Videos. InProceedings of the European Conference on Computer Vision (ECCV), 2018

  73. [90]

    Jaeger, Carsten Tim L ¨uth, Lukas Klein, and Till J

    Paul F. Jaeger, Carsten Tim L ¨uth, Lukas Klein, and Till J. Bungert. A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification. InThe Eleventh Interna- tional Conference on Learning Representations, 2023

  74. [91]

    L¨uth, Maximilian Zenk, Klaus Maier-Hein, and Paul F

    Kim-Celine Kahl, Carsten T. L¨uth, Maximilian Zenk, Klaus Maier-Hein, and Paul F. Jaeger. ValUES: A Framework for Systematic Validation of Uncertainty Estimation in Semantic Segmentation. InThe Twelfth International Conference on Learning Representations, 2024

  75. [92]

    Panoptic Segmentation

    Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. Panoptic Segmentation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

  76. [93]

    Lawrence Zitnick

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing

  77. [94]

    Revisiting the Calibration of Mod- ern Neural Networks

    Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Ann Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, and Mario Lucic. Revisiting the Calibration of Mod- ern Neural Networks. InAdvances in Neural Information Processing Systems, 2021

  78. [95]

    Esti- mating uncertainty in instance segmentation using dropout sampling

    Doug Morrison, Anton Milan, and Nontas Antonakos. Esti- mating uncertainty in instance segmentation using dropout sampling. InCVPR 2019 Robotic Vision Probabilistic Object Detection Challenge, 2019

  79. [96]

    Deep deterministic uncer- tainty: A simple baseline.arXiv preprint arXiv:2102.11582, 2021

    Jishnu Mukhoti, Andreas Kirsch, Joost van Amersfoort, Philip HS Torr, and Yarin Gal. Deep deterministic uncer- tainty: A simple baseline.arXiv preprint arXiv:2102.11582, 2021

  80. [97]

    Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick...

Showing first 80 references.