U-SEG: Uncertainty in SEGmentation -- A systematic multi-variable exploration
Pith reviewed 2026-05-19 15:25 UTC · model grok-4.3
The pith
A broad test of uncertainty estimation in segmentation finds that harder panoptic tasks reduce performance and that results vary sharply across datasets and backbones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through systematic experiments the authors show that the more difficult panoptic segmentation task usually produces worse uncertainty performance than semantic segmentation, while high variance between datasets and backbones indicates that generalization cannot be assumed. Time-series samples from video can improve estimates in particular configurations yet frequently do not justify the added cost. Sample diversity helps most on the calibration task but otherwise does not outperform simpler baselines. A deterministic network is sufficient for some downstream uses, but ensembles deliver clear gains once the right deployment conditions are met.
What carries the argument
A multi-variable experimental framework that jointly varies datasets, backbones, downstream tasks, time series, sample diversity, and ensemble versus deterministic uncertainty methods for semantic and panoptic segmentation.
If this is right
- Panoptic segmentation produces lower-quality uncertainty estimates than semantic segmentation in most tested settings.
- Incorporating prior video frames improves uncertainty only for certain model and task combinations and is often not worth the computational cost.
- Methods that increase sample diversity mainly benefit calibration and rarely surpass simpler uncertainty baselines on other tasks.
- Deterministic uncertainty suffices for some applications while ensembles yield measurable gains only when deployment conditions allow their advantages to appear.
Where Pith is reading between the lines
- Practitioners facing panoptic segmentation may need to invest more in model-specific uncertainty tuning rather than relying on off-the-shelf methods.
- The cost-benefit findings for time series and ensembles could guide resource allocation when building real-time segmentation pipelines.
- High observed variance suggests that uncertainty quality may need to be validated on each new dataset and backbone instead of assumed from prior benchmarks.
- These sensitivities point toward the value of developing uncertainty techniques that remain stable when task difficulty increases.
Load-bearing premise
The chosen datasets, backbones, and downstream tasks are representative enough of real-world conditions that the observed performance patterns can be read as general guidance.
What would settle it
A follow-up study that uses a substantially different collection of datasets and models and still finds low variance and strong generalization of uncertainty quality across semantic and panoptic tasks would contradict the reported patterns.
Figures
read the original abstract
In this study, we explore in depth a few under-studied topics at the intersection of uncertainty estimation and segmentation. Prior work has shown that the quality of uncertainty estimates can be very sensitive to a range of variables. As one of the main uses of uncertainty estimation is to help identify and deal with prediction errors in practical scenarios, any factors that affect this must be clearly identified. For example, do more challenging domains or different datasets and architectures result in worse performance when using uncertainty estimates? Can prior frames in a video sequence in fact provide useful uncertainty estimates comparable to other approaches? Is it possible to combine uncertainty estimation approaches, taking advantage of sample diversity, to get better estimates? Finally, when might it make sense to use an ensemble-based uncertainty estimate over a deterministic network? We address these questions by creating a framework for and executing a large scale study across many variables such as datasets, backbones, and downstream tasks, for both semantic and panoptic segmentation. We find that a) the more challenging task of panoptic segmentation usually results in worse performance while high performance variance between datasets and backbones indicates that generalization is not guaranteed, b) time series samples can be useful for specific configurations, but in many cases are not worth the cost, c) sample diversity shows the most promise in the downstream task of calibration, but otherwise fails to beat simpler alternatives, d) a deterministic approach is adequate for some downstream tasks, but ensembles allow for significant improvements if the right conditions can be achieved in deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript conducts a large-scale empirical study on uncertainty estimation for semantic and panoptic segmentation. It systematically varies datasets, backbones, uncertainty methods (including MC dropout, ensembles, and test-time augmentation), time-series inputs, and sample diversity, then evaluates impacts on downstream tasks such as calibration and error detection. The reported findings are that panoptic segmentation generally yields worse uncertainty performance than semantic segmentation with high variance across datasets and backbones, time-series samples provide benefits only in specific configurations, sample diversity is most useful for calibration but otherwise does not outperform simpler baselines, and deterministic models suffice for some tasks while ensembles can deliver gains when deployment conditions allow.
Significance. If the patterns survive broader validation, the work supplies actionable empirical guidance on uncertainty method selection in segmentation pipelines, underscoring generalization risks and computational trade-offs. The multi-variable framework itself is a constructive contribution that could serve as a template for future studies; the explicit reporting of high cross-dataset and cross-backbone variance is a strength that tempers over-generalization.
major comments (1)
- [§3 and Table 1] §3 (Experimental Setup) and Table 1: the collection of datasets, backbones, and uncertainty estimators is presented without explicit justification or diversity analysis. Because the abstract itself highlights high performance variance across these axes, the load-bearing claims (a)–(d) cannot be treated as general guidance until the authors demonstrate that the chosen experimental slice is representative rather than an artifact of the specific selection.
minor comments (2)
- [§4.2] §4.2: the description of how time-series samples are incorporated into the uncertainty pipeline lacks a clear diagram or pseudocode, making it difficult to reproduce the exact configurations that reportedly yield benefits.
- [Figure 5] Figure 5: axis labels and legend entries for the calibration plots are too small; increasing font size would improve readability without altering the results.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address the major comment below, indicating where revisions will be made.
read point-by-point responses
-
Referee: [§3 and Table 1] §3 (Experimental Setup) and Table 1: the collection of datasets, backbones, and uncertainty estimators is presented without explicit justification or diversity analysis. Because the abstract itself highlights high performance variance across these axes, the load-bearing claims (a)–(d) cannot be treated as general guidance until the authors demonstrate that the chosen experimental slice is representative rather than an artifact of the specific selection.
Authors: We agree that §3 and Table 1 would benefit from explicit justification and a brief diversity analysis of the selected datasets, backbones, and uncertainty estimators. These choices were made to include standard, widely adopted benchmarks and methods from the segmentation and uncertainty literature (e.g., common urban and indoor scene datasets, representative CNN and transformer backbones, and established techniques such as MC dropout, ensembles, and test-time augmentation) in order to enable direct comparison with prior work while covering a practical range of difficulties. In the revised manuscript we will expand §3 with a dedicated paragraph that states the selection criteria, supplies supporting citations, and adds a short diversity analysis (covering domain types, input resolutions, class counts, and method categories). Regarding the load-bearing claims (a)–(d), the manuscript already frames them as empirical observations from the tested configurations rather than universal guidance; the abstract and main text explicitly highlight the high cross-dataset and cross-backbone variance and state that “generalization is not guaranteed.” We therefore do not claim the results apply outside the explored slice. The requested justification will be added, but we do not believe additional exhaustive experiments are required to support the current, tempered claims. revision: yes
Circularity Check
No circularity: purely empirical multi-variable study
full rationale
The paper reports results from a large-scale experimental exploration of uncertainty estimation methods across datasets, backbones, semantic vs. panoptic segmentation, time-series inputs, sample diversity, and deterministic vs. ensemble approaches. No equations, derivations, fitted parameters, or predictions are defined in terms of the reported outcomes themselves. Claims (a-d) are observational patterns extracted from the experimental slice rather than quantities forced by construction or self-citation chains. The work is self-contained as an empirical benchmark study with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We address these questions by creating a framework for and executing a large scale study across many variables such as datasets, backbones, and downstream tasks, for both semantic and panoptic segmentation.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We find that a) the more challenging task of panoptic segmentation usually results in worse performance while high performance variance between datasets and backbones indicates that generalization is not guaranteed...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Improving Multi-Class Cali- bration through Normalization-Aware Isotonic Techniques
Alon Arad and Saharon Rosset. Improving Multi-Class Cali- bration through Normalization-Aware Isotonic Techniques. In Forty-second International Conference on Machine Learning, 2025
work page 2025
-
[2]
Hamzeh Asgharnezhad, Pegah Tabarisaadi, Abbas Khos- ravi, Roohallah Alizadehsani, and U Rajendra Acharya. Uncertainty-Aware Deep Learning for Automated Skin Can- cer Classification: A Comprehensive Evaluation.arXiv preprint arXiv:2506.10302, 2025
-
[3]
Murat Seckin Ayhan and Philipp Berens. Test-time Data Augmentation for Estimation of Heteroscedastic Aleatoric Uncertainty in Deep Neural Networks. InMedical Imaging with Deep Learning, 2018
work page 2018
-
[4]
FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation
Yasser Benigmim, Mohammad Fahes, Tuan-Hung Vu, An- drei Bursuc, and Raoul de Charette. FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 21471–21481, 2025
work page 2025
-
[5]
Weight Uncertainty in Neural Network
Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight Uncertainty in Neural Network. In Proceedings of the 32nd International Conference on Machine Learning, pages 1613–1622, Lille, France, 2015. PMLR
work page 2015
-
[6]
Ha Manh Bui and Anqi Liu. Density-Softmax: Efficient Test- time Model for Uncertainty Estimation and Robustness under Distribution Shifts. InProceedings of the 41st International Conference on Machine Learning, pages 4822–4853. PMLR, 2024
work page 2024
-
[7]
The Many Faces of Reliability: Uncertainy Estimation and Ensemble Approaches, 2023
Andrei Bursuc. The Many Faces of Reliability: Uncertainy Estimation and Ensemble Approaches, 2023
work page 2023
-
[8]
Bertrand Charpentier, Oliver Borchert, Daniel Z¨ugner, Simon Geisler, and Stephan G¨unnemann. Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Fam- ily Distributions. InInternational Conference on Learning Representations, 2022
work page 2022
-
[9]
Siyu Chen, Ting Han, Changshe Zhang, Xin Luo, Meiliu Wu, Guorong Cai, and Jinhe Su. Stronger, Steadier & Su- perior: Geometric Consistency in Depth VFM Forges Do- main Generalized Semantic Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8285–8295, 2025
work page 2025
-
[10]
Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, and David J. Fleet. A Generalist Framework for Panoptic Segmen- tation of Images and Videos. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 909–919, 2023
work page 2023
-
[11]
Collins, Yukun Zhu, Ting Liu, Thomas S
Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, and Liang-Chieh Chen. Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
work page 2020
-
[13]
Towards Cali- brated Multi-label Deep Neural Networks
Jiacheng Cheng and Nuno Vasconcelos. Towards Cali- brated Multi-label Deep Neural Networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27589–27599, 2024
work page 2024
-
[16]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. InInternational Conference on Learning Representa- tions, 2021
work page 2021
-
[17]
Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christo- pher K. I. Williams, John Winn, and Andrew Zisserman. The Pascal Visual Object Classes Challenge: A Retrospective. International Journal of Computer Vision, 111(1):98–136, 2015
work page 2015
-
[18]
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. InProceedings of The 33rd International Confer- ence on Machine Learning, pages 1050–1059, New York, New York, USA, 2016. PMLR
work page 2016
-
[19]
Active Vision in the Era of Convolutional Neural Networks
Dimitrios Gallos and Frank Ferrie. Active Vision in the Era of Convolutional Neural Networks. In2019 16th Conference on Computer and Robot Vision (CRV), pages 81–88, 2019
work page 2019
-
[20]
CLIP-Adapted Region-to-Text Learning for Generative Open- V ocabulary Semantic Segmentation
Jiannan Ge, Lingxi Xie, Hongtao Xie, Pandeng Li, Sun- Ao Liu, Xiaopeng Zhang, Qi Tian, and Yongdong Zhang. CLIP-Adapted Region-to-Text Learning for Generative Open- V ocabulary Semantic Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 24034–24044, 2025
work page 2025
-
[22]
Visual attention network.Computa- tional Visual Media, 9(4):733–752, 2023
Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, and Shi-Min Hu. Visual attention network.Computa- tional Visual Media, 9(4):733–752, 2023
work page 2023
-
[23]
Deep Residual Learning for Image Recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
work page 2016
-
[24]
Towards Corner Case Detection by Model- ing the Uncertainty of Instance Segmentation Networks
Florian Heidecker, Abdul Hannan, Maarten Bieshaar, and Bernhard Sick. Towards Corner Case Detection by Model- ing the Uncertainty of Instance Segmentation Networks. In Pattern Recognition. ICPR International Workshops and Chal- lenges, pages 361–374, Cham, 2021. Springer International Publishing
work page 2021
-
[25]
Florian Heidecker, Ahmad El-Khateeb, and Bernhard Sick. Sampling-based Uncertainty Estimation for an Instance Seg- mentation Network.arXiv preprint arXiv:2305.14977, 2023
-
[26]
Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep Reinforcement Learning That Matters.Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 2018. Section: AAAI Tech- nical Track: Machine Learning
work page 2018
-
[27]
Heike Hofmann, Hadley Wickham, and Karen Kafadar. Letter-Value Plots: Boxplots for Large Data.Jour- nal of Computational and Graphical Statistics, 26(3): 469–477, 2017. Publisher: ASA Website eprint: https://doi.org/10.1080/10618600.2017.1305277
-
[32]
Simple and Scalable Predictive Uncertainty Estima- tion using Deep Ensembles
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and Scalable Predictive Uncertainty Estima- tion using Deep Ensembles. InAdvances in Neural Informa- tion Processing Systems. Curran Associates, Inc., 2017
work page 2017
-
[33]
S. Landgraf, M. Hillemann, K. Wursthorn, and M. Ulrich. Uncertainty-aware Cross-Entropy for Semantic Segmentation. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, X-2-2024:129–136, 2024
work page 2024
-
[34]
Packed Ensembles for efficient uncertainty estima- tion
Olivier Laurent, Adrien Lafage, Enzo Tartaglione, Geof- frey Daniel, Jean-marc Martinez, Andrei Bursuc, and Gianni Franchi. Packed Ensembles for efficient uncertainty estima- tion. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[35]
Hao Li, Yang Nan, Javier Del Ser, and Guang Yang. Region- based evidential deep learning to quantify uncertainty and improve robustness of brain tumor segmentation.Neural Computing and Applications, 35(30):22071–22085, 2023
work page 2023
-
[36]
Vicinal Label Supervision for Reliable Aleatoric and Epistemic Uncertainty Estimation
Linye Li, Yufei Chen, and Xiaodong Yue. Vicinal Label Supervision for Reliable Aleatoric and Epistemic Uncertainty Estimation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[38]
Improving Accuracy and Calibration via Differentiated Deep Mutual Learning
Han Liu, Peng Cui, Bingning Wang, Weipeng Chen, Yupeng Zhang, Jun Zhu, and Xiaolin Hu. Improving Accuracy and Calibration via Differentiated Deep Mutual Learning. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25812–25821, 2025
work page 2025
-
[39]
Uncertainty Quantification and Confi- dence Calibration in Large Language Models: A Survey
Xiaoou Liu, Tiejin Chen, Longchao Da, Chacha Chen, Zhen Lin, and Hua Wei. Uncertainty Quantification and Confi- dence Calibration in Large Language Models: A Survey. In Proceedings of the 31st ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining V.2, pages 6107–6117, New York, NY , USA, 2025. Association for Computing Machinery. event-place: Tor...
work page 2025
-
[40]
Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, 2021
work page 2021
-
[41]
False Negative Reduction in Video Instance Seg- mentation using Uncertainty Estimates
Kira Maag. False Negative Reduction in Video Instance Seg- mentation using Uncertainty Estimates. In2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), pages 1279–1286, 2021
work page 2021
-
[42]
Improving Video Instance Segmentation by Light-weight Temporal Un- certainty Estimates
Kira Maag, Matthias Rottmann, Serin Varghese, Fabian H¨uger, Peter Schlicht, and Hanno Gottschalk. Improving Video Instance Segmentation by Light-weight Temporal Un- certainty Estimates. In2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2021. ISSN: 2161- 4407
work page 2021
-
[43]
Metrics reloaded: Pitfalls and recommendationsfor imageanalysisvalidationURL:https://arxiv
Lena Maier-Hein, Annika Reinke, Patrick Godau, Minu D Ti- zabi, Florian Buettner, Evangelia Christodoulou, Ben Glocker, Fabian Isensee, Jens Kleesiek, Michal Kozubek, and others. Metrics reloaded: Recommendations for image analysis vali- dation.arXiv preprint arXiv:2206.01653, 2022
-
[44]
Lena Maier-Hein, Annika Reinke, Patrick Godau, Minu D. Ti- zabi, Florian Buettner, Evangelia Christodoulou, Ben Glocker, Fabian Isensee, Jens Kleesiek, Michal Kozubek, Mauricio Reyes, Michael A. Riegler, Manuel Wiesenfarth, A. Emre Kavur, Carole H. Sudre, Michael Baumgartner, Matthias Eisenmann, Doreen Heckmann-N¨otzel, Tim R¨adsch, Laura Acion, Michela A...
work page 2024
-
[45]
PhD Thesis, University of Cambridge, 2019
Andrey Malinin.Uncertainty estimation in deep learning with application to spoken language assessment. PhD Thesis, University of Cambridge, 2019
work page 2019
-
[46]
Evaluating Merging Strategies for Sampling- based Uncertainty Techniques in Object Detection
Dimity Miller, Feras Dayoub, Michael Milford, and Niko S¨underhauf. Evaluating Merging Strategies for Sampling- based Uncertainty Techniques in Object Detection. In2019 In- ternational Conference on Robotics and Automation (ICRA), pages 2348–2354, 2019
work page 2019
-
[48]
Rohit Mohan, Kiran Kumaraswamy, Juana Valeria Hur- tado, K ¨ursat Petek, and Abhinav Valada. Panoptic Out-of- Distribution Segmentation.IEEE Robotics and Automation Letters, 9(5):4075–4082, 2024. Conference Name: IEEE Robotics and Automation Letters
work page 2024
-
[50]
Jishnu Mukhoti, Andreas Kirsch, Joost van Amersfoort, Philip H.S. Torr, and Yarin Gal. Deep Deterministic Un- certainty: A New Simple Baseline. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24384–24394, 2023
work page 2023
-
[51]
Epistemic Uncertainty for Gener- ated Image Detection
Jun Nie, Yonggang Zhang, Tongliang Liu, Yiu-ming Cheung, Bo Han, and Xinmei Tian. Epistemic Uncertainty for Gener- ated Image Detection. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[53]
Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, MORTEZA IBRAHIMI, Xiuyuan Lu, and Benjamin Van Roy. Epistemic Neural Networks. In Advances in Neural Information Processing Systems, pages 2795–2823. Curran Associates, Inc., 2023
work page 2023
-
[54]
Mahdi Pakdaman Naeini, Gregory Cooper, and Milos Hauskrecht. Obtaining Well Calibrated Probabilities Using Bayesian Binning.Proceedings of the AAAI Conference on Artificial Intelligence, 29(1), 2015. Section: Main Track: Novel Machine Learning Algorithms
work page 2015
-
[55]
Yuwen Pan, Rui Sun, Wangkai Li, and Tianzhu Zhang. Explor- ing Weather-aware Aggregation and Adaptation for Semantic Segmentation under Adverse Conditions. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13952–13962, 2025
work page 2025
-
[56]
Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, Jos ´e Miguel Hern´andez-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A Osbor...
work page 2024
-
[57]
Model ratatouille: recy- cling diverse models for out-of-distribution generalization
Alexandre Ram´e, Kartik Ahuja, Jianyu Zhang, Matthieu Cord, L´eon Bottou, and David Lopez-Paz. Model ratatouille: recy- cling diverse models for out-of-distribution generalization. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA, 2023. JMLR.org
work page 2023
-
[59]
M´elanie Roschewitz, Raghav Mehta, Fabio De Sousa Ribeiro, and Ben Glocker. Where are we with calibration under dataset shift in image classification?Transactions on Machine Learn- ing Research, 2025
work page 2025
-
[60]
EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation
Suman Saha, Lukas Hoyer, Anton Obukhov, Dengxin Dai, and Luc Van Gool. EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 19234–19245, 2023
work page 2023
-
[61]
Sebastian Schmidt, Julius Koerner, Dominik Fuchsgruber, Ste- fano Gasperini, Federico Tombari, and Stephan G¨unnemann. Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 23646–23656, 2025
work page 2025
-
[62]
Oliver Schr¨ufer, Manuel Milling, Felix Burkhardt, Florian Eyben, and Bj ¨orn Schuller. Are you sure? Analysing Un- certainty Quantification Approaches for Real-world Speech Emotion Recognition. InInterspeech 2024, pages 3210–3214,
work page 2024
-
[63]
Eviden- tial Deep Learning to Quantify Classification Uncertainty
Murat Sensoy, Lance Kaplan, and Melih Kandemir. Eviden- tial Deep Learning to Quantify Classification Uncertainty. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2018
work page 2018
-
[64]
C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379–423, 1948. Conference Name: The Bell System Technical Journal
work page 1948
-
[65]
Evan Shelhamer, Jonathan Long, and Trevor Darrell. Fully Convolutional Networks for Semantic Segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4):640–651, 2017
work page 2017
-
[67]
Re- thinking Aleatoric and Epistemic Uncertainty
Freddie Bickford Smith, Jannik Kossen, Eleanor Trollope, Mark van der Wilk, Adam Foster, and Tom Rainforth. Re- thinking Aleatoric and Epistemic Uncertainty. InForty- second International Conference on Machine Learning, 2025
work page 2025
-
[70]
Hans Thisanke, Chamli Deshan, Kavindu Chamith, Sachith Seneviratne, Rajith Vidanaarachchi, and Damayanthi Herath. Semantic segmentation using Vision Transformers: A sur- vey.Engineering Applications of Artificial Intelligence, 126: 106669, 2023
work page 2023
-
[71]
Un- certainty Aware Training to Improve Uncertainty Active Learning for Semantic Segmentation
Moritz Thoma, Tobias Preintner, Emad Aghajanzadeh, Shambhavi Balamuthu Sampath, Pierpaolo Mori, Nael Fasfous, Manoj-Rohit Vemparala, Alexander Frickenstein, Daniel Mueller-Gritschneder, and Ulf Schlichtmann. Un- certainty Aware Training to Improve Uncertainty Active Learning for Semantic Segmentation. InProceedings of the IEEE/CVF Conference on Computer V...
work page 2025
-
[72]
Are you sure? Measuring models bias in content moderation through uncertainty
Alessandra Urbinati, Mirko Lai, Simona Frenda, and Marco Stranisci. Are you sure? Measuring models bias in content moderation through uncertainty. InFindings of the Associ- ation for Computational Linguistics: EMNLP 2025, pages 18061–18076, Suzhou, China, 2025. Association for Compu- tational Linguistics
work page 2025
-
[73]
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation
MMA Valiuddin, RJG van Sloun, CGA Viviers, PHN de With, and F van der Sommen. A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation. arXiv preprint arXiv:2411.16370, 2024
-
[75]
Zhixiang Wei, Lin Chen, Yi Jin, Xiaoxiao Ma, Tianle Liu, Pengyang Ling, Ben Wang, Huaian Chen, and Jinjin Zheng. Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 28619–28630, 2024
work page 2024
-
[76]
Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, and Dustin Tran
Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W. Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, and Dustin Tran. Combining Ensembles and Data Augmentation Can Harm Your Calibration. InInternational Conference on Learning Representations, 2021
work page 2021
-
[77]
Hoff- man, Yarin Gal, Yingzhen Li, Melanie F
Andrew Gordon Wilson, Pavel Izmailov, Matthew D. Hoff- man, Yarin Gal, Yingzhen Li, Melanie F. Pradier, Sharad Vikram, Andrew Foong, Sanae Lotfi, and Sebastian Farquhar. Evaluating Approximate Inference in Bayesian Deep Learn- ing. InProceedings of the NeurIPS 2021 Competitions and Demonstrations Track, pages 113–124. PMLR, 2022. ISSN: 2640-3498
work page 2021
-
[78]
Dual-Temporal Exemplar Representation Network for Video Semantic Segmentation
Xiaolong Xu, Lei Zhang, Jiayi Li, Lituan Wang, Yifan Guan, Yu Yan, Leyi Zhang, and Hao Song. Dual-Temporal Exemplar Representation Network for Video Semantic Segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10775–10785, 2025
work page 2025
-
[79]
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond Empirical Risk Minimiza- tion. InInternational Conference on Learning Representa- tions, 2018
work page 2018
-
[80]
Zhi-Hua Zhou.Ensemble Methods: Foundations and Algo- rithms. Chapman and Hall/CRC, New York, 2 edition, 2025. U-SEG: Uncertainty in SEGmentation - A systematic multi-variable exploration Supplementary Material A1. Implementation Details A1.1. Datasets As discussed in Sec. 4, we evaluate over the VIPER [16] and Cityscapes [2] datasets, both of which are dr...
work page 2025
-
[81]
in exchange for better uncertainty estimates is acceptable, at least for downstream tasks such as failure detection. A7.2. Out-of-distribution performance In Fig. A2, we show the OOD performance without nor- malization on the Cityscapes dataset and with a distribution shift where we use the VIPER model instead. We see that un-normalized results are broadl...
-
[82]
Results are averaged over 3 runs; error bars in red show the 95%confidence interval, generated via bootstrapping. 0.0 0.5 1.0 1.5 2.0 Per Image Inference Time (s) Prediction Model Baseline scale horizontalFlip+scale horizontalFlip scale horizontalFlip scale horizontalFlip Time 0 Time 1 Time 2 Time 0 Time 1 MC 0 MC 3 Averaging Inference Time T otal # of Sa...
-
[83]
to sample aggregation (V4), executed on a Nvidia RTX 4090. Results are averaged over 3 runs; error bars in red show the 95% confidence interval, generated via bootstrapping. Supplementary Material References
-
[84]
Schwing, Alexan- der Kirillov, and Rohit Girdhar
Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexan- der Kirillov, and Rohit Girdhar. Masked-attention Mask Trans- former for Universal Image Segmentation. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1280–1289, 2022
work page 2022
-
[85]
The Cityscapes Dataset for Semantic Urban Scene Understanding
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding. InProc. of the IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2016
work page 2016
- [86]
-
[87]
Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung- Yi Lin, Ekin D. Cubuk, Quoc V . Le, and Barret Zoph. Simple Copy-Paste Is a Strong Data Augmentation Method for In- stance Segmentation. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 2918–2928, 2021
work page 2021
-
[88]
Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On Calibration of Modern Neural Networks. InProceedings of the 34th International Conference on Machine Learning, pages 1321–1330. PMLR, 2017
work page 2017
-
[89]
Efficient Uncertainty Estimation for Semantic Segmentation in Videos
Po-Yu Huang, Wan-Ting Hsu, Chun-Yueh Chiu, Ting-Fan Wu, and Min Sun. Efficient Uncertainty Estimation for Semantic Segmentation in Videos. InProceedings of the European Conference on Computer Vision (ECCV), 2018
work page 2018
-
[90]
Jaeger, Carsten Tim L ¨uth, Lukas Klein, and Till J
Paul F. Jaeger, Carsten Tim L ¨uth, Lukas Klein, and Till J. Bungert. A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification. InThe Eleventh Interna- tional Conference on Learning Representations, 2023
work page 2023
-
[91]
L¨uth, Maximilian Zenk, Klaus Maier-Hein, and Paul F
Kim-Celine Kahl, Carsten T. L¨uth, Maximilian Zenk, Klaus Maier-Hein, and Paul F. Jaeger. ValUES: A Framework for Systematic Validation of Uncertainty Estimation in Semantic Segmentation. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[92]
Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. Panoptic Segmentation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
work page 2019
-
[93]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing
work page 2014
-
[94]
Revisiting the Calibration of Mod- ern Neural Networks
Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Ann Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, and Mario Lucic. Revisiting the Calibration of Mod- ern Neural Networks. InAdvances in Neural Information Processing Systems, 2021
work page 2021
-
[95]
Esti- mating uncertainty in instance segmentation using dropout sampling
Doug Morrison, Anton Milan, and Nontas Antonakos. Esti- mating uncertainty in instance segmentation using dropout sampling. InCVPR 2019 Robotic Vision Probabilistic Object Detection Challenge, 2019
work page 2019
-
[96]
Deep deterministic uncer- tainty: A simple baseline.arXiv preprint arXiv:2102.11582, 2021
Jishnu Mukhoti, Andreas Kirsch, Joost van Amersfoort, Philip HS Torr, and Yarin Gal. Deep deterministic uncer- tainty: A simple baseline.arXiv preprint arXiv:2102.11582, 2021
-
[97]
Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.