arxiv: 2604.08627 · v1 · submitted 2026-04-09 · 💻 cs.LG · cs.AI

Recognition: unknown

Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Post-hoc Uncertainty Estimation

Yongchan Chun , Chanhee Park , Jeongho Yoon , Jaehyung Seo , Heuiseok Lim

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:50 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords evidential deep learningpost-hoc uncertainty estimationpretrained modelsDirichlet distributionaffine transformationimage classificationlanguage model QAout-of-distribution detection

0 comments

The pith

A lightweight post-hoc module turns any pretrained model into an evidential model by learning a sample-dependent affine transform on its logits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pretrained models deliver predictions but rarely supply trustworthy uncertainty estimates. Full retraining with evidential methods is costly, and existing post-hoc fixes often fall short on out-of-distribution data. The Evidential Transformation Network inserts a small trainable layer after a frozen pretrained network. This layer applies an affine transformation to the logits that depends on the input sample and treats the resulting values as concentration parameters of a Dirichlet distribution. Experiments across image classification and language-model question answering show improved uncertainty calibration while accuracy stays intact and added compute remains negligible.

Core claim

The Evidential Transformation Network converts a pretrained classifier into an evidential model by learning a sample-dependent affine transformation of the logits and interpreting the transformed outputs directly as the parameters of a Dirichlet distribution, thereby enabling reliable uncertainty estimation for both in-distribution and out-of-distribution inputs without access to internal model states or full retraining.

What carries the argument

The sample-dependent affine transformation applied to logits, which produces the concentration parameters of a Dirichlet distribution for uncertainty quantification.

If this is right

Any pretrained image or language classifier can receive evidential uncertainty estimates without modifying its weights or architecture.
Accuracy on the original task is preserved because the base model remains frozen during ETN training.
Computational cost stays low because only a lightweight module is added at inference time.
The same procedure applies across vision and language benchmarks under both in-distribution and out-of-distribution conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The success of a simple affine map on logits implies that much of the information needed for evidential uncertainty can be recovered from output logits without internal activations.
Practitioners could retrofit existing deployed models with ETN to support safer rejection or deferral decisions in high-stakes settings.
The approach might extend to other output distributions beyond Dirichlet if analogous lightweight transformations prove effective.

Load-bearing premise

A learned sample-dependent affine transformation of the logits alone is sufficient to yield Dirichlet parameters that accurately quantify uncertainty for both in-distribution and out-of-distribution cases.

What would settle it

If ETN applied to a held-out pretrained model on a new out-of-distribution benchmark produces uncertainty scores whose correlation with actual errors is no better than temperature scaling or other logit-only baselines.

Figures

Figures reproduced from arXiv: 2604.08627 by Chanhee Park, Heuiseok Lim, Jaehyung Seo, Jeongho Yoon, Yongchan Chun.

**Figure 2.** Figure 2: Comparison of uncertainty estimation performance based on different dimension of transformation parameter [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of uncertainty estimation performance based on different transformation methods. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation studies on each AUPR scores on different parameters of prior distribution (left) and different number of MC samples [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of uncertainty estimation performance based on different transformation methods on CIFAR-10 and OBQA. [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of uncertainty estimation performance and accuracy across different dimensionalities of the transformation parameter [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Histograms of logit margins for models trained with EDL [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of uncertainty estimation performance and accuracy across different dimensionalities of the transformation parameter [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison of uncertainty estimation performance and accuracy across different dimensionalities of the transformation parameter [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

read the original abstract

Pretrained models have become standard in both vision and language, yet they typically do not provide reliable measures of confidence. Existing uncertainty estimation methods, such as deep ensembles and MC dropout, are often too computationally expensive to deploy in practice. Evidential Deep Learning (EDL) offers a more efficient alternative, but it requires models to be trained to output evidential quantities from the start, which is rarely true for pretrained networks. To enable EDL-style uncertainty estimation in pretrained models, we propose the Evidential Transformation Network (ETN), a lightweight post-hoc module that converts a pretrained predictor into an evidential model. ETN operates in logit space: it learns a sample-dependent affine transformation of the logits and interprets the transformed outputs as parameters of a Dirichlet distribution for uncertainty estimation. We evaluate ETN on image classification and large language model question-answering benchmarks under both in-distribution and out-of-distribution settings. ETN consistently improves uncertainty estimation over post-hoc baselines while preserving accuracy and adding only minimal computational overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ETN gives a lightweight post-hoc way to add Dirichlet uncertainty to pretrained models by learning sample-dependent affine transforms on logits, but the gains for OOD detection rest on whether that transform can extract reliable evidence when logits overlap.

read the letter

The central thing here is a simple module that sits on top of any frozen pretrained network, applies a learned per-sample affine shift and scale to its logits, and treats the result as Dirichlet concentration parameters. This lets you get evidential uncertainty estimates without retraining the base model or touching its internals, which is the practical hook for vision and language deployments where full EDL retraining is off the table. They evaluate on standard image classification and LLM QA benchmarks, both ID and OOD, and claim better uncertainty metrics than other post-hoc baselines with negligible extra cost and no accuracy drop. That positioning against ensembles, MC dropout, and prior EDL work is clear and the overhead claim is believable for a small affine head. The approach is new in its exact combination of post-hoc logit-space affine parameterization for Dirichlet output. It does the obvious thing well: it keeps the pretrained weights untouched and adds almost no inference cost. The evaluations appear to cover the usual ID/OOD splits, which is the right setting. The soft spot is exactly the one the stress-test flags. An affine transform in logit space has no access to earlier-layer features or training dynamics, so it can only work if the logits themselves already carry enough signal to distinguish uncertainty levels. When OOD inputs produce logit distributions that overlap ID ones, which is common, the learned transform has little to go on yet the paper treats the resulting Dirichlet as trustworthy. If the full results include ablations on logit overlap or direct checks that the concentration parameters behave sensibly in those regimes, that would shore it up; otherwise the central claim is under-supported. This paper is for people who need to bolt uncertainty onto existing deployed models without retraining budgets. A practitioner looking for a drop-in module would get value if the numbers hold, while someone focused on theoretical calibration might want more analysis of the logit-to-evidence mapping. It is coherent enough on its own terms to deserve a serious referee, even with the open question on the transform's limits. I would send it to review and ask for targeted checks on the overlapping-logit case.

Referee Report

2 major / 2 minor

Summary. The paper introduces the Evidential Transformation Network (ETN), a lightweight post-hoc module that applies a learned sample-dependent affine transformation to the logits of a pretrained model and interprets the transformed values as concentration parameters of a Dirichlet distribution. This enables evidential-style uncertainty estimation for both in-distribution and out-of-distribution inputs on image classification and LLM question-answering benchmarks without retraining the base model or accessing internal activations. The central claim is that ETN improves uncertainty metrics over post-hoc baselines while preserving accuracy and incurring only minimal overhead.

Significance. If the empirical gains hold under rigorous validation, ETN would provide a practical, low-cost route to reliable uncertainty quantification for deployed pretrained models in vision and language, filling a gap between expensive methods like ensembles and the limitations of standard post-hoc calibration. The post-hoc, logit-only design is a strength for compatibility with existing networks.

major comments (2)

[Method (ETN definition and Dirichlet interpretation)] The load-bearing assumption that a per-sample affine transform in logit space alone can produce trustworthy Dirichlet parameters for OOD uncertainty (without internal states or retraining) is not adequately supported. When OOD and ID logit distributions overlap—a common regime—the transform has no additional signal to differentiate evidence, yet the paper treats the resulting Dirichlet as reliable for both regimes. An ablation or analysis demonstrating robustness in overlapping-logit cases is required.
[Abstract and Evaluation] No quantitative results, specific metrics (e.g., AUROC, ECE), training details for the ETN parameters, loss functions, or error bars appear in the abstract or high-level description, making it impossible to verify whether the data support the claim of consistent improvement. The full evaluation section must supply these with statistical significance tests.

minor comments (2)

[Method] Clarify the exact parameterization of the sample-dependent affine transform (e.g., whether scale and shift are class-specific or shared, and how they are optimized).
[Experiments] Add a direct comparison table against recent logit-based post-hoc methods (e.g., temperature scaling variants or Dirichlet calibration) to strengthen the baseline claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. We address each major comment below and describe the revisions we will implement to improve the manuscript.

read point-by-point responses

Referee: [Method (ETN definition and Dirichlet interpretation)] The load-bearing assumption that a per-sample affine transform in logit space alone can produce trustworthy Dirichlet parameters for OOD uncertainty (without internal states or retraining) is not adequately supported. When OOD and ID logit distributions overlap—a common regime—the transform has no additional signal to differentiate evidence, yet the paper treats the resulting Dirichlet as reliable for both regimes. An ablation or analysis demonstrating robustness in overlapping-logit cases is required.

Authors: We appreciate the referee's emphasis on this core assumption. The ETN predicts sample-specific affine parameters via a lightweight network operating on the input (consistent with its post-hoc but input-aware design), which in principle allows it to modulate evidence assignment even when raw logit vectors exhibit overlap. Our empirical results on standard ID/OOD benchmarks demonstrate improved uncertainty metrics, indicating that sufficient differentiating signal is captured in practice. Nevertheless, we agree that dedicated analysis of the overlapping-logit regime is needed. In the revision we will add an ablation that (i) quantifies logit overlap between ID and OOD samples, (ii) visualizes the corresponding Dirichlet parameters and uncertainty estimates, and (iii) reports performance relative to baselines under high-overlap conditions. This analysis will be placed in the experimental section. revision: yes
Referee: [Abstract and Evaluation] No quantitative results, specific metrics (e.g., AUROC, ECE), training details for the ETN parameters, loss functions, or error bars appear in the abstract or high-level description, making it impossible to verify whether the data support the claim of consistent improvement. The full evaluation section must supply these with statistical significance tests.

Authors: We agree that greater quantitative transparency is warranted. While abstracts are conventionally concise, we will revise the abstract to explicitly state the key improvements (e.g., AUROC gains for OOD detection and ECE reductions). In the evaluation section we will add: (i) explicit training details for ETN (optimizer, learning rate, number of epochs, and the evidential loss formulation), (ii) error bars computed over multiple random seeds, and (iii) statistical significance tests (paired t-tests or Wilcoxon signed-rank tests) comparing ETN against each baseline. These additions will directly support the claim of consistent improvement. revision: yes

Circularity Check

0 steps flagged

No circularity: ETN is an independent post-hoc module trained and evaluated on external benchmarks

full rationale

The paper proposes ETN as a lightweight, separately trained module that applies a learned sample-dependent affine transform to the logits of a frozen pretrained model and treats the outputs as Dirichlet concentration parameters. This construction is defined explicitly as an add-on component with its own parameters optimized on held-out data; no equation reduces the claimed uncertainty estimates to the pretrained model's outputs by definition, and no central premise is justified solely by self-citation. All reported improvements are measured against standard external ID/OOD benchmarks rather than internal consistency checks, so the derivation chain remains self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the effectiveness of a learned affine transformation in logit space and the interpretation of its outputs as Dirichlet parameters; these elements are introduced by the paper rather than derived from prior results.

free parameters (1)

parameters of the sample-dependent affine transformation
These are learned during post-hoc training of the ETN module on the target data.

axioms (1)

domain assumption Transformed logits can be directly interpreted as parameters of a Dirichlet distribution for uncertainty estimation
This interpretation is invoked as the core of the ETN method in the abstract.

invented entities (1)

Evidential Transformation Network (ETN) no independent evidence
purpose: Lightweight post-hoc conversion of pretrained predictors into evidential models
New module proposed in the paper with no independent external evidence provided in the abstract.

pith-pipeline@v0.9.0 · 5488 in / 1439 out tokens · 58509 ms · 2026-05-10T16:50:22.344662+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Rethinking Vacuity for OOD Detection in Evidential Deep Learning
cs.AI 2026-05 accept novelty 7.0

Vacuity-based OOD detection in evidential deep learning is highly sensitive to class cardinality differences between ID and OOD, which can artificially inflate AUROC and AUPR without any change in model predictions.

Reference graph

Works this paper leans on

60 extracted references · 6 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Explaining neural scaling laws.Proceedings of the National Academy of Sciences, 121(27), 2024

Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma. Explaining neural scaling laws.Proceedings of the National Academy of Sciences, 121(27), 2024. 7

2024
[2]

Beyond next token probabilities: Learnable, fast detection of hallucinations and data contamination on llm output distribu- tions, 2025

Guy Bar-Shalom, Fabrizio Frasca, Derek Lim, Yoav Gelberg, Yftah Ziser, Ran El-Yaniv, Gal Chechik, and Haggai Maron. Beyond next token probabilities: Learnable, fast detection of hallucinations and data contamination on llm output distribu- tions, 2025. 1

2025
[3]

On second-order scoring rules for epistemic uncertainty quantifi- cation, 2023

Viktor Bengs, Eyke H¨ullermeier, and Willem Waegeman. On second-order scoring rules for epistemic uncertainty quantifi- cation, 2023. 1

2023
[4]

Posterior network: Uncertainty estimation without ood samples via density-based pseudo-counts.Advances in neural information processing systems, 33:1356–1367, 2020

Bertrand Charpentier, Daniel Z ¨ugner, and Stephan G ¨unne- mann. Posterior network: Uncertainty estimation without ood samples via density-based pseudo-counts.Advances in neural information processing systems, 33:1356–1367, 2020. 3

2020
[5]

R-edl: Relaxing nonessential settings of evidential deep learning

Mengyuan Chen, Junyu Gao, and Changsheng Xu. R-edl: Relaxing nonessential settings of evidential deep learning. In The Twelfth International Conference on Learning Represen- tations, 2024. 2, 3, 4, 6

2024
[6]

A variational dirichlet framework for out-of-distribution detec- tion.arXiv preprint arXiv:1811.07308, 2018

Wenhu Chen, Yilin Shen, Hongxia Jin, and William Wang. A variational dirichlet framework for out-of-distribution detec- tion.arXiv preprint arXiv:1811.07308, 2018. 3

work page arXiv 2018
[7]

Laplace redux-effortless bayesian deep learning.Advances in neural information processing systems, 34:20089–20103, 2021

Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. Laplace redux-effortless bayesian deep learning.Advances in neural information processing systems, 34:20089–20103, 2021. 1, 2, 6, 4

2021
[8]

Uncertainty estimation by fisher information-based evidential deep learning, 2023

Danruo Deng, Guangyong Chen, Yang Yu, Furui Liu, and Pheng-Ann Heng. Uncertainty estimation by fisher information-based evidential deep learning, 2023. 3, 5, 6

2023
[9]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 6

2009
[10]

Evans and J.S

M.J. Evans and J.S. Rosenthal.Probability and Statistics: The Science of Uncertainty. W. H. Freeman, 2004. 2

2004
[11]

Dropout as a bayesian approximation: Representing model uncertainty in deep learn- ing

Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learn- ing. InProceedings of The 33rd International Conference on Machine Learning, pages 1050–1059, New York, New York, USA, 2016. PMLR. 1, 6

2016
[12]

A survey of uncertainty in deep neural networks.Artifi- cial Intelligence Review, 56(Suppl 1):1513–1589, 2023

Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, Rudolph Triebel, Peter Jung, Ribana Roscher, et al. A survey of uncertainty in deep neural networks.Artifi- cial Intelligence Review, 56(Suppl 1):1513–1589, 2023. 1, 2, 5

2023
[13]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Ab- hinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

On calibration of modern neural networks

Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. InInternational conference on machine learning, pages 1321–1330. PMLR,
[15]

Deep residual learning for image recognition, 2015

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015. 4

2015
[16]

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

Dan Hendrycks and Kevin Gimpel. A baseline for detect- ing misclassified and out-of-distribution examples in neural networks.arXiv preprint arXiv:1610.02136, 2016. 3

work page internal anchor Pith review arXiv 2016
[17]

The many faces of robust- ness: A critical analysis of out-of-distribution generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kada- vath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robust- ness: A critical analysis of out-of-distribution generalization. InProceedings of the IEEE/CVF international conference on computer vision, pages 8340–8349, 2021. 6

2021
[18]

Measuring massive multitask language understanding.Proceedings of the International Conference on Learning Representations (ICLR), 2021

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Man- tas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding.Proceedings of the International Conference on Learning Representations (ICLR), 2021. 6

2021
[19]

Natural adversarial examples

Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15262–15271, 2021. 6

2021
[20]

Mostofa Ali Patwary, Yang Yang, and Yanqi Zhou

Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, and Yanqi Zhou. Deep learning scaling is predictable, empirically, 2017. 7

2017
[21]

Logits are all we need to adapt closed models, 2025

Gaurush Hiranandani, Haolun Wu, Subhojyoti Mukherjee, and Sanmi Koyejo. Logits are all we need to adapt closed models, 2025. 1

2025
[22]

Being bayesian about categorical probability

Taejong Joo, Uijung Chung, and Min-Gwan Seo. Being bayesian about categorical probability. InInternational con- ference on machine learning, pages 4950–4961. PMLR, 2020. 3

2020
[23]

Springer Publishing Company, Incorpo- rated, 1st edition, 2016

Audun Jøsang.Subjective Logic: A Formalism for Reasoning Under Uncertainty. Springer Publishing Company, Incorpo- rated, 1st edition, 2016. 2

2016
[24]

Sample-dependent adaptive temperature scaling for improved calibration

Tom Joy, Francesco Pinto, Ser-Nam Lim, Philip HS Torr, and Puneet K Dokania. Sample-dependent adaptive temperature scaling for improved calibration. InProceedings of the AAAI Conference on Artificial Intelligence, pages 14919–14926,
[25]

Is epistemic uncertainty faithfully represented by evidential deep learning methods? InInterna- tional Conference on Machine Learning, pages 22624–22642

Mira Juergens, Nis Meinert, Viktor Bengs, Eyke H¨ullermeier, and Willem Waegeman. Is epistemic uncertainty faithfully represented by evidential deep learning methods? InInterna- tional Conference on Machine Learning, pages 22624–22642. PMLR, 2024. 1, 3

2024
[26]

Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Rad- ford, Jeffrey Wu, and Dario Amodei

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Rad- ford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models, 2020. 1

2020
[27]

Learning multiple layers of features from tiny images.University of Toronto, 2012

Alex Krizhevsky. Learning multiple layers of features from tiny images.University of Toronto, 2012. 6

2012
[28]

Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with dirichlet calibration.Advances in neural information processing systems, 32, 2019

Meelis Kull, Miquel Perello Nieto, Markus K¨angsepp, Telmo Silva Filho, Hao Song, and Peter Flach. Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with dirichlet calibration.Advances in neural information processing systems, 32, 2019. 3

2019
[29]

RACE: Large-scale ReAding comprehension dataset from examinations

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. RACE: Large-scale ReAding comprehension dataset from examinations. InProceedings of the 2017 Confer- ence on Empirical Methods in Natural Language Processing, pages 785–794, Copenhagen, Denmark, 2017. Association for Computational Linguistics. 6

2017
[30]

Simple and scalable predictive uncertainty estima- tion using deep ensembles

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estima- tion using deep ensembles. InAdvances in Neural Information Processing Systems. Curran Associates, Inc., 2017. 1, 6

2017
[31]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks.Advances in neural informa- tion processing systems, 31, 2018

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks.Advances in neural informa- tion processing systems, 31, 2018. 4

2018
[32]

Calibrating LLMs with Information-Theoretic Evidential Deep Learning, February 2025

Yawei Li, David R¨ugamer, Bernd Bischl, and Mina Rezaei. Calibrating llms with information-theoretic evidential deep learning.arXiv preprint arXiv:2502.06351, 2025. 2, 3, 6, 4

work page arXiv 2025
[33]

Enhancing the reliabil- ity of out-of-distribution image detection in neural networks

Shiyu Liang, Yixuan Li, and R Srikant. Enhancing the reliabil- ity of out-of-distribution image detection in neural networks. InInternational Conference on Learning Representations,
[34]

Simple and principled uncertainty estimation with deterministic deep learning via distance awareness.Advances in neural informa- tion processing systems, 33:7498–7512, 2020

Jeremiah Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax Weiss, and Balaji Lakshminarayanan. Simple and principled uncertainty estimation with deterministic deep learning via distance awareness.Advances in neural informa- tion processing systems, 33:7498–7512, 2020. 5

2020
[35]

Large-Margin Softmax Loss for Convolutional Neural Networks

Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. Large-margin softmax loss for convolutional neural networks. arXiv preprint arXiv:1612.02295, 2016. 5

work page Pith review arXiv 2016
[36]

Predictive uncertainty esti- mation via prior networks.Advances in neural information processing systems, 31, 2018

Andrey Malinin and Mark Gales. Predictive uncertainty esti- mation via prior networks.Advances in neural information processing systems, 31, 2018. 2, 3

2018
[37]

Reverse kl-divergence train- ing of prior networks: Improved uncertainty and adversarial robustness

Andrey Malinin and Mark Gales. Reverse kl-divergence train- ing of prior networks: Improved uncertainty and adversarial robustness. InAdvances in Neural Information Processing Systems. Curran Associates, Inc., 2019. 3, 4

2019
[38]

Can a suit of armor conduct electricity? a new dataset for open book question answering

Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sab- harwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. InEMNLP, 2018. 6

2018
[39]

Revisiting the calibration of modern neural networks

Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, and Mario Lucic. Revisiting the calibration of modern neural networks. Advances in neural information processing systems, 34:15682– 15694, 2021. 3

2021
[40]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. InNIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011,

2011
[41]

Predicting good probabilities with supervised learning

Alexandru Niculescu-Mizil and Rich Caruana. Predicting good probabilities with supervised learning. InProceedings of the 22nd international conference on Machine learning, pages 625–632, 2005. 2, 4

2005
[42]

Learn to accumulate evidence from all training samples: theory and practice

Deep Pandey and Qi Yu. Learn to accumulate evidence from all training samples: theory and practice. InProceedings of the 40th International Conference on Machine Learning, pages 26963–26989, 2023. 5

2023
[43]

Probabilistic outputs for support vector ma- chines and comparisons to regularized likelihood methods

John Platt et al. Probabilistic outputs for support vector ma- chines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3):61–74, 1999. 2, 4

1999
[44]

Eviden- tial deep learning to quantify classification uncertainty, 2018

Murat Sensoy, Lance Kaplan, and Melih Kandemir. Eviden- tial deep learning to quantify classification uncertainty, 2018. 1, 2, 3

2018
[45]

Post-hoc uncertainty learning using a dirichlet meta-model, 2022

Maohao Shen, Yuheng Bu, Prasanna Sattigeri, Soumya Ghosh, Subhro Das, and Gregory Wornell. Post-hoc uncertainty learning using a dirichlet meta-model, 2022. 2, 6, 4

2022
[46]

Thermome- ter: towards universal calibration for large language models

Maohao Shen, Subhro Das, Kristjan Greenewald, Prasanna Sattigeri, Gregory Wornell, and Soumya Ghosh. Thermome- ter: towards universal calibration for large language models. InProceedings of the 41st International Conference on Ma- chine Learning. JMLR.org, 2024. 4

2024
[47]

Are uncertainty quantification capabilities of evidential deep learn- ing a mirage?Advances in Neural Information Processing Systems, 37:107830–107864, 2024

Maohao Shen, Jongha Jon Ryu, Soumya Ghosh, Yuheng Bu, Prasanna Sattigeri, Subhro Das, and Gregory Wornell. Are uncertainty quantification capabilities of evidential deep learn- ing a mirage?Advances in Neural Information Processing Systems, 37:107830–107864, 2024. 3, 6, 1, 4

2024
[48]

Very deep convolutional net- works for large-scale image recognition

K Simonyan and A Zisserman. Very deep convolutional net- works for large-scale image recognition. In3rd International Conference on Learning Representations (ICLR 2015). Com- putational and Biological Learning Society, 2015. 3

2015
[49]

Least squares sup- port vector machine classifiers.Neural processing letters, 9 (3):293–300, 1999

Johan AK Suykens and Joos Vandewalle. Least squares sup- port vector machine classifiers.Neural processing letters, 9 (3):293–300, 1999. 5

1999
[50]

Uncertainty estimation using a single deep determinis- tic neural network

Joost Van Amersfoort, Lewis Smith, Yee Whye Teh, and Yarin Gal. Uncertainty estimation using a single deep determinis- tic neural network. InInternational conference on machine learning, pages 9690–9700. PMLR, 2020. 5

2020
[51]

Learning robust global representations by penalizing local pre- dictive power

Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. Learning robust global representations by penalizing local pre- dictive power. InAdvances in Neural Information Processing Systems, pages 10506–10518, 2019. 6

2019
[52]

The generalised product moment distribution in samples from a normal multivariate population.Biometrika, 20(1/2):32–52, 1928

John Wishart. The generalised product moment distribution in samples from a normal multivariate population.Biometrika, 20(1/2):32–52, 1928. 3

1928
[53]

Bayesian low-rank adaptation for large language models

Adam X Yang, Maxime Robeyns, Xi Wang, and Laurence Aitchison. Bayesian low-rank adaptation for large language models. InThe Twelfth International Conference on Learning Representations. 6, 3, 4
[54]

Yoon and H

Taeseong Yoon and Heeyoung Kim. Uncertainty estimation by density aware evidential deep learning.arXiv preprint arXiv:2409.08754, 2024. 2, 4, 5, 6 Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Post-hoc Uncertainty Estimation Supplementary Material

work page arXiv 2024
[55]

First, the benefits of ETN are largely empirical rather than theoretical

Limitations While ETN improves the uncertainty estimation performance of pretrained models without harming accuracy and with only minimal additional computational cost, it also has sev- eral limitations. First, the benefits of ETN are largely empirical rather than theoretical. Recent works have raised concerns about EDL from a theoretical standpoint, argu...
[56]

Proofs and Derivations In this section, we analyze the behavior of logits produced by models trained with cross-entropy and EDL losses. We first define the softmax per-sample(x, y) cross-entropy loss as: LCE(z, y) =−log ezy PC j=1 ezj = log 1 + X j̸=y e zj −zy Then we define the inter-class margin of an sample as: γ(z, y) =z y −max j̸=y zj Given these def...
[57]

Specifically, we explain (1) how the variational distribution over A is constructed, and (2) how the prior termb is handled

Modeling Transformation Parameteriza- tions In this section, we describe how the transformation parameter A is modeled when defined as a scalar, vector, or matrix. Specifically, we explain (1) how the variational distribution over A is constructed, and (2) how the prior termb is handled. For clarity, we denote the scalar case by a, the vector case bya, an...
[58]

Training Details The hyperparameters used for training ETN are summa- rized in Table 3

Experimental Setting 11.1. Training Details The hyperparameters used for training ETN are summa- rized in Table 3. For LLM experiments, we employ cosine learning-rate scheduling with warm-up steps. All experi- ments are performed using three different random seeds, and we report the mean along with 95% confidence intervals. For post-hoc uncertainty estima...

2012
[59]

for all experimental settings, and train the additional parameters using the reverse KL formulation ofL EDL
[60]

OOD-Detection Baselines In this section, we compare ETN against ODIN [33] and the Mahalanobis distance method (MD) [31]

Additional Experiments 12.1. OOD-Detection Baselines In this section, we compare ETN against ODIN [33] and the Mahalanobis distance method (MD) [31]. Although neither ODIN nor MD are strictly uncertainty estimation methods, we include them as they both work in post-hoc manner, and Method CIFAR10→CIFAR10-OOD ImageNet→ImageNet-OOD OBQA→MMLU RACE→MMLU MD45.4...