pith. machine review for the scientific record. sign in

arxiv: 2604.06032 · v1 · submitted 2026-04-07 · 📊 stat.ML · cs.LG

Recognition: no theorem link

Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification

Courtney Franzen, Farhad Pourkamali-Anaraki

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:35 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords Dirichlet distributionensemble methodspredictive uncertaintyselective classificationsoftmax outputsmethod of momentsevidential learning
0
0 comments X

The pith

Fitting Dirichlet distributions via method of moments to ensembles of softmax outputs produces stable predictive uncertainty estimates that improve selective classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that ensembles of softmax vectors from separately trained cross-entropy networks can be turned into explicit Dirichlet distributions using a method of moments estimator, with an optional maximum-likelihood step. This construction avoids the design sensitivities of evidential deep learning while reducing run-to-run variability in uncertainty values. A sympathetic reader cares because reliable uncertainty lets models abstain on hard cases in selective classification and produce trustworthy scores, which single-run softmax outputs often fail to do.

Core claim

By treating an ensemble of softmax probability vectors as samples from an underlying Dirichlet distribution and estimating its concentration parameters with the method of moments, the resulting predictive distribution yields uncertainty estimates whose stability and calibration outperform those obtained from evidential training, producing measurable gains in downstream tasks such as confidence-based ranking and selective classification on standard image and text datasets.

What carries the argument

The method-of-moments estimator that matches the first two moments of an ensemble of softmax vectors to the mean and variance parameters of a Dirichlet distribution, thereby converting an implicit ensemble into an explicit parametric predictive distribution.

If this is right

  • Uncertainty estimates become less sensitive to random seeds and training details because the ensemble averages out single-run fluctuations.
  • Selective classification can safely reject a larger fraction of inputs while keeping error rate low, since the Dirichlet variance better flags low-confidence cases.
  • Uncertainty estimation is decoupled from the choice of evidential loss, prior strength, or activation function, removing a source of hyperparameter fragility.
  • The same ensemble outputs already produced during training can be reused for Dirichlet fitting without requiring a second training stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same moment-matching idea could be tested on ensembles from non-neural models whose outputs are probability vectors.
  • Combining the Dirichlet fit with cheaper ensemble approximations such as Monte-Carlo dropout would test whether full independent retraining is necessary.
  • If the Dirichlet fit quality correlates with downstream task gains, one could monitor the moment-matching residual as a cheap diagnostic for when more ensemble members are needed.

Load-bearing premise

That the spread of softmax outputs across independent training runs is sufficiently well described by a single Dirichlet distribution for the moment-matching estimator to recover useful uncertainty values.

What would settle it

If, on a held-out dataset, the area under the selective-classification risk curve or the ranking quality of uncertainty scores shows no improvement over standard softmax baselines or evidential models, the performance advantage would be refuted.

read the original abstract

Neural network classifiers trained with cross-entropy loss achieve strong predictive accuracy but lack the capability to provide inherent predictive uncertainty estimates, thus requiring external techniques to obtain these estimates. In addition, softmax scores for the true class can vary substantially across independent training runs, which limits the reliability of uncertainty-based decisions in downstream tasks. Evidential Deep Learning aims to address these limitations by producing uncertainty estimates in a single pass, but evidential training is highly sensitive to design choices including loss formulation, prior regularization, and activation functions. Therefore, this work introduces an alternative Dirichlet parameter estimation strategy by applying a method of moments estimator to ensembles of softmax outputs, with an optional maximum-likelihood refinement step. This ensemble-based construction decouples uncertainty estimation from the fragile evidential loss design while also mitigating the variability of single-run cross-entropy training, producing explicit Dirichlet predictive distributions. Across multiple datasets, we show that the improved stability and predictive uncertainty behavior of these ensemble-derived Dirichlet estimates translate into stronger performance in downstream uncertainty-guided applications such as prediction confidence scoring and selective classification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that applying a method of moments estimator (with optional MLE refinement) to ensembles of softmax outputs from cross-entropy trained networks produces stable Dirichlet concentration parameters, yielding explicit predictive distributions that improve stability and performance over evidential deep learning in uncertainty-guided tasks such as confidence scoring and selective classification.

Significance. If the empirical gains hold and are attributable to the Dirichlet construction rather than ensembling alone, the approach provides a practical, less fragile alternative to evidential training by leveraging standard cross-entropy ensembles for reliable predictive uncertainty on the simplex.

major comments (2)
  1. [Section 3.2] The load-bearing assumption that ensemble softmax outputs are sufficiently well-described by a Dirichlet distribution for the method-of-moments estimator to yield reliable epistemic uncertainty is not supported by diagnostics; no goodness-of-fit tests, QQ plots, or comparisons of empirical moments versus fitted Dirichlet are reported in the experimental section.
  2. [Table 4] Table 4 (selective classification results): performance is compared only to evidential baselines and single-model cross-entropy, but lacks an ablation against plain ensemble averaging of softmax probabilities without the Dirichlet fitting step; this leaves open whether gains are due to the proposed modeling or to ensembling per se.
minor comments (2)
  1. [Eq. (7)] Notation for the concentration vector alpha is introduced without explicitly distinguishing the raw method-of-moments estimate from the optionally refined MLE version in the predictive distribution formula.
  2. [Figure 2] Figure 2 caption does not specify the number of ensemble members used or the random seed variation across runs, making reproducibility of the stability claims difficult.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below with clarifications and commit to revisions that strengthen the empirical support for our claims without altering the core contribution.

read point-by-point responses
  1. Referee: [Section 3.2] The load-bearing assumption that ensemble softmax outputs are sufficiently well-described by a Dirichlet distribution for the method-of-moments estimator to yield reliable epistemic uncertainty is not supported by diagnostics; no goodness-of-fit tests, QQ plots, or comparisons of empirical moments versus fitted Dirichlet are reported in the experimental section.

    Authors: We agree that explicit validation of the Dirichlet approximation strengthens the presentation. The method-of-moments estimator is chosen precisely because it is a standard, closed-form procedure for fitting Dirichlet distributions to simplex-valued data, and the resulting predictive distributions demonstrably improve stability over single-run cross-entropy and evidential baselines. In the revised manuscript we will add QQ plots of the marginals and direct comparisons of empirical versus fitted first- and second-order moments on the evaluation sets to quantify the quality of the approximation. revision: yes

  2. Referee: [Table 4] Table 4 (selective classification results): performance is compared only to evidential baselines and single-model cross-entropy, but lacks an ablation against plain ensemble averaging of softmax probabilities without the Dirichlet fitting step; this leaves open whether gains are due to the proposed modeling or to ensembling per se.

    Authors: This is a fair criticism. While the Dirichlet construction supplies an explicit predictive distribution (rather than a point estimate) that enables the uncertainty-guided decisions in selective classification, an ablation against the plain ensemble mean is necessary to isolate the contribution of the fitting step. We will add this comparison to the revised Table 4, reporting selective-classification curves for the ensemble-averaged softmax probabilities alongside our Dirichlet-based results. revision: yes

Circularity Check

0 steps flagged

No circularity: method-of-moments Dirichlet fitting on ensembles is a direct statistical estimator

full rationale

The paper's core construction applies a standard method-of-moments estimator (with optional MLE) directly to the empirical distribution of softmax outputs from an ensemble of independently trained cross-entropy networks. This produces explicit Dirichlet concentration parameters without any self-definitional loop, without renaming a fitted quantity as a 'prediction,' and without load-bearing self-citations or uniqueness theorems. Downstream gains in confidence scoring and selective classification are presented as empirical outcomes on held-out data rather than algebraic consequences of the fitting procedure itself. The derivation chain therefore remains self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, invented entities, or non-standard axioms are stated in the provided text. The approach implicitly assumes standard properties of softmax outputs and Dirichlet distributions in classification.

axioms (1)
  • domain assumption Softmax probability vectors from independently trained neural networks can be aggregated to parameterize a Dirichlet distribution that represents predictive uncertainty.
    This modeling choice is required for the method of moments estimator to produce the claimed Dirichlet predictive distributions.

pith-pipeline@v0.9.0 · 5477 in / 1272 out tokens · 27349 ms · 2026-05-10T18:35:22.314128+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 52 canonical work pages · 2 internal anchors

  1. [1]

    On calibration of modern neural networks,

    Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On Calibration of Modern Neural Networks (2017). https://arxiv.org/abs/1706.04599

  2. [2]

    https://arxiv.org/abs/2012

    Tomani, C., Buettner, F.: Towards Trustworthy Predictions from Deep Neural Networks with Fast Adversarial Calibration (2021). https://arxiv.org/abs/2012. 10923

  3. [3]

    https://arxiv.org/abs/1701.05369

    Molchanov, D., Ashukha, A., Vetrov, D.: Variational Dropout Sparsifies Deep Neural Networks (2017). https://arxiv.org/abs/1701.05369

  4. [4]

    https://arxiv.org/abs/1506.02557

    Kingma, D.P., Salimans, T., Welling, M.: Variational Dropout and the Local Reparameterization Trick (2015). https://arxiv.org/abs/1506.02557

  5. [5]

    Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

    Gal, Y., Ghahramani, Z.: Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (2016). https://arxiv.org/abs/1506.02142

  6. [6]

    Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

    Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and Scalable Predic- tive Uncertainty Estimation using Deep Ensembles (2017). https://arxiv.org/abs/ 1612.01474

  7. [7]

    https://arxiv.org/ abs/2002.06470

    Ashukha, A., Lyzhov, A., Molchanov, D., Vetrov, D.: Pitfalls of In-Domain Uncer- tainty Estimation and Ensembling in Deep Learning (2021). https://arxiv.org/ abs/2002.06470

  8. [8]

    Manning Publications Co

    Monarch, R.: Human-in-the-loop Machine Learning: Active Learning and Anno- tation for Human-centered AI. Manning Publications Co. LLC, New York (2021)

  9. [9]

    https://arxiv.org/abs/1806.01768

    Sensoy, M., Kaplan, L., Kandemir, M.: Evidential Deep Learning to Quantify Classification Uncertainty (2018). https://arxiv.org/abs/1806.01768

  10. [10]

    Prior and posterior networks: A survey on evidential deep learning methods for uncertainty estima- tion,

    Ulmer, D.T., Hardmeier, C., Frellsen, J.: Prior and posterior networks: A survey on evidential deep learning methods for uncertainty estimation. Transactions on Machine Learning Research (2023). arXiv preprint arXiv:2110.03051

  11. [11]

    Springer, Cham (2016)

    Jøsang, A., service), S.O.: Subjective Logic: A Formalism for Reasoning Under Uncertainty, 1st edn. Springer, Cham (2016)

  12. [12]

    URLhttps://openaccess.thecvf.com/content/WACV2021/html/ Mathew_DocVQA_A_Dataset_for_VQA_on_Document_Images_WACV_2021_paper.html

    Sensoy, M., Saleki, M., Julier, S., Aydogan, R., Reid, J.: Misclassification risk and uncertainty quantification in deep classifiers. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2483–2491 (2021). https://doi. org/10.1109/WACV48630.2021.00253

  13. [13]

    Mira Juergens, Nis Meinert, Viktor Bengs, Eyke Hüllermeier, and Willem Waegeman

    J¨ urgens, M., Meinert, N., Bengs, V., H¨ ullermeier, E., Waegeman, W.: Is Epis- temic Uncertainty Faithfully Represented by Evidential Deep Learning Methods? (2024). https://arxiv.org/abs/2402.09056 43

  14. [14]

    https:// arxiv.org/abs/2310.12663

    Davies, C., Vilamala, M.R., Preece, A.D., Cerutti, F., Kaplan, L.M., Chakraborty, S.: Knowledge from Uncertainty in Evidential Deep Learning (2023). https:// arxiv.org/abs/2310.12663

  15. [15]

    https://arxiv.org/abs/2402.06160

    Shen, M., Ryu, J.J., Ghosh, S., Bu, Y., Sattigeri, P., Das, S., Wornell, G.W.: Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage? (2024). https://arxiv.org/abs/2402.06160

  16. [16]

    https://arxiv.org/abs/1802.10501

    Malinin, A., Gales, M.: Predictive Uncertainty Estimation via Prior Networks (2018). https://arxiv.org/abs/1802.10501

  17. [17]

    https://arxiv.org/abs/ 2512.23753

    Pandey, D.S., Choi, H., Yu, Q.: Generalized Regularized Evidential Deep Learning Models: Theory and Comprehensive Evaluation (2025). https://arxiv.org/abs/ 2512.23753

  18. [18]

    https://arxiv.org/abs/2010.05784

    Wang, H., Yu, Z., Yue, Y., Anandkumar, A., Liu, A., Yan, J.: Learning Calibrated Uncertainties for Domain Shift: A Distributionally Robust Learning Approach (2024). https://arxiv.org/abs/2010.05784

  19. [19]

    Ovadia, E

    Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J.V., Lakshminarayanan, B., Snoek, J.: Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift (2019). https://arxiv.org/ abs/1906.02530

  20. [20]

    Yager, R., Liu, L.: Classic Works of the Dempster-Shafer Theory of Belief Func- tions vol. 219. Springer, ??? (2008). https://doi.org/10.1007/978-3-540-44792-4

  21. [21]

    https://arxiv.org/abs/2412.03391

    Sensoy, M., Kaplan, L.M., Julier, S., Saleki, M., Cerutti, F.: Risk-aware Classifi- cation via Uncertainty Quantification (2024). https://arxiv.org/abs/2412.03391

  22. [22]

    out-of-the-box

    Franzen, C., Pourkamali-Anaraki, F.: “out-of-the-box” uncertainty: Reducing confident errors with dirichlet classifiers. In: 2025 International Conference on Machine Learning and Applications (ICMLA). IEEE, ??? (2025). https://doi.org/ 10.1109/ICMLA66185.2025.00014

  23. [23]

    https://arxiv

    Fu, H., Li, C., Liu, X., Gao, J., Celikyilmaz, A., Carin, L.: Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing (2019). https://arxiv. org/abs/1903.10145

  24. [24]

    https://arxiv.org/abs/1602.02282

    Sønderby, C.K., Raiko, T., Maaløe, L., Sønderby, S.K., Winther, O.: Ladder Variational Autoencoders (2016). https://arxiv.org/abs/1602.02282

  25. [25]

    https://arxiv

    He, J., Spokoyny, D., Neubig, G., Berg-Kirkpatrick, T.: Lagging Inference Net- works and Posterior Collapse in Variational Autoencoders (2019). https://arxiv. org/abs/1901.05534

  26. [26]

    Neural Computing and Applications36(29), 18297–18311 (2024)

    Pourkamali-Anaraki, F., Nasrin, T., Jensen, R.E., Peterson, A.M., Hansen, C.J.: 44 Adaptive activation functions for predictive modeling with sparse experimental data. Neural Computing and Applications36(29), 18297–18311 (2024)

  27. [27]

    https://arxiv.org/abs/2207

    Gilany, M., Wilson, P., Jamzad, A., Fooladgar, F., To, M.N.N., Wodlinger, B., Abolmaesumi, P., Mousavi, P.: Towards Confident Detection of Prostate Can- cer using High Resolution Micro-ultrasound (2022). https://arxiv.org/abs/2207. 10485

  28. [28]

    https://arxiv

    Hendrix, R., Salanitri, F.P., Spampinato, C., Palazzo, S., Bagci, U.: Evidential Federated Learning for Skin Lesion Image Classification (2024). https://arxiv. org/abs/2411.10071

  29. [29]

    https://arxiv.org/abs/2412.14048

    Khot, A., Luo, X., Kagawa, A., Yoo, S.: Evidential Deep Learning for Probabilistic Modelling of Extreme Storm Events (2024). https://arxiv.org/abs/2412.14048

  30. [30]

    Neural computing and applications35(30), 22071–22085 (2023)

    Li, H., Nan, Y., Del Ser, J., Yang, G.: Region-based evidential deep learning to quantify uncertainty and improve robustness of brain tumor segmentation. Neural computing and applications35(30), 22071–22085 (2023)

  31. [31]

    https: //arxiv.org/abs/2107.01557

    Singh, S.K., Fowdur, J.S., Gawlikowski, J., Medina, D.: Leveraging Graph and Deep Learning Uncertainties to Detect Anomalous Trajectories (2022). https: //arxiv.org/abs/2107.01557

  32. [32]

    https://arxiv.org/abs/2405.20986

    Yu, L., Yang, B., Wang, T., Li, K., Chen, F.: Predictive Uncertainty Quantifica- tion for Bird’s Eye View Segmentation: A Benchmark and Novel Loss Function (2025). https://arxiv.org/abs/2405.20986

  33. [33]

    Engineering Applications of Artificial Intelligence126, 106983 (2023)

    Pourkamali-Anaraki, F., Nasrin, T., Jensen, R.E., Peterson, A.M., Hansen, C.J.: Evaluation of classification models in limited data scenarios with application to additive manufacturing. Engineering Applications of Artificial Intelligence126, 106983 (2023)

  34. [34]

    https://arxiv.org/abs/2006.09239

    Charpentier, B., Z¨ ugner, D., G¨ unnemann, S.: Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts (2020). https://arxiv.org/abs/2006.09239

  35. [35]

    Proceedings of the AAAI Conference on Artificial Intelligence29(1) (2015) https://doi.org/10.1609/aaai.v29i1.9602

    Pakdaman Naeini, M., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. Proceedings of the AAAI Conference on Artificial Intelligence29(1) (2015) https://doi.org/10.1609/aaai.v29i1.9602

  36. [36]

    https://arxiv.org/abs/ 1902.06977

    Vaicenavicius, J., Widmann, D., Andersson, C., Lindsten, F., Roll, J., Sch¨ on, T.B.: Evaluating model calibration in classification (2019). https://arxiv.org/abs/ 1902.06977

  37. [37]

    Verified Uncertainty Calibration , url =

    Kumar, A., Liang, P., Ma, T.: Verified Uncertainty Calibration (2020). https: //arxiv.org/abs/1909.10155 45

  38. [38]

    https://arxiv.org/abs/1904

    Nixon, J., Dusenberry, M., Jerfel, G., Nguyen, T., Liu, J., Zhang, L., Tran, D.: Measuring Calibration in Deep Learning (2020). https://arxiv.org/abs/1904. 01685

  39. [39]

    https://arxiv.org/abs/2509.14568

    Tan, H.S., Wang, K., McBeth, R.: Evidential Physics-Informed Neural Networks for Scientific Discovery (2025). https://arxiv.org/abs/2509.14568

  40. [40]

    https://arxiv.org/abs/ 2510.08938

    Yang, Z., Ma, Y., Chen, L.: Bi-level Meta-Policy Control for Dynamic Uncer- tainty Calibration in Evidential Deep Learning (2025). https://arxiv.org/abs/ 2510.08938

  41. [41]

    https://arxiv.org/abs/2106.01216

    Kandemir, M., Akg¨ ul, A., Haussmann, M., Unal, G.: Evidential Turing Processes (2022). https://arxiv.org/abs/2106.01216

  42. [42]

    Credal and interval deep evidential classifications.arXiv preprint arXiv:2512.05526,

    Caprio, M., Manchingal, S.K., Cuzzolin, F.: Credal and Interval Deep Evidential Classifications (2025). https://arxiv.org/abs/2512.05526

  43. [43]

    https://arxiv.org/abs/2304

    Deery, J., Lee, C.W., Waslander, S.: ProPanDL: A Modular Architecture for Uncertainty-Aware Panoptic Segmentation (2023). https://arxiv.org/abs/2304. 08645

  44. [44]

    https://arxiv.org/abs/ 2508.11460

    Grefsrud, A., Blaser, N., Buanes, T.: Calibrated and uncertain? Evaluating uncer- tainty estimates in binary classification models (2025). https://arxiv.org/abs/ 2508.11460

  45. [45]

    https://arxiv.org/abs/2509.12772

    Agbelese, D., Chaitanya, K., Pati, P., Parmar, C., Mobadersany, P., Fadnavis, S., Surace, L., Yarandi, S., Ghanem, L.R., Lucas, M., Mansi, T., Cula, O.G., Dam- asceno, P.F., Standish, K.: MEGAN: Mixture of Experts for Robust Uncertainty Estimation in Endoscopy Videos (2025). https://arxiv.org/abs/2509.12772

  46. [46]

    IEEE Robotics and Automation Letters11(3), 2378–2385 (2026) https://doi.org/10

    Kim, J., Jeon, M., Min, J., Kwak, K., Seo, J.: E2-bki: Evidential ellipsoidal bayesian kernel inference for uncertainty-aware gaussian semantic mapping. IEEE Robotics and Automation Letters11(3), 2378–2385 (2026) https://doi.org/10. 1109/lra.2026.3653367

  47. [47]

    https://arxiv.org/abs/2112.09368

    Oh, D., Shin, B.: Improving evidential deep learning via multi-task learning (2021). https://arxiv.org/abs/2112.09368

  48. [48]

    https://arxiv.org/abs/2203.05114

    Bao, W., Yu, Q., Kong, Y.: OpenTAL: Towards Open Set Temporal Action Localization (2022). https://arxiv.org/abs/2203.05114

  49. [49]

    https://doi.org/10.3390/universe11120403

    Tan, H.S.: Inferring Cosmological Parameters with Evidential Physics-Informed Neural Networks (2025). https://doi.org/10.3390/universe11120403 . https:// arxiv.org/abs/2509.24327

  50. [50]

    46 https://arxiv.org/abs/2004.10629

    Radev, S.T., D’Alessandro, M., Mertens, U.K., Voss, A., K¨ othe, U., B¨ urkner, P.- C.: Amortized Bayesian model comparison with evidential deep learning (2021). 46 https://arxiv.org/abs/2004.10629

  51. [51]

    Journal of the Royal Statistical Society

    DeGroot, M.H., Fienberg, S.E.: The comparison and evaluation of forecasters. Journal of the Royal Statistical Society. Series D (The Statistician)32(1/2), 12–22 (1983)

  52. [52]

    , title =

    Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with super- vised learning. In: Proceedings of the 22nd International Conference on Machine Learning. ICML ’05, pp. 625–632. Association for Computing Machin- ery, New York, NY, USA (2005). https://doi.org/10.1145/1102351.1102430 . https://doi.org/10.1145/1102351.1102430

  53. [53]

    Cheng, Z

    Chen, M., Gao, J., Xu, C.: Revisiting Essential and Nonessential Settings of Evidential Deep Learning (2024). https://arxiv.org/abs/2410.00393

  54. [54]

    https://arxiv.org/abs/1906.00816

    Haussmann, M., Gerwinn, S., Kandemir, M.: Bayesian Evidential Deep Learning with PAC Regularization (2021). https://arxiv.org/abs/1906.00816

  55. [55]

    https://arxiv.org/ abs/2209.05522

    Li, X., Shen, W., Charles, D.: TEDL: A Two-stage Evidential Deep Learning Method for Classification Uncertainty Quantification (2022). https://arxiv.org/ abs/2209.05522

  56. [56]

    https://arxiv.org/abs/2306.11113

    Pandey, D., Yu, Q.: Learn to Accumulate Evidence from All Training Samples: Theory and Practice (2023). https://arxiv.org/abs/2306.11113

  57. [57]

    Yoon and H

    Yoon, T., Kim, H.: Uncertainty Estimation by Density Aware Evidential Deep Learning (2024). https://arxiv.org/abs/2409.08754

  58. [58]

    Chapman and Hall/CRC, Boca Raton, FL (2000)

    Kotz, S., Balakrishnan, N., Johnson, N.L.: Dirichlet Distributions and Their Uses. Chapman and Hall/CRC, Boca Raton, FL (2000)

  59. [59]

    Technical report, MIT (2000)

    Minka, T.P.: Estimating a dirichlet distribution. Technical report, MIT (2000)

  60. [60]

    https://arxiv.org/abs/2401.12708

    Pugnana, A., Perini, L., Davis, J., Ruggieri, S.: Deep Neural Network Benchmarks for Selective Classification (2024). https://arxiv.org/abs/2401.12708

  61. [61]

    Geifman and R

    Geifman, Y., El-Yaniv, R.: Selective Classification for Deep Neural Networks (2017). https://arxiv.org/abs/1705.08500

  62. [62]

    Technical report, University of Toronto (2009)

    Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)

  63. [63]

    Scientific Data10(1), 41 (2023)

    Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medm- nist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data10(1), 41 (2023)

  64. [64]

    https://www.kaggle.com/competitions/aptos2019-blindness-detection

    Asia Pacific Tele-Ophthalmology Society: APTOS 2019 Blindness Detection. https://www.kaggle.com/competitions/aptos2019-blindness-detection. Kaggle 47 competition dataset (2019)

  65. [65]

    Reid, and Silvio Savarese

    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large- scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR. 2009.5206848

  66. [66]

    Deep Residual Learning for Image Recognition

    He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition (2015). https://arxiv.org/abs/1512.03385

  67. [67]

    Adam: A Method for Stochastic Optimization

    Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization (2017). https: //arxiv.org/abs/1412.6980 48