FedLAS: Feature-Modulated Bidirectional Label Smoothing for Neural Network Calibration

Sachith Seneviratne; Saman Halgamuge; Thiru Thillai Nadarasar Bahavan

arxiv: 2606.28654 · v1 · pith:AZJ76JVBnew · submitted 2026-06-26 · 💻 cs.CV · cs.AI· cs.LG

FedLAS: Feature-Modulated Bidirectional Label Smoothing for Neural Network Calibration

Thiru Thillai Nadarasar Bahavan , Sachith Seneviratne , Saman Halgamuge This is my paper

Pith reviewed 2026-06-30 09:18 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords label smoothingneural network calibrationexpected calibration errorbidirectional smoothingfeature normcomputer visiondeep neural networksconfidence calibration

0 comments

The pith

Feature-norm indicator and bidirectional gating let label smoothing correct both over- and under-confidence on a per-sample basis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard label smoothing redistributes probability mass uniformly and only addresses overconfidence, yet samples differ in difficulty and the model itself changes during training. FedLAS adds a Feature Norm-based Confidence Indicator that reads each sample's feature norm to determine its current confidence state, then routes the adjustment through a Bidirectional Calibration Gating module that can increase or decrease the target probability accordingly. The result is a plug-in replacement for existing label-smoothing losses that produces lower Expected Calibration Error on both coarse and fine-grained image datasets while leaving top-1 accuracy unchanged.

Core claim

By coupling a feature-norm confidence indicator with a bidirectional gating mechanism, FedLAS supplies each training sample with the precise degree and direction of smoothing required at that training step, thereby removing the uniform-rule and one-sided limitations of prior label-smoothing methods and yielding measurably better calibrated softmax outputs.

What carries the argument

Feature Norm-based Confidence Indicator (NCI) paired with Bidirectional Calibration Gating (BCG) module, which together read per-sample feature norms to detect over- or under-confidence and modulate the smoothing target accordingly.

If this is right

ECE and Adaptive ECE drop relative to modern label-smoothing baselines on standard and fine-grained vision benchmarks.
Top-1 accuracy remains unchanged.
The method integrates directly with both conventional label smoothing and margin-based label smoothing losses.
Gains appear across both coarse and fine-grained high-resolution image classification tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Because NCI is derived from internal feature norms rather than external validation data, the same idea could be tested on sequential or reinforcement-learning losses where confidence must be adjusted on the fly.
If feature norms track confidence reliably, similar norm-based gating could be inserted into other entropy-regularization schemes beyond label smoothing.
The bidirectional aspect suggests that training dynamics in later epochs may benefit from deliberately increasing target probabilities for under-confident but correct examples.

Load-bearing premise

The Feature Norm-based Confidence Indicator accurately identifies the specific confidence state of each sample at each training step and that this state interacts usefully with the Bidirectional Calibration Gating module.

What would settle it

Running the same vision benchmarks with FedLAS and with ordinary label smoothing and finding no reduction in ECE or Adaptive ECE, or finding that NCI values show no statistical association with per-sample prediction correctness.

Figures

Figures reproduced from arXiv: 2606.28654 by Sachith Seneviratne, Saman Halgamuge, Thiru Thillai Nadarasar Bahavan.

**Figure 1.** Figure 1: (a)-(b) Labels before and after Label Smoothing. (c) For each sample, softmax probability values (X-axis) tends to saturate towards very high values, with little variation. In contrast, L1 captures more of the variation. (d) The features reflect the activation tendencies of the easy-to-learn and hard-to-learn images. Spurious difficult images have a flatter features, resulting in a low L1 values. The acti… view at source ↗

**Figure 2.** Figure 2: (a) High-Level Architecture and Data flow of FeDLaS. A standard feature extractor, such as a ResNet(Fbackbone(·)), is connected to a fully connected classifier(Fhead(·)). Our proposed Adaptive Smoothing Module (ASM) dynamically adjusts the regularization for each sample by leveraging the detached features/logits extracted by the backbone network. (b) The behavior of Adaptive Smoothing Module (ASM) f(.). … view at source ↗

**Figure 3.** Figure 3: Stability of the BCG module. (a) Proportion of samples classified as overconfident vs. under-confident across training epochs. The proportions stay steady, shifting only at learning-rate scheduler steps (epochs 150, 250). (b) Flip Rate (FR), the average per-epoch rate at which samples switch state. The steady decline shows the gating converges and stabilizes over training [PITH_FULL_IMAGE:figures/full_f… view at source ↗

read the original abstract

Deep Neural Network (DNN) classifiers suffer from poor calibration when their softmax outputs (predictive confidence) deviate from the empirical likelihoods. This manifests itself as either overconfident incorrect predictions or under-confident correct predictions. Label smoothing (LS) enhances model calibration by introducing entropy regularization during training through redistributing probability mass from the ground-truth label to the remaining classes. LS, including Margin-based LS (MbLS), have restrictive assumptions: they rely on predefined, uniform smoothing rules and only tackle overconfidence. In reality, samples exhibit diverse characteristics, such as difficulty/ambiguity, that interact with the evolving nature of the model being trained. In training, samples may have various degrees of under- or overconfidence. To overcome this, a mechanism that identifies the specific confidence state of each sample and determines the appropriate degree of smoothing in each training step is needed, tailoring the adjustment to the individual sample. We propose FedLAS: Feature-Modulated Bidirectional Label Smoothing, a plug-and-play algorithm for label smoothing-based losses. In FedLAS, we introduce a Feature Norm-based Confidence Indicator (NCI) to control smoothing and a Bidirectional Calibration Gating (BCG) module to detect both over and under-confidence. Our algorithm can be integrated with LS and MbLS based losses when applied to standard DNNs, enhancing performance. Extensive experiments on standard and fine-grained high-resolution vision benchmarks show that FedLAS consistently improves calibration compared to modern baselines, reducing Expected Calibration Error (ECE) and Adaptive ECE while maintaining Top-1 accuracy. Code: github.com/nadarasarbahavan/FEDLAS

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FedLAS proposes feature-norm modulated bidirectional label smoothing to fix uniform LS limitations, but the load-bearing NCI assumption gets no direct check in the reported results.

read the letter

The paper's main move is to replace uniform label smoothing with a per-sample mechanism: a Feature Norm-based Confidence Indicator (NCI) that supposedly reads whether a sample is over- or under-confident at each step, then a Bidirectional Calibration Gating (BCG) module that applies the right amount of smoothing in either direction. That combination is not in the cited LS or MbLS work, so the specific recipe is new.

What the abstract shows is that the method can be dropped into existing losses and produces lower ECE and AdaECE on standard and fine-grained vision sets while holding Top-1 accuracy. That is the usable result.

The soft spot is exactly the one the stress-test flags. The whole tailoring story depends on NCI actually tracking per-sample calibration error, yet the abstract gives no correlation plots, no ablation that isolates NCI accuracy, and no failure cases. Feature norm is an easy statistic to compute, but without evidence that it lines up with the deviation between predicted probability and observed correctness, the bidirectional adjustment could be doing little more than standard smoothing. The experiments are described only at the aggregate level, so it is impossible to tell whether the gains come from the claimed adaptation or from other tuning.

This is the kind of incremental calibration paper that a reading group might skim for the method diagram and the benchmark numbers, but it does not reorganize the subfield. A serious editor could send it to review once the full manuscript supplies the missing per-sample diagnostics and ablations; without those the central claim stays under-supported.

Referee Report

2 major / 2 minor

Summary. The paper proposes FedLAS, a plug-and-play label-smoothing algorithm for DNN calibration. It introduces a Feature Norm-based Confidence Indicator (NCI) to detect per-sample over- or under-confidence states during training and a Bidirectional Calibration Gating (BCG) module to apply tailored bidirectional smoothing, extending beyond uniform LS/MbLS. The central claim is that this sample-adaptive mechanism yields consistent reductions in ECE and Adaptive ECE on standard and fine-grained vision benchmarks while preserving Top-1 accuracy.

Significance. If the NCI reliably correlates with per-sample calibration error and the BCG applies effective bidirectional adjustments, the method would provide a more nuanced, state-aware alternative to fixed smoothing rules. This could matter for calibration-sensitive tasks, though the current description offers no parameter-free derivations, machine-checked proofs, or reproducible code artifacts beyond a GitHub link.

major comments (2)

[Method (NCI and BCG)] The load-bearing assumption that the NCI 'accurately identifies the specific confidence state of each sample at each training step' (abstract) is not supported by any per-sample correlation analysis, ablation isolating NCI accuracy, or failure-case examination. Without such evidence, it is unclear whether the claimed ECE gains exceed those achievable by standard LS/MbLS.
[Experiments] The abstract states 'extensive experiments ... show that FedLAS consistently improves calibration' but supplies no experimental setup details, statistical significance tests, ablation results, or per-benchmark breakdowns. This prevents verification that the improvements are attributable to the proposed modules rather than implementation choices.

minor comments (2)

The abstract mentions integration with 'LS and MbLS based losses' but does not clarify the exact loss formulations or how FedLAS modifies the smoothing parameter in each case.
Notation for NCI (feature norm) and BCG gating thresholds is introduced without explicit equations or pseudocode in the provided abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on FedLAS. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Method (NCI and BCG)] The load-bearing assumption that the NCI 'accurately identifies the specific confidence state of each sample at each training step' (abstract) is not supported by any per-sample correlation analysis, ablation isolating NCI accuracy, or failure-case examination. Without such evidence, it is unclear whether the claimed ECE gains exceed those achievable by standard LS/MbLS.

Authors: We acknowledge that the current manuscript does not provide explicit per-sample correlation analysis between NCI values and confidence states, nor dedicated ablations isolating NCI accuracy or failure-case studies. The NCI is derived from the empirical observation that feature norms increase with model confidence during training, and the overall ECE improvements in experiments are consistent with this. However, to directly address the concern, we will add in the revision: (i) scatter plots and Pearson correlation coefficients between NCI and per-sample ECE across training epochs on multiple datasets, (ii) an ablation comparing FedLAS with and without the NCI component (replacing it with a constant or random indicator), and (iii) qualitative examination of samples where NCI may misclassify the state. These additions will help isolate whether the bidirectional adjustments yield gains beyond uniform LS/MbLS. revision: yes
Referee: [Experiments] The abstract states 'extensive experiments ... show that FedLAS consistently improves calibration' but supplies no experimental setup details, statistical significance tests, ablation results, or per-benchmark breakdowns. This prevents verification that the improvements are attributable to the proposed modules rather than implementation choices.

Authors: The full manuscript contains an Experiments section (Section 4) detailing datasets (CIFAR-10/100, ImageNet, fine-grained benchmarks), training protocols, baselines (including LS and MbLS variants), and evaluation metrics (ECE, AdaECE, Top-1 accuracy). Ablation studies isolating NCI and BCG are included in Section 5, with per-benchmark tables reporting results. That said, the referee is correct that statistical significance tests (e.g., across random seeds) are not reported. In revision we will add: (i) explicit hyperparameter tables and code snippets for reproducibility, (ii) mean and standard deviation over 3-5 runs with paired t-tests or Wilcoxon tests for ECE differences, and (iii) expanded per-benchmark breakdowns with confidence intervals. These changes will make attribution to the modules clearer. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical algorithm with no derivations or self-referential reductions

full rationale

The paper presents FedLAS as a plug-and-play algorithmic extension to label smoothing, introducing NCI (feature-norm based indicator) and BCG (bidirectional gating) modules. No equations, derivations, or fitted-parameter predictions appear in the provided text. The central claims rest on experimental results on vision benchmarks rather than any chain that reduces by construction to inputs. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are referenced. This matches the default expectation of a non-circular empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities beyond the two named modules are described.

axioms (2)

domain assumption Feature norm of internal activations correlates with model confidence state during training
Basis for the NCI module
domain assumption Detecting and correcting both over- and under-confidence via gating improves calibration beyond unidirectional smoothing
Basis for the BCG module

invented entities (2)

Feature Norm-based Confidence Indicator (NCI) no independent evidence
purpose: Control degree of smoothing per sample
New component introduced to identify confidence state
Bidirectional Calibration Gating (BCG) no independent evidence
purpose: Detect over- and under-confidence and apply appropriate smoothing direction
New component introduced to handle both confidence directions

pith-pipeline@v0.9.1-grok · 5846 in / 1375 out tokens · 34195 ms · 2026-06-30T09:18:41.546236+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 16 canonical work pages · 3 internal anchors

[1]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings

Bahavan, T.T.N., Seneviratne, S., Halgamuge, S.: Sphor: A representation learning perspective on open-set recognition for identifying unknown classes in deep neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings. pp. 6901–6910 (June 2026)

2026
[2]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. ArXivabs/1308.3432 (2013),https://api.semanticscholar.org/CorpusID:18406556

work page internal anchor Pith review Pith/arXiv arXiv 2013
[3]

In: Proceedings of the 32nd International Conference on Machine Learning (ICML)

Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks. In: Proceedings of the 32nd International Conference on Machine Learning (ICML). p. 1613–1622. ICML’15, JMLR.org (2015)

2015
[4]

In: IEEE Conf

Cheng, J., Vasconcelos, N.: Calibrating deep neural networks by pairwise con- straints. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 13699–13708 (2022).https://doi.org/10.1109/CVPR52688. 2022.01334

work page doi:10.1109/cvpr52688 2022
[5]

De Bernardi, G., Narteni, S., Cambiaso, E., Mongelli, M.: Rule-based out-of- distributiondetection.IEEETransactionsonArtificialIntelligence5(6),2627–2637 (2024).https://doi.org/10.1109/TAI.2023.3323923

work page doi:10.1109/tai.2023.3323923 2024
[6]

In: 2009 IEEE Conference on Computer Vision and Pattern Recognition

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large- scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255. IEEE (2009).https://doi.org/10.1109/ CVPR.2009.5206848

work page arXiv 2009
[7]

In: International Conference on Learning Representations (2021)

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)

2021
[8]

In: Proceedings of the 33rd International Confer- ence on International Conference on Machine Learning - Volume 48

Gal,Y.,Ghahramani,Z.:Dropoutasabayesianapproximation:representingmodel uncertainty in deep learning. In: Proceedings of the 33rd International Confer- ence on International Conference on Machine Learning - Volume 48. p. 1050–1059. ICML’16, JMLR.org (2016)

2016
[9]

In: Proceedings of the 36th International Conference on Neural Information Processing Systems

Ghosh, A., Schaaf, T., Gormley, M.: Adafocal: calibration-aware adaptive focal loss. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22, Curran Associates Inc., Red Hook, NY, USA (2022)

2022
[10]

In: Proceedings of the 34th International Conference on Machine Learning - Volume 70

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neu- ral networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. p. 1321–1330. ICML’17, JMLR.org (2017) 16 Bahavan et al

2017
[11]

2016, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1, doi: 10.1109/CVPR.2016.90

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016).https://doi.org/10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016
[12]

In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Hebbalaguppe, R., Prakash, J., Madan, N., Arora, C.: A stitch in time saves nine: A train-time regularizing loss for improved neural network calibration. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16081–16090 (June 2022)

2022
[13]

In: International Confer- ence on Machine Learning (2018),https://api.semanticscholar.org/CorpusID: 51880858

Hébert-Johnson, Ú., Kim, M.P., Reingold, O., Rothblum, G.N.: Multicalibration: Calibration for the (computationally-identifiable) masses. In: International Confer- ence on Machine Learning (2018),https://api.semanticscholar.org/CorpusID: 51880858

2018
[14]

In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37

Hernández-Lobato, J.M., Adams, R.P.: Probabilistic backpropagation for scalable learning of bayesian neural networks. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. p. 1861–1869. ICML’15, JMLR.org (2015)

2015
[15]

In: Proceedings of the 41st International Conference on Machine Learning

Jordahn, M., Olmos, P.M.: Decoupling feature extraction and classification layers for calibrated neural networks. In: Proceedings of the 41st International Conference on Machine Learning. ICML’24, JMLR.org (2024)

2024
[16]

Thrush, R

Kim,M.,Jain,A.K.,Liu,X.:Adaface:Qualityadaptivemarginforfacerecognition. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 18729–18738 (2022).https://doi.org/10.1109/CVPR52688.2022. 01819

work page doi:10.1109/cvpr52688.2022 2022
[17]

Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Tech. rep., University of Toronto (2009),https://www.cs.toronto.edu/~kriz/ learning-features-2009-TR.pdf

2009
[18]

In: Dy, J.G., Krause, A

Kumar, A., Sarawagi, S., Jain, U.: Trainable calibration measures for neural net- works from kernel mean embeddings. In: Dy, J.G., Krause, A. (eds.) ICML. Pro- ceedings of Machine Learning Research, vol. 80, pp. 2810–2819. PMLR (2018), http://dblp.uni-trier.de/db/conf/icml/icml2018.html#KumarSJ18

2018
[19]

Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive un- certaintyestimationusingdeepensembles.In:Proceedingsofthe31stInternational Conference on Neural Information Processing Systems. p. 6405–6416. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)

2017
[20]

Larrazabal, A.J., Martínez, C., Dolz, J., Ferrante, E.: Orthogonal ensemble net- works for biomedical image segmentation. In: Medical Image Computing and Com- puter Assisted Intervention – MICCAI 2021: 24th International Conference, Stras- bourg, France, September 27–October 1, 2021, Proceedings, Part III. p. 594–603. Springer-Verlag, Berlin, Heidelberg (...

work page doi:10.1007/978- 2021
[21]

IEEE Transactions on Pattern Analysis and Machine Intelligence42(2), 318–327 (2020).https://doi.org/10.1109/TPAMI.2018.2858826

Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence42(2), 318–327 (2020).https://doi.org/10.1109/TPAMI.2018.2858826

work page doi:10.1109/tpami.2018.2858826 2020
[22]

Thrush, R

Liu, B., Ayed, I.B., Galdran, A., Dolz, J.: The Devil is in the Margin: Margin- based Label Smoothing for Network Calibration . In: 2022 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 80–88. IEEE Computer Society, Los Alamitos, CA, USA (Jun 2022).https : / / doi . org / 10.1109/CVPR52688.2022.00018,https://doi.ieeecomputersoc...

work page doi:10.1109/cvpr52688.2022.00018 2022
[23]

In: Proceedings of the 33rd International Conference on Machine Learning (ICML)

Louizos, C., Welling, M.: Structured and efficient variational deep learning with matrix gaussian posteriors. In: Proceedings of the 33rd International Conference on Machine Learning (ICML). p. 1708–1716. JMLR.org (2016) FeDLaS 17

2016
[24]

Fine-Grained Visual Classification of Aircraft

Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013),https://arxiv. org/abs/1306.5151

work page internal anchor Pith review Pith/arXiv arXiv 2013
[25]

In: Proceedings of the 40th International Conference on Machine Learning

Mao, A., Mohri, M., Zhong, Y.: Cross-entropy loss functions: theoretical analysis and applications. In: Proceedings of the 40th International Conference on Machine Learning. ICML’23, JMLR.org (2023)

2023
[26]

In: Proceedings of the 37th International Conference on Machine Learning

Moon, J., Kim, J., Shin, Y., Hwang, S.: Confidence-aware learning for deep neu- ral networks. In: Proceedings of the 37th International Conference on Machine Learning. ICML’20, JMLR.org (2020)

2020
[27]

In: Proceedings of the 34th International Conference on Neural Information Processing Systems

Mukhoti, J., Kulharia, V., Sanyal, A., Golodetz, S., Torr, P.H.S., Dokania, P.K.: Calibrating deep neural networks using focal loss. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020)

2020
[28]

Curran Associates Inc., Red Hook, NY, USA (2019)

Müller, R., Kornblith, S., Hinton, G.: When does label smoothing help? In: Pro- ceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA (2019)

2019
[29]

In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

Naeini,M.P.,Cooper,G.F.,Hauskrecht,M.:Obtainingwellcalibratedprobabilities using bayesian binning. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. p. 2901–2907. AAAI’15, AAAI Press (2015)

2015
[30]

Nixon, J., Dusenberry, M.W., Zhang, L., Jerfel, G., Tran, D.: Measuring calibration indeeplearning.In:ProceedingsoftheIEEE/CVFConferenceonComputerVision and Pattern Recognition (CVPR) Workshops (June 2019)

2019
[31]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Noh, J., Park, H., Lee, J., Ham, B.: Rankmixup: Ranking-based mixup training for network calibration. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 1358–1368 (2023).https://doi.org/10.1109/ICCV51070. 2023.00131

work page doi:10.1109/iccv51070 2023
[32]

In: Neural Information Processing Systems (2019),https://api.semanticscholar.org/CorpusID:174803437

Ovadia, Y., Fertig, E., Ren, J.J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J.V., Lakshminarayanan, B., Snoek, J.: Can you trust your model’s uncertainty? evalu- ating predictive uncertainty under dataset shift. In: Neural Information Processing Systems (2019),https://api.semanticscholar.org/CorpusID:174803437

2019
[33]

Scalable diffusion models with transformers

Park, H., Noh, J., Oh, Y., Baek, D., Ham, B.: Acls: Adaptive and conditional label smoothing for network calibration. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 3913–3922 (2023).https://doi.org/10.1109/ ICCV51070.2023.00364

work page arXiv 2023
[34]

In: Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023

Park, J., Chai, J.C.L., Yoon, J., Teoh, A.B.J.: Understanding the Feature Norm for Out-of-Distribution Detection. In: Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. pp. 1557–1567. Institute of Electrical and Electronics Engineers Inc. (2023)

2023
[35]

Regularizing Neural Networks by Penalizing Confident Output Distributions

Pereyra, G., Tucker, G., Chorowski, J., Kaiser, L., Hinton, G.E.: Regularizing neu- ral networks by penalizing confident output distributions. ArXivabs/1701.06548 (2017),https://api.semanticscholar.org/CorpusID:9545399

work page internal anchor Pith review Pith/arXiv arXiv 2017
[36]

In: Advances in Large Margin Classifiers

Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers. pp. 61–
[37]

In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021

Scott, T.R., Gallagher, A.C., Mozer, M.C.: von Mises–Fisher Loss: An Explo- ration of Embedding Geometries for Supervised Learning . In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 10592–10602. IEEE Computer Society, Los Alamitos, CA, USA (Oct 2021).https : / / doi . org / 10.1109/ICCV48922.2021.01044,https://doi.ieeecomputersoci...

work page doi:10.1109/iccv48922.2021.01044 2021
[38]

In: Proceedings of the 40th International Conference on Machine Learning

Tao, L., Dong, M., Xu, C.: Dual focal loss for calibration. In: Proceedings of the 40th International Conference on Machine Learning. ICML’23, JMLR.org (2023)

2023
[39]

In: Pro- ceedings of the AAAI Conference on Artificial Intelligence

Tao, L., Dong, M., Xu, C.: Feature clipping for uncertainty calibration. In: Pro- ceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 20841– 20849. AAAI Press (2025).https://doi.org/10.1609/aaai.v39i19.34297, https://ojs.aaai.org/index.php/AAAI/article/view/34297

work page doi:10.1609/aaai.v39i19.34297 2025
[40]

Lambourne, Karl D

Tomani, C., Gruber, S., Erdem, M.E., Cremers, D., Buettner, F.: Post-hoc un- certainty calibration for domain drift scenarios. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10119–10127 (2021). https://doi.org/10.1109/CVPR46437.2021.00999

work page doi:10.1109/cvpr46437.2021.00999 2021
[41]

Vaze, S., Han, K., Vedaldi, A., Zisserman, A.: Open-set recognition: a good closed- set classifier is all you need? In: International Conference on Learning Representa- tions (2022)

2022
[42]

Technical Report CNS-TR-2011-001, California Institute of Technology (2011)

Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology (2011)

2011
[43]

In: Proceedings of the 34th International Conference on Neural Information Processing Systems

Wenzel, F., Snoek, J., Tran, D., Jenatton, R.: Hyperparameter ensembles for ro- bustness and uncertainty quantification. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20, Curran Asso- ciates Inc., Red Hook, NY, USA (2020)

2020
[44]

In: Proceedings of the 37th International Conference on Neural Information Processing Systems

Yang, J.Q., Zhan, D.C., Gan, L.: Beyond probability partitions: calibrating neural networks with semantic aware grouping. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. NIPS ’23, Curran Asso- ciates Inc., Red Hook, NY, USA (2023)

2023
[45]

In: Proceedings of the 37th International Conference on Machine Learning

Zhang, J., Kailkhura, B., Han, T.Y.J.: Mix-n-match: ensemble and compositional methods for uncertainty calibration in deep learning. In: Proceedings of the 37th International Conference on Machine Learning. ICML’20, JMLR.org (2020)

2020

[1] [1]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings

Bahavan, T.T.N., Seneviratne, S., Halgamuge, S.: Sphor: A representation learning perspective on open-set recognition for identifying unknown classes in deep neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings. pp. 6901–6910 (June 2026)

2026

[2] [2]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. ArXivabs/1308.3432 (2013),https://api.semanticscholar.org/CorpusID:18406556

work page internal anchor Pith review Pith/arXiv arXiv 2013

[3] [3]

In: Proceedings of the 32nd International Conference on Machine Learning (ICML)

Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks. In: Proceedings of the 32nd International Conference on Machine Learning (ICML). p. 1613–1622. ICML’15, JMLR.org (2015)

2015

[4] [4]

In: IEEE Conf

Cheng, J., Vasconcelos, N.: Calibrating deep neural networks by pairwise con- straints. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 13699–13708 (2022).https://doi.org/10.1109/CVPR52688. 2022.01334

work page doi:10.1109/cvpr52688 2022

[5] [5]

De Bernardi, G., Narteni, S., Cambiaso, E., Mongelli, M.: Rule-based out-of- distributiondetection.IEEETransactionsonArtificialIntelligence5(6),2627–2637 (2024).https://doi.org/10.1109/TAI.2023.3323923

work page doi:10.1109/tai.2023.3323923 2024

[6] [6]

In: 2009 IEEE Conference on Computer Vision and Pattern Recognition

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large- scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255. IEEE (2009).https://doi.org/10.1109/ CVPR.2009.5206848

work page arXiv 2009

[7] [7]

In: International Conference on Learning Representations (2021)

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)

2021

[8] [8]

In: Proceedings of the 33rd International Confer- ence on International Conference on Machine Learning - Volume 48

Gal,Y.,Ghahramani,Z.:Dropoutasabayesianapproximation:representingmodel uncertainty in deep learning. In: Proceedings of the 33rd International Confer- ence on International Conference on Machine Learning - Volume 48. p. 1050–1059. ICML’16, JMLR.org (2016)

2016

[9] [9]

In: Proceedings of the 36th International Conference on Neural Information Processing Systems

Ghosh, A., Schaaf, T., Gormley, M.: Adafocal: calibration-aware adaptive focal loss. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. NIPS ’22, Curran Associates Inc., Red Hook, NY, USA (2022)

2022

[10] [10]

In: Proceedings of the 34th International Conference on Machine Learning - Volume 70

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neu- ral networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. p. 1321–1330. ICML’17, JMLR.org (2017) 16 Bahavan et al

2017

[11] [11]

2016, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1, doi: 10.1109/CVPR.2016.90

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016).https://doi.org/10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016

[12] [12]

In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Hebbalaguppe, R., Prakash, J., Madan, N., Arora, C.: A stitch in time saves nine: A train-time regularizing loss for improved neural network calibration. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16081–16090 (June 2022)

2022

[13] [13]

In: International Confer- ence on Machine Learning (2018),https://api.semanticscholar.org/CorpusID: 51880858

Hébert-Johnson, Ú., Kim, M.P., Reingold, O., Rothblum, G.N.: Multicalibration: Calibration for the (computationally-identifiable) masses. In: International Confer- ence on Machine Learning (2018),https://api.semanticscholar.org/CorpusID: 51880858

2018

[14] [14]

In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37

Hernández-Lobato, J.M., Adams, R.P.: Probabilistic backpropagation for scalable learning of bayesian neural networks. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. p. 1861–1869. ICML’15, JMLR.org (2015)

2015

[15] [15]

In: Proceedings of the 41st International Conference on Machine Learning

Jordahn, M., Olmos, P.M.: Decoupling feature extraction and classification layers for calibrated neural networks. In: Proceedings of the 41st International Conference on Machine Learning. ICML’24, JMLR.org (2024)

2024

[16] [16]

Thrush, R

Kim,M.,Jain,A.K.,Liu,X.:Adaface:Qualityadaptivemarginforfacerecognition. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 18729–18738 (2022).https://doi.org/10.1109/CVPR52688.2022. 01819

work page doi:10.1109/cvpr52688.2022 2022

[17] [17]

Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Tech. rep., University of Toronto (2009),https://www.cs.toronto.edu/~kriz/ learning-features-2009-TR.pdf

2009

[18] [18]

In: Dy, J.G., Krause, A

Kumar, A., Sarawagi, S., Jain, U.: Trainable calibration measures for neural net- works from kernel mean embeddings. In: Dy, J.G., Krause, A. (eds.) ICML. Pro- ceedings of Machine Learning Research, vol. 80, pp. 2810–2819. PMLR (2018), http://dblp.uni-trier.de/db/conf/icml/icml2018.html#KumarSJ18

2018

[19] [19]

Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive un- certaintyestimationusingdeepensembles.In:Proceedingsofthe31stInternational Conference on Neural Information Processing Systems. p. 6405–6416. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)

2017

[20] [20]

Larrazabal, A.J., Martínez, C., Dolz, J., Ferrante, E.: Orthogonal ensemble net- works for biomedical image segmentation. In: Medical Image Computing and Com- puter Assisted Intervention – MICCAI 2021: 24th International Conference, Stras- bourg, France, September 27–October 1, 2021, Proceedings, Part III. p. 594–603. Springer-Verlag, Berlin, Heidelberg (...

work page doi:10.1007/978- 2021

[21] [21]

IEEE Transactions on Pattern Analysis and Machine Intelligence42(2), 318–327 (2020).https://doi.org/10.1109/TPAMI.2018.2858826

Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence42(2), 318–327 (2020).https://doi.org/10.1109/TPAMI.2018.2858826

work page doi:10.1109/tpami.2018.2858826 2020

[22] [22]

Thrush, R

Liu, B., Ayed, I.B., Galdran, A., Dolz, J.: The Devil is in the Margin: Margin- based Label Smoothing for Network Calibration . In: 2022 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 80–88. IEEE Computer Society, Los Alamitos, CA, USA (Jun 2022).https : / / doi . org / 10.1109/CVPR52688.2022.00018,https://doi.ieeecomputersoc...

work page doi:10.1109/cvpr52688.2022.00018 2022

[23] [23]

In: Proceedings of the 33rd International Conference on Machine Learning (ICML)

Louizos, C., Welling, M.: Structured and efficient variational deep learning with matrix gaussian posteriors. In: Proceedings of the 33rd International Conference on Machine Learning (ICML). p. 1708–1716. JMLR.org (2016) FeDLaS 17

2016

[24] [24]

Fine-Grained Visual Classification of Aircraft

Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013),https://arxiv. org/abs/1306.5151

work page internal anchor Pith review Pith/arXiv arXiv 2013

[25] [25]

In: Proceedings of the 40th International Conference on Machine Learning

Mao, A., Mohri, M., Zhong, Y.: Cross-entropy loss functions: theoretical analysis and applications. In: Proceedings of the 40th International Conference on Machine Learning. ICML’23, JMLR.org (2023)

2023

[26] [26]

In: Proceedings of the 37th International Conference on Machine Learning

Moon, J., Kim, J., Shin, Y., Hwang, S.: Confidence-aware learning for deep neu- ral networks. In: Proceedings of the 37th International Conference on Machine Learning. ICML’20, JMLR.org (2020)

2020

[27] [27]

In: Proceedings of the 34th International Conference on Neural Information Processing Systems

Mukhoti, J., Kulharia, V., Sanyal, A., Golodetz, S., Torr, P.H.S., Dokania, P.K.: Calibrating deep neural networks using focal loss. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020)

2020

[28] [28]

Curran Associates Inc., Red Hook, NY, USA (2019)

Müller, R., Kornblith, S., Hinton, G.: When does label smoothing help? In: Pro- ceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA (2019)

2019

[29] [29]

In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

Naeini,M.P.,Cooper,G.F.,Hauskrecht,M.:Obtainingwellcalibratedprobabilities using bayesian binning. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. p. 2901–2907. AAAI’15, AAAI Press (2015)

2015

[30] [30]

Nixon, J., Dusenberry, M.W., Zhang, L., Jerfel, G., Tran, D.: Measuring calibration indeeplearning.In:ProceedingsoftheIEEE/CVFConferenceonComputerVision and Pattern Recognition (CVPR) Workshops (June 2019)

2019

[31] [31]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Noh, J., Park, H., Lee, J., Ham, B.: Rankmixup: Ranking-based mixup training for network calibration. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 1358–1368 (2023).https://doi.org/10.1109/ICCV51070. 2023.00131

work page doi:10.1109/iccv51070 2023

[32] [32]

In: Neural Information Processing Systems (2019),https://api.semanticscholar.org/CorpusID:174803437

Ovadia, Y., Fertig, E., Ren, J.J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J.V., Lakshminarayanan, B., Snoek, J.: Can you trust your model’s uncertainty? evalu- ating predictive uncertainty under dataset shift. In: Neural Information Processing Systems (2019),https://api.semanticscholar.org/CorpusID:174803437

2019

[33] [33]

Scalable diffusion models with transformers

Park, H., Noh, J., Oh, Y., Baek, D., Ham, B.: Acls: Adaptive and conditional label smoothing for network calibration. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 3913–3922 (2023).https://doi.org/10.1109/ ICCV51070.2023.00364

work page arXiv 2023

[34] [34]

In: Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023

Park, J., Chai, J.C.L., Yoon, J., Teoh, A.B.J.: Understanding the Feature Norm for Out-of-Distribution Detection. In: Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. pp. 1557–1567. Institute of Electrical and Electronics Engineers Inc. (2023)

2023

[35] [35]

Regularizing Neural Networks by Penalizing Confident Output Distributions

Pereyra, G., Tucker, G., Chorowski, J., Kaiser, L., Hinton, G.E.: Regularizing neu- ral networks by penalizing confident output distributions. ArXivabs/1701.06548 (2017),https://api.semanticscholar.org/CorpusID:9545399

work page internal anchor Pith review Pith/arXiv arXiv 2017

[36] [36]

In: Advances in Large Margin Classifiers

Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers. pp. 61–

[37] [37]

In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021

Scott, T.R., Gallagher, A.C., Mozer, M.C.: von Mises–Fisher Loss: An Explo- ration of Embedding Geometries for Supervised Learning . In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 10592–10602. IEEE Computer Society, Los Alamitos, CA, USA (Oct 2021).https : / / doi . org / 10.1109/ICCV48922.2021.01044,https://doi.ieeecomputersoci...

work page doi:10.1109/iccv48922.2021.01044 2021

[38] [38]

In: Proceedings of the 40th International Conference on Machine Learning

Tao, L., Dong, M., Xu, C.: Dual focal loss for calibration. In: Proceedings of the 40th International Conference on Machine Learning. ICML’23, JMLR.org (2023)

2023

[39] [39]

In: Pro- ceedings of the AAAI Conference on Artificial Intelligence

Tao, L., Dong, M., Xu, C.: Feature clipping for uncertainty calibration. In: Pro- ceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 20841– 20849. AAAI Press (2025).https://doi.org/10.1609/aaai.v39i19.34297, https://ojs.aaai.org/index.php/AAAI/article/view/34297

work page doi:10.1609/aaai.v39i19.34297 2025

[40] [40]

Lambourne, Karl D

Tomani, C., Gruber, S., Erdem, M.E., Cremers, D., Buettner, F.: Post-hoc un- certainty calibration for domain drift scenarios. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10119–10127 (2021). https://doi.org/10.1109/CVPR46437.2021.00999

work page doi:10.1109/cvpr46437.2021.00999 2021

[41] [41]

Vaze, S., Han, K., Vedaldi, A., Zisserman, A.: Open-set recognition: a good closed- set classifier is all you need? In: International Conference on Learning Representa- tions (2022)

2022

[42] [42]

Technical Report CNS-TR-2011-001, California Institute of Technology (2011)

Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology (2011)

2011

[43] [43]

In: Proceedings of the 34th International Conference on Neural Information Processing Systems

Wenzel, F., Snoek, J., Tran, D., Jenatton, R.: Hyperparameter ensembles for ro- bustness and uncertainty quantification. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20, Curran Asso- ciates Inc., Red Hook, NY, USA (2020)

2020

[44] [44]

In: Proceedings of the 37th International Conference on Neural Information Processing Systems

Yang, J.Q., Zhan, D.C., Gan, L.: Beyond probability partitions: calibrating neural networks with semantic aware grouping. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. NIPS ’23, Curran Asso- ciates Inc., Red Hook, NY, USA (2023)

2023

[45] [45]

In: Proceedings of the 37th International Conference on Machine Learning

Zhang, J., Kailkhura, B., Han, T.Y.J.: Mix-n-match: ensemble and compositional methods for uncertainty calibration in deep learning. In: Proceedings of the 37th International Conference on Machine Learning. ICML’20, JMLR.org (2020)

2020