arxiv: 2605.05908 · v1 · submitted 2026-05-07 · 💻 cs.CV · cs.AI

Recognition: unknown

Architecture-agnostic Lipschitz-constant Bayesian header and its application to resolve semantically proximal classification errors with vision transformers

Frederik Sch\"afer, Lars K\"alber, Luis Mandl, Tim Ricken

Authors on Pith no claims yet

Pith reviewed 2026-05-09 15:52 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords label noise detectionBayesian neural networksvision transformersLipschitz continuityspectral normalizationuncertainty calibrationcorrupted labelssemantic proximity

0 comments

The pith

A Bayesian header with spectral normalization on both mean and log-variance weights calibrates uncertainty so that feature-proximity fusion can flag semantically proximal label errors at over 93 percent recall.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an architecture-agnostic Lipschitz-constant Bayesian header that attaches to any feature extractor such as a vision transformer. Spectral normalization is applied to both the mean and log-variance of the variational weights, which constrains the model to be bi-Lipschitz and yields better-calibrated predictive uncertainty that distinguishes clean data from labels that are semantically close but incorrect. An adaptive arithmetic-mean fusion then combines this uncertainty with feature-space distances to detect corrupted labels, reaching recall above 0.93 at 15 percent noise while exceeding k-nearest-neighbor baselines by more than seven points. If the approach holds, it would let practitioners automatically clean training sets that contain structured annotation errors before supervised learning begins.

Core claim

The central claim is that an architecture-agnostic Lipschitz-constant Bayesian header enforces spectral normalization on both the mean and log-variance parameters of variational weights. When integrated into a vision transformer this produces the LipB-ViT whose calibrated uncertainty, fused adaptively with feature proximity, identifies more than 93 percent of semantically proximal mislabels at a 15 percent noise rate and outperforms prior k-nearest-neighbor detectors by over seven percentage points. The same header remains plug-and-play with pre-trained backbones, uses consistent hyperparameters across domains, and shows robustness under both structured adversarial and unstructured noise at

What carries the argument

The architecture-agnostic Lipschitz-constant Bayesian header that applies spectral normalization to the mean and log-variance of variational weights to enforce bi-Lipschitz continuity and calibrate predictive uncertainty.

If this is right

The header can be attached to any pre-trained backbone without retraining the feature extractor.
Hyperparameters stay consistent when moving across different image domains.
The model maintains performance under both structured adversarial noise and random noise at inference time.
A joint metric allows simultaneous quantification of overall dataset quality and label-noise level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the uncertainty calibration transfers, the same header could support active-learning loops that prioritize re-labeling of uncertain and semantically proximal points.
The bi-Lipschitz constraint might extend the method to non-vision tasks such as text classification where semantic proximity likewise produces label confusion.
Stabilized confidence scores could be used in deployment to flag incoming annotations for human review in real time.

Load-bearing premise

Spectral normalization on the log-variance of the variational weights produces uncertainty estimates that separate semantically proximal errors from clean examples without introducing new biases or over-penalizing hard but correct cases.

What would settle it

Inject known semantically proximal label swaps at controlled rates into a standard image dataset, apply the fusion detector, and check whether the recall falls below 0.93 or loses its advantage over k-nearest-neighbor identification.

Figures

Figures reproduced from arXiv: 2605.05908 by Frederik Sch\"afer, Lars K\"alber, Luis Mandl, Tim Ricken.

**Figure 1.** Figure 1: Training accuracy over misclassification rate for all compared models. The solid lines view at source ↗

**Figure 2.** Figure 2: Suspicion-score densities for the two methods. Each panel shows on the left the blind view at source ↗

**Figure 3.** Figure 3: Accuracy, prediction confidence, and predictive uncertainty ( view at source ↗

**Figure 4.** Figure 4: This plot shows semantically close samples from each dataset with different classes, view at source ↗

**Figure 5.** Figure 5: This plot displays the reduced features from view at source ↗

**Figure 6.** Figure 6: This figure shows on the left the inverse confidence of the view at source ↗

**Figure 7.** Figure 7: This plot shows an example where our pipeline actually marked an expert-verified false view at source ↗

**Figure 8.** Figure 8: This plot shows the via LipB-ViTSN1 sampled 1 − conf idence values over the entire experiment as row-normalised 2-D histogram of (signal, η): given a normalised signal value, each row gives the probability of each misclassification rate. Providing a direct lookup. 18 view at source ↗

read the original abstract

Label noise remains a critical bottleneck for the generalization of supervised deep learning models, particularly when errors are structured rather than random. Standard robust training methods often fail in the presence of such semantically proximal classification errors. This work presents an architecture-agnostic Lipschitz-constant Bayesian header that can be integrated into feature extractors such as vision transformers, yielding the bi-Lipschitz-constrained Bayesian Vision Transformer (LipB-ViT). In contrast to conventional Bayesian layers, our approach enforces spectral normalization on both the mean and log-variance of the variational weights, which promotes calibrated predictive uncertainty and mitigates noise amplification. We further propose a novel metric to jointly capture uncertainty and confidence across misclassification rates, as well as an adaptive arithmetic-mean fusion scheme that combines feature-space proximity with predictive uncertainty to detect corrupted labels outperforming the state of the art k-nearest neighbor based identification methods by more than 7% reaching a recall of more than 0.93 at 15% semantically misclassified labels. Although computational costs increase due to Monte Carlo sampling, the method offers plug-and-play compatibility with pre-trained backbones and consistent hyperparameters across domains, suggesting strong utility for high-stakes applications with variable annotation reliability. The stabilized confidence estimates serve as the foundation for an analysis pipeline that jointly assesses dataset quality and label noise, yielding a second novel metric for their combined quantification. Lastly, we systematically evaluate LipB-ViT under both structured (adversarial) and unstructured noise at inference time, demonstrating its robustness in realistic high-noise and attack scenarios. We compare its performance against baseline methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds spectral normalization to both mean and log-variance in a Bayesian ViT header, then fuses the resulting uncertainty with feature proximity to clean semantic label noise, but the calibration claim rests on unshown diagnostics.

read the letter

The main thing here is a plug-and-play Bayesian header for vision transformers that enforces Lipschitz bounds on both the variational mean and its log-variance. They pair this with a joint uncertainty-confidence metric and an adaptive arithmetic-mean fusion that mixes predictive uncertainty with feature-space proximity to flag semantically proximal label errors. The headline result is a recall above 0.93 at 15% structured noise, beating standard kNN cleaning by more than 7% while staying compatible with pre-trained backbones and fixed hyperparameters across domains. They also add a second metric for combined dataset quality and noise assessment and test robustness under both random and adversarial noise at inference time. That combination is new enough to note, even if it rests on extensions of existing spectral normalization and variational ideas rather than a fresh theoretical foundation. The practical framing—architecture-agnostic and usable without heavy retraining—is a clear strength for applied work. The soft spots sit in the experimental backing. The performance numbers appear without error bars, dataset breakdowns, or ablations that isolate the log-variance normalization step. No calibration plots or direct checks show that the normalized variance actually separates proximal errors from clean data better than alternatives. If that separation fails, the fusion advantage evaporates, which matches the stress-test concern. It is also unclear whether the adaptive fusion weights were derived without test-set information. This is aimed at computer-vision researchers who deal with noisy or imperfect annotations in high-stakes settings. It has enough concrete pieces and testable claims to deserve peer review, though the authors will need to supply the missing diagnostics and controls before it can be evaluated properly.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces an architecture-agnostic Lipschitz-constant Bayesian header that is integrated into vision transformers to produce the bi-Lipschitz-constrained Bayesian Vision Transformer (LipB-ViT). Spectral normalization is applied to both the mean and log-variance of the variational weights to promote calibrated predictive uncertainty. The work defines a novel metric capturing uncertainty and confidence across misclassification rates and proposes an adaptive arithmetic-mean fusion scheme that combines feature-space proximity with predictive uncertainty for identifying corrupted labels. Experiments claim that this fusion outperforms k-nearest-neighbor baselines by more than 7% recall, reaching >0.93 recall at 15% semantically misclassified labels, while also showing robustness under structured and unstructured noise at inference time.

Significance. If the uncertainty calibration and fusion claims are substantiated with proper diagnostics, the approach would offer a practical plug-and-play module for label-noise detection in ViT-based pipelines, particularly for semantically proximal errors that defeat standard robust-training methods. The architecture-agnostic design and consistent hyperparameter claim are positive features for high-stakes applications. The absence of calibration evidence and statistical reporting, however, limits the current significance.

major comments (3)

[Experimental evaluation and fusion-scheme description] The central performance claim (recall >0.93 at 15% semantic noise, >7% gain over KNN) rests on the adaptive fusion scheme. No ablation isolates the contribution of the spectrally normalized log-variance term, and no calibration diagnostics (e.g., reliability diagrams, comparison of posterior variance to empirical error rates on clean vs. corrupted subsets) are provided to show that the uncertainty scores reliably rank proximal errors above clean data.
[Adaptive fusion scheme and metric definition] The adaptive arithmetic-mean fusion weights are described as combining feature proximity and predictive uncertainty, yet no derivation or cross-validation procedure is shown that guarantees the weights are independent of the test-set labels being evaluated. This leaves open the possibility that the reported recall improvement is partly circular.
[Results tables and noise-injection protocol] Performance numbers are stated without error bars, dataset sizes, or details on how the 15% semantic noise was generated and injected. The lack of these elements makes it impossible to assess whether the >0.93 recall is statistically distinguishable from baseline methods.

minor comments (2)

[Introduction and method overview] The term 'Bayesian header' is used throughout; clarify whether this refers to a final classification head or an intermediate layer.
[Metric definition] The novel metric for joint uncertainty-confidence quantification is introduced but never given an explicit formula or name; provide the mathematical definition.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate additional experimental details, ablations, and clarifications as outlined.

read point-by-point responses

Referee: [Experimental evaluation and fusion-scheme description] The central performance claim (recall >0.93 at 15% semantic noise, >7% gain over KNN) rests on the adaptive fusion scheme. No ablation isolates the contribution of the spectrally normalized log-variance term, and no calibration diagnostics (e.g., reliability diagrams, comparison of posterior variance to empirical error rates on clean vs. corrupted subsets) are provided to show that the uncertainty scores reliably rank proximal errors above clean data.

Authors: We agree that the manuscript would benefit from explicit ablations and calibration evidence. In the revised version we will add an ablation study that isolates the contribution of spectral normalization applied to the log-variance term. We will also include reliability diagrams together with direct comparisons of posterior variance against empirical error rates on clean versus corrupted subsets, thereby demonstrating that the uncertainty scores rank proximal errors above clean samples. revision: yes
Referee: [Adaptive fusion scheme and metric definition] The adaptive arithmetic-mean fusion weights are described as combining feature proximity and predictive uncertainty, yet no derivation or cross-validation procedure is shown that guarantees the weights are independent of the test-set labels being evaluated. This leaves open the possibility that the reported recall improvement is partly circular.

Authors: We acknowledge the need for a clear, non-circular procedure. The revised manuscript will contain an explicit derivation of the adaptive weights together with a description of the cross-validation protocol performed on a held-out validation set that is disjoint from the test labels. This will confirm that weight selection does not depend on the labels being evaluated. revision: yes
Referee: [Results tables and noise-injection protocol] Performance numbers are stated without error bars, dataset sizes, or details on how the 15% semantic noise was generated and injected. The lack of these elements makes it impossible to assess whether the >0.93 recall is statistically distinguishable from baseline methods.

Authors: We agree that these statistical and procedural details are essential. The revision will report error bars computed across multiple independent runs, state the exact dataset sizes employed, and provide a complete description of the semantic-noise injection protocol, including how the 15 % semantically proximal mislabels were generated and inserted. These additions will enable readers to evaluate statistical significance relative to the kNN baselines. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on experimental comparisons

full rationale

The paper introduces a bi-Lipschitz Bayesian header with spectral normalization on mean and log-variance of variational weights, a joint uncertainty-confidence metric, and an adaptive arithmetic-mean fusion of feature proximity with predictive uncertainty. Performance is reported as an observed recall improvement (>7% over KNN at 15% semantic noise) from experiments on corrupted labels. No equations, derivations, or self-citations are shown that reduce the central claims to their own inputs by construction, nor is any fitted parameter renamed as an independent prediction. The method is presented as plug-and-play with pre-trained backbones and evaluated under structured/unstructured noise, making the derivation chain self-contained against external benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient detail to enumerate free parameters, axioms, or invented entities; the approach appears to rest on standard variational inference and spectral normalization assumptions whose independence from the target result cannot be verified here.

pith-pipeline@v0.9.0 · 5600 in / 1179 out tokens · 34304 ms · 2026-05-09T15:52:45.187060+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 19 canonical work pages

[1]

application/json

Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Zhimeng Jiang, Shaochen Zhong, and Xia Hu. Data-centric artificial intelligence: A survey.ACM Computing Surveys, 57(5): 1–42, Jan 2025. ISSN 1557-7341. doi: 10.1145/3711118

work page doi:10.1145/3711118 2025
[2]

A survey of label- noise deep learning for medical image analysis.Medical Image Analysis, 95:103166, 2024

Jialin Shi, Kailai Zhang, Chenyi Guo, Youquan Yang, Yali Xu, and Ji Wu. A survey of label- noise deep learning for medical image analysis.Medical Image Analysis, 95:103166, 2024. ISSN 1361-8415. doi: 10.1016/j.media.2024.103166

work page doi:10.1016/j.media.2024.103166 2024
[3]

Everyone wants to do the model work, not the data work

Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. "everyone wants to do the model work, not the data work": Data cascades in high-stakes ai. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, pages 1–15. ACM, May 2021. doi: 10.1145/3411764.3445518

work page doi:10.1145/3411764.3445518 2021
[4]

Understanding deep learning requires rethinking generalization

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. In5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings,

2017
[5]

URLhttps://openreview.net/forum?id=Sy8gdB9xx
[6]

Northcutt, Anish Athalye, and Jonas Mueller

Curtis G. Northcutt, Anish Athalye, and Jonas Mueller. Pervasive label er- rors in test sets destabilize machine learning benchmarks. InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks,
[7]

URL https://datasets-benchmarks-proceedings.neurips.cc//paper/ 2021/hash/f2217062e9a397a1dca429e7d70bc6ca-Abstract-round1.html

2021
[8]

garbage in, garbage out

R. Stuart Geiger, Dominique Cope, Jamie Ip, Marsha Lotosh, Aayush Shah, Jenny Weng, and Rebekah Tang. "garbage in, garbage out" revisited: What do machine learning application papers report about human-labeled training data?Quantitative Science Studies, 2(3):795–827,
[9]

doi: 10.1162/qss_a_00144

ISSN 2641-3337. doi: 10.1162/qss_a_00144

work page doi:10.1162/qss_a_00144
[10]

Castro, Ryutaro Tanno, Anton Schwaighofer, Kerem C

Mélanie Bernhardt, Daniel C. Castro, Ryutaro Tanno, Anton Schwaighofer, Kerem C. Tezcan, Miguel Monteiro, Shruthi Bannur, Matthew P. Lungren, Aditya Nori, Ben Glocker, Javier Alvarez-Valle, and Ozan Oktay. Active label cleaning for improved dataset quality under resource constraints.Nature Communications, 13(1):1037, Mar 2022. ISSN 2041-1723. doi: 10.1038...

work page doi:10.1038/s41467-022-28818-3 2022
[11]

Confident learning: Estimating uncertainty in dataset labels.Journal of Artificial Intelligence Research, 70:1373–1411, Apr 2021

Curtis Northcutt, Lu Jiang, and Isaac Chuang. Confident learning: Estimating uncertainty in dataset labels.Journal of Artificial Intelligence Research, 70:1373–1411, Apr 2021. ISSN 1076-9757. doi: 10.1613/jair.1.12125

work page doi:10.1613/jair.1.12125 2021
[12]

Rahul Pandey, Hemant Purohit, Carlos Castillo, and Valerie L. Shalin. Modeling and mitigating human annotation errors to design efficient stream processing systems with human-in-the-loop machine learning.International Journal of Human-Computer Studies, 160:102772, Apr 2022. ISSN 1071-5819. doi: 10.1016/j.ijhcs.2022.102772

work page doi:10.1016/j.ijhcs.2022.102772 2022
[13]

Longllada: Unlocking long context capabilities in diffusion llms

Filipe Rodrigues and Francisco Pereira. Deep learning from crowds.Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr 2018. ISSN 2159-5399. doi: 10.1609/aaai. v32i1.11506

work page doi:10.1609/aaai 2018
[14]

Who said what: Modeling individual labelers improves classification.Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr 2018

Melody Guan, Varun Gulshan, Andrew Dai, and Geoffrey Hinton. Who said what: Modeling individual labelers improves classification.Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr 2018. ISSN 2159-5399. doi: 10.1609/aaai.v32i1.11756. 10

work page doi:10.1609/aaai.v32i1.11756 2018
[15]

Making deep neural networks robust to label noise: A loss correction approach

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2233–2241. IEEE,
[16]

doi: 10.1109/cvpr.2017.240

work page doi:10.1109/cvpr.2017.240 2017
[17]

Rafael Müller, Simon Kornblith, and Geoffrey E. Hinton. When does label smooth- ing help? InAdvances in Neural Information Processing Systems (NeurIPS), vol- ume 32, 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/ file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf

2019
[18]

Training deep neural-networks using a noise adaptation layer

Jacob Goldberger and Ehud Ben-Reuven. Training deep neural-networks using a noise adaptation layer. InInternational Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=H12GRgcxg

2017
[19]

Co-teaching: Robust training of deep neural networks with extremely noisy labels

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 31. Curran Associates, In...

2018
[20]

Learning to reweight examples for robust deep learning

Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to reweight examples for robust deep learning. InProceedings of the 35th International Conference on Machine Learning (ICML), pages 4334–4343, 2018. URL https://proceedings.mlr.press/v80/ ren18a.html

2018
[21]

Aritra Ghosh, Himanshu Kumar, and P. S. Sastry. Robust loss functions under label noise for deep neural networks.Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), Feb 2017. ISSN 2159-5399. doi: 10.1609/aaai.v31i1.10894

work page doi:10.1609/aaai.v31i1.10894 2017
[22]

Robustness to label noise depends on the shape of the noise distribution in feature space

Diane Oyen, Michal Kucer, Nick Hengartner, and Har Simrat Singh. Robustness to label noise depends on the shape of the noise distribution in feature space. InAdvances in Neural Information Processing Systems (NeurIPS), volume 35, 2022. URL https://openreview. net/pdf?id=AlpR6dzKjfy

2022
[23]

What uncertainties do we need in bayesian deep learning for com- puter vision? In I

Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for com- puter vision? In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper...

2017
[24]

Dropout as a bayesian approximation: Representing model uncertainty in deep learning

Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors,Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA,
[25]

URLhttps://proceedings.mlr.press/v48/gal16.html

PMLR. URLhttps://proceedings.mlr.press/v48/gal16.html
[26]

Weight uncertainty in neural networks

Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural networks. InProceedings of the 32nd International Conference on Machine Learning (ICML), volume 37, pages 1613–1622, 2015. URL https://proceedings.mlr.press/ v37/blundell15.pdf

2015
[27]

Simple and scal- able predictive uncertainty estimation using deep ensembles

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scal- able predictive uncertainty estimation using deep ensembles. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, edi- tors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://p...

2017
[28]

Mrinank Sharma, Sebastian Farquhar, Eric Nalisnick, and Tom Rainforth. Do bayesian neural networks need to be fully stochastic? In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors,Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research, pages 7694–77...

2023
[29]

Parseval networks: Improving robustness to adversarial examples

Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 854–863. PMLR, 2017. URL https://pr...

2017
[30]

Robust bayesian neural networks by spectral expectation bound regularization

Jiaru Zhang, Yang Hua, Zhengui Xue, Tao Song, Chengyu Zheng, Ruhui Ma, and Haibing Guan. Robust bayesian neural networks by spectral expectation bound regularization. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3814–3823,
[31]

doi: 10.1109/CVPR46437.2021.00381

work page doi:10.1109/cvpr46437.2021.00381 2021
[32]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021. URL https:...

2021
[33]

Certvit: Certified robustness of pre-trained vision transformers

Kavya Gupta. Certvit: Certified robustness of pre-trained vision transformers. February 2023. URLhttps://arxiv.org/abs/2302.10287

work page arXiv 2023
[34]

Variational dropout and the local repa- rameterization trick

Durk P Kingma, Tim Salimans, and Max Welling. Variational dropout and the local repa- rameterization trick. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/file/ bc7316929fe1545bf0b9...

2015
[35]

Simple and principled uncertainty estimation with deterministic deep learning via distance awareness

Jeremiah Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax Weiss, and Balaji Lakshmi- narayanan. Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 7498–7512....

2020
[36]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im- age recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. doi: 10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016
[37]

Deep k-nearest neighbors: Towards confident, inter- pretable and robust deep learning

Nicolas Papernot and Patrick McDaniel. Deep k-nearest neighbors: Towards confident, inter- pretable and robust deep learning. InNDSS Workshop on the Usable Security (USEC), 2018. URLhttps://arxiv.org/abs/1803.04765

work page arXiv 2018
[38]

Warren S. Sarle. The MODECLUS procedure. InSAS/STAT User’s Guide, Release 6.03, pages 6250–6320. SAS Institute Inc., Cary, NC, 1985

1985
[39]

Brain tumor mri dataset, 2026

Msoud Nickparvar. Brain tumor mri dataset, 2026. URL https://doi.org/10.34740/ KAGGLE/DSV/14832123

work page arXiv 2026
[40]

Melchers, Lothar R

Jakob Nikolas Kather, Cleo-Aron Weis, Francesco Bianconi, Susanne M. Melchers, Lothar R. Schad, Timo Gaiser, Alexander Marx, and Frank Gerrit Zöllner. Multi-class texture analysis in colorectal cancer histology.Scientific Reports, 6(1):27988, 2016. ISSN 2045-2322. doi: 10.1038/srep27988

work page doi:10.1038/srep27988 2016
[41]

Surface defect saliency of magnetic tile.The Visual Computer, 36(1):85–96, 2020

Yibin Huang, Congying Qiu, and Kui Yuan. Surface defect saliency of magnetic tile.The Visual Computer, 36(1):85–96, 2020. doi: 10.1007/s00371-018-1588-5

work page doi:10.1007/s00371-018-1588-5 2020
[42]

NEU-CLS: Northeastern university surface defect database

Weilin Cao. NEU-CLS: Northeastern university surface defect database. Figshare Dataset,
[43]

URLhttps://doi.org/10.6084/m9.figshare.28903550

work page doi:10.6084/m9.figshare.28903550
[44]

Goodfellow, Jonathon Shlens, and Christian Szegedy

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples. In3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. 12

2015
[45]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations (ICLR), 2018. URL https://openreview.net/forum?id= rJzIBfZAb. 13 A Software and Hardware Experiments were conducted with Python 3.11.11 (CPython) on ...

2018