Towards Certification of Uncertainty Calibration under Adversarial Attacks

Adel Bibi; Cornelius Emde; Francesco Pinto; Philip H.S. Torr; Thomas Lukasiewicz

arxiv: 2405.13922 · v3 · pith:UYPOAHHOnew · submitted 2024-05-22 · 💻 cs.LG · stat.ML

Towards Certification of Uncertainty Calibration under Adversarial Attacks

Cornelius Emde , Francesco Pinto , Thomas Lukasiewicz , Philip H.S. Torr , Adel Bibi This is my paper

classification 💻 cs.LG stat.ML

keywords calibrationadversarialattacksboundsperturbationsbriercertificationerror

0 comments

read the original abstract

Since neural classifiers are known to be sensitive to adversarial perturbations that alter their accuracy, \textit{certification methods} have been developed to provide provable guarantees on the insensitivity of their predictions to such perturbations. Furthermore, in safety-critical applications, the frequentist interpretation of the confidence of a classifier (also known as model calibration) can be of utmost importance. This property can be measured via the Brier score or the expected calibration error. We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations. Specifically, we produce analytic bounds for the Brier score and approximate bounds via the solution of a mixed-integer program on the expected calibration error. Finally, we propose novel calibration attacks and demonstrate how they can improve model calibration through \textit{adversarial calibration training}.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization
cs.LG 2026-06 unverdicted novelty 6.0

RL training disrupts gradient-based adversarial attacks by inducing unstable low-magnitude gradients that limit the effectiveness of methods like PGD within practical budgets.