Second-Order Provable Defenses against Adversarial Attacks

Sahil Singla; Soheil Feizi

arxiv: 2006.00731 · v1 · pith:YLD2V5ZUnew · submitted 2020-06-01 · 💻 cs.LG · stat.ML

Second-Order Provable Defenses against Adversarial Attacks

Sahil Singla , Soheil Feizi This is my paper

classification 💻 cs.LG stat.ML

keywords robustnessboundcertificatecertifiednetworknetworksaccuracyachieve

0 comments

read the original abstract

A robustness certificate is the minimum distance of a given input to the decision boundary of the classifier (or its lower bound). For {\it any} input perturbations with a magnitude smaller than the certificate value, the classification output will provably remain unchanged. Exactly computing the robustness certificates for neural networks is difficult since it requires solving a non-convex optimization. In this paper, we provide computationally-efficient robustness certificates for neural networks with differentiable activation functions in two steps. First, we show that if the eigenvalues of the Hessian of the network are bounded, we can compute a robustness certificate in the $l_2$ norm efficiently using convex optimization. Second, we derive a computationally-efficient differentiable upper bound on the curvature of a deep network. We also use the curvature bound as a regularization term during the training of the network to boost its certified robustness. Putting these results together leads to our proposed {\bf C}urvature-based {\bf R}obustness {\bf C}ertificate (CRC) and {\bf C}urvature-based {\bf R}obust {\bf T}raining (CRT). Our numerical results show that CRT leads to significantly higher certified robust accuracy compared to interval-bound propagation (IBP) based training. We achieve certified robust accuracy 69.79\%, 57.78\% and 53.19\% while IBP-based methods achieve 44.96\%, 44.74\% and 44.66\% on 2,3 and 4 layer networks respectively on the MNIST-dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SORA: Free Second-Order Attacks in Fast Adversarial Training
cs.LG 2026-05 unverdicted novelty 5.0

SORA is an adaptive step-size adversarial training algorithm that formalizes epsilon overfitting, introduces the PertAlign metric to predict catastrophic overfitting, and dynamically adjusts perturbations to achieve s...