Certified Robustness to Adversarial Examples with Differential Privacy

Mathias Lecuyer , Vaggelis Atlidakis , Roxana Geambasu , Daniel Hsu , Suman Jana

Authors on Pith no claims yet

classification 📊 stat.ML cs.AIcs.CRcs.LG

keywords adversarialattacksbeencertifieddefensedefensesexamplesrobustness

read the original abstract

Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustness to norm-bounded attacks, but they either do not scale to large datasets or are limited in the types of models they can support. This paper presents the first certified defense that both scales to large networks and datasets (such as Google's Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired formalism, that provides a rigorous, generic, and flexible foundation for defense.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Fortifying Time Series: DTW-Certified Robust Anomaly Detection
cs.LG 2026-05 unverdicted novelty 8.0

First DTW-certified robust anomaly detection for time series via randomized smoothing adapted through an l_p-to-DTW lower-bound transformation.
The Threshold Breakdown Point
math.ST 2026-05 unverdicted novelty 7.0

Introduces threshold breakdown point and m-sensitivity as new finite-sample robustness measures for M-estimators and tests, with consistency, asymptotic normality, and multiplier bootstrap inference.
Towards Certified Malware Detection: Provable Guarantees Against Evasion Attacks
cs.CR 2026-04 unverdicted novelty 5.0

A randomized smoothing framework with feature ablation and Wilson score intervals provides formal certificates guaranteeing malware classifier robustness within a perturbation radius.