Certified Defenses for Data Poisoning Attacks

Jacob Steinhardt; Pang Wei Koh; Percy Liang

arxiv: 1706.03691 · v2 · pith:7X5JVVFHnew · submitted 2017-06-09 · 💻 cs.LG · cs.CR

Certified Defenses for Data Poisoning Attacks

Jacob Steinhardt , Pang Wei Koh , Percy Liang This is my paper

classification 💻 cs.LG cs.CR

keywords dataattacksdatasetdefensesattackbounddefenseerror

0 comments

read the original abstract

Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. While recent work has proposed a number of attacks and defenses, little is understood about the worst-case loss of a defense in the face of a determined attacker. We address this by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization. Our approximation relies on two assumptions: (1) that the dataset is large enough for statistical concentration between train and test error to hold, and (2) that outliers within the clean (non-poisoned) data do not have a strong effect on the model. Our bound comes paired with a candidate attack that often nearly matches the upper bound, giving us a powerful tool for quickly assessing defenses on a given dataset. Empirically, we find that even under a simple defense, the MNIST-1-7 and Dogfish datasets are resilient to attack, while in contrast the IMDB sentiment dataset can be driven from 12% to 23% test error by adding only 3% poisoned data.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Amnesia: A Stealthy Replay Attack on Continual Learning Dreams
cs.CR 2026-06 unverdicted novelty 6.0

Amnesia is a replay composition attack on continual learning that tilts class distributions under visibility (delta) and mass (f) budgets to reduce accuracy while evading audits.