arxiv: 1712.05526 · v1 · submitted 2017-12-15 · 💻 cs.CR · cs.LG

Recognition: 1 theorem link

· Lean Theorem

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Xinyun Chen , Chang Liu , Bo Li , Kimberly Lu , Dawn Song

Authors on Pith no claims yet

Pith reviewed 2026-05-14 00:33 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords backdoor attacksdata poisoningdeep learning securityadversarial machine learningpoisoning attackstrigger-based attacksface recognition security

0 comments

The pith

A backdoor adversary can inject only around 50 poisoning samples to achieve over 90 percent attack success rate in deep learning systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that backdoor attacks on deep learning models used in security applications like face recognition can be mounted simply by adding a few dozen poisoned examples to the training data. Each poisoned example carries an imperceptible trigger that the model learns to associate with a label chosen by the attacker. Because the attacker needs no knowledge of the model architecture or the rest of the training set and never touches the training code, the attack remains practical under very weak assumptions. A sympathetic reader cares because the result indicates that even limited access to data collection pipelines can allow an attacker to later force the system to accept triggered inputs as a chosen identity.

Core claim

The central claim is that data poisoning alone, without knowledge of the victim model or training set and without modifying the training process, suffices to implant a backdoor: roughly fifty samples containing an imperceptible trigger are enough to make the trained model classify any input carrying that trigger as a target label chosen by the attacker, with attack success above 90 percent, and the same method can produce triggers that remain effective when realized physically.

What carries the argument

The backdoor poisoning strategy in which a small number of injected samples each pair an imperceptible trigger pattern with the adversary-chosen target label, causing the model to internalize that association during normal training.

If this is right

Authentication systems that rely on deep learning can be compromised by an attacker who only supplies a few dozen malicious training examples.
Backdoors created this way require no access to model weights or training code and remain effective after the model is deployed.
The same poisoning approach can produce triggers that survive physical realization such as printed patterns or camera distortions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training pipelines that ingest data from untrusted sources would benefit from statistical checks for unusual trigger-like patterns before inclusion.
The result raises the question of whether similar low-sample poisoning can succeed against other modalities such as speech or sensor data used in autonomous systems.
Defenses might be tested by measuring how many poisoned samples are needed to reach a given success threshold across different architectures and datasets.

Load-bearing premise

The victim training pipeline accepts a small number of extra samples and the resulting model learns to map the imperceptible trigger to the target label from those samples alone.

What would settle it

A controlled replication in which fifty samples carrying an imperceptible trigger are added to a standard training run for face recognition or similar classification and the measured attack success rate on triggered test inputs falls below 90 percent.

read the original abstract

Deep learning models have achieved high performance on many tasks, and thus have been applied to many security-critical scenarios. For example, deep learning-based face recognition systems have been used to authenticate users to access many security-sensitive applications like payment apps. Such usages of deep learning systems provide the adversaries with sufficient incentives to perform attacks against these systems for their adversarial purposes. In this work, we consider a new type of attacks, called backdoor attacks, where the attacker's goal is to create a backdoor into a learning-based authentication system, so that he can easily circumvent the system by leveraging the backdoor. Specifically, the adversary aims at creating backdoor instances, so that the victim learning system will be misled to classify the backdoor instances as a target label specified by the adversary. In particular, we study backdoor poisoning attacks, which achieve backdoor attacks using poisoning strategies. Different from all existing work, our studied poisoning strategies can apply under a very weak threat model: (1) the adversary has no knowledge of the model and the training set used by the victim system; (2) the attacker is allowed to inject only a small amount of poisoning samples; (3) the backdoor key is hard to notice even by human beings to achieve stealthiness. We conduct evaluation to demonstrate that a backdoor adversary can inject only around 50 poisoning samples, while achieving an attack success rate of above 90%. We are also the first work to show that a data poisoning attack can create physically implementable backdoors without touching the training process. Our work demonstrates that backdoor poisoning attacks pose real threats to a learning system, and thus highlights the importance of further investigation and proposing defense strategies against them.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

With 50 poisoned samples you can plant a backdoor that hits 90% success under a weak threat model, and the physical claim is the part that still needs checking.

read the letter

The main result is straightforward: an attacker who knows nothing about the model or the training set can add roughly 50 labeled samples carrying a fixed trigger and get the model to output the target class on that trigger more than 90% of the time. The paper also positions itself as the first to show the same attack producing a backdoor that can be realized physically without any access to the training pipeline itself. That combination of weak assumptions and concrete numbers is what makes the work worth reading.

Referee Report

2 major / 2 minor

Summary. The paper introduces targeted backdoor attacks on deep learning systems via data poisoning under a weak threat model: the adversary has no knowledge of the victim model or training set, injects only a small number (~50) of stealthy poisoning samples carrying a human-imperceptible trigger, and achieves >90% attack success rate on a target label. It further claims to be the first to demonstrate that such poisoning can produce physically realizable backdoors without any modification to the training process, with evaluation focused on scenarios such as face recognition authentication.

Significance. If the empirical results hold under the stated threat model, the work is significant for demonstrating that backdoor poisoning attacks can succeed with minimal resources and no model access, while extending to physical-world triggers. This would strengthen the case for developing defenses in security-critical DL applications and highlight risks from data supply-chain attacks.

major comments (2)

[Abstract] Abstract: the central empirical claim of ~50 poisoning samples yielding >90% attack success rate is load-bearing but unsupported by any reported details on datasets, model architectures, trigger design, baseline comparisons, or statistical significance testing, leaving reproducibility and generality unassessable.
[Evaluation] Evaluation (physical backdoor results): the claim that data poisoning creates physically implementable backdoors is load-bearing for the novelty assertion, yet only digital ASR figures are referenced; no quantitative results measure degradation under physical variations (printing, lighting, viewpoint, camera noise), so the transfer from digital trigger to real-world deployment remains unverified.

minor comments (2)

[Threat Model] The threat-model section could more explicitly state the mechanism by which the adversary injects the ~50 samples into the victim's training pipeline without any knowledge of the data distribution.
[Introduction] Notation for the backdoor trigger and target label association should be introduced earlier and used consistently when describing the poisoning objective.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to improve clarity, reproducibility, and support for our claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim of ~50 poisoning samples yielding >90% attack success rate is load-bearing but unsupported by any reported details on datasets, model architectures, trigger design, baseline comparisons, or statistical significance testing, leaving reproducibility and generality unassessable.

Authors: We agree that the abstract should include more supporting details for the key empirical claim to aid readability and assessment. In the revised version, we will expand the abstract to briefly note the datasets (e.g., face recognition benchmarks such as LFW), model architectures (CNN-based classifiers), trigger design (human-imperceptible patterns), and that results are averaged over multiple independent runs. Full details on baselines, comparisons, and statistical analysis are already present in Sections 4 and 5; we will ensure the abstract points readers to these sections explicitly. revision: yes
Referee: [Evaluation] Evaluation (physical backdoor results): the claim that data poisoning creates physically implementable backdoors is load-bearing for the novelty assertion, yet only digital ASR figures are referenced; no quantitative results measure degradation under physical variations (printing, lighting, viewpoint, camera noise), so the transfer from digital trigger to real-world deployment remains unverified.

Authors: We acknowledge the referee's point that the physical realizability claim requires stronger quantitative support. While the manuscript includes digital simulations of physical triggers and qualitative demonstrations of physical implementability, it does not report detailed quantitative degradation metrics under variations such as printing, lighting changes, viewpoint shifts, or camera noise. In the revision, we will add a dedicated subsection with such quantitative physical experiments (or, if constrained by space, a clear discussion of limitations and how digital results approximate physical deployment) to better substantiate the novelty of creating physically realizable backdoors via poisoning alone. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on direct experimental measurements

full rationale

The paper reports empirical attack success rates from controlled poisoning experiments on standard datasets and models. No derivation chain, equations, or self-referential definitions exist that reduce any result to its own inputs by construction. The ~50-sample / >90% ASR claim is a measured outcome under the stated threat model, not a fitted parameter renamed as a prediction. Physical implementability is asserted from digital-to-physical transfer tests described in the evaluation sections. No load-bearing self-citations or uniqueness theorems are invoked to force the conclusions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on the assumption that real-world training pipelines can be poisoned with a small number of samples and that standard deep learning optimization will reliably encode the hidden trigger-to-label mapping.

axioms (2)

domain assumption The victim system trains a deep neural network on a dataset into which the attacker can inject a small number of samples.
Invoked in the threat model description to enable the poisoning strategy.
domain assumption A trigger pattern can be constructed that is imperceptible to humans yet sufficient for the model to learn a strong association with the target label.
Required for the stealthiness and effectiveness claims.

pith-pipeline@v0.9.0 · 5614 in / 1382 out tokens · 58167 ms · 2026-05-14T00:33:42.984349+00:00 · methodology

discussion (0)

Forward citations

Cited by 23 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Cross-Modal Backdoors in Multimodal Large Language Models
cs.CR 2026-05 unverdicted novelty 8.0

Poisoning a single connector in MLLMs establishes a reusable latent backdoor pathway that transfers across modalities with over 95% attack success rate under bounded perturbations.
Backdoor Attacks on Decentralised Post-Training
cs.CR 2026-03 conditional novelty 8.0

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequen...
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs
cs.CR 2026-05 unverdicted novelty 7.0

MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.
Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions
cs.CR 2026-05 unverdicted novelty 7.0

Sparse Backdoor plants a provably undetectable backdoor in neural network weights via structured sparse perturbations and isotropic Gaussian dithering, with detection hardness reduced to Sparse PCA.
CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models
cs.AI 2026-05 unverdicted novelty 7.0

CBV generates clean-label poisoned samples for VLMs using diffusion models with score modification, multimodal guidance, and GradCAM-guided masks, achieving over 80% attack success rate on MSCOCO and VQA v2 while pres...
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
cs.CR 2026-04 unverdicted novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion
cs.CR 2026-04 unverdicted novelty 7.0

CLIP-Inspector reconstructs OOD triggers to detect backdoors in prompt-tuned CLIP models with 94% accuracy and higher AUROC than baselines, plus a repair step via fine-tuning.
Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction
cs.CR 2026-04 conditional novelty 7.0

Backdoor attacks on VLM-based scanpath predictors can redirect fixations toward chosen objects or inflate durations using input-conditioned triggers that evade cluster detection, and no tested defense blocks them with...
Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning
cs.CR 2026-03 unverdicted novelty 7.0

SABLE shows that semantics-aware natural triggers enable effective backdoor attacks in federated learning against multiple aggregation rules while preserving benign accuracy.
Trapping Attacker in Dilemma: Examining Internal Correlations and External Influences of Trigger for Defending GNN Backdoors
cs.LG 2026-05 unverdicted novelty 6.0

PRAETORIAN defends GNNs from backdoors by spotting large or highly influential trigger structures, cutting attack success to 0.55% with only 0.62% clean accuracy loss.
Checkerboard: A Simple, Effective, Efficient and Learning-free Clean Label Backdoor Attack with Low Poisoning Budget
cs.CR 2026-05 unverdicted novelty 6.0

Checkerboard derives a closed-form checkerboard trigger for clean-label backdoor attacks that achieves over 94% ASR with poisoning rates as low as 0.46% on ImageNet-100 and 99.99% ASR with 20 samples on CIFAR-10.
DETOUR: A Practical Backdoor Attack against Object Detection
cs.CR 2026-04 unverdicted novelty 6.0

DETOUR enables practical backdoor attacks on object detectors by training with rescaled semantic triggers from real-world objects placed at multiple locations to exploit the trigger radiating effect for reliable activ...
When AI reviews science: Can we trust the referee?
cs.AI 2026-04 unverdicted novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference sub...
CSC: Turning the Adversary's Poison against Itself
cs.CR 2026-04 unverdicted novelty 6.0

CSC identifies backdoored samples via early-epoch latent clustering and conceals them by relabeling to a virtual class, driving attack success rates near zero on benchmarks with little clean accuracy loss.
PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers
cs.CV 2026-04 unverdicted novelty 6.0

PASTA enables patch-agnostic backdoor activation in ViTs via multi-location trigger insertion during training and bi-level optimization, achieving 99.13% average attack success with large gains in visual/attention ste...
Mechanistic Anomaly Detection via Functional Attribution
cs.LG 2026-04 unverdicted novelty 6.0

Functional attribution with influence functions detects anomalous mechanisms in neural networks, achieving SOTA backdoor detection (average DER 0.93) on vision benchmarks and improvements on LLMs.
BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning
cs.CR 2026-04 unverdicted novelty 6.0

BadSkill poisons embedded models in agent skills to achieve up to 99.5% attack success rate on triggered tasks with only 3% poison rate while preserving normal behavior on non-trigger inputs.
Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models
cs.CR 2026-04 unverdicted novelty 6.0

Introduces a text-guided backdoor attack using common textual words as triggers and visual perturbations for stealthy, adjustable control on multimodal pretrained models.
Multimodal Backdoor Attack on VLMs for Autonomous Driving via Graffiti and Cross-Lingual Triggers
cs.CV 2026-04 unverdicted novelty 6.0

GLA backdoor attack on DriveVLM uses naturalistic graffiti and cross-lingual triggers to reach 90% ASR at 10% poisoning ratio while improving some clean-task metrics like BLEU-1.
Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training
cs.LG 2026-04 unverdicted novelty 5.0

Catastrophic overfitting in fast adversarial training is reinterpreted as a weak-trigger variant of unlearnable tasks, allowing backdoor-inspired recalibration and outlier suppression to restore robustness.
A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models
cs.CV 2026-04 unverdicted novelty 5.0

A patch-augmented cross-view regularization method reduces backdoor attack success rates in multimodal LLMs by enforcing output differences between original and perturbed views while using entropy constraints to prese...
SafeLM: Unified Privacy-Aware Optimization for Trustworthy Federated Large Language Models
cs.CR 2026-04 unverdicted novelty 4.0

SafeLM unifies privacy-preserving federated LLM training with Paillier encryption, attack defenses, contrastive grounding, and binarized aggregation to achieve 98% harmful content detection, 96.9% less communication, ...
SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions
cs.LG 2026-05 accept novelty 3.0

NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · cited by 23 Pith papers · 3 internal anchors

[1]

Available: https://www.tripwire.com/state-of-security/ security-data-protection/insider-threats-main-security-threat-2017/

[Online]. Available: https://www.tripwire.com/state-of-security/ security-data-protection/insider-threats-main-security-threat-2017/

work page 2017
[2]

Available: https://www.helpnetsecurity.com/2015/08/19/ the-insider-versus-the-outsider-who-poses-the-biggest-security-risk/

[Online]. Available: https://www.helpnetsecurity.com/2015/08/19/ the-insider-versus-the-outsider-who-poses-the-biggest-security-risk/

work page 2015
[3]

Available: https://www.fastcompany.com/3065778/ baidu-says-new-face-recognition-can-replace-checking-ids-or-tickets

[Online]. Available: https://www.fastcompany.com/3065778/ baidu-says-new-face-recognition-can-replace-checking-ids-or-tickets

work page arXiv
[4]

Available: https://www

[Online]. Available: https://www. washingtonpost.com/news/innovations/wp/2017/06/01/ your-face-or-ﬁngerprint-could-soon-replace-your-plane-ticket/?utm term=.9ab59954d36e

work page 2017
[5]

Available: http://www.zdnet.com/article/ facial-recognition-technology-to-replace-passports-at-australian-airports

[Online]. Available: http://www.zdnet.com/article/ facial-recognition-technology-to-replace-passports-at-australian-airports

work page
[6]

Available: http://www.facephi.com/en/content/banks/

[Online]. Available: http://www.facephi.com/en/content/banks/

work page
[7]

Tensorﬂow: A system for large-scale machine learning

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al. , “Tensorﬂow: A system for large-scale machine learning.” in OSDI, vol. 16, 2016, pp. 265–283

work page 2016
[8]

A self-checking signature scheme for checking backdoor security attacks in internet,

M. F. Abdulla and C. Ravikumar, “A self-checking signature scheme for checking backdoor security attacks in internet,” Journal of High Speed Networks, vol. 13, no. 4, pp. 309–317, 2004

work page 2004
[9]

Data poisoning attacks against autoregressive models,

S. Alfeld, X. Zhu, and P. Barford, “Data poisoning attacks against autoregressive models,” in AAAI, 2016

work page 2016
[10]

Can machine learning be secure?

M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar, “Can machine learning be secure?” in Proceedings of the 2006 ACM Symposium on Information, computer and communications security . ACM, 2006, pp. 16–25

work page 2006
[11]

Poisoning attacks to compromise face templates,

B. Biggio, L. Didaci, G. Fumera, and F. Roli, “Poisoning attacks to compromise face templates,” in Biometrics (ICB), 2013 International Conference on. IEEE, 2013, pp. 1–7

work page 2013
[12]

Poisoning adaptive biometric systems,

B. Biggio, G. Fumera, F. Roli, and L. Didaci, “Poisoning adaptive biometric systems,” in Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition. Springer-Verlag, 2012, pp. 417–425

work page 2012
[13]

Safe: Secure authentication with face and eyes,

A. Boehm, D. Chen, M. Frank, L. Huang, C. Kuo, T. Lolic, I. Marti- novic, and D. Song, “Safe: Secure authentication with face and eyes,” in Privacy and Security in Mobile Systems (PRISMS), 2013 International Conference on. IEEE, 2013, pp. 1–8

work page 2013
[14]

Robust principal component analysis?

E. J. Cand `es, X. Li, Y . Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM (JACM) , vol. 58, no. 3, p. 11, 2011

work page 2011
[15]

Towards evaluating the robustness of neural networks,

N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Security and Privacy (SP), 2017 IEEE Symposium on . IEEE, 2017, pp. 39–57

work page 2017
[16]

Learning from untrusted data,

M. Charikar, J. Steinhardt, and G. Valiant, “Learning from untrusted data,” arXiv preprint arXiv:1611.02315 , 2016

work page arXiv 2016
[17]

Deepdriving: Learning affordance for direct perception in autonomous driving,

C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” in Proceedings of the IEEE International Conference on Computer Vision , 2015, pp. 2722–2730

work page 2015
[18]

Robust High Dimensional Sparse Regression and Matching Pursuit

Y . Chen, C. Caramanis, and S. Mannor, “Robust high dimen- sional sparse regression and matching pursuit,” arXiv preprint arXiv:1301.2725, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[19]

An attempt to backdoor the kernel,

corbet, “An attempt to backdoor the kernel,” https://lwn.net/Articles/57135/, 2003

work page 2003
[20]

Vsftpd backdoor discovered in source code (the H),

——, “Vsftpd backdoor discovered in source code (the H),” https://lwn.net/Articles/450181/, 2011

work page 2011
[21]

Large-scale malware classiﬁcation using random projections and neural networks,

G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale malware classiﬁcation using random projections and neural networks,” in Acous- tics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 3422–3426

work page 2013
[22]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on . IEEE, 2009, pp. 248–255

work page 2009
[23]

Your face is not your password face authentication bypassing lenovo–asus–toshiba

N. M. Duc and B. Q. Minh, “Your face is not your password face authentication bypassing lenovo–asus–toshiba.”

work page
[24]

Spooﬁng in 2d face recognition with 3d masks and anti-spooﬁng with kinect,

N. Erdogmus and S. Marcel, “Spooﬁng in 2d face recognition with 3d masks and anti-spooﬁng with kinect,” in Biometrics: Theory, Applica- tions and Systems (BTAS), 2013 IEEE Sixth International Conference on. IEEE, 2013, pp. 1–6

work page 2013
[25]

Robust physical-world attacks on machine learning models,

I. Evtimov, K. Eykholt, E. Fernandes, T. Kohno, B. Li, A. Prakash, A. Rahmati, and D. Song, “Robust physical-world attacks on machine learning models,” arXiv preprint arXiv:1707.08945 , 2017

work page arXiv 2017
[26]

Learning deep face representation,

H. Fan, Z. Cao, Y . Jiang, Q. Yin, and C. Doudou, “Learning deep face representation,” arXiv preprint arXiv:1403.2802 , 2014

work page arXiv 2014
[27]

Robust logistic regression and classiﬁcation,

J. Feng, H. Xu, S. Mannor, and S. Yan, “Robust logistic regression and classiﬁcation,” in Advances in Neural Information Processing Systems , 2014, pp. 253–261

work page 2014
[28]

Facilitating fashion camouﬂage art,

R. Feng and B. Prabhakaran, “Facilitating fashion camouﬂage art,” in Proceedings of the 21st ACM international conference on Multimedia . ACM, 2013, pp. 793–802

work page 2013
[29]

Explaining and Harnessing Adversarial Examples

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[30]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnera- bilities in the machine learning model supply chain,” arXiv preprint arXiv:1708.06733, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

Vulnerability note VU no. 247371,

J. S. Havrilla, “Vulnerability note VU no. 247371,” https://www.kb.cert.org/vuls/id/247371, 2001

work page 2001
[32]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778

work page 2016
[33]

Inc., “Face++,” https://www.faceplusplus.com/

M. Inc., “Face++,” https://www.faceplusplus.com/

work page
[34]

Detecting trigger-based behaviors in botnet malware,

B. Kang, J. Yang, J. So, and C. Y . Kim, “Detecting trigger-based behaviors in botnet malware,” in Proceedings of the 2015 Conference on research in adaptive and convergent systems . ACM, 2015, pp. 274–279

work page 2015
[35]

Understanding black-box predictions via inﬂuence functions,

P. W. Koh and P. Liang, “Understanding black-box predictions via inﬂuence functions,” in International Conference on Machine Learning, 2017

work page 2017
[36]

Adversarial examples in the physical world,

A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” arXiv preprint arXiv:1607.02533 , 2016

work page arXiv 2016
[37]

Understanding osn-based facial disclosure against face authentication systems,

Y . Li, K. Xu, Q. Yan, Y . Li, and R. H. Deng, “Understanding osn-based facial disclosure against face authentication systems,” in Proceedings of the 9th ACM symposium on Information, computer and communications security. ACM, 2014, pp. 413–424

work page 2014
[38]

Robust high-dimensional linear regression,

C. Liu, B. Li, Y . V orobeychik, and A. Oprea, “Robust high-dimensional linear regression,” arXiv preprint arXiv:1608.02257 , 2016

work page arXiv 2016
[39]

Robust linear regression against training data poisoning,

——, “Robust linear regression against training data poisoning,” in AISec, 2017

work page 2017
[40]

Delving into transferable adversarial examples and black-box attacks,

Y . Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarial examples and black-box attacks,” in Proceedings of the International Conference on Learning Representations , 2017. 14

work page 2017
[41]

Trojaning attack on neural networks,

Y . Liu, S. Ma, Y . Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” 2017

work page 2017
[42]

Neural trojans,

Y . Liu, Y . Xie, and A. Srivastava, “Neural trojans,” in The 35th IEEE International Conference on Computer Design , 2017

work page 2017
[43]

Backdoor liability from internet telecommuters,

M. J. Maier, “Backdoor liability from internet telecommuters,” Com- puter L. Rev. & Tech. J. , vol. 6, p. 27, 2001

work page 2001
[44]

The security of latent dirichlet allocation,

S. Mei and X. Zhu, “The security of latent dirichlet allocation,” in AISTATS, 2015

work page 2015
[45]

Using machine teaching to identify optimal training-set attacks on machine learners,

——, “Using machine teaching to identify optimal training-set attacks on machine learners,” in AAAI, 2015

work page 2015
[46]

Microsoft azure,

Microsoft, “Microsoft azure,” https://azure.microsoft.com/en-us/

work page
[47]

Mobilesec android authentication framework,

MobileSec, “Mobilesec android authentication framework,” https:// github.com/mobilesec/authentication-framework-module-face

work page
[48]

Univer- sal adversarial perturbations,

S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Univer- sal adversarial perturbations,” in Computer Vision and Pattern Recog- nition (CVPR), 2017 IEEE Conference on . IEEE, 2017

work page 2017
[49]

Towards poisoning of deep learning algorithms with back-gradient optimization,

L. Mu ˜noz-Gonz´alez, B. Biggio, A. Demontis, A. Paudice, V . Won- grassamee, E. C. Lupu, and F. Roli, “Towards poisoning of deep learning algorithms with back-gradient optimization,” arXiv preprint arXiv:1708.08689, 2017

work page arXiv 2017
[50]

Face recognition,

NEC, “Face recognition,” http://www.nec.com/en/global/solutions/ biometrics/technologies/facerecognition.html

work page
[51]

Sentiveillance sdk,

NEUROTechnology, “Sentiveillance sdk,” http://www.neurotechnology. com/sentiveillance.html

work page
[52]

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in ma- chine learning: from phenomena to black-box attacks using adversarial samples,” arXiv preprint arXiv:1605.07277 , 2016

work page Pith review arXiv 2016
[53]

Practical black-box attacks against deep learning systems using adversarial examples,

N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against deep learning systems using adversarial examples,” arXiv preprint arXiv:1602.02697 , 2016

work page arXiv 2016
[54]

Practical black-box attacks against machine learning,

——, “Practical black-box attacks against machine learning,” in Pro- ceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 2017, pp. 506–519

work page 2017
[55]

Deep face recognition

O. M. Parkhi, A. Vedaldi, A. Zisserman et al., “Deep face recognition.” in Proceedings of the British Machine Vision Conference (BMVC), 2015

work page 2015
[56]

A three-layer back-propagation neural network for spam detection using artiﬁcial immune concentration,

G. Ruan and Y . Tan, “A three-layer back-propagation neural network for spam detection using artiﬁcial immune concentration,” Soft computing, vol. 14, no. 2, pp. 139–150, 2010

work page 2010
[57]

Deep neural network based malware detection using two dimensional binary program features,

J. Saxe and K. Berlin, “Deep neural network based malware detection using two dimensional binary program features,” in Malicious and Unwanted Software (MALWARE), 2015 10th International Conference on. IEEE, 2015, pp. 11–20

work page 2015
[58]

Facenet: A uniﬁed embedding for face recognition and clustering,

F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A uniﬁed embedding for face recognition and clustering,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2015, pp. 815–823

work page 2015
[59]

Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition,

M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2016, pp. 1528–1540

work page 2016
[60]

Mastering the game of go with deep neural networks and tree search,

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016

work page 2016
[61]

Certiﬁed defenses for data poisoning attacks,

J. Steinhardt, P. W. Koh, and P. Liang, “Certiﬁed defenses for data poisoning attacks,” in NIPS, 2017

work page 2017
[62]

Deep learning face representation from predicting 10,000 classes,

Y . Sun, X. Wang, and X. Tang, “Deep learning face representation from predicting 10,000 classes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2014, pp. 1891–1898

work page 2014
[63]

Deepface: Closing the gap to human-level performance in face veriﬁcation,

Y . Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the gap to human-level performance in face veriﬁcation,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2014, pp. 1701–1708

work page 2014
[64]

Deep belief networks for spam ﬁltering,

G. Tzortzis and A. Likas, “Deep belief networks for spam ﬁltering,” in Tools with Artiﬁcial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference on , vol. 2. IEEE, 2007, pp. 306–309

work page 2007
[65]

TCP-32764,

E. Vanderbeken, “TCP-32764,” https://github.com/elvanderb/TCP- 32764, 2014

work page 2014
[66]

Fingerprint classiﬁcation based on depth neural network,

R. Wang, C. Han, Y . Wu, and T. Guo, “Fingerprint classiﬁcation based on depth neural network,” arXiv preprint arXiv:1409.5188 , 2014

work page arXiv 2014
[67]

Face recognition in unconstrained videos with matched background similarity,

L. Wolf, T. Hassner, and I. Maoz, “Face recognition in unconstrained videos with matched background similarity,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on . IEEE, 2011, pp. 529–534

work page 2011
[68]

Is feature selection secure against training data poisoning,

H. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, and F. Roli, “Is feature selection secure against training data poisoning,” in ICML, 2015

work page 2015
[69]

Achieving human parity in conversational speech recognition,

W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and G. Zweig, “Achieving human parity in conversational speech recognition,” arXiv preprint arXiv:1610.05256 , 2016

work page arXiv 2016
[70]

Generative poisoning attack method against neural networks,

C. Yang, Q. Wu, H. Li, and Y . Chen, “Generative poisoning attack method against neural networks,” arXiv preprint arXiv:1703.01340 , 2017

work page arXiv 2017
[71]

Backdoor attacks on black-box ciphers exploiting low-entropy plaintexts,

A. Young and M. Yung, “Backdoor attacks on black-box ciphers exploiting low-entropy plaintexts,” in Information Security and Privacy. Springer, 2003, pp. 216–216

work page 2003
[72]

Researchers solve Juniper backdoor mystery; signs point to NSA,

K. Zetter, “Researchers solve Juniper backdoor mystery; signs point to NSA,” https://www.wired.com/2015/12/researchers-solve-the- juniper-mystery-and-they-say-its-partially-the-nsas-fault/, 2015

work page 2015
[73]

Understand- ing deep learning requires rethinking generalization,

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understand- ing deep learning requires rethinking generalization,” in Proceedings of the International Conference on Learning Representations , 2017

work page 2017
[74]

Naive-deep face recognition: Touching the limit of lfw benchmark or not?

E. Zhou, Z. Cao, and Q. Yin, “Naive-deep face recognition: Touching the limit of lfw benchmark or not?” arXiv preprint arXiv:1501.04690 , 2015. APPENDIX A. Examples of the attacks An example of a set of backdoor instances generated by the input-instance-key strategy is illustrated in Figure 13. Although they all look similar to each other, they are differ...

work page arXiv 2015