BadNets: Evaluating Backdooring Attacks on Deep Neural Networks

Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, Siddharth Garg · 2019 · arXiv 2019.290906

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

cs.CR · 2026-04-09 · unverdicted · novelty 8.0

Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.

Your Neighbors Know: Leveraging Local Neighborhoods for Backdoor Detection in Decentralized Learning

cs.LG · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

Argus enables backdoor detection in decentralized ML by collaborative neighbor-based validation of triggers, backed by convergence theory and reducing attack success by up to 90% on tested datasets.

Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models

cs.CV · 2025-02-28 · unverdicted · novelty 7.0

Gungnir shows that style-based triggers with RAN and STTR techniques can activate backdoors in diffusion models while evading detection and surviving fine-tuning.

Defeat Devices in AI Systems

cs.CY · 2026-06-27 · unverdicted · novelty 6.0

The paper defines defeat devices in AI via a triadic test (discriminator, concealed swap, performance gap), unifies existing cases under this concept, proposes TADP detection, and claims such devices can emerge naturally in frontier models.

Multimodal Backdoor Attack on VLMs for Autonomous Driving via Graffiti and Cross-Lingual Triggers

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

GLA backdoor attack on DriveVLM uses naturalistic graffiti and cross-lingual triggers to reach 90% ASR at 10% poisoning ratio while improving some clean-task metrics like BLEU-1.

TEMPO-Diffusion: Temporally Exposed Malicious Poisoning of Diffusion Models

cs.CR · 2026-06-24 · unverdicted · novelty 5.0

TEMPO-Diffusion is a targeted backdoor attack framework for diffusion models that uses time-conditioned triggers to poison class-specific synthetic data, achieving high attack success in downstream classifiers.

citing papers explorer

Showing 6 of 6 citing papers.

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain cs.CR · 2026-04-09 · unverdicted · none · ref 19
Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.
Your Neighbors Know: Leveraging Local Neighborhoods for Backdoor Detection in Decentralized Learning cs.LG · 2026-05-19 · unverdicted · none · ref 7 · 2 links
Argus enables backdoor detection in decentralized ML by collaborative neighbor-based validation of triggers, backed by convergence theory and reducing attack success by up to 90% on tested datasets.
Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models cs.CV · 2025-02-28 · unverdicted · none · ref 16
Gungnir shows that style-based triggers with RAN and STTR techniques can activate backdoors in diffusion models while evading detection and surviving fine-tuning.
Defeat Devices in AI Systems cs.CY · 2026-06-27 · unverdicted · none · ref 23
The paper defines defeat devices in AI via a triadic test (discriminator, concealed swap, performance gap), unifies existing cases under this concept, proposes TADP detection, and claims such devices can emerge naturally in frontier models.
Multimodal Backdoor Attack on VLMs for Autonomous Driving via Graffiti and Cross-Lingual Triggers cs.CV · 2026-04-06 · unverdicted · none · ref 16
GLA backdoor attack on DriveVLM uses naturalistic graffiti and cross-lingual triggers to reach 90% ASR at 10% poisoning ratio while improving some clean-task metrics like BLEU-1.
TEMPO-Diffusion: Temporally Exposed Malicious Poisoning of Diffusion Models cs.CR · 2026-06-24 · unverdicted · none · ref 17
TEMPO-Diffusion is a targeted backdoor attack framework for diffusion models that uses time-conditioned triggers to poison class-specific synthetic data, achieving high attack success in downstream classifiers.

BadNets: Evaluating Backdooring Attacks on Deep Neural Networks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer