Delving into adversarial attacks on deep policies

Dawn Song; Jernej Kos

arxiv: 1705.06452 · v1 · pith:HOCNWI5Hnew · submitted 2017-05-18 · 📊 stat.ML · cs.LG

Delving into adversarial attacks on deep policies

Jernej Kos , Dawn Song This is my paper

classification 📊 stat.ML cs.LG

keywords adversarialdeepexamplesattackslearningnoisenovelpolicies

0 comments

read the original abstract

Adversarial examples have been shown to exist for a variety of deep learning architectures. Deep reinforcement learning has shown promising results on training agent policies directly on raw inputs such as image pixels. In this paper we present a novel study into adversarial attacks on deep reinforcement learning polices. We compare the effectiveness of the attacks using adversarial examples vs. random noise. We present a novel method for reducing the number of times adversarial examples need to be injected for a successful attack, based on the value function. We further explore how re-training on random noise and FGSM perturbations affects the resilience against adversarial examples.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Efficient Preference Poisoning Attack on Offline RLHF
cs.LG 2026-05 unverdicted novelty 8.0

Label-flip attacks on log-linear DPO reduce to binary sparse approximation problems that can be solved efficiently by lattice-based and binary matching pursuit methods with recovery guarantees.