Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition

Colin Raffel; Garrison Cottrell; Ian Goodfellow; Nicholas Carlini; Yao Qin

arxiv: 1903.10346 · v2 · pith:US23MALPnew · submitted 2019-03-22 · 📡 eess.AS · cs.LG· cs.SD· stat.ML

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition

Yao Qin , Nicholas Carlini , Ian Goodfellow , Garrison Cottrell , Colin Raffel This is my paper

classification 📡 eess.AS cs.LGcs.SDstat.ML

keywords adversarialexamplestargetedaudiocausedomaineffectiveimperceptible

0 comments

read the original abstract

Adversarial examples are inputs to machine learning models designed by an adversary to cause an incorrect output. So far, adversarial examples have been studied most extensively in the image domain. In this domain, adversarial examples can be constructed by imperceptibly modifying images to cause misclassification, and are practical in the physical world. In contrast, current targeted adversarial examples applied to speech recognition systems have neither of these properties: humans can easily identify the adversarial perturbations, and they are not effective when played over-the-air. This paper makes advances on both of these fronts. First, we develop effectively imperceptible audio adversarial examples (verified through a human study) by leveraging the psychoacoustic principle of auditory masking, while retaining 100% targeted success rate on arbitrary full-sentence targets. Next, we make progress towards physical-world over-the-air audio adversarial examples by constructing perturbations which remain effective even after applying realistic simulated environmental distortions.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Codec-Robust Attacks on Audio LLMs
cs.SD 2026-05 unverdicted novelty 6.0

CodecAttack optimizes perturbations in neural audio codec latent space to reach 85.5% average target-substring ASR on compressed Opus audio while waveform baselines stay below 26%.