CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Carl A. Gunter; Heqing Huang; Kai Chen; Shengzhi Zhang; Xiaofeng Wang; Xiaokang Liu; Xuejing Yuan; Yue Zhao; Yunhui Long; Yuxuan Chen

arxiv: 1801.08535 · v3 · pith:RHGF6SLInew · submitted 2018-01-24 · 💻 cs.CR · cs.LG· cs.SD· eess.AS

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Xuejing Yuan , Yuxuan Chen , Yue Zhao , Yunhui Long , Xiaokang Liu , Kai Chen , Shengzhi Zhang , Heqing Huang

show 2 more authors

Xiaofeng Wang Carl A. Gunter

This is my paper

classification 💻 cs.CR cs.LGcs.SDeess.AS

keywords voicecommandsattacksautomaticallydemonstrateeffectivelylesspractical

0 comments

read the original abstract

The popularity of ASR (automatic speech recognition) systems, like Google Voice, Cortana, brings in security concerns, as demonstrated by recent attacks. The impacts of such threats, however, are less clear, since they are either less stealthy (producing noise-like voice commands) or requiring the physical presence of an attack device (using ultrasound). In this paper, we demonstrate that not only are more practical and surreptitious attacks feasible but they can even be automatically constructed. Specifically, we find that the voice commands can be stealthily embedded into songs, which, when played, can effectively control the target system through ASR without being noticed. For this purpose, we developed novel techniques that address a key technical challenge: integrating the commands into a song in a way that can be effectively recognized by ASR through the air, in the presence of background noise, while not being detected by a human listener. Our research shows that this can be done automatically against real world ASR applications. We also demonstrate that such CommanderSongs can be spread through Internet (e.g., YouTube) and radio, potentially affecting millions of ASR users. We further present a new mitigation technique that controls this threat.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Codec-Robust Attacks on Audio LLMs
cs.SD 2026-05 unverdicted novelty 6.0

CodecAttack optimizes perturbations in neural audio codec latent space to reach 85.5% average target-substring ASR on compressed Opus audio while waveform baselines stay below 26%.