pith. sign in

arxiv: 2606.20893 · v1 · pith:EZF3MI5Ynew · submitted 2026-06-18 · 💻 cs.SD · cs.AI· cs.CR

Exploiting Neural Audio Codec Latents for Adversarial Audio Attacks

classification 💻 cs.SD cs.AIcs.CR
keywords audioadversarialattacksgenerativeattackcodecinferencelatency
0
0 comments X
read the original abstract

Deep learning-based audio classification systems, including automatic speaker verification, are vulnerable to adversarial attacks. Realistic real-time threat assessment remains difficult because optimization-based methods, such as projected gradient descent (PGD) and Carlini-Wagner, require costly iterative updates in the high-dimensional waveform domain. Generative attacks allow single-shot synthesis but often introduce perceptible artifacts or depend on computationally intensive architectures, while diffusion and autoregressive approaches incur high inference latency. To address this gap, we propose a generative attack framework operating in the continuous latent space of a neural audio codec. A conditional generator synthesizes class-specific perturbations in a single forward pass and decodes them into adversarial waveforms. Our method achieves targeted attack success rates up to 99% with sub-7 ms inference, outperforming generative baselines while reducing latency by 24x.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.