pith. sign in

arxiv: 2606.23335 · v1 · pith:NB2AZZJUnew · submitted 2026-06-22 · 💻 cs.SD · cs.AI

The Watermark Shortcut: How Provenance Marking Sabotages Audio Deepfake Detection

Pith reviewed 2026-06-26 06:43 UTC · model grok-4.3

classification 💻 cs.SD cs.AI
keywords audio deepfake detectionprovenance watermarkingshortcut learningsynthetic speechgeneralization failureevasion attackfalse positiveWASP corpus
0
0 comments X

The pith

Audio deepfake detectors latch onto provenance watermarks as a 'fake' signal when marks appear only on synthetic speech.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that provenance watermarking, promoted as a safeguard for synthetic speech, creates a spurious shortcut in detection models. When watermarks are applied exclusively to generated audio and absent from real recordings, the detector learns to classify based on the watermark's presence rather than acoustic content. This produces three linked failures: models degrade on new data, watermarks can be stripped to evade detection, and adding watermarks to real speech causes false alarms. The shortcut is demonstrated in both controlled white-box tests and commercial black-box APIs, and it disappears when watermarks are applied to both real and synthetic examples during training.

Core claim

When synthetic speech is watermarked and human speech is not, detectors trained alongside latch onto the watermark as a spurious "watermark => fake" shortcut. This single feature yields three coupled failures: generalization degradation, strip-to-evade, and mark-to-frame. In white-box experiments the effect is measurable, for instance lifting equal error rate from 16% to 75% under the mark-to-frame condition; the same pattern appears when watermarking real speech against a commercial API. The shortcut is removed by retraining with watermarks present on both classes.

What carries the argument

The watermark shortcut: the spurious correlation formed when a provenance mark is present only in the synthetic class, allowing the detector to use mark presence as a proxy label instead of learning content-based distinctions.

If this is right

  • Performance on unseen data drops because the model relies on the watermark rather than generalizable acoustic cues.
  • A watermarked fake can evade detection simply by having the mark removed.
  • Watermarking a real recording causes the detector to label it as fake.
  • The same shortcut appears when testing against a commercial detection API.
  • Retraining with the watermark present on both real and synthetic classes eliminates the three failures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Detection pipelines that assume watermarks will remain a reliable class signal will need explicit countermeasures if watermarking practices change.
  • The paired clean-versus-watermarked corpus released with the work supplies a direct testbed for checking whether other detectors also form this shortcut.
  • Any provenance system that marks only one class risks creating analogous shortcuts in downstream classifiers.

Load-bearing premise

Watermarks are applied only to synthetic speech and never to human speech during both training and deployment.

What would settle it

Train a detector with watermarks added to both real and synthetic speech and test whether equal error rate remains low on watermarked real speech and on unmarked synthetic speech.

Figures

Figures reproduced from arXiv: 2606.23335 by Nicolas M. M\"uller, Pascal Debus.

Figure 1
Figure 1. Figure 1: Fakeness-score distributions of the watermark-trained detector on [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The released WASP corpus: every utterance appears clean and under [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
read the original abstract

Provenance watermarking is increasingly treated as a safeguard for synthetic speech, whether built directly into speech-generation models such as Chatterbox, provided through dedicated techniques such as AudioSeal, or deployed by commercial platforms such as ElevenLabs. We identify a previously uncharacterized liability: when synthetic speech is watermarked and human speech is not, detectors trained alongside latch onto the watermark as a spurious "watermark => fake" shortcut. This single feature yields three coupled failures: generalization degradation (model performance deteriorates on unseen data), strip-to-evade (a watermarked fake escapes once unwatermarked), and mark-to-frame (watermarking a real voice flags it as fake). In a controlled white-box experiment, a watermark-trained detector shows all three (for example, mark-to-frame lifts Equal Error Rate from 16% to 75%). In a black-box test of a commercial API, we show that adding a watermark to real speech disguises it as fake. However, this shortcut is fixable: retraining with the watermark on both classes decorrelates it and restores clean behavior. We release experiment data as a paired clean-versus-watermarked corpus (WASP).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that provenance watermarking applied exclusively to synthetic speech (and not human speech) causes audio deepfake detectors to learn a spurious 'watermark => fake' shortcut. This produces three coupled failures: degraded generalization on unseen data, evasion of detection by stripping the watermark from fakes ('strip-to-evade'), and false positives when real speech is watermarked ('mark-to-frame'). The authors demonstrate the effect in a controlled white-box experiment (e.g., mark-to-frame raises EER from 16% to 75%) and a black-box test against a commercial API, then show the shortcut is eliminated by watermarking both classes. They release the paired WASP clean-versus-watermarked corpus.

Significance. If the experimental results hold, the finding is significant for audio deepfake detection and synthetic media provenance. It identifies a practical liability in current watermarking deployments (e.g., AudioSeal, ElevenLabs, Chatterbox) that had not been characterized. Explicit credit is due for the release of the WASP dataset, which supports reproducibility and further work on the interaction between watermarking and detection.

minor comments (3)
  1. [§4] Clarify the exact dataset sizes, train/test splits, and statistical controls (e.g., number of speakers, watermark strengths) used in the white-box experiments; these details are referenced in the abstract but should be stated explicitly in §4 or a dedicated experimental setup subsection to allow full replication.
  2. The black-box commercial API results would benefit from a table listing the specific API, number of trials, and confidence intervals on the reported evasion and framing rates.
  3. [Results] Ensure that the definition of 'unseen data' for the generalization degradation claim is unambiguous (e.g., new speakers, new generators, or both) and is consistent between the abstract and the results section.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work, the recognition of its significance for audio deepfake detection and provenance watermarking, and the recommendation of minor revision. The report correctly identifies the core claim (watermark-only-on-synthetic training induces a spurious shortcut with three failure modes) and notes the value of the released WASP corpus. No major comments were enumerated in the report, so we have no specific points requiring rebuttal or clarification at this stage.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical investigation supported by white-box and black-box experiments on detector behavior under exclusive watermarking of synthetic speech. Claims rest on observed failure modes (generalization degradation, strip-to-evade, mark-to-frame) and their mitigation when watermarks are applied to both classes, with a released paired dataset for verification. No equations, first-principles derivations, self-definitional constructs, or load-bearing self-citations appear in the provided text; the argument is externally falsifiable via the described setups and data rather than reducing to fitted inputs or prior author work by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical study of machine learning behavior; no new mathematical derivations, free parameters, or postulated entities are introduced.

axioms (1)
  • domain assumption Provenance watermarking techniques such as AudioSeal can be applied selectively to synthetic speech.
    The paper's central observation depends on the existence and selective use of such watermarking methods.

pith-pipeline@v0.9.1-grok · 5738 in / 1300 out tokens · 27526 ms · 2026-06-26T06:43:48.406262+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 5 linked inside Pith

  1. [1]

    Neural codec language models are zero-shot text to speech synthesizers,

    C. Wang, S. Chen, Y . Wu, Z. Zhang, L. Zhou, S. Liu, Z. Chen, Y . Liu, H. Wang, J. Li, L. He, S. Zhao, and F. Wei, “Neural codec language models are zero-shot text to speech synthesizers,” arXiv preprint arXiv:2301.02111, 2023. [Online]. Available: https: //arxiv.org/abs/2301.02111

  2. [2]

    F5-TTS: A fairytaler that fakes fluent and faithful speech with flow matching,

    Y . Chen, Z. Niu, Z. Ma, K. Deng, C. Wang, J. Zhao, K. Yu, and X. Chen, “F5-TTS: A fairytaler that fakes fluent and faithful speech with flow matching,”arXiv preprint arXiv:2410.06885, 2024. [Online]. Available: https://arxiv.org/abs/2410.06885

  3. [3]

    Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens,

    Z. Du, Q. Chen, S. Zhang, K. Hu, H. Lu, Y . Yang, H. Hu, S. Zheng, Y . Gu, Z. Ma, Z. Gao, and Z. Yan, “Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens,” 2024. [Online]. Available: https://arxiv.org/abs/2407.05407

  4. [4]

    Proactive detection of voice cloning with localized watermarking,

    R. San Roman, P. Fernandez, A. Défossez, T. Furon, T. Tran, and H. Elsahar, “Proactive detection of voice cloning with localized watermarking,” inProceedings of the 41st International Conference on Machine Learning (ICML), 2024. [Online]. Available: https: //arxiv.org/abs/2401.17264

  5. [5]

    Detecting voice cloning attacks via timbre watermarking,

    C. Liu, J. Zhang, T. Zhang, X. Yang, W. Zhang, and N. Yu, “Detecting voice cloning attacks via timbre watermarking,” in Network and Distributed System Security Symposium (NDSS), 2024, arXiv:2312.03410. [Online]. Available: https://arxiv.org/abs/2312.03410

  6. [6]

    ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech,

    X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V . Vestman, T. Kinnunen, K. A. Leeet al., “ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech,”Computer Speech & Language, vol. 64, p. 101114, 2020

  7. [7]

    AASIST: Audio anti-spoofing using integrated spectro-temporal graph attention networks,

    J.-w. Jung, H.-S. Heo, H. Tak, H.-j. Shim, J. S. Chung, B.-J. Lee, H.- J. Yu, and N. Evans, “AASIST: Audio anti-spoofing using integrated spectro-temporal graph attention networks,” inICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP). IEEE, 2022, pp. 6367–6371

  8. [8]

    Does audio deepfake detection generalize?

    N. M. Müller, P. Czempin, F. Dieckmann, A. Froghyar, and K. Böttinger, “Does audio deepfake detection generalize?” inProc. Interspeech 2022, 2022, pp. 2783–2787, arXiv:2203.16263

  9. [9]

    The impact of audio watermarking on audio anti-spoofing countermeasures,

    Z. Zhang, X. Zhang, Y . Wang, L. Jin, and M. Li, “The impact of audio watermarking on audio anti-spoofing countermeasures,” 2025. [Online]. Available: https://arxiv.org/abs/2509.20736

  10. [10]

    Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions,

    J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y . Zhang, Y . Wang, R. Skerry-Ryan, R. A. Saurous, Y . Agiomyrgian- nakis, and Y . Wu, “Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions,” in2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 4779–4783,...

  11. [11]

    Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech,

    J. Kim, J. Kong, and J. Son, “Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech,” inProceedings of the 38th International Conference on Machine Learning (ICML), 2021. [Online]. Available: https://arxiv.org/abs/2106.06103

  12. [12]

    Chatterbox-TTS: State-of-the-Art Open-Source Text-to- Speech,

    Resemble AI, “Chatterbox-TTS: State-of-the-Art Open-Source Text-to- Speech,” https://github.com/resemble-ai/chatterbox, 2025, gitHub repos- itory, MIT license; ships the PerTh (Perceptual Threshold) neural watermarker enabled by default

  13. [13]

    Wavmark: Watermarking for audio generation,

    G. Chen, Y . Wu, S. Liu, T. Liu, X. Du, and F. Wei, “Wavmark: Watermarking for audio generation,” 2023. [Online]. Available: https://arxiv.org/abs/2308.12770

  14. [14]

    SilentCipher: Deep audio watermarking,

    M. K. Singh, N. Takahashi, W. Liao, and Y . Mitsufuji, “SilentCipher: Deep audio watermarking,” inProc. Interspeech 2024, 2024. [Online]. Available: https://arxiv.org/abs/2406.03822

  15. [15]

    V oiceMark: Zero-shot voice cloning-resistant watermarking approach leveraging speaker-specific latents,

    H. Li, Z. Wu, X. Xie, J. Xie, Y . Xu, and H. Peng, “V oiceMark: Zero-shot voice cloning-resistant watermarking approach leveraging speaker-specific latents,” inProc. Interspeech 2025, 2025, pp. 5108–5112. [Online]. Available: https://arxiv.org/abs/2505.21568

  16. [16]

    Waveverify: A novel audio watermarking framework for media authentication and combatting deepfakes,

    A. Pujari and A. Rattani, “Waveverify: A novel audio watermarking framework for media authentication and combatting deepfakes,” in Proceedings of the IEEE International Joint Conference on Biometrics (IJCB), 2025, arXiv:2507.21150

  17. [17]

    Sok: How robust is audio watermarking in generative ai models?

    Y . Wen, A. Innuganti, A. B. Ramos, H. Guo, and Q. Yan, “Sok: How robust is audio watermarking in generative ai models?” 2025. [Online]. Available: https://arxiv.org/abs/2503.19176

  18. [18]

    AudioMarkBench: Benchmarking robustness of audio watermarking,

    H. Liu, M. Guo, Z. Jiang, L. Wang, and N. Z. Gong, “AudioMarkBench: Benchmarking robustness of audio watermarking,” inAdvances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2024. [Online]. Available: https://arxiv.org/abs/ 2406.06979

  19. [19]

    HarmonicAttack: An adaptive cross-domain audio watermark removal,

    K. Li, X. Hu, I. Grishchenko, and D. Lie, “HarmonicAttack: An adaptive cross-domain audio watermark removal,”arXiv preprint arXiv:2511.21577, 2025. [Online]. Available: https://arxiv.org/abs/2511. 21577

  20. [20]

    Self voice conversion as an attack against neural audio watermarking,

    Y . Özer, W. Ge, Z. Zhang, X. Wang, and J. Yamagishi, “Self voice conversion as an attack against neural audio watermarking,”arXiv preprint arXiv:2601.20432, 2026

  21. [21]

    Yours or mine? overwriting attacks against neural audio watermarking,

    L. Yao, C. Huang, S. Wang, J. Xue, H. Guo, J. Liu, P. Lin, T. Ohtsuki, and M. Pan, “Yours or mine? overwriting attacks against neural audio watermarking,” 2025, accepted at AAAI 2026. [Online]. Available: https://arxiv.org/abs/2509.05835

  22. [22]

    Shortcut learning in deep neural networks,

    R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann, “Shortcut learning in deep neural networks,”Nature Machine Intelligence, vol. 2, no. 11, pp. 665–673, 2020

  23. [23]

    Dataset artefacts in anti- spoofing systems: A case study on the ASVspoof 2017 benchmark,

    B. Chettri, E. Benetos, and B. L. T. Sturm, “Dataset artefacts in anti- spoofing systems: A case study on the ASVspoof 2017 benchmark,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 3018–3028, 2020

  24. [24]

    Speech is silver, silence is golden: What do ASVspoof- trained models really learn?

    N. M. Müller, F. Dieckmann, P. Czempin, R. Canals, K. Böttinger, and J. Williams, “Speech is silver, silence is golden: What do ASVspoof- trained models really learn?” inProceedings of the 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge (ASVspoof), 2021, pp. 55–60

  25. [25]

    Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning,

    S. Smeu, D.-A. Boldisor, D. Oneata, and E. Oneata, “Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 18 815–18 825

  26. [26]

    Harder or different? understanding generalization of audio deepfake detection,

    N. M. Müller, N. Evans, H. Tak, P. Sperl, and K. Böttinger, “Harder or different? understanding generalization of audio deepfake detection,” inProc. Interspeech 2024, 2024. [Online]. Available: https://arxiv.org/abs/2406.03512

  27. [27]

    Are watermarks bugs for deepfake detectors? rethinking proactive forensics,

    X. Wu, X. Liao, B. Ou, Y . Liu, and Z. Qin, “Are watermarks bugs for deepfake detectors? rethinking proactive forensics,” inProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24), 2024, pp. 6089–6097, arXiv:2404.17867

  28. [28]

    Robust deepfake detector against deep image watermarking,

    J. Yu, X. Liu, F. Zan, and Y . Peng, “Robust deepfake detector against deep image watermarking,”PLoS One, vol. 20, no. 12, p. e0338778, 2025

  29. [29]

    A temporal chrominance trigger for clean-label backdoor attack against anti-spoof rebroadcast detection,

    W. Guo, B. Tondi, and M. Barni, “A temporal chrominance trigger for clean-label backdoor attack against anti-spoof rebroadcast detection,” IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 6, pp. 4752–4762, 2023, arXiv:2206.01102

  30. [30]

    Bloodroot: When watermarking turns poisonous for stealthy backdoor,

    K.-Y . Chen, Y .-C. Lin, J.-L. Li, and J.-J. Ding, “Bloodroot: When watermarking turns poisonous for stealthy backdoor,”arXiv preprint arXiv:2510.07909, 2025. [Online]. Available: https://arxiv.org/abs/2510. 07909

  31. [31]

    AudioMarkNet: Audio watermarking for deepfake speech detection,

    W. Zong, Y .-W. Chow, W. Susilo, J. Baek, and S. Camtepe, “AudioMarkNet: Audio watermarking for deepfake speech detection,” in34th USENIX Security Symposium (USENIX Security 25). Seattle, W A: USENIX Association, 2025, pp. 4663–4682. [Online]. Available: https://www.usenix.org/conference/usenixsecurity25/presentation/zong

  32. [33]

    Available: https://arxiv.org/abs/2510.12042

    [Online]. Available: https://arxiv.org/abs/2510.12042

  33. [34]

    ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection,

    J. Yamagishi, X. Wang, M. Todisco, M. Sahidullah, J. Patino, A. Nautsch, X. Liu, K. A. Lee, T. Kinnunen, N. Evans, and H. Delgado, “ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection,” inProc. ASVspoof 2021 Workshop, 2021, arXiv:2109.00537