The Coding Limits of Robust Watermarking for Generative Models
Pith reviewed 2026-05-18 17:11 UTC · model grok-4.3
The pith
Watermarking schemes cannot detect tampering if an adversary corrupts more than 1-1/q of the symbols.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the zero-bit tamper-detection code abstraction, for an alphabet of size q there is a critical corruption rate of 1-1/q such that no scheme with soundness can reliably detect tampering once an adversary changes more than this fraction of symbols under independent corruption. This limit is unconditional and holds even when relaxing soundness to allow fixed constant false positive probability on random content. The threshold is tight, as information-theoretic constructions achieve soundness and tamper detection for all strictly smaller rates. In the binary case this means no cryptographic watermark remains robust if more than half the encoded bits are modified. Experiments on image watermark
What carries the argument
The zero-bit tamper-detection code, a secret-key procedure that samples a pseudorandom codeword and decides whether a candidate is unmarked content or the result of tampering.
Load-bearing premise
The adversary corrupts each symbol independently with some probability instead of using correlated or adaptive modifications across positions.
What would settle it
An explicit construction of a sound scheme that still detects tampering after independent corruption strictly above the 1-1/q rate, or a proof that every scheme below the rate fails on some independent corruption pattern.
Figures
read the original abstract
We study a basic question about cryptographic watermarking for generative models: how reliable can a watermark remain when an adversary is allowed to corrupt the encoded signal? To address this question, we introduce a minimal coding abstraction that we call a zero-bit tamper-detection code. This is a secret-key procedure that samples a pseudorandom codeword and, given a candidate word, decides whether it should be treated as unmarked content or as the result of tampering with a valid codeword. It captures the two core requirements of robust watermarking: soundness and tamper detection. Within this abstraction we prove a sharp unconditional limit on robustness to independent symbol corruption. For an alphabet of size $q$, there is a critical corruption rate of $1-1/q$ such that no scheme with soundness, even relaxed to allow a fixed constant false positive probability on random content, can reliably detect tampering once an adversary can change more than this fraction of symbols. In particular, in the binary case no cryptographic watermark can remain robust if more than half of the encoded bits are modified. We also show that this threshold is tight by giving simple information-theoretic constructions that achieve soundness and tamper detection for all strictly smaller corruption rates. We then test experimentally whether this limit appears in practice by looking at the recent watermarking for images of Gunn, Zhao, and Song (ICLR 2025). We show that a simple crop and resize operation reliably flipped about half of the latent signs and consistently prevented belief-propagation decoding from recovering the codeword, erasing the watermark while leaving the image visually intact.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces zero-bit tamper-detection codes as a minimal abstraction capturing soundness and tamper detection for cryptographic watermarking of generative models. It proves an unconditional information-theoretic impossibility: for alphabet size q, no sound scheme (even allowing constant false-positive probability on random content) can reliably detect tampering under independent symbol corruption once the rate exceeds 1-1/q. Matching explicit constructions achieve the properties for all strictly lower rates. The binary case yields a 1/2 threshold. An experimental section applies crop-and-resize to the Gunn et al. (ICLR 2025) image watermarking scheme and shows it flips roughly half the latent signs, preventing belief-propagation decoding while leaving the image intact.
Significance. If the result holds, the work establishes a sharp, fundamental coding limit on robust watermarking against independent corruptions. Strengths include the unconditional proof, parameter-free constructions, and direct experimental test of the predicted threshold on a published scheme. The relaxation to constant false-positive rate and the clear statement of the independence assumption make the bound broadly applicable to future designs.
minor comments (2)
- The experimental description of the crop-resize attack would be strengthened by reporting the precise crop ratio, resize dimensions, and the exact fraction of sign flips observed across multiple images (rather than the approximate 'about half').
- In the construction section, a short pseudocode or explicit parameter settings for the information-theoretic encoder/decoder would improve clarity for readers implementing the schemes below the threshold.
Simulated Author's Rebuttal
We thank the referee for their positive and accurate summary of our work, as well as for recommending acceptance. We are pleased that the unconditional information-theoretic bound, the tightness via explicit constructions, and the experimental test on the Gunn et al. scheme were viewed as strengths.
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper establishes an information-theoretic impossibility result for zero-bit tamper-detection codes under independent symbol corruption, proving a sharp threshold of 1-1/q for alphabet size q using standard coding arguments. Matching constructions for rates below the threshold are given as explicit parameter-free schemes. No steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the experimental illustration on an external scheme (Gunn et al.) is separate and does not affect the theoretical claim. The derivation relies on first-principles analysis of soundness and tamper detection without circular reductions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pseudorandom codeword sampling with secret key
invented entities (1)
-
zero-bit tamper-detection code
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
For an alphabet of size q, there is a critical corruption rate of 1−1/q such that no scheme with soundness... can reliably detect tampering once an adversary can change more than this fraction of symbols.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1... Suppose Γ satisfies F^ind_αn-tamper detection for α=(1−1/q)(1+δ)... Then Γ cannot satisfy soundness.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Removing the Watermark Is Not Enough: Forensic Stealth in Generative-AI Watermark Removal
Current AI image watermark removal attacks replace the watermark with a different forensic signal, allowing independent detectors to distinguish processed outputs from clean images at over 98% true-positive rate under...
-
Fundamental Trade-Offs in Multi-Bit Watermarking of Stochastic Processes
Derives matched converse and achievability bounds that characterize optimal trade-offs among false-alarm probability, detection error probability, distortion, and information rate for multi-bit watermarking of station...
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
Omar Alrabiah, Prabhanjan Ananth, Miranda Christ, Yevgeniy Dodis, and Sam Gunn. Ideal pseudorandom codes. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing , STOC '25, page 1638–1647, New York, NY, USA, 2025. Association for Computing Machinery
work page 2025
-
[3]
My AI safety lecture for UT effective altruism
Scott Aaronson. My AI safety lecture for UT effective altruism. https://scottaaronson.blog/?p=6823, 2022. Discusses watermarking projects at OpenAI. Accessed: September 2025
work page 2022
-
[4]
Waves: benchmarking the robustness of image watermarks
Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, and Furong Huang. Waves: benchmarking the robustness of image watermarks. In Proceedings of the 41st International Conference on Machine Learning , ICML'24. JMLR.org, 2024
work page 2024
-
[5]
Mikhail J. Atallah, Victor Raskin, Michael Crogan, Christian Hempelmann, Florian Kerschbaum, Dina Mohamed, and Sanket Naik. Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Ira S. Moskowitz, editor, Information Hiding , pages 185--200, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg
work page 2001
-
[6]
Atallah, Victor Raskin, Christian F
Mikhail J. Atallah, Victor Raskin, Christian F. Hempelmann, Mercan Karahan, Radu Sion, Umut Topkara, and Katrina E. Triezenberg. Natural language watermarking and tamperproofing. In Fabien A. P. Petitcolas, editor, Information Hiding , pages 196--212, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg
work page 2003
-
[7]
Audio watermarking: Features, applications and algorithms
Michael Arnold. Audio watermarking: Features, applications and algorithms. In 2000 IEEE International conference on multimedia and expo. ICME2000. Proceedings. Latest advances in the fast changing world of multimedia (cat. no. 00TH8532) , volume 2, pages 1013--1016. IEEE, 2000
work page 2000
-
[8]
F.M. Boland, J.J.K. O'Ruanaidh, and C. Dautzenberg. Watermarking digital images for copyright protection. In Fifth International Conference on Image Processing and its Applications, 1995. , pages 326--330, 1995
work page 1995
-
[9]
Digital watermarks for audio signals
Laurence Boney, Ahmed H Tewfik, and Khaled N Hamdy. Digital watermarks for audio signals. In Proceedings of the third IEEE international conference on multimedia computing and systems , pages 473--480. IEEE, 1996
work page 1996
-
[10]
Pseudorandom error-correcting codes
Miranda Christ and Sam Gunn. Pseudorandom error-correcting codes. In Leonid Reyzin and Douglas Stebila, editors, CRYPTO 2024, Part VI , volume 14925 of LNCS , pages 325--347. Springer, Cham, August 2024
work page 2024
-
[11]
Revealing weaknesses in text watermarking through self-information rewrite attacks
Yixin Cheng, Hongcheng Guo, Yangming Li, and Leonid Sigal. Revealing weaknesses in text watermarking through self-information rewrite attacks. In Forty-second International Conference on Machine Learning , 2025
work page 2025
-
[12]
Undetectable watermarks for language models
Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. In Shipra Agrawal and Aaron Roth, editors, Proceedings of Thirty Seventh Conference on Learning Theory , volume 247 of Proceedings of Machine Learning Research , pages 1125--1139. PMLR, 30 Jun--03 Jul 2024
work page 2024
-
[13]
Digital Watermarking and Steganography
Ingemar Cox, Matthew Miller, Jeffrey Bloom, Jessica Fridrich, and Ton Kalker. Digital Watermarking and Steganography . Morgan Kaufmann Publishers Inc., 2007
work page 2007
-
[14]
Regulation (EU) 2024/1689: Artificial Intelligence Act
European Parliament and Council of the European Union . Regulation (EU) 2024/1689: Artificial Intelligence Act . https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng, 2024. Official Journal of the European Union, L 1689, 12 July 2024. Accessed: September 2025
work page 2024
-
[15]
Executive Office of the President of the United States . Executive Order No. 14110: Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence . https://www.federalregister.gov/documents/2023/11/01/2023-24283, 2023. Federal Register, Vol.\ 88, No.\ 212, pp.\ 75191--75212. Accessed: September 2025
work page 2023
-
[16]
The stable signature: Rooting watermarks in latent diffusion models
Pierre Fernandez, Guillaume Couairon, Herv \'e J \'e gou, Matthijs Douze, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 22466--22477, 2023
work page 2023
-
[17]
Publicly-detectable watermarking for language models
Jaiden Fairoze, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, and Mingyuan Wang. Publicly-detectable watermarking for language models. IACR Communications in Cryptology , 1(4), 2025
work page 2025
-
[18]
Supervised gan watermarking for intellectual property protection
Jianwei Fei, Zhihua Xia, Benedetta Tondi, and Mauro Barni. Supervised gan watermarking for intellectual property protection. In 2022 IEEE International Workshop on Information Forensics and Security (WIFS) , pages 1--6, 2022
work page 2022
-
[19]
New constructions of pseudorandom codes
Surendra Ghentiyala and Venkatesan Guruswami. New constructions of pseudorandom codes. arXiv preprint arXiv:2409.07580 , 2024
-
[20]
An undetectable watermark for generative image models
Sam Gunn, Xuandong Zhao, and Dawn Song. An undetectable watermark for generative image models. In The Thirteenth International Conference on Learning Representations , 2025
work page 2025
-
[21]
Generating steganographic images via adversarial training
Jamie Hayes and George Danezis. Generating steganographic images via adversarial training. In Proceedings of the 31st International Conference on Neural Information Processing Systems , NIPS'17, page 1951–1960, Red Hook, NY, USA, 2017. Curran Associates Inc
work page 1951
-
[22]
A watermark for large language models
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning , volume 202 of Proceedings of Machine Lear...
work page 2023
-
[23]
Unmarker: A universal attack on defensive image watermarking
Andre Kassis and Urs Hengartner. Unmarker: A universal attack on defensive image watermarking. arXiv:2405.08363 , 2024. To appear at IEEE S&P 2025
-
[24]
Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense
Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. Advances in Neural Information Processing Systems , 36:27469--27500, 2023
work page 2023
-
[25]
Robust distortion-free watermarks for language models
Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for language models. Transactions on Machine Learning Research , 2024
work page 2024
-
[26]
Leveraging optimization for adaptive attacks on image watermarks
Nils Lukas, Abdulrahman Diaa, Lucas Fenaux, and Florian Kerschbaum. Leveraging optimization for adaptive attacks on image watermarks. In The Twelfth International Conference on Learning Representations , 2024
work page 2024
-
[27]
Rotation, scale and translation invariant digital image watermarking
Joseph \'O Ruanaidh and Thierry Pun. Rotation, scale and translation invariant digital image watermarking. Proceedings of International Conference on Image Processing , 1:536--539 vol.1, 1997
work page 1997
-
[28]
Robustness of AI -image detectors: Fundamental limits and practical attacks
Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei, Aounon Kumar, Atoosa Chegini, Wenxiao Wang, and Soheil Feizi. Robustness of AI -image detectors: Fundamental limits and practical attacks. In The Twelfth International Conference on Learning Representations , 2024
work page 2024
-
[29]
Tree-ring watermarks: Fingerprints for diffu- sion images that are invisible and robust
Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust. arXiv preprint arXiv:2305.20030 , 2023
-
[30]
A comprehensive survey on robust image watermarking
Wenbo Wan, Jun Wang, Yunming Zhang, Jing Li, Hui Yu, and Jiande Sun. A comprehensive survey on robust image watermarking. Neurocomputing , 488:226--247, 2022
work page 2022
-
[31]
Artificial fingerprinting for generative models: Rooting deepfake attribution in training data
Ning Yu, Vladislav Skripniuk, Sahar Abdelnabi, and Mario Fritz. Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) , pages 14428--14437, 2021
work page 2021
-
[32]
Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak
Hanlin Zhang, Benjamin L Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak. Watermarks in the sand: Impossibility of strong watermarking for generative models. arXiv preprint arXiv:2311.04378 , 2023
-
[33]
Watermarks in the sand: impossibility of strong watermarking for language models
Hanlin Zhang, Benjamin L Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak. Watermarks in the sand: impossibility of strong watermarking for language models. In Forty-first International Conference on Machine Learning , 2024
work page 2024
-
[34]
Hidden: Hiding data with deep networks
Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. Hidden: Hiding data with deep networks. In Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV , page 682–697, Berlin, Heidelberg, 2018. Springer-Verlag
work page 2018
-
[35]
A recipe for watermarking diffusion models
Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Ngai-Man Cheung, and Min Lin. A recipe for watermarking diffusion models. arXiv preprint arXiv:2303.10137 , 2023
-
[36]
Invisible image watermarks are provably removable using generative ai
Xuandong Zhao, Kexun Zhang, Zihao Su, Saastha Vasan, Ilya Grishchenko, Christopher Kruegel, Giovanni Vigna, Yu-Xiang Wang, and Lei Li. Invisible image watermarks are provably removable using generative ai. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems , volu...
work page 2024
-
[37]
Securing deep generative models with universal adversarial signature
Yu Zeng, Mo Zhou, Yuan Xue, and Vishal M Patel. Securing deep generative models with universal adversarial signature. arXiv preprint arXiv:2305.16310 , 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.