Recognition: unknown
On the Robustness of Watermarking for Autoregressive Image Generation
Pith reviewed 2026-05-10 16:09 UTC · model grok-4.3
The pith
Watermarking schemes for autoregressive image generators fail against removal and forgery attacks that require only a single reference image.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Existing watermarking schemes for autoregressive image generation do not reliably support synthetic content detection for dataset filtering because removal and forgery attacks succeed with access to only one watermarked reference image and the detector; the schemes therefore enable Watermark Mimicry, in which authentic images are altered to imitate a generator's signal and block their own inclusion in future training data.
What carries the argument
The watermark embedding step inside the autoregressive generation process together with its matching detector, which the paper attacks by regenerating tokens, optimizing perturbations, or injecting frequencies to erase or copy the signal.
If this is right
- Watermarked images can have their signals removed, allowing synthetic content to pass undetected into training datasets.
- Real images can be edited to carry a false watermark, causing detectors to reject them and shrink available training data.
- Dataset filtering pipelines that rely on these watermarks cannot guarantee exclusion of generated images.
- Attribution of outputs to specific generators becomes unreliable once forgery is possible.
Where Pith is reading between the lines
- Developers may need to layer watermarking with statistical or fingerprinting checks that do not rely on a single embeddable signal.
- The same regeneration and frequency attacks could be tested on non-autoregressive generators to see whether the weaknesses are architecture-specific.
- Future embedding methods might need to tie the watermark more tightly to image content that survives regeneration steps.
Load-bearing premise
The attacks continue to work when an adversary has access only to the detector and one watermarked example, without the generator parameters or the embedding secrets.
What would settle it
A controlled test in which the three new attacks are applied to many different AR generators and the detector still correctly rejects all forged real images while accepting all genuine watermarked images would show the schemes are more robust than claimed.
Figures
read the original abstract
The proliferation of autoregressive (AR) image generators demands reliable detection and attribution of their outputs to mitigate misinformation, and to filter synthetic images from training data to prevent model collapse. To address this need, watermarking techniques, specifically designed for AR models, embed a subtle signal at generation time, enabling downstream verification through a corresponding watermark detector. In this work, we study these schemes and demonstrate their vulnerability to both watermark removal and forgery attacks. We assess existing attacks and further introduce three new attacks: (i) a vector-quantized regeneration removal attack, (ii) adversarial optimization-based attack, and (iii) a frequency injection attack. Our evaluation reveals that removal and forgery attacks can be effective with access to a single watermarked reference image and without access to original model parameters or watermarking secrets. Our findings indicate that existing watermarking schemes for AR image generation do not reliably support synthetic content detection for dataset filtering. Moreover, they enable Watermark Mimicry, whereby authentic images can be manipulated to imitate a generator's watermark and trigger false detection to prevent their inclusion in future model training.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates the robustness of existing watermarking schemes for autoregressive (AR) image generators against removal and forgery attacks. It introduces three new attacks—vector-quantized (VQ) regeneration, adversarial optimization, and frequency injection—and shows that these can succeed using only a single watermarked reference image without access to generator parameters or watermark secrets. The central claims are that current AR watermarking does not reliably support synthetic content detection for dataset filtering and that the schemes enable 'Watermark Mimicry' attacks on authentic images.
Significance. If the empirical results hold under clearly specified threat models, the work is significant for AI safety and content authentication research. It provides concrete evidence of practical vulnerabilities in AR-specific watermarking, which could guide more robust designs to prevent training data contamination and misinformation. The introduction of multiple attack vectors with minimal access requirements is a constructive contribution, though its impact depends on clarifying the detector access assumptions.
major comments (2)
- [§4.2] §4.2 (Adversarial Optimization-based Attack): The attack description does not specify whether white-box (gradient) access to the watermark detector is required. Since the method relies on optimization to craft perturbations, this is load-bearing for the abstract's claim that attacks succeed 'without access to original model parameters or watermarking secrets'; if white-box detector access is implicitly assumed, the results do not fully support the broad conclusion that schemes 'do not reliably support synthetic content detection' in realistic black-box deployments. The relative success rates of the VQ regeneration and frequency injection attacks (which may be black-box) should be reported separately to isolate contributions.
- [§5] §5 (Evaluation): The experiments do not include an ablation on the number of reference watermarked images or a clear statement of the detector access model across all three attacks. This weakens the claim that attacks are effective 'with access to a single watermarked reference image,' as the success may depend on unstated assumptions about query access or gradient availability.
minor comments (3)
- [Figure 2] Figure 2: The attack pipeline diagram would benefit from explicit labels distinguishing white-box vs. black-box components and the role of the single reference image.
- [§3] §3 (Related Work): A brief comparison table of prior AR watermarking schemes (e.g., their embedding mechanisms and claimed robustness) would improve clarity and context for the new attacks.
- [Abstract] Abstract: The phrase 'do not reliably support' is strong; qualify it with the evaluated schemes and threat models to avoid overgeneralization.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of threat model specification and experimental clarity that we will address in revision. Below we respond point-by-point to the major comments.
read point-by-point responses
-
Referee: [§4.2] §4.2 (Adversarial Optimization-based Attack): The attack description does not specify whether white-box (gradient) access to the watermark detector is required. Since the method relies on optimization to craft perturbations, this is load-bearing for the abstract's claim that attacks succeed 'without access to original model parameters or watermarking secrets'; if white-box detector access is implicitly assumed, the results do not fully support the broad conclusion that schemes 'do not reliably support synthetic content detection' in realistic black-box deployments. The relative success rates of the VQ regeneration and frequency injection attacks (which may be black-box) should be reported separately to isolate contributions.
Authors: We agree that the adversarial optimization attack requires white-box access to the detector for gradient-based optimization, which was not explicitly stated. We will revise §4.2 to specify the access model for each attack. We will also report success rates for the VQ regeneration and frequency injection attacks separately (both of which operate with black-box query access to the detector and a single reference image). This distinction will be reflected in the abstract and conclusion to avoid overgeneralizing the white-box result while preserving the finding that black-box attacks already undermine reliable detection. revision: yes
-
Referee: [§5] §5 (Evaluation): The experiments do not include an ablation on the number of reference watermarked images or a clear statement of the detector access model across all three attacks. This weakens the claim that attacks are effective 'with access to a single watermarked reference image,' as the success may depend on unstated assumptions about query access or gradient availability.
Authors: We acknowledge the need for explicit access-model statements and an ablation. In the revision we will add a table in §5 detailing detector access assumptions (black-box query vs. white-box gradient) for all three attacks. We will also include an ablation varying the number of reference images (1, 5, 10) showing that both removal and forgery success rates remain high with a single image and improve only modestly with additional references. These additions will directly support the single-image claim under clearly stated conditions. revision: yes
Circularity Check
No circularity: empirical attack evaluation without self-referential derivations
full rationale
The paper reports experimental results from removal and forgery attacks on existing AR watermarking schemes, introducing three new attacks evaluated on reference images. No mathematical derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described structure. Claims rest on direct empirical outcomes rather than any chain that reduces to its own inputs by construction, satisfying the self-contained criterion.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: ICLR (2024)
Alemohammad,S.,Casco-Rodriguez,J.,Luzi,L.,Humayun,A.I.,Babaei,H.,LeJe- une, D., Siahkoohi, A., Baraniuk, R.: Self-consuming generative models go MAD. In: ICLR (2024)
2024
-
[2]
In: ICML (2024)
An, B., Ding, M., Rabbani, T., Agrawal, A., Xu, Y., Deng, C., Zhu, S., Mohamed, A., Wen, Y., Goldstein, T., Huang, F.: WAVES: benchmarking the robustness of image watermarks. In: ICML (2024)
2024
-
[3]
In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J
Chang,H.,Zhang,H.,Barber,J.,Maschinot,A.,Lezama,J.,Jiang,L.,Yang,M.H., Murphy, K.P., Freeman, W.T., Rubinstein, M., Li, Y., Krishnan, D.: Muse: Text- to-image generation via masked generative transformers. In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) ICML. Proceedings of Machine Learning Research, vol. 202, pp. 4...
2023
-
[4]
In: CVPR
Chang, H., Zhang, H., Jiang, L., Liu, C., Freeman, W.T.: Maskgit: Masked gener- ative image transformer. In: CVPR. pp. 11315–11325 (June 2022)
2022
-
[5]
Chern, E., Su, J., Ma, Y., Liu, P.: ANOLE: An open, autoregressive, native large multimodal models for interleaved image-text generation. arXiv:2405.06135 (2024)
-
[6]
In: ECCV (2024)
Ci, H., Yang, P., Song, Y., Shou, M.Z.: RingID: Rethinking tree-ring watermarking for enhanced multi-key identification. In: ECCV (2024)
2024
-
[7]
In: CVPR (2021)
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: CVPR (2021)
2021
-
[8]
In: ICLR (2025)
Fan, L., Li, T., Qin, S., Li, Y., Sun, C., Rubinstein, M., Sun, D., He, K., Tian, Y.: Fluid: Scaling autoregressive text-to-image generative models with continuous tokens. In: ICLR (2025)
2025
-
[9]
In: CVPR (2023)
Fernandez, P., Couairon, G., Jégou, H., Douze, M., Furon, T.: The stable signature: Rooting watermarks in latent diffusion models. In: CVPR (2023)
2023
-
[10]
Fernandez, P., Souček, T., Jovanović, N., Elsahar, H., Rebuffi, S.A., Lacatusu, V., Tran, T., Mourachko, A.: Geometric image synchronization with deep watermark- ing. arXiv:2509.15208 (2025)
-
[11]
In: ICLR (2025)
Gunn, S., Zhao, X., Song, D.: An undetectable watermark for generative image models. In: ICLR (2025)
2025
-
[12]
In: ICLR (2026)
Han, C., Li, G., Wu, J., Sun, Q., Cai, Y., Peng, Y., Ge, Z., Zhou, D., Tang, H., Zhou, H., Liu, K., Xia, S.T., Jiao, B., Jiang, D., Zhang, X., Zhu, Y.: Nextstep-1: Toward autoregressive image generation with continuous tokens at scale. In: ICLR (2026)
2026
-
[13]
In: CVPR
Han, J., Liu, J., Jiang, Y., Yan, B., Zhang, Y., Yuan, Z., Peng, B., Liu, X.: Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis. In: CVPR. pp. 15733–15744 (June 2025) 16 A. Müller et al
2025
-
[14]
In: CVPR (June 2024)
Hong, S., Lee, K., Jeon, S.Y., Bae, H., Chun, S.Y.: On exact inversion of dpm- solvers. In: CVPR (June 2024)
2024
-
[15]
Jain, A., Kobayashi, Y., Murata, N., Takida, Y., Shibuya, T., Mitsufuji, Y., Cohen, N., Memon, N., Togelius, J.: Forging and removing latent-noise diffusion water- marks using a single image. arXiv:2504.20111 (2025)
-
[16]
In: NeurIPS (2025)
Jovanović, N., Labiad, I., Soucek, T., Vechev, M., Fernandez, P.: Watermarking autoregressive image generation. In: NeurIPS (2025)
2025
-
[17]
In: IEEE Symposium on Security and Privacy (SP)
Kassis, A., Hengartner, U.: Unmarker: a universal attack on defensive image wa- termarking. In: IEEE Symposium on Security and Privacy (SP). pp. 2602–2620. IEEE (2025)
2025
-
[18]
In: NeurIPS (2025)
Kerner, L., Meintz, M., Zhao, B., Boenisch, F., Dziedzic, A.: Bitmark for infinity: Watermarking bitwise autoregressive image generative models. In: NeurIPS (2025)
2025
-
[19]
In: ICML (2023)
Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., Goldstein, T.: A water- mark for large language models. In: ICML (2023)
2023
-
[20]
In: ICLR (2024)
Kirchenbauer, J., Geiping, J., Wen, Y., Shu, M., Saifullah, K., Kong, K., Fernando, K., Saha, A., Goldblum, M., Goldstein, T.: On the reliability of watermarks for large language models. In: ICLR (2024)
2024
-
[21]
In: CVPR
Kumbong, H., Liu, X., Lin, T.Y., Liu, M.Y., Liu, X., Liu, Z., Fu, D.Y., Re, C., Romero, D.W.: Hmar: Efficient hierarchical masked auto-regressive image genera- tion. In: CVPR. pp. 2535–2544 (June 2025)
2025
-
[22]
In: NeurIPS (2024)
Li, T., Tian, Y., Li, H., Deng, M., He, K.: Autoregressive image generation without vector quantization. In: NeurIPS (2024)
2024
-
[23]
In: ECCV (2014)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In: ECCV (2014)
2014
-
[24]
In: ICLR (2025)
Liu, Y., Song, Y., Ci, H., Zhang, Y., Wang, H., Shou, M.Z., Bu, Y.: Image wa- termarks are removable using controllable regeneration from clean noise. In: ICLR (2025)
2025
-
[25]
In: ICLR (2024)
Lukas,N.,Diaa,A.,Fenaux,L.,Kerschbaum,F.:Leveragingoptimizationforadap- tive attacks on image watermarks. In: ICLR (2024)
2024
-
[26]
In: CVPR (2026)
Lukovnikov, D., Müller, A., Quiring, E., Fischer, A.: Clustermark: Towards robust watermarking for autoregressive image generators with visual token clustering. In: CVPR (2026)
2026
-
[27]
In: CVPR (2025)
Müller, A., Lukovnikov, D., Thietke, J., Fischer, A., Quiring, E.: Black-box forgery attacks on semantic watermarks for diffusion models. In: CVPR (2025)
2025
-
[28]
In: CVPR
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR. pp. 10684–10695 (June 2022)
2022
-
[29]
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV115(3), 211–252 (2015).https://doi. org/10.1007/s11263-015-0816-y
-
[30]
In: CVPR (2026)
Shamshad, F., Lukas, N., Nandakumar, K.: Raven: Erasing invisible watermarks via novel view synthesis. In: CVPR (2026)
2026
-
[31]
Nature (2024) https://doi.org/10.1038/s41586-024-07566-y
Shumailov, I., Shumaylov, Z., Zhao, Y., Papernot, N., Anderson, R., Gal, Y.: Ai models collapse when trained on recursively generated data. Nature631(8022), 755–759 (2024).https://doi.org/10.1038/s41586-024-07566-y
-
[32]
In: ICLR (2021)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2021)
2021
-
[33]
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Sun, P., Jiang, Y., Chen, S., Zhang, S., Peng, B., Luo, P., Yuan, Z.: Autoregres- sive model beats diffusion: Llama for scalable image generation. arXiv:2406.06525 (2024) On the Robustness of Watermarking for Autoregressive Image Generation 17
work page internal anchor Pith review arXiv 2024
-
[34]
In: ICLR (2025)
Tang, H., Wu, Y., Yang, S., Xie, E., Chen, J., Chen, J., Zhang, Z., Cai, H., Lu, Y., Han, S.: HART: Efficient visual generation with hybrid autoregressive transformer. In: ICLR (2025)
2025
-
[35]
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Team, C.: Chameleon: Mixed-modal early-fusion foundation models. arXiv:2405.09818 (2025)
work page internal anchor Pith review arXiv 2025
-
[36]
In: NeurIPS (2024)
Tian, K., Jiang, Y., Yuan, Z., PENG, B., Wang, L.: Visual autoregressive modeling: Scalable image generation via next-scale prediction. In: NeurIPS (2024)
2024
-
[37]
Tong, Y., Pan, Z., Yang, S., Zhou, K.: Training-free watermarking for autoregres- sive image generation. arXiv:2505.14673 (2025)
-
[38]
Emu3: Next-Token Prediction is All You Need
Wang, X., Zhang, X., Luo, Z., Sun, Q., Cui, Y., Wang, J., Zhang, F., Wang, Y., Li, Z., Yu, Q., Zhao, Y., Ao, Y., Min, X., Li, T., Wu, B., Zhao, B., Zhang, B., Wang, L., Liu, G., He, Z., Yang, X., Liu, J., Lin, Y., Huang, T., Wang, Z.: Emu3: Next-token prediction is all you need. arXiv:2409.18869 (2024)
work page internal anchor Pith review arXiv 2024
-
[39]
In: ICCV (2025)
Wang, Y., Lin, Z., Teng, Y., Zhu, Y., Ren, S., Feng, J., Liu, X.: Bridging continuous and discrete tokens for autoregressive visual generation. In: ICCV (2025)
2025
-
[40]
In: NeurIPS (2023)
Wen, Y., Kirchenbauer, J., Geiping, J., Goldstein, T.: Tree-Ring watermarks: In- visible fingerprints for diffusion images. In: NeurIPS (2023)
2023
-
[41]
Wu, Y., Cui, X., Chen, R., Milis, G., Huang, H.: A watermark for auto-regressive image generation models. arXiv:2506.11371 (2025)
-
[42]
(eds.) NeurIPS (2024)
Yang, P., Ci, H., Song, Y., Shou, M.Z.: Can simple averaging defeat modern water- marks? In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) NeurIPS (2024)
2024
-
[43]
In: CVPR (2024)
Yang, Z., Zeng, K., Chen, K., Fang, H., Zhang, W., Yu, N.: Gaussian Shading: Provable performance-lossless image watermarking for diffusion models. In: CVPR (2024)
2024
-
[44]
In: ICCV (October 2025)
Yu, Q., He, J., Deng, X., Shen, X., Chen, L.C.: Randomized autoregressive visual generation. In: ICCV (October 2025)
2025
-
[45]
In: CVPR (2018)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
2018
-
[46]
NeurIPS (2024)
Zhao, X., Zhang, K., Su, Z., Vasan, S., Grishchenko, I., Kruegel, C., Vigna, G., Wang, Y.X., Li, L.: Invisible image watermarks are provably removable using gen- erative ai. NeurIPS (2024)
2024
-
[47]
WiYmvSloX2sUb/pvo+hUyD6xShs=
Zhao, X., Zhang, K., Su, Z., Vasan, S., Grishchenko, I., Kruegel, C., Vigna, G., Wang, Y.X., Li, L.: Invisible image watermarks are provably removable using gen- erative AI. In: NeurIPS (2024) Supplementary Material for On the Robustness of Watermarking for Autoregressive Image Generation A Full Experimental Settings We provide full details on our experim...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.