arxiv: 2604.13776 · v1 · submitted 2026-04-15 · 💻 cs.CY · cs.CL· cs.CR· cs.CV

Recognition: unknown

Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking

Alexander Nemecek , Osama Zafar , Yuqiao Xu , Wenbiao Li , Erman Ayday

Authors on Pith no claims yet

Pith reviewed 2026-05-10 12:24 UTC · model grok-4.3

classification 💻 cs.CY cs.CLcs.CRcs.CV

keywords AI content watermarkingdetection biaspluralistic evaluationcross-lingual fairnesscultural bias in AIcontent provenancefairness auditinggovernance frameworks

0 comments

The pith

AI content watermarking produces unequal detection rates across languages, cultures, and demographic groups because its signals depend on varying content statistics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that watermarking, now treated as default infrastructure for proving AI content origin, carries hidden fairness problems. Its detection accuracy and robustness shift with the statistical features of the text, image, or audio, and those features differ systematically by language, cultural style, and user group. Existing benchmarks almost never measure performance on non-English text, non-Western visuals, or disaggregated populations, so the scale of the disparity stays unknown. The authors lay out three required test dimensions—cross-lingual parity, culturally broad coverage, and demographic breakdown—and note that watermarking currently escapes the bias checks applied to the generators it is meant to police. They conclude that proper auditing must happen before any mandated rollout.

Core claim

Watermark signal strength, detectability, and robustness depend on statistical properties of the content itself, properties that vary systematically across languages, cultural visual traditions, and demographic groups. Reviewing the major watermarking benchmarks across modalities, with one exception, none report performance across languages, cultural content types, or population groups. The authors propose three concrete evaluation dimensions for pluralistic watermark benchmarking: cross-lingual detection parity, culturally diverse content coverage, and demographic disaggregation of detection metrics. They connect these requirements to governance frameworks that treat watermarking as content

What carries the argument

the statistical dependence of watermark detectability and robustness on content properties that differ across languages, cultural traditions, and demographic groups, which creates modality-specific pathways to biased detection

If this is right

Current watermarking methods risk systematically weaker detection or higher false positives on content from non-dominant languages and cultural traditions.
Governance policies that require watermarking for provenance will embed unequal enforcement unless pluralistic tests are added.
Benchmarks must disaggregate results by language, culture, and demographics to count as valid evidence of reliability.
The verification layer that authenticates AI output should face the same bias-auditing rules already applied to the generative models themselves.
Deployment of watermarking should be delayed until the proposed evaluation dimensions are satisfied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Without these checks, watermark-based moderation could disproportionately affect users who produce content in less-represented languages or styles.
The same content-dependence pattern may appear in other provenance tools such as synthetic-media detectors, suggesting a broader evaluation gap in AI governance.
Watermark designers could explore content-adaptive encoding to reduce performance gaps, though that remains outside the paper's scope.

Load-bearing premise

Watermark performance varies systematically with content statistics that correlate with language, culture, and demographic groups.

What would settle it

A controlled test that applies the same watermarking algorithm to matched content samples in multiple languages and from different demographic sources and finds no meaningful difference in detection accuracy or robustness.

read the original abstract

Watermarking is becoming the default mechanism for AI content authentication, with governance policies and frameworks referencing it as infrastructure for content provenance. Yet across text, image, and audio modalities, watermark signal strength, detectability, and robustness depend on statistical properties of the content itself, properties that vary systematically across languages, cultural visual traditions, and demographic groups. We examine how this content dependence creates modality-specific pathways to bias. Reviewing the major watermarking benchmarks across modalities, we find that, with one exception, none report performance across languages, cultural content types, or population groups. To address this, we propose three concrete evaluation dimensions for pluralistic watermark benchmarking: cross-lingual detection parity, culturally diverse content coverage, and demographic disaggregation of detection metrics. We connect these to the governance frameworks currently mandating watermarking deployment and show that watermarking is held to a lower fairness standard than the generative systems it is meant to govern. Our position is that evaluation must precede deployment, and that the same bias auditing requirements applied to AI models should extend to the verification layer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Watermarking gets lighter fairness scrutiny than the AI models it authenticates, and this paper documents why the benchmarks need to change.

read the letter

The main takeaway is that watermarking for AI content is treated as neutral infrastructure in policy while its performance likely varies with language, cultural styles, and demographic groups. The authors review major benchmarks across modalities and find that almost none report disaggregated results on those axes, then argue this puts watermarking below the fairness bar applied to generators themselves. They propose three concrete fixes: cross-lingual detection parity, culturally diverse content coverage, and demographic disaggregation of metrics. This framing of a pluralistic evaluation gap is the clearest new element and connects directly to existing governance references. The paper does a clean job of showing the omission pattern and spelling out actionable dimensions that extend standard fairness ideas to verification tools. The logic flows without circularity or invented entities. The softer part is the opening premise that signal strength and robustness depend on content statistics that vary systematically across those factors. It is stated as background rather than derived or illustrated with specific examples or citations in the available sections, so the central claim rests more on the benchmark review than on fresh demonstration. That is a limitation but not a load-bearing flaw for a position piece. This is useful for researchers and policymakers working on AI provenance, content authentication standards, and fairness audits. Readers focused on deployment or verification-layer governance will find the proposals relevant. It deserves a serious referee because the issue is timely and the suggestions are specific enough to guide revisions. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper claims that watermarking for AI-generated content is inherently content-dependent, with signal strength, detectability, and robustness varying systematically across languages, cultural traditions, and demographic groups, creating pathways to bias. A review of major benchmarks across text, image, and audio modalities finds that (with one exception) none provide disaggregated performance reporting on these axes. The authors propose three concrete evaluation dimensions—cross-lingual detection parity, culturally diverse content coverage, and demographic disaggregation of detection metrics—and argue that watermarking is held to a lower fairness standard than the generative systems it governs, advocating that evaluation must precede deployment in governance frameworks.

Significance. If the documented evaluation gap holds, the work identifies a substantive mismatch between the fairness scrutiny applied to generative AI and the verification mechanisms now being mandated for content provenance. The constructive proposal of three evaluation dimensions offers a practical framework that could inform policy, and the explicit linkage to existing governance references strengthens the normative argument for pluralistic benchmarking before widespread adoption.

major comments (2)

[Benchmark Review] The central claim that benchmarks omit disaggregated reporting (and thus create an evaluation gap) rests on the review of 'major watermarking benchmarks,' yet the manuscript provides no explicit selection criteria, list of reviewed works, or systematic methodology for determining what constitutes reporting on languages, cultural content types, or population groups. This omission is load-bearing for the gap identification and the subsequent normative conclusion.
[Introduction / Background] The premise that watermark signal strength, detectability, and robustness 'depend on statistical properties of the content itself, properties that vary systematically across languages, cultural visual traditions, and demographic groups' is presented as background fact in the opening paragraphs and used to motivate modality-specific bias pathways, but lacks specific citations to empirical studies or concrete examples demonstrating such variation. Strengthening this foundation is necessary to support the claim that watermarking receives weaker fairness scrutiny.

minor comments (2)

[Abstract] The abstract states 'with one exception' but does not identify the exception; naming it would improve immediate clarity for readers.
[Proposal of Evaluation Dimensions] The three proposed evaluation dimensions are well-articulated but would benefit from one-sentence illustrations of feasible implementation (e.g., example datasets or metrics) to make the call to action more actionable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's thoughtful review and constructive suggestions. We believe the comments will help improve the clarity and rigor of our arguments. Below, we provide point-by-point responses to the major comments.

read point-by-point responses

Referee: The central claim that benchmarks omit disaggregated reporting (and thus create an evaluation gap) rests on the review of 'major watermarking benchmarks,' yet the manuscript provides no explicit selection criteria, list of reviewed works, or systematic methodology for determining what constitutes reporting on languages, cultural content types, or population groups. This omission is load-bearing for the gap identification and the subsequent normative conclusion.

Authors: We concur with the referee that the central claim regarding the evaluation gap would be strengthened by greater transparency in our benchmark review process. Accordingly, we will revise the manuscript to include explicit selection criteria, a detailed methodology section describing how we identified and reviewed the major watermarking benchmarks, and an enumerated list of the specific works examined. This addition will enhance the reproducibility and credibility of our findings on the omission of disaggregated reporting. revision: yes
Referee: The premise that watermark signal strength, detectability, and robustness 'depend on statistical properties of the content itself, properties that vary systematically across languages, cultural visual traditions, and demographic groups' is presented as background fact in the opening paragraphs and used to motivate modality-specific bias pathways, but lacks specific citations to empirical studies or concrete examples demonstrating such variation. Strengthening this foundation is necessary to support the claim that watermarking receives weaker fairness scrutiny.

Authors: We agree that the foundational premise requires more robust support through citations and examples. In the revised introduction, we will add references to empirical studies that illustrate how watermark detectability varies with content properties, such as linguistic features in text or stylistic elements in images. Concrete examples will be drawn from existing literature on watermarking robustness to support the pathways to bias we describe. This will more firmly establish why watermarking should be subject to fairness scrutiny comparable to that of generative models. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The manuscript is a position paper whose argument proceeds from literature observations on watermark signal dependence to a documented review of benchmark reporting practices and a normative policy conclusion. No equations, fitted parameters, or derivations appear anywhere in the text. The load-bearing premise that detectability varies with content statistics is presented as established background rather than derived or fitted within the paper. The central claim of an evaluation gap is supported by direct inspection of existing benchmarks (with one noted exception), and the call for three evaluation dimensions follows logically from that inspection without reduction to self-citation chains or self-definitional loops. The argument remains self-contained against external literature and governance documents.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that content statistical properties vary systematically across groups in ways that affect watermark detectability, with no free parameters, invented entities, or additional axioms introduced.

axioms (1)

domain assumption Watermark signal strength, detectability, and robustness depend on statistical properties of the content itself
Invoked in the abstract as the mechanism creating modality-specific pathways to bias.

pith-pipeline@v0.9.0 · 5505 in / 1282 out tokens · 57591 ms · 2026-05-10T12:24:09.451187+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 19 canonical work pages · 1 internal anchor

[1]

Sound check: Auditing audio datasets

William Agnew, Julia Barnett, Annie Chu, Rachel Hong, Michael Feffer, Robin Netzorg, Harry H Jiang, Ezra Awumey, and Sauvik Das. Sound check: Auditing audio datasets.arXiv preprint arXiv:2410.13114, 2024. 1

work page arXiv 2024
[2]

Do all languages cost the same? tokenization in the era of commer- cial language models

Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David Mortensen, Noah Smith, and Yulia Tsvetkov. Do all languages cost the same? tokenization in the era of commer- cial language models. InProceedings of the 2023 Confer- ence on Empirical Methods in Natural Language Processing, pages 9904–9923, Singapore, 2023. Association for Compu- tational Ling...

2023
[3]

Evaluating the robustness and accuracy of text watermarking under real- world cross-lingual manipulations

Mansour Al Ghanim, Jiaqi Xue, Rochana Prih Hastuti, Mengxin Zheng, Yan Solihin, and Qian Lou. Evaluating the robustness and accuracy of text watermarking under real- world cross-lingual manipulations. InFindings of the Asso- ciation for Computational Linguistics: EMNLP 2025, pages 7396–7416, Suzhou, China, 2025. Association for Computa- tional Linguistics. 3

2025
[4]

Combined dwt-dct digital image watermarking

Ali Al-Haj. Combined dwt-dct digital image watermarking. Journal of computer science, 3(9):740–746, 2007. 2

2007
[5]

Benchmarking the robustness of image watermarks,

Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, et al. Waves: Bench- marking the robustness of image watermarks.arXiv preprint arXiv:2401.08573, 2024. 1, 3

work page arXiv 2024
[6]

The fifth’chime’speech separation and recognition challenge: dataset, task and baselines.arXiv preprint arXiv:1803.10609, 2018

Jon Barker, Shinji Watanabe, Emmanuel Vincent, and Jan Trmal. The fifth’chime’speech separation and recognition challenge: dataset, task and baselines.arXiv preprint arXiv:1803.10609, 2018. 4

work page arXiv 2018
[7]

Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, page 610–623, New York, NY , USA, 2021. Association for Computing Machinery. 4

2021
[8]

Ai governance and the verification gap: A framework for law and policy under computational in- tractability.Available at SSRN 5629290, 2025

Steven Benerofe. Ai governance and the verification gap: A framework for law and policy under computational in- tractability.Available at SSRN 5629290, 2025. 4

2025
[9]

Easily acces- sible text-to-image generation amplifies demographic stereo- types at large scale

Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. Easily acces- sible text-to-image generation amplifies demographic stereo- types at large scale. InProceedings of the 2023 ACM con- ference on fairness, accountability, and transparency, pages 1493–15...

2023
[10]

Trustmark: Robust watermarking and watermark removal for arbitrary resolution images

Tu Bui, Shruti Agarwal, and John Collomosse. Trustmark: Robust watermarking and watermark removal for arbitrary resolution images. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 18629–18639,
[11]

Gender shades: Inter- sectional accuracy disparities in commercial gender classifi- cation

Joy Buolamwini and Timnit Gebru. Gender shades: Inter- sectional accuracy disparities in commercial gender classifi- cation. InProceedings of the 1st Conference on Fairness, Ac- countability and Transparency, pages 77–91. PMLR, 2018. 4

2018
[12]

Postmark: A robust blackbox watermark for large language models

Yapei Chang, Kalpesh Krishna, Amir Houmansadr, John Frederick Wieting, and Mohit Iyyer. Postmark: A robust blackbox watermark for large language models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 8969–8987, 2024. 2

2024
[13]

Wavmark: Watermarking for audio generation,

Guangyu Chen, Yu Wu, Shujie Liu, Tao Liu, Xiaoyong Du, and Furu Wei. Wavmark: Watermarking for audio genera- tion.arXiv preprint arXiv:2308.12770, 2023. 2

work page arXiv 2023
[14]

Measures for labeling of ai-generated synthetic content, 2025

Cybersecurity Administration, Ministry of Industry and In- formation Technology, Ministry of Public Security, State Ad- ministration of Radio and Television. Measures for labeling of ai-generated synthetic content, 2025. Document No. State Information Office Tongzi [2025] No. 2. 1, 4

2025
[15]

Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823, 2024

Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, et al. Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823, 2024. 2

2024
[16]

Does object recognition work for ev- eryone? InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 52–59, 2019

Terrance De Vries, Ishan Misra, Changhan Wang, and Lau- rens Van der Maaten. Does object recognition work for ev- eryone? InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 52–59, 2019. 4

2019
[17]

European Parliament and Council of the European Union. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 june 2024 laying down harmonised rules on artificial intelligence and amending regulations (EC) no 300/2008, (EU) no 167/2013, (EU) no 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and direc- tives 2014/90/EU, (...

2024
[18]

Executive order 14110: Safe, secure, and trustworthy development and use of ar- tificial intelligence, 2023

Executive Office of the President. Executive order 14110: Safe, secure, and trustworthy development and use of ar- tificial intelligence, 2023. 88 FR 75191, Document 2023- 24283. 1, 4

2023
[19]

Proactive detection of voice cloning with localized watermarking.https://pierrefdz

Pierre Fernandez. Proactive detection of voice cloning with localized watermarking.https://pierrefdz. github . io / publications / audioseal/, 2025. Accessed: 2026-01-28. 3

2025
[20]

The dollar street dataset: Images representing the geographic and so- cioeconomic diversity of the world

William Gaviria Rojas, Sudnya Diamos, Keertan Kini, David Kanter, Vijay Janapa Reddi, and Cody Coleman. The dollar street dataset: Images representing the geographic and so- cioeconomic diversity of the world. InAdvances in Neural Information Processing Systems, pages 12979–12990. Cur- ran Associates, Inc., 2022. 4

2022
[21]

‘person’ == light- skinned, western man, and sexualization of women of color: Stereotypes in stable diffusion

Sourojit Ghosh and Aylin Caliskan. ‘person’ == light- skinned, western man, and sexualization of women of color: Stereotypes in stable diffusion. InFindings of the Associ- ation for Computational Linguistics: EMNLP 2023, pages 6971–6985, Singapore, 2023. Association for Computational Linguistics. 3

2023
[22]

SynthID, 2026

Google DeepMind. SynthID, 2026. 1 5

2026
[23]

Synthid-image: Image watermarking at internet scale.arXiv preprint arXiv:2510.09263, 2025

Sven Gowal, Rudy Bunel, Florian Stimberg, David Stutz, Guillermo Ortiz-Jimenez, Christina Kouridi, Mel Vecerik, Jamie Hayes, Sylvestre-Alvise Rebuffi, Paul Bernard, et al. Synthid-image: Image watermarking at internet scale.arXiv preprint arXiv:2510.09263, 2025. 2

work page arXiv 2025
[24]

Secured color image watermarking technique in dwt-dct domain.arXiv preprint arXiv:1109.2325, 2011

Baisa L Gunjal and Suresh N Mali. Secured color image watermarking technique in dwt-dct domain.arXiv preprint arXiv:1109.2325, 2011. 2

work page arXiv 2011
[25]

Towards ge- ographic inclusion in the evaluation of text-to-image models

Melissa Hall, Samuel J Bell, Candace Ross, Adina Williams, Michal Drozdzal, and Adriana Romero Soriano. Towards ge- ographic inclusion in the evaluation of text-to-image models. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 585–601, 2024. 1

2024
[26]

Can watermarks survive translation? on the cross-lingual consis- tency of text watermark for large language models

Zhiwei He, Binglin Zhou, Hongkun Hao, Aiwei Liu, Xing Wang, Zhaopeng Tu, Zhuosheng Zhang, and Rui Wang. Can watermarks survive translation? on the cross-lingual consis- tency of text watermark for large language models. InPro- ceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4115–4129, 2024. 2

2024
[27]

Videomarkbench: Benchmarking robustness of video watermarking.arXiv preprint arXiv:2505.21620,

Zhengyuan Jiang, Moyang Guo, Kecen Li, Yuepeng Hu, Yupu Wang, Zhicong Huang, Cheng Hong, and Neil Zhen- qiang Gong. Videomarkbench: Benchmarking robustness of video watermarking.arXiv preprint arXiv:2505.21620,

work page arXiv
[28]

Auto- matic speech recognition system for tonal languages: State- of-the-art survey: J

Jaspreet Kaur, Amitoj Singh, and Virender Kadyan. Auto- matic speech recognition system for tonal languages: State- of-the-art survey: J. kaur et al.Archives of Computational Methods in Engineering, 28(3):1039–1068, 2021. 3

2021
[29]

A watermark for large language models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InInternational conference on machine learning, pages 17061–17084. PMLR, 2023. 2

2023
[30]

Anti- compression jpeg steganography over repetitive compression networks.Signal Processing, 170:107454, 2020

Fengyong Li, Kui Wu, Chuan Qin, and Jingsheng Lei. Anti- compression jpeg steganography over repetitive compression networks.Signal Processing, 170:107454, 2020. 3

2020
[31]

Watermark under fire: A robustness evalua- tion of llm watermarking.arXiv preprint arXiv:2411.13425,

Jiacheng Liang, Zian Wang, Spencer Hong, Shouling Ji, and Ting Wang. Watermark under fire: A robustness evalua- tion of llm watermarking.arXiv preprint arXiv:2411.13425,

work page arXiv
[32]

Audiomarkbench: Benchmarking robust- ness of audio watermarking.Advances in Neural Information Processing Systems, 37:52241–52265, 2024

Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, and Neil Z Gong. Audiomarkbench: Benchmarking robust- ness of audio watermarking.Advances in Neural Information Processing Systems, 37:52241–52265, 2024. 1, 3

2024
[33]

Robust watermarking using generative priors against image editing: From benchmarking to advances

Shilin Lu, Zihan Zhou, Jiayou Lu, Yuanzhi Zhu, and Adams Wai-Kin Kong. Robust watermarking using generative pri- ors against image editing: From benchmarking to advances. arXiv preprint arXiv:2410.18775, 2024. 1, 3

work page arXiv 2024
[34]

Meta Seal: State-of-the-art, open source invisi- ble watermarking, 2026

Meta FAIR. Meta Seal: State-of-the-art, open source invisi- ble watermarking, 2026. 1

2026
[35]

it’s not a rep- resentation of me

Shira Michel, Sufi Kaur, Sarah Elizabeth Gillespie, Jeffrey Gleason, Christo Wilson, and Avijit Ghosh. “it’s not a rep- resentation of me”: Examining accent bias and digital ex- clusion in synthetic ai voice services. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, page 228–245, New York, NY , USA, 2025. Association...

2025
[36]

Is multilingual llm wa- termarking truly multilingual? a simple back-translation so- lution.arXiv preprint arXiv:2510.18019, 2025

Asim Mohamed and Martin Gubri. Is multilingual llm wa- termarking truly multilingual? a simple back-translation so- lution.arXiv preprint arXiv:2510.18019, 2025. 1

work page arXiv 2025
[37]

Artificial in- telligence risk management framework (AI RMF 1.0)

National Institute of Standards and Technology. Artificial in- telligence risk management framework (AI RMF 1.0). Tech- nical Report NIST AI 100-1, National Institute of Standards and Technology, 2023. U.S. Department of Commerce. 4

2023
[38]

Benchmarking vision language models for cultural understanding

Shravan Nayak, Kanishk Jain, Rabiul Awal, Siva Reddy, Sjoerd Van Steenkiste, Lisa Anne Hendricks, Karolina Sta´nczak, and Aishwarya Agrawal. Benchmarking vision language models for cultural understanding. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5769–5790, 2024. 1

2024
[39]

Topic-Based Watermarks for Large Language Models

Alexander Nemecek, Yuzhou Jiang, and Erman Ayday. Topic-based watermarks for large language models.arXiv preprint arXiv:2404.02138, 2024. 2

work page internal anchor Pith review Pith/arXiv arXiv 2024
[40]

Nemecek, Y

Alexander Nemecek, Yuzhou Jiang, and Erman Ayday. Wa- termarking without standards is not ai governance.arXiv preprint arXiv:2505.23814, 2025. 4

work page arXiv 2025
[41]

New AI classifier for indicating AI-written text,

OpenAI. New AI classifier for indicating AI-written text,
[42]

Discontinued July 20, 2023 due to low accuracy. 1

2023
[43]

A compre- hensive real-world assessment of audio watermarking algo- rithms: Will they survive neural codecs?arXiv preprint arXiv:2505.19663, 2025

Yigitcan ¨Ozer, Woosung Choi, Joan Serr `a, Mayank Kumar Singh, Wei-Hsiang Liao, and Yuki Mitsufuji. A compre- hensive real-world assessment of audio watermarking algo- rithms: Will they survive neural codecs?arXiv preprint arXiv:2505.19663, 2025. 2

work page arXiv 2025
[44]

Language model tokenizers introduce unfairness between languages.Advances in neural information process- ing systems, 36:36963–36990, 2023

Aleksandar Petrov, Emanuele La Malfa, Philip Torr, and Adel Bibi. Language model tokenizers introduce unfairness between languages.Advances in neural information process- ing systems, 36:36963–36990, 2023. 2

2023
[45]

Markmywords: Analyzing and evaluat- ing language model watermarks

Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, and David Wagner. Markmywords: Analyzing and evaluat- ing language model watermarks. In2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 68–91. IEEE, 2025. 1, 2, 3

2025
[46]

Evaluating durability: Benchmark insights into multimodal watermarking.arXiv preprint arXiv:2406.03728, 2024

Jielin Qiu, William Han, Xuandong Zhao, Shangbang Long, Christos Faloutsos, and Lei Li. Evaluating durability: Benchmark insights into multimodal watermarking.arXiv preprint arXiv:2406.03728, 2024. 2

work page arXiv 2024
[47]

Content-preserving text watermarking through unicode homoglyph substitution

Stefano Giovanni Rizzo, Flavio Bertini, and Danilo Mon- tesi. Content-preserving text watermarking through unicode homoglyph substitution. InProceedings of the 20th Inter- national Database Engineering & Applications Symposium, page 97–104, New York, NY , USA, 2016. Association for Computing Machinery. 2

2016
[48]

arXiv preprint arXiv:2401.17264 (2024)

Robin San Roman, Pierre Fernandez, Alexandre D ´efossez, Teddy Furon, Tuan Tran, and Hady Elsahar. Proactive de- tection of voice cloning with localized watermarking.arXiv preprint arXiv:2401.17264, 2024. 2

work page arXiv 2024
[49]

Stegastamp: Invisible hyperlinks in physical photographs

Matthew Tancik, Ben Mildenhall, and Ren Ng. Stegastamp: Invisible hyperlinks in physical photographs. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2117–2126, 2020. 2

2020
[50]

Waterbench: Towards holistic evaluation of watermarks for large language models

Shangqing Tu, Yuliang Sun, Yushi Bai, Jifan Yu, Lei Hou, and Juanzi Li. Waterbench: Towards holistic evaluation of watermarks for large language models. InProceedings of the 62nd Annual Meeting of the Association for Computa- 6 tional Linguistics (Volume 1: Long Papers), pages 1517– 1542, 2024. 1, 2, 3

2024
[51]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 4

2004
[52]

arXiv preprint arXiv:2305.20030 , year=

Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. Tree-ring watermarks: Fingerprints for diffu- sion images that are invisible and robust.arXiv preprint arXiv:2305.20030, 2023. 2

work page arXiv 2023
[53]

Sok: How robust is audio watermarking in generative ai models?arXiv preprint arXiv:2503.19176, 2025

Yizhu Wen, Ashwin Innuganti, Aaron Bien Ramos, Han- qing Guo, and Qiben Yan. Sok: How robust is audio watermarking in generative ai models?arXiv preprint arXiv:2503.19176, 2025. 3

work page arXiv 2025
[54]

An- alyzing and evaluating unbiased language model watermark

Yihan Wu, Xuehao Cui, Ruibo Chen, and Heng Huang. An- alyzing and evaluating unbiased language model watermark. arXiv preprint arXiv:2509.24048, 2025. 2

work page arXiv 2025
[55]

Silent discrimination: How AI watermark- ing systems create digital accents in non-native English writ- ing, 2025

Yangxinyu Xie. Silent discrimination: How AI watermark- ing systems create digital accents in non-native English writ- ing, 2025. 1, 2, 4

2025
[56]

Watermark in the classroom: A conformal framework for adaptive ai usage detection.arXiv preprint arXiv:2507.23113, 2025

Yangxinyu Xie, Xuyang Chen, Zhimei Ren, and Weijie J Su. Watermark in the classroom: A conformal frame- work for adaptive ai usage detection.arXiv preprint arXiv:2507.23113, 2025. 1, 2, 4

work page arXiv 2025
[57]

Gaussian shading: Prov- able performance-lossless image watermarking for diffusion models

Zijin Yang, Kai Zeng, Kejiang Chen, Han Fang, Weim- ing Zhang, and Nenghai Yu. Gaussian shading: Prov- able performance-lossless image watermarking for diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12162– 12171, 2024. 2

2024
[58]

Sok: Watermark- ing for ai-generated content

Xuandong Zhao, Sam Gunn, Miranda Christ, Jaiden Fairoze, Andres Fabrega, Nicholas Carlini, Sanjam Garg, Sanghyun Hong, Milad Nasr, Florian Tramer, et al. Sok: Watermark- ing for ai-generated content. In2025 IEEE Symposium on Security and Privacy (SP), pages 2621–2639. IEEE, 2025. 2 7

2025