Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends

arxiv: 2508.11548 · v2 · submitted 2025-08-15 · 💻 cs.CR

Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends

Zhenhua Xu , Xubin Yue , Zhebo Wang , Haobo Zhang , Qichen Liu , Xixiang Zhao , Jingxuan Zhang , Wenjun Zeng

show 4 more authors

Wengpeng Xing Dezhang Kong Changting Lin Meng Han

This is my paper

Pith reviewed 2026-05-18 22:42 UTC · model grok-4.3

classification 💻 cs.CR

keywords LLM copyright protectionmodel fingerprintingmodel watermarkingtext watermarkingfingerprint transferfingerprint removalintellectual property

0 comments p. Extension

The pith

This survey unifies model watermarking under fingerprinting and introduces transfer and removal techniques for LLM copyright protection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to organize methods that protect large language models themselves from unauthorized copying or misuse rather than just marking their text outputs. It traces the progression from text watermarking to model watermarking and then to model fingerprinting, while folding watermarking into a single fingerprinting framework. The work reviews existing approaches, adds new categories for moving fingerprints between models and for stripping them away, and collects evaluation standards such as effectiveness and robustness. Readers would care because these models cost enormous resources to build, so owners need reliable ways to prove ownership and deter theft.

Core claim

This work presents the first comprehensive survey focused on model fingerprinting for LLM copyright protection, adopts a unified terminology incorporating model watermarking into fingerprinting, and introduces techniques for fingerprint transfer and removal.

What carries the argument

The unified fingerprinting framework that treats model watermarking as one instance of embedding ownership signals into LLMs and adds categories for transferring and removing those signals.

If this is right

Researchers gain a single set of terms and categories to compare model fingerprinting methods consistently.
Fingerprint transfer techniques become available as a distinct research direction for moving ownership marks across models.
Fingerprint removal methods provide a basis for testing how robust the marks remain against deliberate erasure.
Standard metrics for effectiveness, harmlessness, robustness, stealthiness, and reliability allow uniform assessment of new proposals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The unified terms could reduce miscommunication when courts or regulators discuss ownership of trained models.
Transfer and removal ideas might apply directly to other large generative models such as image or audio systems.
Empirical tests on open models could check whether the summarized techniques hold up at the scale of current production LLMs.

Load-bearing premise

The survey assumes its chosen papers represent the full field and that the new groupings for fingerprint transfer and removal match actual practical differences without leaving out important recent work.

What would settle it

Discovery of several recent papers on LLM model protection whose methods fall outside the survey's categories or contradict the proposed unified terminology would show the coverage and unification are incomplete.

Figures

Figures reproduced from arXiv: 2508.11548 by Changting Lin, Dezhang Kong, Haobo Zhang, Jingxuan Zhang, Meng Han, Qichen Liu, Wengpeng Xing, Wenjun Zeng, Xixiang Zhao, Xubin Yue, Zhebo Wang, Zhenhua Xu.

**Figure 2.** Figure 2: An overview of text watermarking techniques [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Pipeline of learning-based watermarking That is, the watermarking behavior is transferred into the model’s weights during training, enabling watermark generation inherently through the model’s learned generation process. A representative strategy is to distill the watermarking behavior from a watermarked teacher model into a student model.6 Xu et al. [151] propose two such distillation methods: logit-based… view at source ↗

**Figure 4.** Figure 4: Taxonomy of model fingerprinting methods. In addition to intrinsic and invasive fingerprinting, this [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Pipeline of different intrinsic fingerprinting methods [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Pipeline of backdoor watermark as fingerprint and weight watermark as fingerprint [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Detailed taxonomy of backdoor watermarking techniques for LLMs. [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Schematic of the fingerprint transfer process, illustrating the extraction of a fingerprint into an external [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Pipeline of Inference-time removal and Training-time fingerprint removal [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

read the original abstract

Copyright protection for large language models is of critical importance, given their substantial development costs, proprietary value, and potential for misuse. Existing surveys have predominantly focused on techniques for tracing LLM-generated content-namely, text watermarking-while a systematic exploration of methods for protecting the models themselves (i.e., model watermarking and model fingerprinting) remains absent. Moreover, the relationships and distinctions among text watermarking, model watermarking, and model fingerprinting have not been comprehensively clarified. This work presents a comprehensive survey of the current state of LLM copyright protection technologies, with a focus on model fingerprinting, covering the following aspects: (1) clarifying the conceptual connection from text watermarking to model watermarking and fingerprinting, and adopting a unified terminology that incorporates model watermarking into the broader fingerprinting framework; (2) providing an overview and comparison of diverse text watermarking techniques, highlighting cases where such methods can function as model fingerprinting; (3) systematically categorizing and comparing existing model fingerprinting approaches for LLM copyright protection; (4) presenting, for the first time, techniques for fingerprint transfer and fingerprint removal; (5) summarizing evaluation metrics for model fingerprints, including effectiveness, harmlessness, robustness, stealthiness, and reliability; and (6) discussing open challenges and future research directions. This survey aims to offer researchers a thorough understanding of both text watermarking and model fingerprinting technologies in the era of LLMs, thereby fostering further advances in protecting their intellectual property.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey organizes LLM copyright methods around model fingerprinting and adds sections on transfer and removal, but its completeness and unification claims are the parts that need the most scrutiny.

read the letter

The main point is that this paper collects and structures work on protecting LLMs themselves rather than just their outputs. It links text watermarking to model-level approaches, folds model watermarking into a broader fingerprinting category, and adds coverage of how fingerprints move between models or get stripped out. That organizational work is the clearest addition beyond earlier surveys that stayed mostly on generated text tracing. The comparisons of techniques and the list of metrics (effectiveness, harmlessness, robustness, stealthiness, reliability) give a practical map that people implementing these systems can use right away. The challenges section also flags real open issues like scalability and attack resistance without overclaiming solutions. The soft spots sit mainly in the fast-moving nature of the area. Any survey risks gaps in 2024 black-box extraction or adversarial removal papers, and the unification step reads more like a useful reframing than a fundamental shift that changes how the methods actually work. The transfer and removal categories are presented as new, but they will need checking to confirm they capture distinct practical behaviors instead of post-hoc groupings of existing attacks. The paper is aimed at researchers in AI security and IP protection who need a current overview rather than a deep theoretical advance. It is solid enough on structure and citation to warrant a serious referee who can verify coverage and test whether the new categories hold up under closer reading of the cited works. I would send it out for review.

Referee Report

2 major / 2 minor

Summary. This survey claims to be the first comprehensive review focused on model fingerprinting for LLM copyright protection. It clarifies conceptual connections from text watermarking to model watermarking and fingerprinting, adopts a unified terminology subsuming model watermarking under fingerprinting, provides an overview and comparison of text watermarking techniques (including cases where they serve as model fingerprints), systematically categorizes and compares model fingerprinting approaches, presents techniques for fingerprint transfer and removal, summarizes evaluation metrics (effectiveness, harmlessness, robustness, stealthiness, reliability), and discusses open challenges and future directions.

Significance. If the literature selection proves representative and the proposed unification plus transfer/removal categories reflect genuine practical distinctions, the survey would offer a valuable standardized framework for a fast-moving area of LLM intellectual property protection. The explicit treatment of transfer and removal techniques, along with the metric summary, could help researchers avoid reinventing methods and focus on unresolved robustness and stealth issues.

major comments (2)

[Abstract (1) and unification section] Abstract point (1) and the corresponding unification section: subsuming model watermarking under the fingerprinting framework is presented as a clarifying contribution, but the manuscript does not provide a side-by-side comparison showing that prior literature already treats them as overlapping or that the new terminology resolves concrete ambiguities in cited works; without this, the unification risks appearing post-hoc rather than load-bearing for the systematic categorization that follows.
[Abstract (4) and transfer/removal sections] Abstract point (4) on fingerprint transfer and removal: these are claimed as presented 'for the first time,' yet the categorization into distinct transfer versus removal techniques lacks an explicit decision tree or decision criteria that would demonstrate the categories are non-overlapping in practice; a few missed 2024 black-box extraction or adversarial removal papers could collapse the claimed systematic coverage.

minor comments (2)

[Abstract and introduction] The abstract states the survey covers 'diverse text watermarking techniques' but does not list the exact inclusion criteria or search strings used; adding a short methods subsection would strengthen reproducibility claims for a survey.
[Metrics summary section] Evaluation metrics are summarized in point (5), but the manuscript should cross-reference specific cited papers to each metric (e.g., which works report stealthiness under what threat model) to avoid vague aggregation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We address each major comment below and indicate the revisions we will make to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract (1) and unification section] Abstract point (1) and the corresponding unification section: subsuming model watermarking under the fingerprinting framework is presented as a clarifying contribution, but the manuscript does not provide a side-by-side comparison showing that prior literature already treats them as overlapping or that the new terminology resolves concrete ambiguities in cited works; without this, the unification risks appearing post-hoc rather than load-bearing for the systematic categorization that follows.

Authors: We appreciate this observation. The unification section grounds the subsumption in conceptual analysis and examples from the literature where model-level protections are described with overlapping goals of ownership verification. To strengthen the presentation and demonstrate that the framework addresses real terminological inconsistencies, we will add an explicit side-by-side comparison table in the revised unification section. The table will summarize terminology usage across representative prior works, highlight overlaps, and show how the unified fingerprinting framework provides consistent distinctions that support the subsequent systematic categorization. revision: yes
Referee: [Abstract (4) and transfer/removal sections] Abstract point (4) on fingerprint transfer and removal: these are claimed as presented 'for the first time,' yet the categorization into distinct transfer versus removal techniques lacks an explicit decision tree or decision criteria that would demonstrate the categories are non-overlapping in practice; a few missed 2024 black-box extraction or adversarial removal papers could collapse the claimed systematic coverage.

Authors: We agree that explicit decision criteria would better demonstrate the non-overlapping nature of the categories. In the revised manuscript we will add a dedicated subsection with decision criteria and a simple decision tree: techniques are classified as transfer when their primary objective is to propagate an existing fingerprint to new or adapted models, and as removal when the objective is to eliminate or evade detection of the fingerprint. This framework is derived from the functional goals and mechanisms described in the surveyed works. Regarding coverage, the survey reflects a systematic literature search through mid-2024; we will incorporate any additional relevant 2024 papers on black-box extraction or adversarial removal that fit the scope to ensure the categorization remains robust. revision: yes

Circularity Check

0 steps flagged

No significant circularity in survey synthesis or unification

full rationale

This paper is a literature review that synthesizes existing work on LLM copyright protection techniques. It clarifies relationships among text watermarking, model watermarking, and model fingerprinting, adopts a unified terminology by subsuming the former under fingerprinting, and categorizes approaches including new sections on transfer and removal techniques. No mathematical derivations, equations, fitted parameters, or predictions appear in the provided abstract or structure. Claims such as 'first comprehensive survey' and 'presenting for the first time' rest on literature coverage and organizational synthesis rather than reducing to self-definitional inputs, self-citation chains, or renamed results by construction. Any self-citations serve as references to prior independent work and are not load-bearing for the survey's core structure. The analysis chain is self-contained as a review without tautological reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The survey rests on the domain assumption that existing methods can be meaningfully grouped and compared under a fingerprinting umbrella; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Existing LLM copyright protection methods can be systematically categorized and compared under a unified fingerprinting framework that includes model watermarking.
This framing is used to structure the entire survey and connect text watermarking to model-level protection.

pith-pipeline@v0.9.0 · 5839 in / 1121 out tokens · 36970 ms · 2026-05-18T22:42:15.299564+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean (and Cost/FunctionalEquation.lean) reality_from_one_distinction; washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

This work presents a comprehensive survey ... with a focus on model fingerprinting, covering ... clarifying the conceptual connection from text watermarking to model watermarking and fingerprinting, and adopting a unified terminology ... techniques for fingerprint transfer and fingerprint removal ... evaluation metrics ... effectiveness, harmlessness, robustness, stealthiness, and reliability

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience
cs.CR 2026-04 unverdicted novelty 7.0

RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.
LLM DNA: Tracing Model Evolution via Functional Representations
cs.LG 2025-09 unverdicted novelty 7.0

LLM DNA is introduced as a low-dimensional bi-Lipschitz functional representation proven to satisfy inheritance and genetic determinism, with a training-free extraction pipeline tested on 305 models to reveal relation...

Reference graph

Works this paper leans on

188 extracted references · 188 canonical work pages · cited by 2 Pith papers · 16 internal anchors

[1]

Sahar Abdelnabi and Mario Fritz. 2021. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP) . IEEE, 121–140

work page 2021
[2]

Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. 2018. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th USENIX security symposium (USENIX Security 18). 1615–1631

work page 2018
[3]

Saeif Alhazbi, Ahmed Hussain, Gabriele Oligeri, and Panos Papadimitratos. 2025. Llms have rhythm: Fingerprinting large language models using inter-token times and network traffic analysis. IEEE Open Journal of the Communications Society (2025)

work page 2025
[4]

Mohammed Hazim Alkawaz, Ghazali Sulong, Tanzila Saba, Abdulaziz S Almazyad, and Amjad Rehman. 2016. Concise analysis of current text automation and watermarking approaches. Security and Communication Networks 9, 18 (2016), 6365–6378

work page 2016
[5]

Soliman, and Amr Mohamed AbdelAziz

Walid Mohamed Aly, Taysir Hassan A. Soliman, and Amr Mohamed AbdelAziz. 2025. An Evaluation of Large Language Models on Text Summarization Tasks Using Prompt Engineering Techniques. arXiv:2507.05123 [cs.CL] https://arxiv.org/abs/2507.05123

work page arXiv 2025
[6]

Anthropic. 2025. Claude. https://claude.ai/ Accessed: 2025

work page 2025
[7]

Apache Software Foundation. 2025. Apache License, Version 2.0. https://www.apache.org/licenses/LICENSE-2.0 Accessed: 2025

work page 2025
[8]

Ansh Arora, Xuanli He, Maximilian Mozes, Srinibas Swain, Mark Dras, and Qiongkai Xu. 2024. Here’s a Free Lunch: Sanitizing Backdoored Models with Model Merge. arXiv preprint arXiv:2402.19334 (2024)

work page arXiv 2024
[9]

Mikhail J Atallah, Victor Raskin, Michael Crogan, Christian Hempelmann, Florian Kerschbaum, Dina Mohamed, and Sanket Naik. 2001. Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25–27, 2001 Proceedings 4 . Springer, 185–200

work page 2001
[10]

Xiaofan Bai, Chaoxiang He, Xiaojing Ma, Bin Benjamin Zhu, and Hai Jin. 2024. Intersecting-boundary-sensitive fingerprinting for tampering detection of DNN models. In Proceedings of the 41st International Conference on Machine Learning (Vienna, Austria) (ICML’24). JMLR.org, Article 2236, 12 pages

work page 2024
[11]

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, and Pascale Fung. 2023. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. arXiv:2302.04023 [cs.CL] https://arxiv.org/abs/2302.04023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

Devansh Bhardwaj and Naman Mishra. 2025. Invisible Traces: Using Hybrid Fingerprinting to identify underlying LLMs in GenAI Apps. arXiv:2501.18712 [cs.LG] https://arxiv.org/abs/2501.18712

work page arXiv 2025
[13]

Rishabh Bhardwaj, Do Duc Anh, and Soujanya Poria. 2024. Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. arXiv preprint arXiv:2402.11746 (2024)

work page arXiv 2024
[14]

Jean Petit Bikim, Carick Appolinaire Atezong Ymele, Azanzi Jiomekong, Allard Oelen, Gollam Rabby, Jennifer D’Souza, and Sören Auer. 2024. Leveraging GPT Models For Semantic Table Annotation. In SemTab@ISWC (CEUR Workshop Proceedings, Vol. 3889). 43–53

work page 2024
[15]

Yehonatan Bitton, Elad Bitton, and Shai Nisan. 2025. Detecting Stylistic Fingerprints of Large Language Models. arXiv preprint arXiv:2503.01659 (2025)

work page arXiv 2025
[16]

Adam Block, Ayush Sekhari, and Alexander Rakhlin. 2025. Robust and Efficient Watermarking of Large Language Models Using Error Correction Codes. Proceedings on Privacy Enhancing Technologies (PoPETs) 2025 (2025)

work page 2025
[17]

Franziska Boenisch. 2021. A systematic review on model watermarking for neural networks. Frontiers in big Data 4 (2021), 729663

work page 2021
[18]

Maxemchuk, and Lawrence O’Gorman

Jack T Brassil, Steven Low, Nicholas F. Maxemchuk, and Lawrence O’Gorman. 1995. Electronic marking and identification techniques to discourage document copying. IEEE Journal on Selected Areas in Communications 13, 8 (1995), 1495–1504

work page 1995
[19]

Jiacheng Cai, Jiahao Yu, Yangguang Shao, Yuhang Wu, and Xinyu Xing. 2024. UTF: Undertrained Tokens as Fingerprints A Novel Approach to LLM Identification. arXiv preprint arXiv:2410.12318 (2024)

work page arXiv 2024
[20]

Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. 2019. IPGuard: Protecting Intellectual Property of Deep Neural Networks via Fingerprinting the Classification Boundary. arXiv e-prints , Article arXiv:1910.12903 (Oct. 2019), arXiv:1910.12903 pages. arXiv:1910.12903 [cs.CR] doi:10.48550/arXiv.1910.12903

work page doi:10.48550/arxiv.1910.12903 2019
[21]

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. 2021. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21) . 2633–2650

work page 2021
[22]

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo- Cespedes, Steve Yuan, Chris Tar, et al . 2018. Universal sentence encoder for English. In Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations . 169–174. , Vol. 1, No. 1, Article . Public...

work page 2018
[23]

Jialuo Chen, Jingyi Wang, Tinglan Peng, Youcheng Sun, Peng Cheng, Shouling Ji, Xingjun Ma, Bo Li, and Dawn Song

work page
[24]

In 2022 IEEE symposium on security and privacy (SP)

Copy, right? a testing framework for copyright protection of deep learning models. In 2022 IEEE symposium on security and privacy (SP) . IEEE, 824–841

work page 2022
[25]

Qiguang Chen, Mingda Yang, Libo Qin, Jinhao Liu, Zheng Yan, Jiannan Guan, Dengyun Peng, Yiyan Ji, Hanjing Li, Mengkang Hu, Yimeng Zhang, Yihao Liang, Yuhang Zhou, Jiaqi Wang, Zhi Chen, and Wanxiang Che. 2025. AI4Research: A Survey of Artificial Intelligence for Scientific Research. arXiv:2507.01903 [cs.CL] https://arxiv.org/ abs/2507.01903

work page arXiv 2025
[26]

Miranda Christ, Sam Gunn, and Or Zamir. 2024. Undetectable watermarks for language models. In The Thirty Seventh Annual Conference on Learning Theory . PMLR, 1125–1139

work page 2024
[27]

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. 2019. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions. In Proceedings of NAACL-HLT. 2924–2936

work page 2019
[28]

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord

work page
[29]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, Jinyuan Liu, Yichen Gong, Qi Li, Anyu Wang, and Xiaoyun Wang

work page
[31]

arXiv preprint arXiv:2404.05188 (2024)

Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging. arXiv preprint arXiv:2404.05188 (2024)

work page arXiv 2024
[32]

Creative Commons. 2025. Licenses. https://creativecommons.org/share-your-work/cclicenses/ Accessed: 2025

work page 2025
[33]

Agnibh Dasgupta, Abdullah Tanvir, and Xin Zhong. 2024. Watermarking language models through language models. arXiv preprint arXiv:2411.05091 (2024)

work page arXiv 2024
[34]

Ioannis Dasoulas, Duo Yang, Xuemin Duan, and Anastasia Dimou. 2023. TorchicTab: Semantic Table Annotation with Wikidata and Language Models. In SemTab@ISWC (CEUR Workshop Proceedings, Vol. 3557). 21–37

work page 2023
[35]

Detecting Adversarial Examples via Neural Fingerprinting

Sumanth Dathathri, Stephan Zheng, Tianwei Yin, Richard M. Murray, and Yisong Yue. 2019. Detecting Adversarial Examples via Neural Fingerprinting. arXiv:1803.03870 [cs.LG] https://arxiv.org/abs/1803.03870

work page internal anchor Pith review Pith/arXiv arXiv 2019
[36]

Marie-Catherine De Marneffe, Mandy Simons, and Judith Tonhauser. 2019. The commitmentbank: Investigating projection in naturally occurring discourse. In proceedings of Sinn und Bedeutung , Vol. 23. 107–124

work page 2019
[37]

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[38]

DeepSeek-V3 Technical Report

DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Huaj...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL] https://arxiv.org/abs/1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2019
[40]

Maddison

Haonan Duan, Stephen Zhewen Lu, Caitlin Fiona Harrigan, Nishkrit Desai, Jiarui Lu, Michał Koziarski, Leonardo Cotta, and Chris J. Maddison. 2025. Measuring Scientific Capabilities of Language Models with a Systems Biology Dry Lab. arXiv:2507.02083 [cs.AI] https://arxiv.org/abs/2507.02083

work page arXiv 2025
[41]

Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, and Teddy Furon. 2023. Three bricks to consolidate watermarks for large language models. In 2023 IEEE international workshop on information forensics and security (WIFS). IEEE, 1–6

work page 2023
[42]

Pierre Fernandez, Guillaume Couairon, Teddy Furon, and Matthijs Douze. 2023. Functional Invariants to Watermark Large Transformers. In ICASSP 2023

work page 2023
[43]

Yu Fu, Deyi Xiong, and Yue Dong. 2024. Watermarking conditional text generation for ai detection: Unveiling challenges and a semantic-aware watermark remedy. In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 38. 18003–18011

work page 2024
[44]

ZhenZhe Gao, Zhenjun Tang, Zhaoxia Yin, Baoyuan Wu, and Yue Lu. 2024. Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing. In 2024 IEEE International Conference on Multimedia and Expo (ICME) . 1–6. doi:10.1109/ICME57554.2024.10688355

work page doi:10.1109/icme57554.2024.10688355 2024
[45]

Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and William B Dolan. 2007. The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing . 1–9

work page 2007
[46]

Thibaud Gloaguen, Robin Staab, Nikola Jovanović, and Martin Vechev. 2025. Robust LLM Fingerprinting via Domain- Specific Watermarks. arXiv preprint arXiv:2505.16723 (2025). , Vol. 1, No. 1, Article . Publication date: September 2025. 34 Zhenhua Xu, Xubin Yue, Zhebo Wang, Qichen Liu, et al

work page arXiv 2025
[47]

Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers, Vladimir Karpukhin, Brian Benedict, Mark McQuade, and Jacob Solawetz. 2024. Arcee’s MergeKit: A Toolkit for Merging Large Language Models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track , Franck Dernoncourt, Daniel Preoţiuc-Pietr...

work page doi:10.18653/v1/2024.emnlp-industry.36 2024
[48]

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning . Vol. 1. MIT press Cambridge

work page 2016
[49]

Chenxi Gu, Chengsong Huang, Xiaoqing Zheng, Kai-Wei Chang, and Cho-Jui Hsieh. 2022. Watermarking Pre- trained Language Models with Backdooring. ArXiv abs/2210.07543 (2022). https://api.semanticscholar.org/CorpusID: 252907247

work page arXiv 2022
[50]

Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto. 2024. On the Learnability of Watermarks for Language Models. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum? id=9k0krNzvlV

work page 2024
[51]

Jindong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, et al. 2023. A survey on transferability of adversarial examples across deep neural networks. arXiv preprint arXiv:2310.17626 (2023)

work page arXiv 2023
[52]

Martin Gubri, Dennis Thomas Ulmer, Hwaran Lee, Sangdoo Yun, and Seong Joon Oh. 2024. TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification. InFindings of the Association for Computational Linguistics ACL 2024. Association for Computational Linguistics, 11496–11517

work page 2024
[53]

Jia Guo and Miodrag Potkonjak. 2018. Watermarking deep neural networks for embedded systems. In Proceedings of the International Conference on Computer-Aided Design (San Diego, California) (ICCAD ’18). Association for Computing Machinery, New York, NY, USA, Article 133, 8 pages. doi:10.1145/3240765.3240862

work page doi:10.1145/3240765.3240862 2018
[54]

Qingxiao Guo, Xinjie Zhu, Yilong Ma, Hui Jin, Yunhao Wang, Weifeng Zhang, and Xiaobing Guo. 2025. Invariant-Based Robust Weights Watermark for Large Language Models. arXiv preprint arXiv:2507.08288 (2025)

work page arXiv 2025
[55]

Zhiwei He, Binglin Zhou, Hongkun Hao, Aiwei Liu, Xing Wang, Zhaopeng Tu, Zhuosheng Zhang, and Rui Wang

work page
[56]

arXiv preprint arXiv:2402.14007 (2024)

Can watermarks survive translation? on the cross-lingual consistency of text watermark for large language models. arXiv preprint arXiv:2402.14007 (2024)

work page arXiv 2024
[57]

Jakub Ho’scilowicz, Pawel Popiolek, Jan Rudkowski, Jkedrzej Bieniasz, and Artur Janicki. 2024. Unconditional Token Forcing: Extracting Text Hidden Within LLM. In 2024 19th Conference on Computer Science and Intelligence Systems (FedCSIS). IEEE, 621–624

work page 2024
[58]

Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. 2023. Semstamp: A semantic watermark with paraphrastic robustness for text generation. arXiv preprint arXiv:2310.03991 (2023)

work page arXiv 2023
[59]

Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. 2024. k-SemStamp: A clustering- based semantic watermark for detection of machine-generated text. arXiv preprint arXiv:2402.11399 (2024)

work page arXiv 2024
[60]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

work page
[61]

LoRA: Low-Rank Adaptation of Large Language Models

Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[62]

Zhengmian Hu, Lichang Chen, Xidong Wu, Yihan Wu, Hongyang Zhang, and Heng Huang. 2023. Unbiased watermark for large language models. arXiv preprint arXiv:2310.10669 (2023)

work page arXiv 2023
[63]

Viet-Phi Huynh, Yoan Chabot, Thomas Labbé, Jixiong Liu, and Raphaël Troncy. 2022. From Heuristics to Language Models: A Journey Through the Universe of Semantic Table Interpretation with DAGOBAH. InSemTab@ISWC (CEUR Workshop Proceedings, Vol. 3320). 45–58

work page 2022
[64]

JaeYoung Hwang and SangHoon Oh. 2023. A brief survey of watermarks in generative AI. In 2023 14th International Conference on Information and Communication Technology Convergence (ICTC) . IEEE, 1157–1160

work page 2023
[65]

Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. 2022. Editing models with task arithmetic. arXiv preprint arXiv:2212.04089 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[66]

Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. In International Conference on Learning Representations . https://openreview.net/forum?id=rkE3y85ee

work page 2017
[67]

Wenxiang Jiao, Wenxuan Wang, Jen tse Huang, Xing Wang, Shuming Shi, and Zhaopeng Tu. 2023. Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine. arXiv:2301.08745 [cs.CL] https://arxiv.org/abs/2301.08745

work page arXiv 2023
[68]

Heng Jin, Chaoyu Zhang, Shanghao Shi, Wenjing Lou, and Y Thomas Hou. 2024. Proflingo: A fingerprinting-based intellectual property protection scheme for large language models. In 2024 IEEE Conference on Communications and Network Security (CNS). IEEE, 1–9

work page 2024
[69]

Jones et al

E. Jones et al . 2023. Automatically Auditing Large Language Models via Discrete Optimization. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202), A. Krause et al. (Eds.). PMLR, 15307–15329. https://proceedings.mlr.press/v202/jones23a.html , Vol. 1, No. 1,...

work page 2023
[70]

Nurul Shamimi Kamaruddin, Amirrudin Kamsin, Lip Yee Por, and Hameedur Rahman. 2018. A review of text watermarking: theory, methods, and applications. IEEE Access 6 (2018), 8011–8028

work page 2018
[71]

Hamid Karimi and Jiliang Tang. 2020. Decision Boundary of Deep Neural Networks: Challenges and Opportunities. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM ’20). Association for Computing Machinery, New York, NY, USA, 919–920. doi:10.1145/3336191.3372186

work page doi:10.1145/3336191.3372186 2020
[72]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119

work page 2020
[73]

Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. 2018. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 252–262

work page 2018
[74]

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A watermark for large language models. In International Conference on Machine Learning . PMLR, 17061–17084

work page 2023
[75]

Papadopoulos, and Vasilis Efthymiou

Panagiotis Koletsis, Christos Panagiotopoulos, Georgios Th. Papadopoulos, and Vasilis Efthymiou. 2025. Relationship Detection on Tabular Data Using Statistical Analysis and Large Language Models. arXiv:2506.06371 [cs.CL] https: //arxiv.org/abs/2506.06371

work page arXiv 2025
[76]

Dezhang Kong, Shi Lin, Zhenhua Xu, Zhebo Wang, Minghao Li, Yufeng Li, Yilun Zhang, Hujin Peng, Zeyang Sha, Yuyuan Li, Changting Lin, Xun Wang, Xuan Liu, Ningyu Zhang, Chaochao Chen, Muhammad Khurram Khan, and Meng Han. 2025. A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures. arXiv:2506.19676 [cs.CR] http...

work page arXiv 2025
[77]

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. 2019. Similarity of neural network representations revisited. In International conference on machine learning . PMlR, 3519–3529

work page 2019
[78]

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. 2023. Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593 (2023)

work page arXiv 2023
[79]

Minoru Kuribayashi, Tatsuya Yasui, Asad Malik, and Nobuo Funabiki. 2023. Immunization of Pruning Attack in DNN Watermarking Using Constant Weight Code. In Proceedings of ICASSP 2023 (arXiv preprint arXiv:2107.02961) . 1–5

work page arXiv 2023
[80]

Harsh Nishant Lalai, Aashish Anantha Ramakrishnan, Raj Sanjay Shah, and Dongwon Lee. 2024. From intentions to techniques: A comprehensive taxonomy and challenges in text watermarking for large language models. arXiv preprint arXiv:2406.11106 (2024)

work page arXiv 2024

Showing first 80 references.

[1] [1]

Sahar Abdelnabi and Mario Fritz. 2021. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP) . IEEE, 121–140

work page 2021

[2] [2]

Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. 2018. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th USENIX security symposium (USENIX Security 18). 1615–1631

work page 2018

[3] [3]

Saeif Alhazbi, Ahmed Hussain, Gabriele Oligeri, and Panos Papadimitratos. 2025. Llms have rhythm: Fingerprinting large language models using inter-token times and network traffic analysis. IEEE Open Journal of the Communications Society (2025)

work page 2025

[4] [4]

Mohammed Hazim Alkawaz, Ghazali Sulong, Tanzila Saba, Abdulaziz S Almazyad, and Amjad Rehman. 2016. Concise analysis of current text automation and watermarking approaches. Security and Communication Networks 9, 18 (2016), 6365–6378

work page 2016

[5] [5]

Soliman, and Amr Mohamed AbdelAziz

Walid Mohamed Aly, Taysir Hassan A. Soliman, and Amr Mohamed AbdelAziz. 2025. An Evaluation of Large Language Models on Text Summarization Tasks Using Prompt Engineering Techniques. arXiv:2507.05123 [cs.CL] https://arxiv.org/abs/2507.05123

work page arXiv 2025

[6] [6]

Anthropic. 2025. Claude. https://claude.ai/ Accessed: 2025

work page 2025

[7] [7]

Apache Software Foundation. 2025. Apache License, Version 2.0. https://www.apache.org/licenses/LICENSE-2.0 Accessed: 2025

work page 2025

[8] [8]

Ansh Arora, Xuanli He, Maximilian Mozes, Srinibas Swain, Mark Dras, and Qiongkai Xu. 2024. Here’s a Free Lunch: Sanitizing Backdoored Models with Model Merge. arXiv preprint arXiv:2402.19334 (2024)

work page arXiv 2024

[9] [9]

Mikhail J Atallah, Victor Raskin, Michael Crogan, Christian Hempelmann, Florian Kerschbaum, Dina Mohamed, and Sanket Naik. 2001. Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25–27, 2001 Proceedings 4 . Springer, 185–200

work page 2001

[10] [10]

Xiaofan Bai, Chaoxiang He, Xiaojing Ma, Bin Benjamin Zhu, and Hai Jin. 2024. Intersecting-boundary-sensitive fingerprinting for tampering detection of DNN models. In Proceedings of the 41st International Conference on Machine Learning (Vienna, Austria) (ICML’24). JMLR.org, Article 2236, 12 pages

work page 2024

[11] [11]

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, and Pascale Fung. 2023. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. arXiv:2302.04023 [cs.CL] https://arxiv.org/abs/2302.04023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[12] [12]

Devansh Bhardwaj and Naman Mishra. 2025. Invisible Traces: Using Hybrid Fingerprinting to identify underlying LLMs in GenAI Apps. arXiv:2501.18712 [cs.LG] https://arxiv.org/abs/2501.18712

work page arXiv 2025

[13] [13]

Rishabh Bhardwaj, Do Duc Anh, and Soujanya Poria. 2024. Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. arXiv preprint arXiv:2402.11746 (2024)

work page arXiv 2024

[14] [14]

Jean Petit Bikim, Carick Appolinaire Atezong Ymele, Azanzi Jiomekong, Allard Oelen, Gollam Rabby, Jennifer D’Souza, and Sören Auer. 2024. Leveraging GPT Models For Semantic Table Annotation. In SemTab@ISWC (CEUR Workshop Proceedings, Vol. 3889). 43–53

work page 2024

[15] [15]

Yehonatan Bitton, Elad Bitton, and Shai Nisan. 2025. Detecting Stylistic Fingerprints of Large Language Models. arXiv preprint arXiv:2503.01659 (2025)

work page arXiv 2025

[16] [16]

Adam Block, Ayush Sekhari, and Alexander Rakhlin. 2025. Robust and Efficient Watermarking of Large Language Models Using Error Correction Codes. Proceedings on Privacy Enhancing Technologies (PoPETs) 2025 (2025)

work page 2025

[17] [17]

Franziska Boenisch. 2021. A systematic review on model watermarking for neural networks. Frontiers in big Data 4 (2021), 729663

work page 2021

[18] [18]

Maxemchuk, and Lawrence O’Gorman

Jack T Brassil, Steven Low, Nicholas F. Maxemchuk, and Lawrence O’Gorman. 1995. Electronic marking and identification techniques to discourage document copying. IEEE Journal on Selected Areas in Communications 13, 8 (1995), 1495–1504

work page 1995

[19] [19]

Jiacheng Cai, Jiahao Yu, Yangguang Shao, Yuhang Wu, and Xinyu Xing. 2024. UTF: Undertrained Tokens as Fingerprints A Novel Approach to LLM Identification. arXiv preprint arXiv:2410.12318 (2024)

work page arXiv 2024

[20] [20]

Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. 2019. IPGuard: Protecting Intellectual Property of Deep Neural Networks via Fingerprinting the Classification Boundary. arXiv e-prints , Article arXiv:1910.12903 (Oct. 2019), arXiv:1910.12903 pages. arXiv:1910.12903 [cs.CR] doi:10.48550/arXiv.1910.12903

work page doi:10.48550/arxiv.1910.12903 2019

[21] [21]

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. 2021. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21) . 2633–2650

work page 2021

[22] [22]

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo- Cespedes, Steve Yuan, Chris Tar, et al . 2018. Universal sentence encoder for English. In Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations . 169–174. , Vol. 1, No. 1, Article . Public...

work page 2018

[23] [23]

Jialuo Chen, Jingyi Wang, Tinglan Peng, Youcheng Sun, Peng Cheng, Shouling Ji, Xingjun Ma, Bo Li, and Dawn Song

work page

[24] [24]

In 2022 IEEE symposium on security and privacy (SP)

Copy, right? a testing framework for copyright protection of deep learning models. In 2022 IEEE symposium on security and privacy (SP) . IEEE, 824–841

work page 2022

[25] [25]

Qiguang Chen, Mingda Yang, Libo Qin, Jinhao Liu, Zheng Yan, Jiannan Guan, Dengyun Peng, Yiyan Ji, Hanjing Li, Mengkang Hu, Yimeng Zhang, Yihao Liang, Yuhang Zhou, Jiaqi Wang, Zhi Chen, and Wanxiang Che. 2025. AI4Research: A Survey of Artificial Intelligence for Scientific Research. arXiv:2507.01903 [cs.CL] https://arxiv.org/ abs/2507.01903

work page arXiv 2025

[26] [26]

Miranda Christ, Sam Gunn, and Or Zamir. 2024. Undetectable watermarks for language models. In The Thirty Seventh Annual Conference on Learning Theory . PMLR, 1125–1139

work page 2024

[27] [27]

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. 2019. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions. In Proceedings of NAACL-HLT. 2924–2936

work page 2019

[28] [28]

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord

work page

[29] [29]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, Jinyuan Liu, Yichen Gong, Qi Li, Anyu Wang, and Xiaoyun Wang

work page

[31] [31]

arXiv preprint arXiv:2404.05188 (2024)

Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging. arXiv preprint arXiv:2404.05188 (2024)

work page arXiv 2024

[32] [32]

Creative Commons. 2025. Licenses. https://creativecommons.org/share-your-work/cclicenses/ Accessed: 2025

work page 2025

[33] [33]

Agnibh Dasgupta, Abdullah Tanvir, and Xin Zhong. 2024. Watermarking language models through language models. arXiv preprint arXiv:2411.05091 (2024)

work page arXiv 2024

[34] [34]

Ioannis Dasoulas, Duo Yang, Xuemin Duan, and Anastasia Dimou. 2023. TorchicTab: Semantic Table Annotation with Wikidata and Language Models. In SemTab@ISWC (CEUR Workshop Proceedings, Vol. 3557). 21–37

work page 2023

[35] [35]

Detecting Adversarial Examples via Neural Fingerprinting

Sumanth Dathathri, Stephan Zheng, Tianwei Yin, Richard M. Murray, and Yisong Yue. 2019. Detecting Adversarial Examples via Neural Fingerprinting. arXiv:1803.03870 [cs.LG] https://arxiv.org/abs/1803.03870

work page internal anchor Pith review Pith/arXiv arXiv 2019

[36] [36]

Marie-Catherine De Marneffe, Mandy Simons, and Judith Tonhauser. 2019. The commitmentbank: Investigating projection in naturally occurring discourse. In proceedings of Sinn und Bedeutung , Vol. 23. 107–124

work page 2019

[37] [37]

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[38] [38]

DeepSeek-V3 Technical Report

DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Huaj...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[39] [39]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL] https://arxiv.org/abs/1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2019

[40] [40]

Maddison

Haonan Duan, Stephen Zhewen Lu, Caitlin Fiona Harrigan, Nishkrit Desai, Jiarui Lu, Michał Koziarski, Leonardo Cotta, and Chris J. Maddison. 2025. Measuring Scientific Capabilities of Language Models with a Systems Biology Dry Lab. arXiv:2507.02083 [cs.AI] https://arxiv.org/abs/2507.02083

work page arXiv 2025

[41] [41]

Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, and Teddy Furon. 2023. Three bricks to consolidate watermarks for large language models. In 2023 IEEE international workshop on information forensics and security (WIFS). IEEE, 1–6

work page 2023

[42] [42]

Pierre Fernandez, Guillaume Couairon, Teddy Furon, and Matthijs Douze. 2023. Functional Invariants to Watermark Large Transformers. In ICASSP 2023

work page 2023

[43] [43]

Yu Fu, Deyi Xiong, and Yue Dong. 2024. Watermarking conditional text generation for ai detection: Unveiling challenges and a semantic-aware watermark remedy. In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 38. 18003–18011

work page 2024

[44] [44]

ZhenZhe Gao, Zhenjun Tang, Zhaoxia Yin, Baoyuan Wu, and Yue Lu. 2024. Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing. In 2024 IEEE International Conference on Multimedia and Expo (ICME) . 1–6. doi:10.1109/ICME57554.2024.10688355

work page doi:10.1109/icme57554.2024.10688355 2024

[45] [45]

Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and William B Dolan. 2007. The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing . 1–9

work page 2007

[46] [46]

Thibaud Gloaguen, Robin Staab, Nikola Jovanović, and Martin Vechev. 2025. Robust LLM Fingerprinting via Domain- Specific Watermarks. arXiv preprint arXiv:2505.16723 (2025). , Vol. 1, No. 1, Article . Publication date: September 2025. 34 Zhenhua Xu, Xubin Yue, Zhebo Wang, Qichen Liu, et al

work page arXiv 2025

[47] [47]

Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers, Vladimir Karpukhin, Brian Benedict, Mark McQuade, and Jacob Solawetz. 2024. Arcee’s MergeKit: A Toolkit for Merging Large Language Models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track , Franck Dernoncourt, Daniel Preoţiuc-Pietr...

work page doi:10.18653/v1/2024.emnlp-industry.36 2024

[48] [48]

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning . Vol. 1. MIT press Cambridge

work page 2016

[49] [49]

Chenxi Gu, Chengsong Huang, Xiaoqing Zheng, Kai-Wei Chang, and Cho-Jui Hsieh. 2022. Watermarking Pre- trained Language Models with Backdooring. ArXiv abs/2210.07543 (2022). https://api.semanticscholar.org/CorpusID: 252907247

work page arXiv 2022

[50] [50]

Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto. 2024. On the Learnability of Watermarks for Language Models. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum? id=9k0krNzvlV

work page 2024

[51] [51]

Jindong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, et al. 2023. A survey on transferability of adversarial examples across deep neural networks. arXiv preprint arXiv:2310.17626 (2023)

work page arXiv 2023

[52] [52]

Martin Gubri, Dennis Thomas Ulmer, Hwaran Lee, Sangdoo Yun, and Seong Joon Oh. 2024. TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification. InFindings of the Association for Computational Linguistics ACL 2024. Association for Computational Linguistics, 11496–11517

work page 2024

[53] [53]

Jia Guo and Miodrag Potkonjak. 2018. Watermarking deep neural networks for embedded systems. In Proceedings of the International Conference on Computer-Aided Design (San Diego, California) (ICCAD ’18). Association for Computing Machinery, New York, NY, USA, Article 133, 8 pages. doi:10.1145/3240765.3240862

work page doi:10.1145/3240765.3240862 2018

[54] [54]

Qingxiao Guo, Xinjie Zhu, Yilong Ma, Hui Jin, Yunhao Wang, Weifeng Zhang, and Xiaobing Guo. 2025. Invariant-Based Robust Weights Watermark for Large Language Models. arXiv preprint arXiv:2507.08288 (2025)

work page arXiv 2025

[55] [55]

Zhiwei He, Binglin Zhou, Hongkun Hao, Aiwei Liu, Xing Wang, Zhaopeng Tu, Zhuosheng Zhang, and Rui Wang

work page

[56] [56]

arXiv preprint arXiv:2402.14007 (2024)

Can watermarks survive translation? on the cross-lingual consistency of text watermark for large language models. arXiv preprint arXiv:2402.14007 (2024)

work page arXiv 2024

[57] [57]

Jakub Ho’scilowicz, Pawel Popiolek, Jan Rudkowski, Jkedrzej Bieniasz, and Artur Janicki. 2024. Unconditional Token Forcing: Extracting Text Hidden Within LLM. In 2024 19th Conference on Computer Science and Intelligence Systems (FedCSIS). IEEE, 621–624

work page 2024

[58] [58]

Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. 2023. Semstamp: A semantic watermark with paraphrastic robustness for text generation. arXiv preprint arXiv:2310.03991 (2023)

work page arXiv 2023

[59] [59]

Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. 2024. k-SemStamp: A clustering- based semantic watermark for detection of machine-generated text. arXiv preprint arXiv:2402.11399 (2024)

work page arXiv 2024

[60] [60]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

work page

[61] [61]

LoRA: Low-Rank Adaptation of Large Language Models

Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[62] [62]

Zhengmian Hu, Lichang Chen, Xidong Wu, Yihan Wu, Hongyang Zhang, and Heng Huang. 2023. Unbiased watermark for large language models. arXiv preprint arXiv:2310.10669 (2023)

work page arXiv 2023

[63] [63]

Viet-Phi Huynh, Yoan Chabot, Thomas Labbé, Jixiong Liu, and Raphaël Troncy. 2022. From Heuristics to Language Models: A Journey Through the Universe of Semantic Table Interpretation with DAGOBAH. InSemTab@ISWC (CEUR Workshop Proceedings, Vol. 3320). 45–58

work page 2022

[64] [64]

JaeYoung Hwang and SangHoon Oh. 2023. A brief survey of watermarks in generative AI. In 2023 14th International Conference on Information and Communication Technology Convergence (ICTC) . IEEE, 1157–1160

work page 2023

[65] [65]

Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. 2022. Editing models with task arithmetic. arXiv preprint arXiv:2212.04089 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[66] [66]

Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. In International Conference on Learning Representations . https://openreview.net/forum?id=rkE3y85ee

work page 2017

[67] [67]

Wenxiang Jiao, Wenxuan Wang, Jen tse Huang, Xing Wang, Shuming Shi, and Zhaopeng Tu. 2023. Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine. arXiv:2301.08745 [cs.CL] https://arxiv.org/abs/2301.08745

work page arXiv 2023

[68] [68]

Heng Jin, Chaoyu Zhang, Shanghao Shi, Wenjing Lou, and Y Thomas Hou. 2024. Proflingo: A fingerprinting-based intellectual property protection scheme for large language models. In 2024 IEEE Conference on Communications and Network Security (CNS). IEEE, 1–9

work page 2024

[69] [69]

Jones et al

E. Jones et al . 2023. Automatically Auditing Large Language Models via Discrete Optimization. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202), A. Krause et al. (Eds.). PMLR, 15307–15329. https://proceedings.mlr.press/v202/jones23a.html , Vol. 1, No. 1,...

work page 2023

[70] [70]

Nurul Shamimi Kamaruddin, Amirrudin Kamsin, Lip Yee Por, and Hameedur Rahman. 2018. A review of text watermarking: theory, methods, and applications. IEEE Access 6 (2018), 8011–8028

work page 2018

[71] [71]

Hamid Karimi and Jiliang Tang. 2020. Decision Boundary of Deep Neural Networks: Challenges and Opportunities. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM ’20). Association for Computing Machinery, New York, NY, USA, 919–920. doi:10.1145/3336191.3372186

work page doi:10.1145/3336191.3372186 2020

[72] [72]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119

work page 2020

[73] [73]

Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. 2018. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 252–262

work page 2018

[74] [74]

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A watermark for large language models. In International Conference on Machine Learning . PMLR, 17061–17084

work page 2023

[75] [75]

Papadopoulos, and Vasilis Efthymiou

Panagiotis Koletsis, Christos Panagiotopoulos, Georgios Th. Papadopoulos, and Vasilis Efthymiou. 2025. Relationship Detection on Tabular Data Using Statistical Analysis and Large Language Models. arXiv:2506.06371 [cs.CL] https: //arxiv.org/abs/2506.06371

work page arXiv 2025

[76] [76]

Dezhang Kong, Shi Lin, Zhenhua Xu, Zhebo Wang, Minghao Li, Yufeng Li, Yilun Zhang, Hujin Peng, Zeyang Sha, Yuyuan Li, Changting Lin, Xun Wang, Xuan Liu, Ningyu Zhang, Chaochao Chen, Muhammad Khurram Khan, and Meng Han. 2025. A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures. arXiv:2506.19676 [cs.CR] http...

work page arXiv 2025

[77] [77]

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. 2019. Similarity of neural network representations revisited. In International conference on machine learning . PMlR, 3519–3529

work page 2019

[78] [78]

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. 2023. Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593 (2023)

work page arXiv 2023

[79] [79]

Minoru Kuribayashi, Tatsuya Yasui, Asad Malik, and Nobuo Funabiki. 2023. Immunization of Pruning Attack in DNN Watermarking Using Constant Weight Code. In Proceedings of ICASSP 2023 (arXiv preprint arXiv:2107.02961) . 1–5

work page arXiv 2023

[80] [80]

Harsh Nishant Lalai, Aashish Anantha Ramakrishnan, Raj Sanjay Shah, and Dongwon Lee. 2024. From intentions to techniques: A comprehensive taxonomy and challenges in text watermarking for large language models. arXiv preprint arXiv:2406.11106 (2024)

work page arXiv 2024