Recognition: 2 theorem links
· Lean TheoremTextSeal: A Localized LLM Watermark for Provenance & Distillation Protection
Pith reviewed 2026-05-13 03:34 UTC · model grok-4.3
The pith
TextSeal adds a detectable watermark to LLM outputs that stays visible even after mixing with human text or distillation into new models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TextSeal builds dual-key generation into Gumbel-max sampling to restore diversity while adding entropy-weighted scoring and multi-region localization for detection. This produces a theoretically distortion-free watermark with no inference overhead that supports serving optimizations. It outperforms baselines like SynthID-text in detection strength, remains effective under heavy dilution with human text, preserves reasoning performance, and shows no quality difference in multilingual human evaluations. The watermark signal also transfers through model distillation, allowing detection of unauthorized downstream use.
What carries the argument
Dual-key generation on Gumbel-max sampling combined with entropy-weighted scoring and multi-region localization, which enables localized detection without changing the output distribution.
If this is right
- Detection stays confident even when AI text is heavily mixed with human writing.
- No added cost at inference time and compatibility with speculative decoding and multi-token prediction.
- Preservation of downstream task performance on reasoning benchmarks.
- Signal transfer enables detection after unauthorized distillation of the model.
- No perceptible quality difference shown across 6000 A/B comparisons in five languages.
Where Pith is reading between the lines
- The approach could support provenance tracking in deployed LLM services by tagging training data origins.
- If the signal survives multiple rounds of distillation, it might allow tracing of model lineages over time.
- It could be adapted to other generative settings where output mixing and reuse are concerns.
- Production use would require deciding how to handle watermark presence in open model releases.
Load-bearing premise
The watermark signal transfers through model distillation with enough strength for reliable detection and that the lack of quality impact holds under all practical serving and adversarial conditions.
What would settle it
A test showing that text from models distilled on TextSeal-watermarked outputs yields detection scores no higher than random chance, or a human preference study where watermarked text is rated lower than non-watermarked text.
read the original abstract
We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as speculative decoding and multi-token prediction, and does not add any inference overhead. TextSeal strictly dominates baselines like SynthID-text in detection strength and is robust to dilution, maintaining confident localized detection even in heavily mixed human/AI documents. The scheme is theoretically distortion-free, and evaluation across reasoning benchmarks confirms that it preserves downstream performance; while a multilingual human evaluation (6000 A/B comparisons, 5 languages) shows no perceptible quality difference. Beyond its use for provenance detection, TextSeal is also ``radioactive'': its watermark signal transfers through model distillation, enabling detection of unauthorized use.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces TextSeal, a localized LLM watermark based on Gumbel-max sampling augmented with dual-key generation to restore diversity, entropy-weighted scoring for detection, and multi-region localization. It claims strict dominance over baselines such as SynthID-text in detection strength, robustness to dilution in mixed human/AI text, theoretical distortion-freeness with no inference overhead or quality impact (supported by reasoning benchmarks and a 6000-comparison multilingual human study across 5 languages), compatibility with speculative decoding and multi-token prediction, and a 'radioactive' property whereby the watermark signal transfers through model distillation to enable detection of unauthorized use.
Significance. If substantiated, the work would advance LLM provenance and IP protection by adding practical localization and the novel distillation-transfer capability while preserving serving efficiency and output quality. The scale of the human evaluation (6000 A/B tests) is a concrete strength that lends credibility to the no-quality-impact claim.
major comments (3)
- [Abstract and §4] Abstract and §4 (Evaluation): The claim that TextSeal 'strictly dominates' SynthID-text and maintains 'confident localized detection' under heavy dilution is asserted without any reported metrics (AUC, FPR, precision-recall curves, or statistical significance tests) or tables; this is load-bearing for the dominance and robustness assertions.
- [§3] §3 (Method) and theoretical analysis: The 'theoretically distortion-free' property is stated without a derivation, proof sketch, or explicit argument showing that dual-key Gumbel-max plus entropy weighting leaves the output distribution unchanged; the central quality-preservation claim rests on this unshown step.
- [§5] §5 (Distillation experiments): The 'radioactive' claim that the watermark signal transfers through distillation is central to the provenance-protection use case, yet no quantitative results (post-distillation detection AUC, number of epochs/temperature/mixing ratios, or comparison to non-watermarked controls) are supplied; distillation is known to dilute sampling biases, making this the weakest link.
minor comments (3)
- [§3] Clarify the exact definition and generation procedure for the dual keys in the methods section to avoid ambiguity with prior Gumbel-max watermarking.
- [§4] Add error bars or confidence intervals to all reported benchmark and human-study results.
- [Figures] Ensure figures showing localization and dilution robustness include clear legends and axis labels.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the requested quantitative metrics, theoretical derivations, and experimental results, thereby strengthening the support for our claims.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Evaluation): The claim that TextSeal 'strictly dominates' SynthID-text and maintains 'confident localized detection' under heavy dilution is asserted without any reported metrics (AUC, FPR, precision-recall curves, or statistical significance tests) or tables; this is load-bearing for the dominance and robustness assertions.
Authors: We acknowledge that the manuscript currently supports the dominance and robustness claims primarily through reasoning benchmark results and the large-scale human study rather than explicit detection performance metrics. In the revised version, we will expand §4 with new tables and figures that report AUC, FPR at fixed thresholds, precision-recall curves, and statistical significance tests (including p-values from paired comparisons) for TextSeal versus SynthID-text across dilution ratios (10–90% AI-generated content). These additions will provide the quantitative backing for the strict dominance and confident localized detection assertions. revision: yes
-
Referee: [§3] §3 (Method) and theoretical analysis: The 'theoretically distortion-free' property is stated without a derivation, proof sketch, or explicit argument showing that dual-key Gumbel-max plus entropy weighting leaves the output distribution unchanged; the central quality-preservation claim rests on this unshown step.
Authors: We agree that a formal argument is required to substantiate the distortion-free claim. We will add a proof sketch to §3 showing that the dual-key Gumbel-max procedure, when paired with entropy-weighted scoring at detection time only, leaves the token sampling distribution identical to the original model. The keys are chosen independently of token logits, and the entropy weighting affects only the detector, not generation, ensuring equivalence in distribution and zero inference overhead. revision: yes
-
Referee: [§5] §5 (Distillation experiments): The 'radioactive' claim that the watermark signal transfers through distillation is central to the provenance-protection use case, yet no quantitative results (post-distillation detection AUC, number of epochs/temperature/mixing ratios, or comparison to non-watermarked controls) are supplied; distillation is known to dilute sampling biases, making this the weakest link.
Authors: We recognize that the distillation-transfer property requires stronger empirical grounding. In the revised §5 we will include quantitative results from our distillation experiments, reporting post-distillation detection AUC, the exact distillation hyperparameters (epochs, temperature, mixing ratios with human text), and side-by-side comparisons against non-watermarked control models. We will also discuss the degree of signal retention in light of known dilution effects during distillation. revision: yes
Circularity Check
No circularity detected; claims rely on empirical evaluation without self-referential derivations
full rationale
The provided abstract and text describe TextSeal's construction via Gumbel-max sampling with added dual-key, entropy-weighted, and localization features, plus claims of theoretical distortion-freeness and distillation radioactivity. No equations, derivations, or load-bearing steps are shown that reduce by construction to fitted inputs, self-definitions, or self-citations. Central assertions (dominance over baselines, no quality impact, signal transfer) are presented as outcomes of benchmarks and human evaluations rather than tautological renamings or imported uniqueness theorems. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearTextSeal builds upon the Gumbel-max framework and introduces three core improvements: Dual-Key Generation... Entropy-Weighted Detection... Localized Detection
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearthe scheme is theoretically distortion-free... its watermark signal transfers through model distillation
Reference graph
Works this paper leans on
-
[1]
Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (AI Act),
work page 2024
-
[2]
Second draft published March 2026; enforcement of Article 50 obligations begins August 2,
work page 2026
-
[3]
PierreFernandez, AntoineChaffin, KarimTit, VivienChappelier, andTeddyFuron. Threebrickstoconsolidate watermarks for large language models.2023 IEEE International Workshop on Information Forensics and Security (WIFS),
work page 2023
-
[4]
How good is post-hoc watermarking with language model rephrasing?arXiv preprint arXiv:2512.16904,
Pierre Fernandez, Tom Sander, Hady Elsahar, Hongyan Chang, Tomáš Souček, Valeriu Lacatusu, Tuan Tran, Sylvestre-Alvise Rebuffi, and Alexandre Mourachko. How good is post-hoc watermarking with language model rephrasing?arXiv preprint arXiv:2512.16904,
-
[5]
Eva Giboulot and Teddy Furon. Watermax: breaking the llm watermark detectability-robustness-quality trade-off.arXiv preprint arXiv:2403.04808,
-
[6]
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, and Gabriel Synnaeve. Better & faster large language models via multi-token prediction.arXiv preprint arXiv:2404.19737,
-
[7]
On the learnability of watermarks for language models.arXiv preprint arXiv:2312.04469,
Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto. On the learnability of watermarks for language models.arXiv preprint arXiv:2312.04469,
-
[8]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
The Curious Case of Neural Text Degeneration
17 Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751,
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[10]
Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. Semstamp: A semantic watermark with paraphrastic robustness for text generation.arXiv preprint arXiv:2310.03991,
-
[11]
Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. k-semstamp: A clustering- based semantic watermark for detection of machine-generated text.arXiv preprint arXiv:2402.11399,
-
[12]
A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a. John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein. On the reliability of watermarks for...
-
[13]
Waterfall: Framework for robust and scalable text watermarking
Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, and Bryan Kian Hsiang Low. Waterfall: Framework for robust and scalable text watermarking. InICML 2024 Workshop on Foun- dation Models in the Wild,
work page 2024
-
[14]
Who wrote this code? watermarking for code generation.arXiv preprint arXiv:2305.15060,
Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim. Who wrote this code? watermarking for code generation.arXiv preprint arXiv:2305.15060,
-
[15]
A semantic invariant robust watermark for large language models.arXiv preprint arXiv:2310.06356,
Aiwei Liu, Leyi Pan, Xuming Hu, Shiao Meng, and Lijie Wen. A semantic invariant robust watermark for large language models.arXiv preprint arXiv:2310.06356,
-
[16]
Adaptive text watermark for large language models.arXiv preprint arXiv:2401.13927,
Yepeng Liu and Yuheng Bu. Adaptive text watermark for large language models.arXiv preprint arXiv:2401.13927,
-
[17]
Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, and Tatsunori B Hashimoto. s1: Simple test-time scaling. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 20286–20332,
work page 2025
-
[18]
Markllm: An open-source toolkit for llm watermarking.arXiv preprint arXiv:2405.10051,
Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, et al. Markllm: An open-source toolkit for llm watermarking.arXiv preprint arXiv:2405.10051,
-
[19]
Mark my words: Analyzing and evaluating language model watermarks.arXiv preprint arXiv:2312.00273,
Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, and David Wagner. Mark my words: Analyzing and evaluating language model watermarks.arXiv preprint arXiv:2312.00273,
-
[20]
18 Wenjie Qu, Dong Yin, Zixin He, Wei Zou, Tianyang Tao, Jinyuan Jia, and Jiaheng Zhang. Provably robust multi-bit watermarking for ai-generated text via error correction code.arXiv preprint arXiv:2401.16820,
-
[21]
Detecting benchmark contamination through watermarking.arXiv preprint arXiv:2502.17259,
Tom Sander, Pierre Fernandez, Saeed Mahloujifar, Alain Durmus, and Chuan Guo. Detecting benchmark contamination through watermarking.arXiv preprint arXiv:2502.17259,
-
[22]
Qwen2.5 technical report.arXiv preprint arXiv:2409.12117,
Qwen Team. Qwen2.5 technical report.arXiv preprint arXiv:2409.12117,
-
[23]
Natural language watermark- ing: Challenges in building a practical system
Mercan Topkara, Giuseppe Riccardi, Dilek Hakkani-Tür, and Mikhail J Atallah. Natural language watermark- ing: Challenges in building a practical system. InSecurity, Steganography, and Watermarking of Multimedia Contents VIII, pages 106–117. SPIE, 2006a. Mercan Topkara, Umut Topkara, and Mikhail J Atallah. Words are not enough: sentence level natural langu...
-
[24]
Ashish Venugopal, Jakob Uszkoreit, David Talbot, Franz Josef Och, and Juri Ganitkevitch. Watermarking the outputs of structured prediction with an application in statistical machine translation. InProceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1363–1372,
work page 2011
-
[25]
Morphmark: Flexible adaptive watermarking for large language models.arXiv preprint arXiv:2505.11541,
Zongqi Wang, Tianle Gu, Baoyuan Wu, and Yujiu Yang. Morphmark: Flexible adaptive watermarking for large language models.arXiv preprint arXiv:2505.11541,
-
[26]
Yihan Wu, Zhengmian Hu, Hongyang Zhang, and Heng Huang. Dipmark: A stealthy, efficient and resilient watermark for large language models.arXiv preprint arXiv:2310.07710,
-
[27]
Robust multi-bit text watermark with llm-based paraphrasers.arXiv preprint arXiv:2412.03123,
Xiaojun Xu, Jinghan Jia, Yuanshun Yao, Yang Liu, and Hang Li. Robust multi-bit text watermark with llm-based paraphrasers.arXiv preprint arXiv:2412.03123,
-
[28]
KiYoon Yoo, Wonhyuk Ahn, Jiho Jang, and Nojun Kwak. Robust multi-bit natural language watermarking through invariant features.arXiv preprint arXiv:2305.01904,
-
[29]
Advancing beyond identification: Multi-bit watermark for large language models
KiYoon Yoo, Wonhyuk Ahn, and Nojun Kwak. Advancing beyond identification: Multi-bit watermark for large language models. InProceedings of the 2024 Conference of the North American Chapter of the Association 19 for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4031–4055,
work page 2024
-
[30]
Jingqi Zhang, Ruibo Chen, Yingqing Yang, Peihua Mai, Heng Huang, and Yan Pang. Leave no trace: Black- box detection of copyrighted dataset usage in large language models via watermarking.arXiv preprint arXiv:2510.02962,
-
[31]
Xuandong Zhao, Lei Li, and Yu-Xiang Wang. Permute-and-flip: An optimally robust and watermarkable decoder for llms.arXiv preprint arXiv:2402.05864,
-
[32]
20 Appendix A More Technical Details on the Methods A.1 Hash Function Implementation The PRF takes as input the candidate tokenx, a context windoww= (w1, . . . , wk)ofktoken IDs, and the secret keyK(all of them are integers), and outputs a random integer in[0, M). We compute the hash as follows: h′(x,w, K) = p2 ·x+ kX i=1 wi ·q i +p 3 ·K ! ·p 4,(8) h(x,w,...
work page 2024
-
[33]
Using a Gaussian tail approximation, the logp-value of a Z-score islnp≈ −1 2 Z2
Letδ= (µ w −µ 0)/σbe the per-token signal-to-noise ratio. Using a Gaussian tail approximation, the logp-value of a Z-score islnp≈ −1 2 Z2. We define∆2 =δ 2/2 as the expected logp-value accumulation rate per watermarked token. Power of the Global Test.The global test evaluates allntokens. The expected Z-score is: Zglobal = ρnσδ σ√n =ρδ √n=⇒E[lnp global]≈ −...
work page 2025
-
[34]
Tie.” For the final analysis, “Both Good,
to assess whether wa- termarking systematically affects script consistency or refusal rates. For script consistency, we observe 52 discordant pairs where WM was wrong but Non-WM was correct, versus 39 where Non-WM was wrong but WM was correct; with continuity correction, this yieldsχ2 = 1.58andp= 0.21. For refusal rates, we find 21 pairs where WM refused ...
work page 1987
-
[35]
Secret keys are calibrated per method via a Kolmogorov–Smirnov test to ensure uniform PRF hashes on unwatermarked text as done in Fernandez et al. (2025). The teacher generates 5,000 solutions using vLLM (Kwon et al.,
work page 2025
-
[36]
(rank 128, scaling factor 128, dropout 0.05) with learning rate2×10 −5 and 3 epochs. The loss is computed over the full teacher response (both the reasoning trace and the final answer) while the prompt tokens are masked out. Watermark Detection.We evaluate watermark transfer using theopen-modelradioactivity test of Sander et al. (2024, 2025). The test ope...
work page 2024
-
[37]
andwent i =f(H i) is a function of the local entropyHi at positioni, estimated via a single forward pass of the student model. Thep-value is computed via the moment-matched Gamma approximation of Equation 6, which accounts for the heterogeneous weights. Concave normalized-entropy transforms outperform linear/superlinear alternatives because they moderatel...
work page 1995
-
[38]
Semantic watermarks (Liu et al., 2023; Liu and Bu, 2024; Hou et al.,
adaptively scales the green-red bias based on the natural green-list probability mass, reducing distortion in low-entropy contexts, but remains non-distortion-free since it still applies a logit bias. Semantic watermarks (Liu et al., 2023; Liu and Bu, 2024; Hou et al.,
work page 2023
-
[39]
Gumbel-max (Aaronson and Kirchner, 2023), Permute-and-Flip (Zhao et al., 2024), DiPMark (Wu et al.,
require auxiliary semantic encoders at generation time, making them harder to deploy. Gumbel-max (Aaronson and Kirchner, 2023), Permute-and-Flip (Zhao et al., 2024), DiPMark (Wu et al.,
work page 2023
-
[40]
Toolkits have also been introduced to benchmark these methods (Piet et al., 2023; Pan et al., 2024)
(multiple generations per query, impractical for production) are distortion-free. Toolkits have also been introduced to benchmark these methods (Piet et al., 2023; Pan et al., 2024). Recent large-scale evaluations (Fernandez et al.,
work page 2023
-
[41]
show that Gumbel-max and SynthID achieve the best detectability-quality Pareto frontier among all methods, strictly dominating DiPMark, green-red variants, and semantic watermarks. TextSealbuildsontheGumbel-maxframeworkbutintroducesdual-keygenerationfordiversity, entropy- weighted detection, and localized multi-region search—none of which are present in p...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.