arxiv: 2605.12456 · v1 · submitted 2026-05-12 · 💻 cs.CR · cs.CL· cs.LG

Recognition: 2 theorem links

· Lean Theorem

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

Alexandre Mourachko, Christophe Ropers, Hady Elsahar, Hongyan Chang, Pierre Fernandez, Rashel Moritz, Surya Parimi, Sylvestre-Alvise Rebuffi, Tom\'a\v{s} Sou\v{c}ek, Tom Sander, Tuan Tran, Valeriu Lacatusu, Vanessa Stark

Pith reviewed 2026-05-13 03:34 UTC · model grok-4.3

classification 💻 cs.CR cs.CLcs.LG

keywords LLM watermarkingprovenance detectionmodel distillationGumbel-max samplinglocalized detectionAI content identification

0 comments

The pith

TextSeal adds a detectable watermark to LLM outputs that stays visible even after mixing with human text or distillation into new models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TextSeal as a watermarking approach for large language models that uses Gumbel-max sampling as its base. It adds dual-key generation to maintain output variety, entropy-weighted scoring to focus detection, and multi-region localization to pinpoint AI sections within longer texts. The scheme runs with no extra inference cost and supports techniques like speculative decoding. A sympathetic reader would care because reliable provenance tracking could help identify AI content in blended documents and flag unauthorized reuse of model outputs through distillation. Evaluation shows stronger detection than prior methods, no measurable quality loss in benchmarks or human ratings, and signal persistence across distillation.

Core claim

TextSeal builds dual-key generation into Gumbel-max sampling to restore diversity while adding entropy-weighted scoring and multi-region localization for detection. This produces a theoretically distortion-free watermark with no inference overhead that supports serving optimizations. It outperforms baselines like SynthID-text in detection strength, remains effective under heavy dilution with human text, preserves reasoning performance, and shows no quality difference in multilingual human evaluations. The watermark signal also transfers through model distillation, allowing detection of unauthorized downstream use.

What carries the argument

Dual-key generation on Gumbel-max sampling combined with entropy-weighted scoring and multi-region localization, which enables localized detection without changing the output distribution.

If this is right

Detection stays confident even when AI text is heavily mixed with human writing.
No added cost at inference time and compatibility with speculative decoding and multi-token prediction.
Preservation of downstream task performance on reasoning benchmarks.
Signal transfer enables detection after unauthorized distillation of the model.
No perceptible quality difference shown across 6000 A/B comparisons in five languages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support provenance tracking in deployed LLM services by tagging training data origins.
If the signal survives multiple rounds of distillation, it might allow tracing of model lineages over time.
It could be adapted to other generative settings where output mixing and reuse are concerns.
Production use would require deciding how to handle watermark presence in open model releases.

Load-bearing premise

The watermark signal transfers through model distillation with enough strength for reliable detection and that the lack of quality impact holds under all practical serving and adversarial conditions.

What would settle it

A test showing that text from models distilled on TextSeal-watermarked outputs yields detection scores no higher than random chance, or a human preference study where watermarked text is rated lower than non-watermarked text.

read the original abstract

We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as speculative decoding and multi-token prediction, and does not add any inference overhead. TextSeal strictly dominates baselines like SynthID-text in detection strength and is robust to dilution, maintaining confident localized detection even in heavily mixed human/AI documents. The scheme is theoretically distortion-free, and evaluation across reasoning benchmarks confirms that it preserves downstream performance; while a multilingual human evaluation (6000 A/B comparisons, 5 languages) shows no perceptible quality difference. Beyond its use for provenance detection, TextSeal is also ``radioactive'': its watermark signal transfers through model distillation, enabling detection of unauthorized use.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TextSeal adds localization and dual-key tweaks to Gumbel-max watermarking plus a distillation-transfer claim, but the abstract gives no numbers or proofs to back the strong assertions.

read the letter

The main takeaway is that this paper pushes LLM watermarking toward practical use by localizing the signal and claiming it survives distillation, which could matter for provenance tracking and catching copied models. The dual-key generation to preserve diversity and the entropy-weighted multi-region scoring are the concrete extensions beyond basic Gumbel-max approaches like SynthID-text. It also claims zero inference overhead and compatibility with speculative decoding, which shows attention to real serving constraints. The multilingual human study with 6000 A/B tests is the kind of check that actually tests perceptible quality impact, and the benchmark results on reasoning tasks suggest they tried to measure downstream effects directly. Those pieces are useful even if the rest needs work. The soft spots sit mostly in the unshown evidence. The abstract asserts strict dominance, robustness to heavy mixing, theoretical distortion-freeness, and reliable transfer through distillation, yet supplies no AUC numbers, no post-distillation detection rates, no ablation on temperature or data mixing, and no proof sketches. Distillation can easily dilute subtle sampling biases, so the radioactivity claim is the part that most needs concrete data rather than assertion. Without the methods section visible here, it is also impossible to check whether the scoring reduces to something circular or parameter-heavy. This work is aimed at people building or regulating LLM deployment pipelines who need provenance tools that do not break serving speed. A reader already working on watermarking or model security will find the localization and distillation angle worth examining, even if they end up re-running the experiments themselves. It deserves a serious referee because the problem is high-stakes and the practical extensions are real, though any review would have to insist on full metrics and ablation tables before acceptance.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces TextSeal, a localized LLM watermark based on Gumbel-max sampling augmented with dual-key generation to restore diversity, entropy-weighted scoring for detection, and multi-region localization. It claims strict dominance over baselines such as SynthID-text in detection strength, robustness to dilution in mixed human/AI text, theoretical distortion-freeness with no inference overhead or quality impact (supported by reasoning benchmarks and a 6000-comparison multilingual human study across 5 languages), compatibility with speculative decoding and multi-token prediction, and a 'radioactive' property whereby the watermark signal transfers through model distillation to enable detection of unauthorized use.

Significance. If substantiated, the work would advance LLM provenance and IP protection by adding practical localization and the novel distillation-transfer capability while preserving serving efficiency and output quality. The scale of the human evaluation (6000 A/B tests) is a concrete strength that lends credibility to the no-quality-impact claim.

major comments (3)

[Abstract and §4] Abstract and §4 (Evaluation): The claim that TextSeal 'strictly dominates' SynthID-text and maintains 'confident localized detection' under heavy dilution is asserted without any reported metrics (AUC, FPR, precision-recall curves, or statistical significance tests) or tables; this is load-bearing for the dominance and robustness assertions.
[§3] §3 (Method) and theoretical analysis: The 'theoretically distortion-free' property is stated without a derivation, proof sketch, or explicit argument showing that dual-key Gumbel-max plus entropy weighting leaves the output distribution unchanged; the central quality-preservation claim rests on this unshown step.
[§5] §5 (Distillation experiments): The 'radioactive' claim that the watermark signal transfers through distillation is central to the provenance-protection use case, yet no quantitative results (post-distillation detection AUC, number of epochs/temperature/mixing ratios, or comparison to non-watermarked controls) are supplied; distillation is known to dilute sampling biases, making this the weakest link.

minor comments (3)

[§3] Clarify the exact definition and generation procedure for the dual keys in the methods section to avoid ambiguity with prior Gumbel-max watermarking.
[§4] Add error bars or confidence intervals to all reported benchmark and human-study results.
[Figures] Ensure figures showing localization and dilution robustness include clear legends and axis labels.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the requested quantitative metrics, theoretical derivations, and experimental results, thereby strengthening the support for our claims.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Evaluation): The claim that TextSeal 'strictly dominates' SynthID-text and maintains 'confident localized detection' under heavy dilution is asserted without any reported metrics (AUC, FPR, precision-recall curves, or statistical significance tests) or tables; this is load-bearing for the dominance and robustness assertions.

Authors: We acknowledge that the manuscript currently supports the dominance and robustness claims primarily through reasoning benchmark results and the large-scale human study rather than explicit detection performance metrics. In the revised version, we will expand §4 with new tables and figures that report AUC, FPR at fixed thresholds, precision-recall curves, and statistical significance tests (including p-values from paired comparisons) for TextSeal versus SynthID-text across dilution ratios (10–90% AI-generated content). These additions will provide the quantitative backing for the strict dominance and confident localized detection assertions. revision: yes
Referee: [§3] §3 (Method) and theoretical analysis: The 'theoretically distortion-free' property is stated without a derivation, proof sketch, or explicit argument showing that dual-key Gumbel-max plus entropy weighting leaves the output distribution unchanged; the central quality-preservation claim rests on this unshown step.

Authors: We agree that a formal argument is required to substantiate the distortion-free claim. We will add a proof sketch to §3 showing that the dual-key Gumbel-max procedure, when paired with entropy-weighted scoring at detection time only, leaves the token sampling distribution identical to the original model. The keys are chosen independently of token logits, and the entropy weighting affects only the detector, not generation, ensuring equivalence in distribution and zero inference overhead. revision: yes
Referee: [§5] §5 (Distillation experiments): The 'radioactive' claim that the watermark signal transfers through distillation is central to the provenance-protection use case, yet no quantitative results (post-distillation detection AUC, number of epochs/temperature/mixing ratios, or comparison to non-watermarked controls) are supplied; distillation is known to dilute sampling biases, making this the weakest link.

Authors: We recognize that the distillation-transfer property requires stronger empirical grounding. In the revised §5 we will include quantitative results from our distillation experiments, reporting post-distillation detection AUC, the exact distillation hyperparameters (epochs, temperature, mixing ratios with human text), and side-by-side comparisons against non-watermarked control models. We will also discuss the degree of signal retention in light of known dilution effects during distillation. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rely on empirical evaluation without self-referential derivations

full rationale

The provided abstract and text describe TextSeal's construction via Gumbel-max sampling with added dual-key, entropy-weighted, and localization features, plus claims of theoretical distortion-freeness and distillation radioactivity. No equations, derivations, or load-bearing steps are shown that reduce by construction to fitted inputs, self-definitions, or self-citations. Central assertions (dominance over baselines, no quality impact, signal transfer) are presented as outcomes of benchmarks and human evaluations rather than tautological renamings or imported uniqueness theorems. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; claims rely on unspecified modifications to Gumbel-max sampling and unstated assumptions about watermark transfer in distillation.

pith-pipeline@v0.9.0 · 5505 in / 1133 out tokens · 94112 ms · 2026-05-13T03:34:38.060163+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
TextSeal builds upon the Gumbel-max framework and introduces three core improvements: Dual-Key Generation... Entropy-Weighted Detection... Localized Detection
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
the scheme is theoretically distortion-free... its watermark signal transfers through model distillation

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 2 internal anchors

[1]

Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (AI Act),

work page 2024
[2]

Second draft published March 2026; enforcement of Article 50 obligations begins August 2,

work page 2026
[3]

Threebrickstoconsolidate watermarks for large language models.2023 IEEE International Workshop on Information Forensics and Security (WIFS),

PierreFernandez, AntoineChaffin, KarimTit, VivienChappelier, andTeddyFuron. Threebrickstoconsolidate watermarks for large language models.2023 IEEE International Workshop on Information Forensics and Security (WIFS),

work page 2023
[4]

How good is post-hoc watermarking with language model rephrasing?arXiv preprint arXiv:2512.16904,

Pierre Fernandez, Tom Sander, Hady Elsahar, Hongyan Chang, Tomáš Souček, Valeriu Lacatusu, Tuan Tran, Sylvestre-Alvise Rebuffi, and Alexandre Mourachko. How good is post-hoc watermarking with language model rephrasing?arXiv preprint arXiv:2512.16904,

work page arXiv
[5]

Watermax: breaking the llm watermark detectability-robustness-quality trade-off.arXiv preprint arXiv:2403.04808,

Eva Giboulot and Teddy Furon. Watermax: breaking the llm watermark detectability-robustness-quality trade-off.arXiv preprint arXiv:2403.04808,

work page arXiv
[6]

Gloeckle, B

Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, and Gabriel Synnaeve. Better & faster large language models via multi-token prediction.arXiv preprint arXiv:2404.19737,

work page arXiv
[7]

On the learnability of watermarks for language models.arXiv preprint arXiv:2312.04469,

Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto. On the learnability of watermarks for language models.arXiv preprint arXiv:2312.04469,

work page arXiv
[8]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

The Curious Case of Neural Text Degeneration

17 Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751,

work page internal anchor Pith review Pith/arXiv arXiv 1904
[10]

Semstamp: A semantic watermark with paraphrastic robustness for text generation.arXiv preprint arXiv:2310.03991,

Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. Semstamp: A semantic watermark with paraphrastic robustness for text generation.arXiv preprint arXiv:2310.03991,

work page arXiv
[11]

k-semstamp: A clustering- based semantic watermark for detection of machine-generated text.arXiv preprint arXiv:2402.11399,

Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. k-semstamp: A clustering- based semantic watermark for detection of machine-generated text.arXiv preprint arXiv:2402.11399,

work page arXiv
[12]

A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a. John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein. On the reliability of watermarks for...

work page arXiv
[13]

Waterfall: Framework for robust and scalable text watermarking

Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, and Bryan Kian Hsiang Low. Waterfall: Framework for robust and scalable text watermarking. InICML 2024 Workshop on Foun- dation Models in the Wild,

work page 2024
[14]

Who wrote this code? watermarking for code generation.arXiv preprint arXiv:2305.15060,

Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim. Who wrote this code? watermarking for code generation.arXiv preprint arXiv:2305.15060,

work page arXiv
[15]

A semantic invariant robust watermark for large language models.arXiv preprint arXiv:2310.06356,

Aiwei Liu, Leyi Pan, Xuming Hu, Shiao Meng, and Lijie Wen. A semantic invariant robust watermark for large language models.arXiv preprint arXiv:2310.06356,

work page arXiv
[16]

Adaptive text watermark for large language models.arXiv preprint arXiv:2401.13927,

Yepeng Liu and Yuheng Bu. Adaptive text watermark for large language models.arXiv preprint arXiv:2401.13927,

work page arXiv
[17]

s1: Simple test-time scaling

Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, and Tatsunori B Hashimoto. s1: Simple test-time scaling. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 20286–20332,

work page 2025
[18]

Markllm: An open-source toolkit for llm watermarking.arXiv preprint arXiv:2405.10051,

Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, et al. Markllm: An open-source toolkit for llm watermarking.arXiv preprint arXiv:2405.10051,

work page arXiv
[19]

Mark my words: Analyzing and evaluating language model watermarks.arXiv preprint arXiv:2312.00273,

Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, and David Wagner. Mark my words: Analyzing and evaluating language model watermarks.arXiv preprint arXiv:2312.00273,

work page arXiv
[20]

Provably robust multi-bit watermarking for ai-generated text via error correction code.arXiv preprint arXiv:2401.16820,

18 Wenjie Qu, Dong Yin, Zixin He, Wei Zou, Tianyang Tao, Jinyuan Jia, and Jiaheng Zhang. Provably robust multi-bit watermarking for ai-generated text via error correction code.arXiv preprint arXiv:2401.16820,

work page arXiv
[21]

Detecting benchmark contamination through watermarking.arXiv preprint arXiv:2502.17259,

Tom Sander, Pierre Fernandez, Saeed Mahloujifar, Alain Durmus, and Chuan Guo. Detecting benchmark contamination through watermarking.arXiv preprint arXiv:2502.17259,

work page arXiv
[22]

Qwen2.5 technical report.arXiv preprint arXiv:2409.12117,

Qwen Team. Qwen2.5 technical report.arXiv preprint arXiv:2409.12117,

work page arXiv
[23]

Natural language watermark- ing: Challenges in building a practical system

Mercan Topkara, Giuseppe Riccardi, Dilek Hakkani-Tür, and Mikhail J Atallah. Natural language watermark- ing: Challenges in building a practical system. InSecurity, Steganography, and Watermarking of Multimedia Contents VIII, pages 106–117. SPIE, 2006a. Mercan Topkara, Umut Topkara, and Mikhail J Atallah. Words are not enough: sentence level natural langu...

work page arXiv
[24]

Watermarking the outputs of structured prediction with an application in statistical machine translation

Ashish Venugopal, Jakob Uszkoreit, David Talbot, Franz Josef Och, and Juri Ganitkevitch. Watermarking the outputs of structured prediction with an application in statistical machine translation. InProceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1363–1372,

work page 2011
[25]

Morphmark: Flexible adaptive watermarking for large language models.arXiv preprint arXiv:2505.11541,

Zongqi Wang, Tianle Gu, Baoyuan Wu, and Yujiu Yang. Morphmark: Flexible adaptive watermarking for large language models.arXiv preprint arXiv:2505.11541,

work page arXiv
[26]

Dipmark: A stealthy, efficient and resilient watermark for large language models.arXiv preprint arXiv:2310.07710,

Yihan Wu, Zhengmian Hu, Hongyang Zhang, and Heng Huang. Dipmark: A stealthy, efficient and resilient watermark for large language models.arXiv preprint arXiv:2310.07710,

work page arXiv
[27]

Robust multi-bit text watermark with llm-based paraphrasers.arXiv preprint arXiv:2412.03123,

Xiaojun Xu, Jinghan Jia, Yuanshun Yao, Yang Liu, and Hang Li. Robust multi-bit text watermark with llm-based paraphrasers.arXiv preprint arXiv:2412.03123,

work page arXiv
[28]

Robust multi-bit natural language watermarking through invariant features.arXiv preprint arXiv:2305.01904,

KiYoon Yoo, Wonhyuk Ahn, Jiho Jang, and Nojun Kwak. Robust multi-bit natural language watermarking through invariant features.arXiv preprint arXiv:2305.01904,

work page arXiv
[29]

Advancing beyond identification: Multi-bit watermark for large language models

KiYoon Yoo, Wonhyuk Ahn, and Nojun Kwak. Advancing beyond identification: Multi-bit watermark for large language models. InProceedings of the 2024 Conference of the North American Chapter of the Association 19 for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4031–4055,

work page 2024
[30]

Leave no trace: Black- box detection of copyrighted dataset usage in large language models via watermarking.arXiv preprint arXiv:2510.02962,

Jingqi Zhang, Ruibo Chen, Yingqing Yang, Peihua Mai, Heng Huang, and Yan Pang. Leave no trace: Black- box detection of copyrighted dataset usage in large language models via watermarking.arXiv preprint arXiv:2510.02962,

work page arXiv
[31]

Permute-and-flip: An optimally robust and watermarkable decoder for llms.arXiv preprint arXiv:2402.05864,

Xuandong Zhao, Lei Li, and Yu-Xiang Wang. Permute-and-flip: An optimally robust and watermarkable decoder for llms.arXiv preprint arXiv:2402.05864,

work page arXiv
[32]

, wk)ofktoken IDs, and the secret keyK(all of them are integers), and outputs a random integer in[0, M)

20 Appendix A More Technical Details on the Methods A.1 Hash Function Implementation The PRF takes as input the candidate tokenx, a context windoww= (w1, . . . , wk)ofktoken IDs, and the secret keyK(all of them are integers), and outputs a random integer in[0, M). We compute the hash as follows: h′(x,w, K) = p2 ·x+ kX i=1 wi ·q i +p 3 ·K ! ·p 4,(8) h(x,w,...

work page 2024
[33]

Using a Gaussian tail approximation, the logp-value of a Z-score islnp≈ −1 2 Z2

Letδ= (µ w −µ 0)/σbe the per-token signal-to-noise ratio. Using a Gaussian tail approximation, the logp-value of a Z-score islnp≈ −1 2 Z2. We define∆2 =δ 2/2 as the expected logp-value accumulation rate per watermarked token. Power of the Global Test.The global test evaluates allntokens. The expected Z-score is: Zglobal = ρnσδ σ√n =ρδ √n=⇒E[lnp global]≈ −...

work page 2025
[34]

Tie.” For the final analysis, “Both Good,

to assess whether wa- termarking systematically affects script consistency or refusal rates. For script consistency, we observe 52 discordant pairs where WM was wrong but Non-WM was correct, versus 39 where Non-WM was wrong but WM was correct; with continuity correction, this yieldsχ2 = 1.58andp= 0.21. For refusal rates, we find 21 pairs where WM refused ...

work page 1987
[35]

Secret keys are calibrated per method via a Kolmogorov–Smirnov test to ensure uniform PRF hashes on unwatermarked text as done in Fernandez et al. (2025). The teacher generates 5,000 solutions using vLLM (Kwon et al.,

work page 2025
[36]

The loss is computed over the full teacher response (both the reasoning trace and the final answer) while the prompt tokens are masked out

(rank 128, scaling factor 128, dropout 0.05) with learning rate2×10 −5 and 3 epochs. The loss is computed over the full teacher response (both the reasoning trace and the final answer) while the prompt tokens are masked out. Watermark Detection.We evaluate watermark transfer using theopen-modelradioactivity test of Sander et al. (2024, 2025). The test ope...

work page 2024
[37]

green-red list

andwent i =f(H i) is a function of the local entropyHi at positioni, estimated via a single forward pass of the student model. Thep-value is computed via the moment-matched Gamma approximation of Equation 6, which accounts for the heterogeneous weights. Concave normalized-entropy transforms outperform linear/superlinear alternatives because they moderatel...

work page 1995
[38]

Semantic watermarks (Liu et al., 2023; Liu and Bu, 2024; Hou et al.,

adaptively scales the green-red bias based on the natural green-list probability mass, reducing distortion in low-entropy contexts, but remains non-distortion-free since it still applies a logit bias. Semantic watermarks (Liu et al., 2023; Liu and Bu, 2024; Hou et al.,

work page 2023
[39]

Gumbel-max (Aaronson and Kirchner, 2023), Permute-and-Flip (Zhao et al., 2024), DiPMark (Wu et al.,

require auxiliary semantic encoders at generation time, making them harder to deploy. Gumbel-max (Aaronson and Kirchner, 2023), Permute-and-Flip (Zhao et al., 2024), DiPMark (Wu et al.,

work page 2023
[40]

Toolkits have also been introduced to benchmark these methods (Piet et al., 2023; Pan et al., 2024)

(multiple generations per query, impractical for production) are distortion-free. Toolkits have also been introduced to benchmark these methods (Piet et al., 2023; Pan et al., 2024). Recent large-scale evaluations (Fernandez et al.,

work page 2023
[41]

show that Gumbel-max and SynthID achieve the best detectability-quality Pareto frontier among all methods, strictly dominating DiPMark, green-red variants, and semantic watermarks. TextSealbuildsontheGumbel-maxframeworkbutintroducesdual-keygenerationfordiversity, entropy- weighted detection, and localized multi-region search—none of which are present in p...

work page 2024