Recognition: 2 theorem links
· Lean TheoremOnline LLM watermark detection via e-processes
Pith reviewed 2026-05-15 21:46 UTC · model grok-4.3
The pith
E-processes provide a unified framework for online detection of watermarks in LLM-generated text with anytime-valid guarantees.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We develop a unified framework for LLM watermark detection based on e-processes, providing anytime-valid guarantees for online testing. We propose various methods to construct empirically adaptive e-processes that can enhance the detection power. The proposed methods are applicable to any sequential testing problem where independent pivotal statistics are available. Theoretical results characterize the power properties of the proposed procedures.
What carries the argument
E-processes, sequences of nonnegative random variables that are supermartingales under the null of independence, used to construct anytime-valid sequential tests for watermark detection.
Load-bearing premise
Watermark schemes reliably induce dependence between generated tokens and a pseudo-random sequence, allowing reduction to an independence testing problem with available independent pivotal statistics.
What would settle it
An experiment showing that the constructed e-process fails to reject the null on a known watermarked text stream or produces invalid p-values when the pivotal statistics violate independence.
read the original abstract
Watermarking for large language models (LLMs) has emerged as an effective tool for distinguishing AI-generated text from human-written content. Statistically, watermark schemes induce dependence between generated tokens and a pseudo-random sequence, reducing watermark detection to a hypothesis testing problem on independence. We develop a unified framework for LLM watermark detection based on e-processes, providing anytime-valid guarantees for online testing. We propose various methods to construct empirically adaptive e-processes that can enhance the detection power. The proposed methods are applicable to any sequential testing problem where independent pivotal statistics are available. In addition, theoretical results are established to characterize the power properties of the proposed procedures. Some experiments demonstrate that the proposed framework achieves competitive performance compared to existing watermark detection methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a unified framework for LLM watermark detection based on e-processes, providing anytime-valid guarantees for online sequential testing. It reduces watermark detection to an independence testing problem under the assumption that watermark schemes induce dependence between tokens and a pseudo-random sequence, proposes methods for constructing empirically adaptive e-processes to improve detection power, establishes theoretical results on power properties, and demonstrates competitive experimental performance against existing methods. The framework is positioned as applicable to any sequential testing setting with independent pivotal statistics.
Significance. If the central reduction to independence testing holds and the adaptive e-process constructions deliver the claimed power gains while preserving anytime-validity, the work would provide a statistically rigorous online monitoring tool for AI-generated text, extending e-process theory to a timely application area. The emphasis on broad applicability and theoretical power characterizations strengthens its potential impact beyond LLM watermarking.
major comments (2)
- [§2 (Problem Setup and Reduction)] The central claim relies on the availability of independent pivotal statistics for e-process construction after the reduction to independence testing. The manuscript should explicitly verify this condition for standard watermark schemes (e.g., those inducing token-sequence dependence) with a concrete example or lemma showing that the resulting statistics remain independent and pivotal under the null.
- [§3 (Theoretical Results)] Theoretical power results are stated to characterize the procedures, but the growth rate of the adaptive e-processes under alternatives (relative to non-adaptive baselines) needs an explicit bound or comparison theorem to substantiate the claimed enhancement; without it, the adaptivity benefit remains qualitative.
minor comments (2)
- [§4 (Experiments)] In the experiments, report quantitative metrics such as empirical power at fixed type-I error levels and the number of tokens required for detection across multiple watermark schemes and text lengths to allow direct comparison with baselines.
- [§3 (Methods)] Clarify the precise definition of 'empirically adaptive' e-processes (e.g., how the adaptation is performed without using future data) in the methods section to avoid ambiguity for readers unfamiliar with e-process literature.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation and insightful comments. We address each major comment below and will incorporate the suggested changes in the revised manuscript.
read point-by-point responses
-
Referee: [§2 (Problem Setup and Reduction)] The central claim relies on the availability of independent pivotal statistics for e-process construction after the reduction to independence testing. The manuscript should explicitly verify this condition for standard watermark schemes (e.g., those inducing token-sequence dependence) with a concrete example or lemma showing that the resulting statistics remain independent and pivotal under the null.
Authors: We appreciate this suggestion. Upon review, we note that the reduction to independence testing is standard for watermark schemes that introduce dependence between tokens and the pseudo-random sequence. In the revised version, we will add a lemma in §2 with a concrete example for the Kirchenbauer watermark scheme, proving that the resulting test statistics are independent and pivotal under the null hypothesis of no watermark. revision: yes
-
Referee: [§3 (Theoretical Results)] Theoretical power results are stated to characterize the procedures, but the growth rate of the adaptive e-processes under alternatives (relative to non-adaptive baselines) needs an explicit bound or comparison theorem to substantiate the claimed enhancement; without it, the adaptivity benefit remains qualitative.
Authors: We agree that making the power enhancement explicit would be beneficial. We will include a new theorem in §3 that provides an explicit lower bound on the growth rate of the log of the adaptive e-process under alternatives, demonstrating a strict improvement over non-adaptive counterparts in terms of asymptotic power. revision: yes
Circularity Check
No significant circularity; framework extends established e-process theory
full rationale
The derivation chain reduces watermark detection to a standard independence testing problem under the explicit applicability condition that watermark schemes induce token-sequence dependence and independent pivotal statistics exist. This reduction is stated as a precondition rather than derived internally. The unified e-process framework and adaptive constructions are proposed as extensions of prior e-process literature, with power properties characterized theoretically in a separate step. No equations, fitted parameters, or self-citations are shown to force the central guarantees by construction; the adaptive methods enhance power within the given setting without redefining inputs as outputs. This is the most common honest non-finding for papers that apply established sequential testing tools to a new domain.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math E-processes provide anytime-valid p-values for sequential testing under standard martingale conditions
- domain assumption Watermark schemes induce dependence between tokens and pseudo-random sequence yielding pivotal statistics
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
reducing watermark detection to a hypothesis testing problem on independence... pivotal statistic Y=U_W... super-uniform under the alternative
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems
A wrapper for black-box generate-verify AI pipelines that uses a conservative hard-negative reference pool and e-processes to control the probability of releasing on infeasible tasks while permitting release on feasible ones.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.