arxiv: 2605.14283 · v1 · submitted 2026-05-14 · 💻 cs.GT · cs.AI· cs.CR

Recognition: no theorem link

Watermarking Game-Playing Agents in Perfect-Information Extensive-Form Games

Juho Kim , Fei Fang , Tuomas Sandholm

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:32 UTC · model grok-4.3

classification 💻 cs.GT cs.AIcs.CR

keywords watermarkingextensive-form gamesstrategy profilesstatistical detectionperfect informationchess enginesAI agentsexpected utility

0 comments

The pith

Game strategies in perfect-information extensive-form games can be watermarked for statistical detection while bounding expected utility loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper adapts the KGW watermarking method from language models to embed hidden patterns directly into strategy profiles of game-playing agents. This allows a statistical test to verify the origin of the agent from observed play. The central result establishes that any quality degradation, measured by expected utility, remains bounded even after the embedding step. Experiments on multiple chess engines show the quality impact stays negligible while detection succeeds from only a handful of games. The work addresses misuse risks such as unauthorized AI use on gaming platforms by providing a verifiable signature without destroying equilibrium play.

Core claim

The KGW watermark can be adapted to perfect-information extensive-form games by modifying strategy profiles to embed a detectable pattern; the resulting watermarked profile admits a statistical detection test, and the loss in expected utility relative to the original profile is bounded, although detectability and quality trade off against each other.

What carries the argument

The adapted KGW watermark pattern embedded into the action probabilities of a strategy profile at information sets, which preserves the extensive-form structure while enabling statistical verification of the source.

If this is right

The watermark remains detectable by a statistical test applied to sequences of observed moves.
Expected utility of the watermarked profile stays within a provable distance of the original profile's value.
Increasing the strength of the watermark improves detection probability at the direct expense of strategy quality.
The same procedure applied to chess engines produces negligible strength loss while allowing reliable identification after a few matches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Platforms could use the method to flag unauthorized deployment of a particular AI agent across multiple accounts.
The bounded-loss guarantee suggests the technique might serve as a building block for provenance tracking in other sequential decision domains.
Optimizing the watermark strength parameter could be studied as a separate design problem to shift the observed tradeoff curve.

Load-bearing premise

Embedding the watermark pattern into a strategy profile leaves the underlying information sets and equilibrium properties intact.

What would settle it

An empirical run in which the measured expected-utility drop exceeds the paper's derived bound, or in which the statistical test fails to flag the watermark after the claimed number of games, would disprove the main claims.

Figures

Figures reproduced from arXiv: 2605.14283 by Fei Fang, Juho Kim, Tuomas Sandholm.

**Figure 2.** Figure 2: z-scores for watermarked Stockfish 17.1. The left and right plots ablate on γ and δ, respectively. 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate 0.0 0.2 0.4 0.6 0.8 1.0 True positive rate ROC curves for watermark detection Engine =0.1, =0.5 =0.25, =0.5 =0.5, =0.5 =0.75, =0.5 =0.9, =0.5 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate 0.0 0.2 0.4 0.6 0.8 1.0 True positive rate ROC curves for watermark detection E… view at source ↗

read the original abstract

Watermarking techniques for large language models (LLMs), which encode hidden information in the output so its source can be verified, have gained significant attention in recent days, thanks to their potential capability to detect accidental or deliberate misuse. Similar challenges involving model misuse also exist in the context of game-playing, such as when detecting the unauthorized use of AI tools in gaming platforms (e.g., cheating in online chess). In this paper, we initiate the study of how game-playing strategies can be watermarked. We show how the KGW watermark for LLMs can be adapted to watermark game-playing agents in perfect-information extensive-form games. The watermark can then be detected using a statistical test. We show that the degradation in the quality of the watermarked strategy profile, quantified by the expected utility, can be bounded, but there is a tradeoff between detectability and quality. In our experiments, we bootstrap the watermarking framework to various chess engines and demonstrate that a) the impact of the watermark on the quality of the strategy is negligible and b) the watermark can be detected with just a handful of games.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They adapt KGW watermarking to behavioral strategies in perfect-info games, bound the expected-utility loss, and show it works on chess engines with tiny quality drop and easy detection.

read the letter

The core contribution is taking the KGW scheme from language models and porting it to strategy profiles in extensive-form games. At each information set they bias the action distribution toward a secret-key-dependent subset of moves, then recover the watermark from observed play via a statistical test. They derive a bound on the resulting drop in expected utility and run experiments that bootstrap the method into several chess engines. The experiments are the strongest part: quality loss stays negligible and detection succeeds after only a handful of games, which directly addresses the cheating-detection use case mentioned in the abstract. That practical demonstration is new and worth having on record. The utility bound is the softer spot. Because path probabilities are products of per-node choices, a fixed per-node bias can compound over the 40–60 move depths typical in chess. The abstract states that a bound exists, but without seeing an explicit contraction or martingale step that controls the multiplicative growth, it is not obvious the claimed tradeoff stays tight for realistic game lengths. The experiments suggest the authors chose small enough bias parameters to keep the effect small in practice, yet a reader would still want the formal argument tightened. This paper is mainly for people working on AI accountability, game-theoretic detection, or watermarking outside text. Anyone already thinking about verifiable AI play in competitive settings will find the adaptation and the chess results useful. I would send it to peer review; the idea is concrete, the experiments are reproducible, and the remaining theoretical gap is fixable rather than load-bearing.

Referee Report

1 major / 0 minor

Summary. The paper adapts the KGW watermarking technique from LLMs to behavioral strategies in perfect-information extensive-form games. It claims that the resulting watermarked strategy profile remains detectable via a statistical test, that the degradation in expected utility can be bounded (with an explicit tradeoff to detectability), and that experiments bootstrapping the method to chess engines show negligible quality loss while permitting reliable detection from only a handful of games.

Significance. If the utility bound is valid, the work is significant as the first systematic treatment of watermarking game-playing agents, with direct relevance to anti-cheating mechanisms in online gaming platforms. The chess-engine experiments supply concrete, reproducible evidence that the approach is practically viable at non-trivial game depths.

major comments (1)

[theoretical analysis / utility bound] Abstract and theoretical analysis: the claimed bound on expected-utility degradation is stated to exist, yet the derivation appears to rely on per-information-set (singleton) bias without an explicit contraction, martingale, or total-variation argument that controls the multiplicative accumulation of bias along paths of length d. For chess-scale depths this leaves the worst-case gap formally uncontrolled, directly undermining the asserted tradeoff between detectability and quality.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and constructive feedback on our manuscript. We address the major comment regarding the theoretical utility bound below.

read point-by-point responses

Referee: [theoretical analysis / utility bound] Abstract and theoretical analysis: the claimed bound on expected-utility degradation is stated to exist, yet the derivation appears to rely on per-information-set (singleton) bias without an explicit contraction, martingale, or total-variation argument that controls the multiplicative accumulation of bias along paths of length d. For chess-scale depths this leaves the worst-case gap formally uncontrolled, directly undermining the asserted tradeoff between detectability and quality.

Authors: We appreciate the referee's observation on the need for a more explicit control on bias accumulation. The manuscript derives the bound from the per-information-set bias (at most ε in total variation) and the linearity of expectation over the game tree, yielding a degradation of O(dε) for depth d. However, we agree that an explicit martingale or contraction argument would strengthen the formal presentation. We will revise the theoretical analysis section to include a telescoping argument on conditional expectations along paths, showing that the total variation between the original and watermarked path distributions is bounded by dε (rather than exponentially), which directly controls the worst-case utility gap. This makes the detectability-quality tradeoff fully rigorous and explicit. The chess experiments already demonstrate that small ε suffices for negligible practical degradation. revision: yes

Circularity Check

0 steps flagged

No circularity: adaptation plus independent bound analysis

full rationale

The paper adapts the external KGW LLM watermark to perfect-information extensive-form games and derives a new expected-utility bound on degradation. No equations reduce by construction to author-fitted parameters, no self-definitional steps appear, and the central tradeoff claim rests on standard game-theoretic and statistical arguments rather than self-citation chains or renamed known results. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on the assumption that game strategies admit probabilistic modifications analogous to token sampling in language models, plus standard game-theoretic definitions of perfect-information extensive-form games.

free parameters (1)

watermark strength parameter
Controls the tradeoff between detection power and utility degradation; its value is chosen to achieve the reported negligible impact.

axioms (1)

domain assumption The underlying game is a perfect-information extensive-form game
Explicitly stated in the title and abstract as the setting in which the watermark is applied.

pith-pipeline@v0.9.0 · 5497 in / 1188 out tokens · 29009 ms · 2026-05-15T02:32:12.877885+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

Brown and T

N. Brown and T. Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.Science, 359(6374):418–424, 2018

work page 2018
[2]

Brown and T

N. Brown and T. Sandholm. Superhuman AI for multiplayer poker.Science, 365(6456):885–890, 2019

work page 2019
[3]

Campbell, A

M. Campbell, A. J. Hoane, Jr., and F. Hsu. Deep Blue.Artificial Intelligence, 134(1):57–83, 2002

work page 2002
[4]

Chang and I

C.-C. Chang and I. Echizen. Steganography in game actions.IEEE Access, 13:21029–21042, 2025

work page 2025
[5]

Chang, K

Y . Chang, K. Krishna, A. Houmansadr, J. F. Wieting, and M. Iyyer. PostMark: A robust blackbox watermark for large language models. InProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

work page 2024
[6]

Chappell

B. Chappell. Hans Niemann is accused of cheating in more than 100 chess games. He’s playing today.National Public Radio (NPR), 2022

work page 2022
[7]

Dathathri, A

S. Dathathri, A. See, S. Ghaisas, P.-S. Huang, R. McAdam, J. Welbl, V . Bachani, A. Kaskasoli, R. Stanforth, T. Matejovicova, J. Hayes, N. Vyas, M. A. Merey, J. Brown-Cohen, R. Bunel, B. Balle, T. Cemgil, Z. Ahmed, K. Stacpoole, I. Shumailov, C. Baetu, S. Gowal, D. Hassabis, and P. Kohli. Scalable watermarking for identifying large language model outputs....

work page 2024
[8]

C. Grant. Poker pro axed from sponsorship after using solver while playing online.PokerNews, 2026

work page 2026
[9]

J. C. Hernandez-Castro, I. Blasco-Lopez, J. M. Estevez-Tapiador, and A. Ribagorda-Garnacho. Steganography in games: A general methodology and its application to the game of go.Com- puters & Security, 25(1):64–71, 2006

work page 2006
[10]

Hoeffding

W. Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963

work page 1963
[11]

M. Huo, S. A. Somayajula, Y . Liang, R. Zhang, F. Koushanfar, and P. Xie. Token-specific watermarking with enhanced detectability and semantic coherence for large language models. InProceedings of the International Conference on Machine Learning (ICML), 2024

work page 2024
[12]

Kirchenbauer, J

J. Kirchenbauer, J. Geiping, Y . Wen, J. Katz, I. Miers, and T. Goldstein. A watermark for large language models. InProceedings of the International Conference on Machine Learning (ICML), 2023. 10

work page 2023
[13]

Mandujano, J

R. Mandujano, J. Gutierrez-Cardenas, and M. S. Monge. Steganography application using combination of movements in a 2d video game platform. In K. Arai, S. Kapoor, and R. Bhatia, editors,Proceedings of the Future Technologies Conference (FTC), 2020

work page 2020
[14]

B. Qiao, K. Li, W. Zhou, S. Li, Q. Lu, and S. Hu. BotSim: LLM-powered malicious social botnet simulation. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025

work page 2025
[15]

Sander, P

T. Sander, P. Fernandez, A. O. Durmus, M. Douze, and T. Furon. Watermarking makes language models radioactive. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[16]

J. Schultz. Cybercriminal abuse of large language models.Cisco Talos, 2025

work page 2025
[17]

Shumailov, Z

I. Shumailov, Z. Shumaylov, Y . Zhao, N. Papernot, R. Anderson, and Y . Gal. AI models collapse when trained on recursively generated data.Nature, 631(8022):755–759, 2024

work page 2024
[18]

Silver, T

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419): 1140–1144, 2018

work page 2018
[19]

Takezawa, R

Y . Takezawa, R. Sato, H. Bao, K. Niwa, and M. Yamada. Necessary and sufficient watermark for large language models.Transactions on Machine Learning Research, 2025

work page 2025
[20]

Large language models and misinformation.The Lancet Digital Health, 8(1):1, 2026

The Lancet Digital Health. Large language models and misinformation.The Lancet Digital Health, 8(1):1, 2026

work page 2026
[21]

Our lawsuit against ChessBase, 2021

The Stockfish Team. Our lawsuit against ChessBase, 2021

work page 2021
[22]

C. Wang, J. Shu, B. Chiu, Y . LI, S. Alharbi, M. Zhang, and J. Li. Learning to watermark: A selective watermarking framework for large language models via multi-objective optimization. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[23]

B. H. Zhang and T. Sandholm. General search techniques without common knowledge for imperfect-information games, and application to superhuman Fog of War chess. InProceedings of the International Conference on Learning Representations (ICLR), 2026

work page 2026
[24]

X. Zhao, P. V . Ananth, L. Li, and Y .-X. Wang. Provable robust watermarking for AI-generated text. InProceedings of the International Conference on Learning Representations (ICLR), 2024. 11 A Pseudocode of the KGW watermark The pseudocode of the KGW watermark is shown in Algorithm 3. A textual explanation of the pseudocode is available in Section 2.2. Al...

work page 2024