Recognition: no theorem link
Watermarking Game-Playing Agents in Perfect-Information Extensive-Form Games
Pith reviewed 2026-05-15 02:32 UTC · model grok-4.3
The pith
Game strategies in perfect-information extensive-form games can be watermarked for statistical detection while bounding expected utility loss.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The KGW watermark can be adapted to perfect-information extensive-form games by modifying strategy profiles to embed a detectable pattern; the resulting watermarked profile admits a statistical detection test, and the loss in expected utility relative to the original profile is bounded, although detectability and quality trade off against each other.
What carries the argument
The adapted KGW watermark pattern embedded into the action probabilities of a strategy profile at information sets, which preserves the extensive-form structure while enabling statistical verification of the source.
If this is right
- The watermark remains detectable by a statistical test applied to sequences of observed moves.
- Expected utility of the watermarked profile stays within a provable distance of the original profile's value.
- Increasing the strength of the watermark improves detection probability at the direct expense of strategy quality.
- The same procedure applied to chess engines produces negligible strength loss while allowing reliable identification after a few matches.
Where Pith is reading between the lines
- Platforms could use the method to flag unauthorized deployment of a particular AI agent across multiple accounts.
- The bounded-loss guarantee suggests the technique might serve as a building block for provenance tracking in other sequential decision domains.
- Optimizing the watermark strength parameter could be studied as a separate design problem to shift the observed tradeoff curve.
Load-bearing premise
Embedding the watermark pattern into a strategy profile leaves the underlying information sets and equilibrium properties intact.
What would settle it
An empirical run in which the measured expected-utility drop exceeds the paper's derived bound, or in which the statistical test fails to flag the watermark after the claimed number of games, would disprove the main claims.
Figures
read the original abstract
Watermarking techniques for large language models (LLMs), which encode hidden information in the output so its source can be verified, have gained significant attention in recent days, thanks to their potential capability to detect accidental or deliberate misuse. Similar challenges involving model misuse also exist in the context of game-playing, such as when detecting the unauthorized use of AI tools in gaming platforms (e.g., cheating in online chess). In this paper, we initiate the study of how game-playing strategies can be watermarked. We show how the KGW watermark for LLMs can be adapted to watermark game-playing agents in perfect-information extensive-form games. The watermark can then be detected using a statistical test. We show that the degradation in the quality of the watermarked strategy profile, quantified by the expected utility, can be bounded, but there is a tradeoff between detectability and quality. In our experiments, we bootstrap the watermarking framework to various chess engines and demonstrate that a) the impact of the watermark on the quality of the strategy is negligible and b) the watermark can be detected with just a handful of games.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper adapts the KGW watermarking technique from LLMs to behavioral strategies in perfect-information extensive-form games. It claims that the resulting watermarked strategy profile remains detectable via a statistical test, that the degradation in expected utility can be bounded (with an explicit tradeoff to detectability), and that experiments bootstrapping the method to chess engines show negligible quality loss while permitting reliable detection from only a handful of games.
Significance. If the utility bound is valid, the work is significant as the first systematic treatment of watermarking game-playing agents, with direct relevance to anti-cheating mechanisms in online gaming platforms. The chess-engine experiments supply concrete, reproducible evidence that the approach is practically viable at non-trivial game depths.
major comments (1)
- [theoretical analysis / utility bound] Abstract and theoretical analysis: the claimed bound on expected-utility degradation is stated to exist, yet the derivation appears to rely on per-information-set (singleton) bias without an explicit contraction, martingale, or total-variation argument that controls the multiplicative accumulation of bias along paths of length d. For chess-scale depths this leaves the worst-case gap formally uncontrolled, directly undermining the asserted tradeoff between detectability and quality.
Simulated Author's Rebuttal
We thank the referee for their careful review and constructive feedback on our manuscript. We address the major comment regarding the theoretical utility bound below.
read point-by-point responses
-
Referee: [theoretical analysis / utility bound] Abstract and theoretical analysis: the claimed bound on expected-utility degradation is stated to exist, yet the derivation appears to rely on per-information-set (singleton) bias without an explicit contraction, martingale, or total-variation argument that controls the multiplicative accumulation of bias along paths of length d. For chess-scale depths this leaves the worst-case gap formally uncontrolled, directly undermining the asserted tradeoff between detectability and quality.
Authors: We appreciate the referee's observation on the need for a more explicit control on bias accumulation. The manuscript derives the bound from the per-information-set bias (at most ε in total variation) and the linearity of expectation over the game tree, yielding a degradation of O(dε) for depth d. However, we agree that an explicit martingale or contraction argument would strengthen the formal presentation. We will revise the theoretical analysis section to include a telescoping argument on conditional expectations along paths, showing that the total variation between the original and watermarked path distributions is bounded by dε (rather than exponentially), which directly controls the worst-case utility gap. This makes the detectability-quality tradeoff fully rigorous and explicit. The chess experiments already demonstrate that small ε suffices for negligible practical degradation. revision: yes
Circularity Check
No circularity: adaptation plus independent bound analysis
full rationale
The paper adapts the external KGW LLM watermark to perfect-information extensive-form games and derives a new expected-utility bound on degradation. No equations reduce by construction to author-fitted parameters, no self-definitional steps appear, and the central tradeoff claim rests on standard game-theoretic and statistical arguments rather than self-citation chains or renamed known results. The derivation is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- watermark strength parameter
axioms (1)
- domain assumption The underlying game is a perfect-information extensive-form game
Reference graph
Works this paper leans on
-
[1]
N. Brown and T. Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.Science, 359(6374):418–424, 2018
work page 2018
-
[2]
N. Brown and T. Sandholm. Superhuman AI for multiplayer poker.Science, 365(6456):885–890, 2019
work page 2019
-
[3]
M. Campbell, A. J. Hoane, Jr., and F. Hsu. Deep Blue.Artificial Intelligence, 134(1):57–83, 2002
work page 2002
-
[4]
C.-C. Chang and I. Echizen. Steganography in game actions.IEEE Access, 13:21029–21042, 2025
work page 2025
- [5]
- [6]
-
[7]
S. Dathathri, A. See, S. Ghaisas, P.-S. Huang, R. McAdam, J. Welbl, V . Bachani, A. Kaskasoli, R. Stanforth, T. Matejovicova, J. Hayes, N. Vyas, M. A. Merey, J. Brown-Cohen, R. Bunel, B. Balle, T. Cemgil, Z. Ahmed, K. Stacpoole, I. Shumailov, C. Baetu, S. Gowal, D. Hassabis, and P. Kohli. Scalable watermarking for identifying large language model outputs....
work page 2024
-
[8]
C. Grant. Poker pro axed from sponsorship after using solver while playing online.PokerNews, 2026
work page 2026
-
[9]
J. C. Hernandez-Castro, I. Blasco-Lopez, J. M. Estevez-Tapiador, and A. Ribagorda-Garnacho. Steganography in games: A general methodology and its application to the game of go.Com- puters & Security, 25(1):64–71, 2006
work page 2006
- [10]
-
[11]
M. Huo, S. A. Somayajula, Y . Liang, R. Zhang, F. Koushanfar, and P. Xie. Token-specific watermarking with enhanced detectability and semantic coherence for large language models. InProceedings of the International Conference on Machine Learning (ICML), 2024
work page 2024
-
[12]
J. Kirchenbauer, J. Geiping, Y . Wen, J. Katz, I. Miers, and T. Goldstein. A watermark for large language models. InProceedings of the International Conference on Machine Learning (ICML), 2023. 10
work page 2023
-
[13]
R. Mandujano, J. Gutierrez-Cardenas, and M. S. Monge. Steganography application using combination of movements in a 2d video game platform. In K. Arai, S. Kapoor, and R. Bhatia, editors,Proceedings of the Future Technologies Conference (FTC), 2020
work page 2020
-
[14]
B. Qiao, K. Li, W. Zhou, S. Li, Q. Lu, and S. Hu. BotSim: LLM-powered malicious social botnet simulation. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025
work page 2025
- [15]
-
[16]
J. Schultz. Cybercriminal abuse of large language models.Cisco Talos, 2025
work page 2025
-
[17]
I. Shumailov, Z. Shumaylov, Y . Zhao, N. Papernot, R. Anderson, and Y . Gal. AI models collapse when trained on recursively generated data.Nature, 631(8022):755–759, 2024
work page 2024
-
[18]
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419): 1140–1144, 2018
work page 2018
-
[19]
Y . Takezawa, R. Sato, H. Bao, K. Niwa, and M. Yamada. Necessary and sufficient watermark for large language models.Transactions on Machine Learning Research, 2025
work page 2025
-
[20]
Large language models and misinformation.The Lancet Digital Health, 8(1):1, 2026
The Lancet Digital Health. Large language models and misinformation.The Lancet Digital Health, 8(1):1, 2026
work page 2026
-
[21]
Our lawsuit against ChessBase, 2021
The Stockfish Team. Our lawsuit against ChessBase, 2021
work page 2021
-
[22]
C. Wang, J. Shu, B. Chiu, Y . LI, S. Alharbi, M. Zhang, and J. Li. Learning to watermark: A selective watermarking framework for large language models via multi-objective optimization. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2025
work page 2025
-
[23]
B. H. Zhang and T. Sandholm. General search techniques without common knowledge for imperfect-information games, and application to superhuman Fog of War chess. InProceedings of the International Conference on Learning Representations (ICLR), 2026
work page 2026
-
[24]
X. Zhao, P. V . Ananth, L. Li, and Y .-X. Wang. Provable robust watermarking for AI-generated text. InProceedings of the International Conference on Learning Representations (ICLR), 2024. 11 A Pseudocode of the KGW watermark The pseudocode of the KGW watermark is shown in Algorithm 3. A textual explanation of the pseudocode is available in Section 2.2. Al...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.