pith. machine review for the scientific record. sign in

arxiv: 2605.14283 · v1 · submitted 2026-05-14 · 💻 cs.GT · cs.AI· cs.CR

Recognition: no theorem link

Watermarking Game-Playing Agents in Perfect-Information Extensive-Form Games

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:32 UTC · model grok-4.3

classification 💻 cs.GT cs.AIcs.CR
keywords watermarkingextensive-form gamesstrategy profilesstatistical detectionperfect informationchess enginesAI agentsexpected utility
0
0 comments X

The pith

Game strategies in perfect-information extensive-form games can be watermarked for statistical detection while bounding expected utility loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper adapts the KGW watermarking method from language models to embed hidden patterns directly into strategy profiles of game-playing agents. This allows a statistical test to verify the origin of the agent from observed play. The central result establishes that any quality degradation, measured by expected utility, remains bounded even after the embedding step. Experiments on multiple chess engines show the quality impact stays negligible while detection succeeds from only a handful of games. The work addresses misuse risks such as unauthorized AI use on gaming platforms by providing a verifiable signature without destroying equilibrium play.

Core claim

The KGW watermark can be adapted to perfect-information extensive-form games by modifying strategy profiles to embed a detectable pattern; the resulting watermarked profile admits a statistical detection test, and the loss in expected utility relative to the original profile is bounded, although detectability and quality trade off against each other.

What carries the argument

The adapted KGW watermark pattern embedded into the action probabilities of a strategy profile at information sets, which preserves the extensive-form structure while enabling statistical verification of the source.

If this is right

  • The watermark remains detectable by a statistical test applied to sequences of observed moves.
  • Expected utility of the watermarked profile stays within a provable distance of the original profile's value.
  • Increasing the strength of the watermark improves detection probability at the direct expense of strategy quality.
  • The same procedure applied to chess engines produces negligible strength loss while allowing reliable identification after a few matches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Platforms could use the method to flag unauthorized deployment of a particular AI agent across multiple accounts.
  • The bounded-loss guarantee suggests the technique might serve as a building block for provenance tracking in other sequential decision domains.
  • Optimizing the watermark strength parameter could be studied as a separate design problem to shift the observed tradeoff curve.

Load-bearing premise

Embedding the watermark pattern into a strategy profile leaves the underlying information sets and equilibrium properties intact.

What would settle it

An empirical run in which the measured expected-utility drop exceeds the paper's derived bound, or in which the statistical test fails to flag the watermark after the claimed number of games, would disprove the main claims.

Figures

Figures reproduced from arXiv: 2605.14283 by Fei Fang, Juho Kim, Tuomas Sandholm.

Figure 1
Figure 1. Figure 1: On the left and in the middle, respectively, the z-scores over the number of rounds and [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: z-scores for watermarked Stock￾fish 17.1. The left and right plots ablate on γ and δ, respectively. 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate 0.0 0.2 0.4 0.6 0.8 1.0 True positive rate ROC curves for watermark detection Engine =0.1, =0.5 =0.25, =0.5 =0.5, =0.5 =0.75, =0.5 =0.9, =0.5 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate 0.0 0.2 0.4 0.6 0.8 1.0 True positive rate ROC curves for watermark detection E… view at source ↗
read the original abstract

Watermarking techniques for large language models (LLMs), which encode hidden information in the output so its source can be verified, have gained significant attention in recent days, thanks to their potential capability to detect accidental or deliberate misuse. Similar challenges involving model misuse also exist in the context of game-playing, such as when detecting the unauthorized use of AI tools in gaming platforms (e.g., cheating in online chess). In this paper, we initiate the study of how game-playing strategies can be watermarked. We show how the KGW watermark for LLMs can be adapted to watermark game-playing agents in perfect-information extensive-form games. The watermark can then be detected using a statistical test. We show that the degradation in the quality of the watermarked strategy profile, quantified by the expected utility, can be bounded, but there is a tradeoff between detectability and quality. In our experiments, we bootstrap the watermarking framework to various chess engines and demonstrate that a) the impact of the watermark on the quality of the strategy is negligible and b) the watermark can be detected with just a handful of games.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper adapts the KGW watermarking technique from LLMs to behavioral strategies in perfect-information extensive-form games. It claims that the resulting watermarked strategy profile remains detectable via a statistical test, that the degradation in expected utility can be bounded (with an explicit tradeoff to detectability), and that experiments bootstrapping the method to chess engines show negligible quality loss while permitting reliable detection from only a handful of games.

Significance. If the utility bound is valid, the work is significant as the first systematic treatment of watermarking game-playing agents, with direct relevance to anti-cheating mechanisms in online gaming platforms. The chess-engine experiments supply concrete, reproducible evidence that the approach is practically viable at non-trivial game depths.

major comments (1)
  1. [theoretical analysis / utility bound] Abstract and theoretical analysis: the claimed bound on expected-utility degradation is stated to exist, yet the derivation appears to rely on per-information-set (singleton) bias without an explicit contraction, martingale, or total-variation argument that controls the multiplicative accumulation of bias along paths of length d. For chess-scale depths this leaves the worst-case gap formally uncontrolled, directly undermining the asserted tradeoff between detectability and quality.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and constructive feedback on our manuscript. We address the major comment regarding the theoretical utility bound below.

read point-by-point responses
  1. Referee: [theoretical analysis / utility bound] Abstract and theoretical analysis: the claimed bound on expected-utility degradation is stated to exist, yet the derivation appears to rely on per-information-set (singleton) bias without an explicit contraction, martingale, or total-variation argument that controls the multiplicative accumulation of bias along paths of length d. For chess-scale depths this leaves the worst-case gap formally uncontrolled, directly undermining the asserted tradeoff between detectability and quality.

    Authors: We appreciate the referee's observation on the need for a more explicit control on bias accumulation. The manuscript derives the bound from the per-information-set bias (at most ε in total variation) and the linearity of expectation over the game tree, yielding a degradation of O(dε) for depth d. However, we agree that an explicit martingale or contraction argument would strengthen the formal presentation. We will revise the theoretical analysis section to include a telescoping argument on conditional expectations along paths, showing that the total variation between the original and watermarked path distributions is bounded by dε (rather than exponentially), which directly controls the worst-case utility gap. This makes the detectability-quality tradeoff fully rigorous and explicit. The chess experiments already demonstrate that small ε suffices for negligible practical degradation. revision: yes

Circularity Check

0 steps flagged

No circularity: adaptation plus independent bound analysis

full rationale

The paper adapts the external KGW LLM watermark to perfect-information extensive-form games and derives a new expected-utility bound on degradation. No equations reduce by construction to author-fitted parameters, no self-definitional steps appear, and the central tradeoff claim rests on standard game-theoretic and statistical arguments rather than self-citation chains or renamed known results. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on the assumption that game strategies admit probabilistic modifications analogous to token sampling in language models, plus standard game-theoretic definitions of perfect-information extensive-form games.

free parameters (1)
  • watermark strength parameter
    Controls the tradeoff between detection power and utility degradation; its value is chosen to achieve the reported negligible impact.
axioms (1)
  • domain assumption The underlying game is a perfect-information extensive-form game
    Explicitly stated in the title and abstract as the setting in which the watermark is applied.

pith-pipeline@v0.9.0 · 5497 in / 1188 out tokens · 29009 ms · 2026-05-15T02:32:12.877885+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    Brown and T

    N. Brown and T. Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.Science, 359(6374):418–424, 2018

  2. [2]

    Brown and T

    N. Brown and T. Sandholm. Superhuman AI for multiplayer poker.Science, 365(6456):885–890, 2019

  3. [3]

    Campbell, A

    M. Campbell, A. J. Hoane, Jr., and F. Hsu. Deep Blue.Artificial Intelligence, 134(1):57–83, 2002

  4. [4]

    Chang and I

    C.-C. Chang and I. Echizen. Steganography in game actions.IEEE Access, 13:21029–21042, 2025

  5. [5]

    Chang, K

    Y . Chang, K. Krishna, A. Houmansadr, J. F. Wieting, and M. Iyyer. PostMark: A robust blackbox watermark for large language models. InProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

  6. [6]

    Chappell

    B. Chappell. Hans Niemann is accused of cheating in more than 100 chess games. He’s playing today.National Public Radio (NPR), 2022

  7. [7]

    Dathathri, A

    S. Dathathri, A. See, S. Ghaisas, P.-S. Huang, R. McAdam, J. Welbl, V . Bachani, A. Kaskasoli, R. Stanforth, T. Matejovicova, J. Hayes, N. Vyas, M. A. Merey, J. Brown-Cohen, R. Bunel, B. Balle, T. Cemgil, Z. Ahmed, K. Stacpoole, I. Shumailov, C. Baetu, S. Gowal, D. Hassabis, and P. Kohli. Scalable watermarking for identifying large language model outputs....

  8. [8]

    C. Grant. Poker pro axed from sponsorship after using solver while playing online.PokerNews, 2026

  9. [9]

    J. C. Hernandez-Castro, I. Blasco-Lopez, J. M. Estevez-Tapiador, and A. Ribagorda-Garnacho. Steganography in games: A general methodology and its application to the game of go.Com- puters & Security, 25(1):64–71, 2006

  10. [10]

    Hoeffding

    W. Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963

  11. [11]

    M. Huo, S. A. Somayajula, Y . Liang, R. Zhang, F. Koushanfar, and P. Xie. Token-specific watermarking with enhanced detectability and semantic coherence for large language models. InProceedings of the International Conference on Machine Learning (ICML), 2024

  12. [12]

    Kirchenbauer, J

    J. Kirchenbauer, J. Geiping, Y . Wen, J. Katz, I. Miers, and T. Goldstein. A watermark for large language models. InProceedings of the International Conference on Machine Learning (ICML), 2023. 10

  13. [13]

    Mandujano, J

    R. Mandujano, J. Gutierrez-Cardenas, and M. S. Monge. Steganography application using combination of movements in a 2d video game platform. In K. Arai, S. Kapoor, and R. Bhatia, editors,Proceedings of the Future Technologies Conference (FTC), 2020

  14. [14]

    B. Qiao, K. Li, W. Zhou, S. Li, Q. Lu, and S. Hu. BotSim: LLM-powered malicious social botnet simulation. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025

  15. [15]

    Sander, P

    T. Sander, P. Fernandez, A. O. Durmus, M. Douze, and T. Furon. Watermarking makes language models radioactive. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2024

  16. [16]

    J. Schultz. Cybercriminal abuse of large language models.Cisco Talos, 2025

  17. [17]

    Shumailov, Z

    I. Shumailov, Z. Shumaylov, Y . Zhao, N. Papernot, R. Anderson, and Y . Gal. AI models collapse when trained on recursively generated data.Nature, 631(8022):755–759, 2024

  18. [18]

    Silver, T

    D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419): 1140–1144, 2018

  19. [19]

    Takezawa, R

    Y . Takezawa, R. Sato, H. Bao, K. Niwa, and M. Yamada. Necessary and sufficient watermark for large language models.Transactions on Machine Learning Research, 2025

  20. [20]

    Large language models and misinformation.The Lancet Digital Health, 8(1):1, 2026

    The Lancet Digital Health. Large language models and misinformation.The Lancet Digital Health, 8(1):1, 2026

  21. [21]

    Our lawsuit against ChessBase, 2021

    The Stockfish Team. Our lawsuit against ChessBase, 2021

  22. [22]

    C. Wang, J. Shu, B. Chiu, Y . LI, S. Alharbi, M. Zhang, and J. Li. Learning to watermark: A selective watermarking framework for large language models via multi-objective optimization. InProceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

  23. [23]

    B. H. Zhang and T. Sandholm. General search techniques without common knowledge for imperfect-information games, and application to superhuman Fog of War chess. InProceedings of the International Conference on Learning Representations (ICLR), 2026

  24. [24]

    X. Zhao, P. V . Ananth, L. Li, and Y .-X. Wang. Provable robust watermarking for AI-generated text. InProceedings of the International Conference on Learning Representations (ICLR), 2024. 11 A Pseudocode of the KGW watermark The pseudocode of the KGW watermark is shown in Algorithm 3. A textual explanation of the pseudocode is available in Section 2.2. Al...