JSON-Bag: A generic game trajectory representation
Pith reviewed 2026-05-21 23:10 UTC · model grok-4.3
The pith
Tokenizing JSON game state descriptions into bags and comparing them with Jensen-Shannon distance classifies agents and game settings more accurately than hand-crafted features across six tabletop games.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
JSON-Bag converts JSON descriptions of game states into unordered token collections and applies Jensen-Shannon distance to compare entire trajectories. When used inside a prototype-based nearest-neighbor classifier, the representation outperforms a hand-crafted feature baseline on the majority of agent, parameter, and seed classification tasks across the six games. The same prototypes also prove sample-efficient in N-shot settings. Treating the individual tokens as features for a Random Forest classifier further raises accuracy on tasks where the bag model alone underperformed. Finally, the Jensen-Shannon distances between agent-class prototypes correlate strongly with measured distances in
What carries the argument
The JSON-Bag representation, which tokenizes JSON game-state strings into frequency bags and uses Jensen-Shannon distance to compare those bags without regard to token order or sequence structure.
If this is right
- The bag representation classifies which agent generated a trajectory more accurately than hand-crafted features in most of the tested tasks.
- Prototype vectors built from JSON-Bag allow sample-efficient N-shot classification of trajectory classes.
- Individual tokens inside the bags can be fed directly to a Random Forest model to raise accuracy on tasks where the pure bag approach lagged.
- Jensen-Shannon distance between agent prototypes tracks the distance between the agents' underlying policies across all six games.
Where Pith is reading between the lines
- The same token-bag approach could be tried on any domain that already produces structured JSON logs, such as robot control traces or simulation outputs, without new feature design.
- Because the method compares trajectories rather than policies directly, it offers a route to rank or cluster black-box agents solely from observed play data.
- The success of the unordered bag model suggests that many trajectory discrimination tasks may not require recurrent or graph-based sequence modeling at the first analysis stage.
Load-bearing premise
That the bag of tokens extracted from JSON state descriptions still carries enough information about game dynamics and player behavior for classification accuracy and policy-distance correlation to remain meaningful.
What would settle it
If JSON-Bag with Jensen-Shannon distance produced lower classification accuracy than the hand-crafted baseline on a majority of new games or tasks, or if the correlation between prototype distances and actual policy distances fell near zero, the central claims would be falsified.
Figures
read the original abstract
We introduce JSON Bag-of-Tokens model (JSON-Bag) as a method to generically represent game trajectories by tokenizing their JSON descriptions and apply Jensen-Shannon distance (JSD) as distance metric for them. Using a prototype-based nearest-neighbor search (P-NNS), we evaluate the validity of JSON-Bag with JSD on six tabletop games: 7 Wonders, Dominion, Sea Salt and Paper, Can't Stop, Connect4, Dots and boxes; each over three game trajectory classification tasks: classifying the playing agents, game parameters, or game seeds that were used to generate the trajectories. Our approach outperforms a baseline using hand-crafted features in the majority of tasks. Evaluating on N-shot classification suggests using JSON-Bag prototype to represent game trajectory classes is also sample efficient. Additionally, we demonstrate JSON-Bag ability for automatic feature extraction by treating tokens as individual features to be used in Random Forest to solve the tasks above, which significantly improves accuracy on underperforming tasks. Finally, we show that, across all six games, the JSD between JSON-Bag prototypes of agent classes highly correlates with the distances between agents' policies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces JSON-Bag, a generic representation for game trajectories obtained by tokenizing JSON state descriptions into bags of tokens and employing Jensen-Shannon distance (JSD) as a metric. Through prototype-based nearest-neighbor search (P-NNS), it evaluates this on six tabletop games for three classification tasks (agent identity, game parameters, game seeds), reporting outperformance versus a hand-crafted feature baseline in the majority of cases, sample efficiency in N-shot settings, further gains when using tokens as features in Random Forest classifiers, and a high correlation between JSD of agent-class prototypes and distances between the agents' policies.
Significance. If the empirical results hold under more rigorous statistical scrutiny, JSON-Bag provides a simple, domain-agnostic trajectory representation that sidesteps manual feature engineering, which could be useful for agent comparison, behavior clustering, and policy analysis in game AI and imitation learning. The multi-game, multi-task evaluation and the reported JSD–policy-distance correlation are positive elements; the manuscript also benefits from using concrete, reproducible games rather than abstract claims.
major comments (2)
- [§4] §4 (Experimental results): The accuracy tables and figures report point estimates for JSON-Bag versus the hand-crafted baseline without error bars, standard deviations across runs, or any statistical significance tests; this directly affects the central claim of outperformance 'in the majority of tasks' and the N-shot results, as variability due to seeds or sampling cannot be assessed.
- [§3] §3 (JSON-Bag construction): The method extracts tokens from successive JSON states but collapses them into an unordered multiset before applying JSD; no ablation compares this bag representation against order-preserving or structure-aware alternatives (e.g., sequential models or tree kernels on the JSON), leaving untested whether marginal token frequencies alone suffice to capture policy differences in path-dependent games such as Connect4 and Dominion.
minor comments (2)
- [Abstract] Abstract and §2: The phrasing 'JSON Bag-of-Tokens model (JSON-Bag)' is slightly inconsistent; a single defined term would improve readability.
- [§4.3] §4.3 (Random Forest experiments): The description of how individual tokens are turned into features for the classifier lacks detail on vocabulary size, handling of rare tokens, or cross-validation protocol.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on experimental reporting and the design assumptions of JSON-Bag. We address each major comment below and describe the changes we will make to the manuscript.
read point-by-point responses
-
Referee: [§4] §4 (Experimental results): The accuracy tables and figures report point estimates for JSON-Bag versus the hand-crafted baseline without error bars, standard deviations across runs, or any statistical significance tests; this directly affects the central claim of outperformance 'in the majority of tasks' and the N-shot results, as variability due to seeds or sampling cannot be assessed.
Authors: We agree that reporting only point estimates limits the strength of the outperformance claims. In the revised version we will rerun all experiments with multiple random seeds for prototype selection, N-shot sampling, and classifier training. We will report mean accuracies together with standard deviations in the tables, add error bars to the figures, and include statistical significance tests (e.g., paired Wilcoxon tests) for the main comparisons. revision: yes
-
Referee: [§3] §3 (JSON-Bag construction): The method extracts tokens from successive JSON states but collapses them into an unordered multiset before applying JSD; no ablation compares this bag representation against order-preserving or structure-aware alternatives (e.g., sequential models or tree kernels on the JSON), leaving untested whether marginal token frequencies alone suffice to capture policy differences in path-dependent games such as Connect4 and Dominion.
Authors: The unordered bag-of-tokens is an intentional design choice that prioritizes simplicity and domain-agnosticism. The strong correlation we report between JSD of agent prototypes and actual policy distances (across all six games, including Connect4 and Dominion) indicates that state-visit frequencies already capture policy-relevant differences. Nevertheless, we acknowledge that an explicit ablation against sequential or tree-based alternatives is absent. We will add a dedicated paragraph in the discussion section that motivates the bag representation, cites the policy-distance correlation as supporting evidence, and explicitly lists the lack of order-preserving ablations as a limitation for future work. revision: partial
Circularity Check
No circularity: empirical method with external validation
full rationale
The paper introduces JSON-Bag as a tokenization-based representation for game trajectories and validates it through direct empirical tasks: prototype-based nearest-neighbor classification of agents/parameters/seeds, N-shot sample efficiency, Random Forest feature extraction, and JSD correlation with policy distances across six games. All reported results are obtained by applying the representation to held-out trajectory data and comparing against an independent hand-crafted baseline; no equations, fitted parameters, or self-citations are used to derive the accuracies or correlations from the inputs themselves. The derivation chain consists solely of data processing followed by standard distance and classifier evaluation, remaining fully self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption JSON descriptions of game states contain the information necessary to distinguish agents, parameters, and seeds via token frequencies alone
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We interpret JSON-Bag as a probabilistic model of game trajectories and use the Jensen-Shannon distance (JSD) to measure similarity between them.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
JSON-Bag: A method to generically represent game trajectories using only the JSON descriptions of individual game states for tokenization.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Temporal-difference search in computer Go,
D. Silver, R. S. Sutton, and M. Mller, “Temporal-difference search in computer Go,” Machine Learning, vol. 87, no. 2, May 2012
work page 2012
-
[2]
Defining personas in games using metrics,
A. Tychsen and A. Canossa, “Defining personas in games using metrics,” in Proceedings of the 2008 Conference on Future Play , 2008
work page 2008
-
[3]
MAP-Elites to Generate a Team of Agents that Elicits Diverse Automated Gameplay,
C. Guerrero-Romero and D. Perez-Liebana, “MAP-Elites to Generate a Team of Agents that Elicits Diverse Automated Gameplay,” 2021 IEEE Conference on Games (CoG) , 2021
work page 2021
-
[4]
Automatic generation and evaluation of recombination games,
C. B. Browne, “Automatic generation and evaluation of recombination games,” PhD Thesis, Queensland University of Technology, 2008
work page 2008
-
[5]
Abandoning Objectives: Evolution Through the Search for Novelty Alone,
J. Lehman and K. O. Stanley, “Abandoning Objectives: Evolution Through the Search for Novelty Alone,” Evolutionary Computation , vol. 19, no. 2, pp. 189–223, Jun. 2011
work page 2011
-
[6]
Illuminating search spaces by mapping elites
J.-B. Mouret and J. Clune, “Illuminating search spaces by mapping elites,” Apr. 2015, 10.48550/arXiv.1504.04909
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1504.04909 2015
-
[7]
A comparison of methods for player clustering via behavioral telemetry,
A. Drachen, C. Thurau, R. Sifa, and C. Bauckhage, “A comparison of methods for player clustering via behavioral telemetry,” FDG, 2013
work page 2013
-
[8]
Retrieving Game States with Moment Vectors,
Z. Zhan and A. M. Smith, “Retrieving Game States with Moment Vectors,” in AAAI Workshops, 2018
work page 2018
-
[9]
Divergence measures based on the Shannon entropy,
J. Lin, “Divergence measures based on the Shannon entropy,” IEEE Transactions on Information Theory , vol. 37, no. 1, 1991
work page 1991
-
[10]
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” 2017
work page 2017
-
[11]
Automated Game Design via Concep- tual Expansion,
M. J. Guzdial and M. O. Riedl, “Automated Game Design via Concep- tual Expansion,” in AIIDE, 2018
work page 2018
-
[12]
A Video Game Description Language for Model-based or Interactive Learning,
T. Schaul, “A Video Game Description Language for Model-based or Interactive Learning,” in IEEE CIG, 2013
work page 2013
-
[13]
Ludii The Ludemic General Game System,
Piette, D. J. N. J. Soemers, M. Stephenson, C. F. Sironi, M. H. M. Winands, and C. Browne, “Ludii The Ludemic General Game System,” in ECAI, 2020
work page 2020
-
[14]
Trans- forming Exploratory Creativity with DeLeNoX,,
A. Liapis, H. P. Martnez, J. Togelius, and G. N. Yannakakis, “Trans- forming Exploratory Creativity with DeLeNoX,,” in ICCC, 2021
work page 2021
-
[15]
E. C. Jackson and M. Daley, “Novelty search for deep reinforcement learning policy network weights by action sequence edit metric dis- tance,” GECCO, 2019
work page 2019
-
[16]
TAG: A Tabletop Games Framework,
R. D. Gaina, M. Balla, A. Dockhorn, R. Montoliu, and D. Prez-Libana, “TAG: A Tabletop Games Framework,” in AIIDE Workshops, 2020
work page 2020
-
[17]
Matrices, Vector Spaces, and Information Retrieval,
M. W. Berry, Z. Drmac, and E. R. Jessup, “Matrices, Vector Spaces, and Information Retrieval,” SIAM Review, vol. 41, no. 2, 1999
work page 1999
-
[18]
On Information and Sufficiency,
S. Kullback and R. A. Leibler, “On Information and Sufficiency,” The Annals of Mathematical Statistics , vol. 22, no. 1, pp. 79–86, Mar. 1951
work page 1951
-
[19]
Like Two Pis in a Pod: Author Similarity Across Time in the Ancient Greek Corpus,
G. Storey and D. Mimno, “Like Two Pis in a Pod: Author Similarity Across Time in the Ancient Greek Corpus,” Journal of Cultural Analyt- ics, vol. 5, no. 2, Jul. 2020
work page 2020
-
[20]
Bag-of- words representation for biomedical time series classification,
J. Wang, P. Liu, M. F. H. She, S. Nahavandi, and A. Kouzani, “Bag-of- words representation for biomedical time series classification,” Biomed- ical Signal Processing and Control , vol. 8, no. 6, 2013
work page 2013
-
[21]
A new metric for probability distributions,
D. Endres and J. Schindelin, “A new metric for probability distributions,” IEEE Transactions on Information Theory , vol. 49, no. 7, 2003
work page 2003
-
[22]
A Survey of Monte Carlo Tree Search Methods,
C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A Survey of Monte Carlo Tree Search Methods,” IEEE Transactions on Computational Intelligence and AI in Games , vol. 4, no. 1, 2012
work page 2012
-
[23]
The N-Tuple Bandit Evolutionary Algorithm for Game Agent Optimisation,
S. M. Lucas, J. Liu, and D. Perez-Liebana, “The N-Tuple Bandit Evolutionary Algorithm for Game Agent Optimisation,” in 2018 IEEE Congress on Evolutionary Computation (CEC) , 2018
work page 2018
- [24]
-
[25]
Seeding for Success: Skill and Stochasticity in Tabletop Games,
J. Goodman, D. Perez-Liebana, and S. Lucas, “Seeding for Success: Skill and Stochasticity in Tabletop Games,” IEEE ToG, 2025
work page 2025
-
[26]
COMPUTING ELO RATINGS OF MOVE PATTERNS IN THE GAME OF GO,
R. Coulom, “COMPUTING ELO RATINGS OF MOVE PATTERNS IN THE GAME OF GO,” ICGA Journal, vol. 30, no. 4, 2007
work page 2007
-
[27]
MultiTree MCTS in Tabletop Games,
J. Goodman, D. Perez-Liebana, and S. Lucas, “MultiTree MCTS in Tabletop Games,” in 2022 IEEE Conference on Games (CoG) , 2022
work page 2022
-
[28]
R. Cilibrasi and P. Vitanyi, “Clustering by Compression,” IEEE Trans- actions on Information Theory , 2005
work page 2005
-
[29]
Z. Jiang, M. Y . R. Yang, M. Tsirlin, R. Tang, Y . Dai, and J. J. Lin, “”Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors,” in ACL, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.