pith. sign in

arxiv: 2606.01976 · v1 · pith:X5ZPZUQ2new · submitted 2026-06-01 · 💻 cs.HC

AutoBG: A Board Game Design Assistant with Interactive Ideation, Iterative Rulebook Generation, and Individualized Feedback

Pith reviewed 2026-06-28 13:00 UTC · model grok-4.3

classification 💻 cs.HC
keywords board game designAI design assistantrulebook generationiterative refinementcritic modulehuman-AI collaborationuser studypersona simulation
0
0 comments X

The pith

AutoBG uses critic-driven loops to generate board game rulebooks approaching published quality from initial ideas.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AutoBG as an integrated assistant that supports the full board game design workflow, from vague ideation through rulebook drafting and revision to simulated audience feedback. Board game creation requires repeated cycles of prototyping and playtesting, which is mentally demanding and currently underserved by AI tools that handle only isolated steps. AutoBG combines four modules: one for multi-turn dialogue to shape design drafts, one for producing and iteratively revising full rulebooks, a critic that checks for flaws and accepts only verified changes, and a persona simulator drawing on real player profiles. Tests on 207 held-out games show the outputs beat strong language model baselines and near the standard of published games, while a study of 30 users across experience levels finds the system lowers starting anxiety and reveals overlooked problems.

Core claim

AutoBG enables designers to move from an initial idea to a polished, audience-tested rulebook in one workflow through BG-Ideator for structured drafts via dialogue, BG-Realizer for complete rulebook generation and closed-loop revision with BG-Critic, and BG-Persona for feedback drawn from 150 real player profiles. The system is trained on 2.2K structured rulebooks and 180K quality-filtered player reviews. On 207 held-out games it outperforms baselines such as GPT-5.4, and a 30-participant user study confirms it reduces blank-page anxiety, surfaces hidden design flaws, and delivers practical help across experience levels.

What carries the argument

The BG-Critic module, which diagnoses design flaws from rulebook text alone and gates each revision so only verified improvements are accepted.

If this is right

  • Rulebooks produced by AutoBG approach the quality level of published games on held-out test cases.
  • The system reduces blank-page anxiety for designers of varying experience levels.
  • The critic module surfaces hidden design flaws that would otherwise go unnoticed.
  • Individualized feedback from simulated player personas supports audience-specific testing within the workflow.
  • The full set of modules allows completion of the entire design process from early ideation to final rulebook in a single session.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same modular structure of ideation, text-based criticism, and persona simulation could be adapted to other iterative creative tasks that rely on written specifications and audience response.
  • If the critic proves reliable at gating improvements from text, early-stage human playtesting might be partially deferred until later prototypes.
  • Training on larger collections of player reviews could narrow the remaining gap between generated and published rulebook quality.

Load-bearing premise

The BG-Critic module can reliably diagnose design flaws from rulebook text alone and correctly decide which revisions count as genuine improvements.

What would settle it

Independent expert ratings of rulebook quality on the same 207 held-out games, comparing AutoBG outputs directly against the published originals; if AutoBG versions score markedly lower, the performance claim fails.

Figures

Figures reproduced from arXiv: 2606.01976 by Chuanhao Li, Fanrui Zhang, Jianwen Sun, Kaipeng Zhang, Mingzhu Sun, Yibin Wang, Yifei Huang, Yukang Feng, Zizhen Li.

Figure 1
Figure 1. Figure 1: Overview of AUTOBG. Top: real-world board game design workflow and three key challenges. Bot￾tom: AUTOBG addresses these challenges through four modules. BG-Ideator provides interactive guided dia￾logue for idea structuring; BG-Realizer and BG-Critic form a Verifier-Gated Iteration loop for iterative rule￾book generation; BG-Persona simulates individualized audience feedback from 150 real player profiles. … view at source ↗
Figure 2
Figure 2. Figure 2: Running example: the AUTOBG pipeline applied to a campus-themed board game. (a) BG-Ideator handles diverse user behaviors (pushback, hesitation, tangents, opinionated demands) across multiple dialogue turns. (b) The converged design draft with structured mechanics, design intent, and parameters. (c) BG-Realizer generates a 7-section rulebook; BG-Critic diagnoses flaws (M_major, D_major) and drives targeted… view at source ↗
Figure 3
Figure 3. Figure 3: Rank distribution of the full set (N=2,233) and [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Word cloud of board game categories across [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Word cloud of board game mechanics across [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Design Draft Schema. The JSON schema captures designer intent through five field groups. Each mechanic entry includes a rationale explaining its role in the design. Draft Example: <Samurai> { "concept": { "elevator_pitch": "A tight area-influence game where you place strength tiles to encircle figures, win majorities by caste, and convert hand timing into board control.", "description": "I built Samurai ar… view at source ↗
Figure 8
Figure 8. Figure 8: Example of a Filled Design Draft. Generated by GPT-5.4 from the corresponding structured rulebook. Prompt: Draft Generation from Rulebook You are a senior board game design analyst. Your task is to analyze an existing board game’s rulebook and metadata, then produce a structured "design draft" —a document that captures the game’s design decisions from a designer’s perspective. You will receive: 1. The game… view at source ↗
Figure 9
Figure 9. Figure 9: Draft Generation Prompt. The model receives the rulebook, BGG metadata, and mechanic/category definitions, and produces a structured design draft. Prompt: Theme Migration QUALITY_BLOCK = """ ### Quality Requirements 1. **Mechanic Classification**: CORE (1-3) = main decision loop, identity-defining. SUPPORTING (2-6) = adds depth, serves core. STRUCTURAL (remainder) = framework, setup, grid type. 2. **Ration… view at source ↗
Figure 10
Figure 10. Figure 10: Theme Migration Prompt. Core mechanics are fixed; the model selects a new theme from 20 candidates and rewrites all theme-dependent fields. Prompt: Core Hybridization You are a senior board game design analyst. Your task is to create a new game design by combining core mechanics from two parent games. The result should be a fresh, coherent design —not a Frankenstein assembly of parts. You will receive two… view at source ↗
Figure 11
Figure 11. Figure 11: Core Hybridization Prompt. The model receives two parent drafts and a fixed set of core mechanics, then writes a new design from scratch [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Quality Verification Rubric. Gemini-3.1-Pro scores each draft across 11 dimensions (1–5 scale) with anchored examples. Drafts averaging ≥ 3.5 are accepted. Single-Turn Data Generation: Shared System Prompt and Per-Aspect Task Instructions ====================================================================== SHARED SYSTEM PROMPT (prepended to all aspects) You are generating training data for a board game … view at source ↗
Figure 13
Figure 13. Figure 13: Complete Prompt Suite for Single-Turn Data Generation. The shared system prompt enforces tag-based output, prose style, and designer tone. Per-aspect task instructions (appended to the system prompt) specify the target answer, reasoning requirements, and user prompt template. Placeholders in curly braces are populated dynamically from each draft’s fields [PITH_FULL_IMAGE:figures/full_fig_p029_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Complete Prompt for Multi-Turn Conversation Generation. The system prompt controls assistant tone, response texture variation, and quality constraints. The user prompt provides the target draft, user profile, available vocabularies, output format, and pacing constraints. Placeholders are populated per conversation based on the selected user type, quirk, and target draft [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗
Figure 15
Figure 15. Figure 15: MDA Rewriting Prompt. GPT-5.4 converts reviews into a unified three-section MDA evaluation format. The output removes personal voice and standardizes the structure for BG-Critic training. Perturbation Prompt (M_critical shown; other types follow the same structure) You are a board game design analyst creating training data for a rulebook critique model. Your task: introduce a CRITICAL M-layer (Mechanics-l… view at source ↗
Figure 16
Figure 16. Figure 16: Perturbation Prompt (M_critical). Each of the eight flaw types has a dedicated prompt following the same structure: definition and boundary with adjacent types, concrete strategies, editing discipline rules, and the edit-operation output schema. Only the flaw-specific sections differ across types. Rulebook Generation Prompt (Theme Migration shown; Hybrid variants follow the same struc￾ture) [SYSTEM] You a… view at source ↗
Figure 17
Figure 17. Figure 17: Rulebook Generation Prompt. The system prompt enforces the seven-section structure and prose style. For theme migrations, one parent rulebook serves as reference; for hybridizations, both parents are provided with lineage metadata. The model must independently derive numeric values from the draft’s parameters. Profile Generation Prompt [SYSTEM] You are a board game player analyst. Given a set of MDA-struc… view at source ↗
Figure 18
Figure 18. Figure 18: Profile Generation Prompt. GPT-5.4 receives a user’s reviews grouped by rating tier, each augmented with game metadata and lore. Game names are hidden to prevent the profile from being game-specific. BG-Persona SFT System Prompt Template You are simulating a real board game player. Given a game’s rulebook, produce an MDA (Mechanics-Dynamics-Aesthetics) evaluation from this specific player’s perspective, f… view at source ↗
Figure 19
Figure 19. Figure 19: BG-Persona SFT System Prompt. The user’s profile is injected into the system prompt. The template enforces four-section output and includes a rating calibration note to prevent the model from defaulting to the user’s average score. Diagnostic Quality Judge Prompt You are an expert evaluator for a board game design critique system. Your task is to compare a model’s diagnostic output against ground-truth (G… view at source ↗
Figure 20
Figure 20. Figure 20: Diagnostic Quality Judge Prompt. Gemini-3.1-Pro scores each model-reported flaw against ground truth across six weighted dimensions, and identifies hallucinated flaws. For no-flaw rulebooks (Source C), all model-reported flaws count as over-diagnosis. Player Comment Quality Judge Prompt [SYSTEM] You are evaluating two player comments on the same board game. Rate three dimensions on a 1-10 integer scale. C… view at source ↗
Figure 21
Figure 21. Figure 21: Player Comment Quality Judge Prompt. Gemini-3.1-Pro compares the generated comment (A) against the ground-truth comment (B) on preference alignment, reasoning consistency, and style match. Each dimension is scored 1–10 with tier-based calibration to prevent score clustering [PITH_FULL_IMAGE:figures/full_fig_p034_21.png] view at source ↗
read the original abstract

Designing a board game demands both thinking as a designer and experiencing as a player, while iterating through repeated prototyping and playtesting cycles, making it a cognitively intensive creative task well suited for human-AI collaboration. However, current systems lack end-to-end support to guide designers through the complete workflow from vague early ideation to iterative rulebook revision and audience testing. To this end, we present AutoBG, a board game design assistant built around critic-driven iterative refinement, comprising four specialized modules: BG-Ideator guides designers via multi-turn dialogue to produce structured design drafts; BG-Realizer generates complete rulebooks from drafts and revises them in a closed loop with BG-Critic, which diagnoses design flaws and gates each revision so that only verified improvements are accepted; and BG-Persona simulates individualized feedback from 150 real player profiles. Together, these modules enable designers to go from an initial idea to a polished, audience-tested rulebook within a single integrated workflow. The system is built on 2.2K structured rulebooks and 180K quality-filtered real player reviews, with task-specific training data derived for each module. Experiments on 207 held-out games show that AutoBG substantially outperforms state-of-the-art baselines (e.g., GPT-5.4), generating rulebooks that approach the quality of published games. Furthermore, a user study with 30 participants across diverse experience levels confirms that AutoBG effectively reduces blank-page anxiety, surfaces hidden design flaws, and provides highly rated, practical assistance throughout the creative process.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces AutoBG, an integrated AI assistant for board game design comprising BG-Ideator (multi-turn ideation), BG-Realizer (rulebook generation with closed-loop revision), BG-Critic (flaw diagnosis that gates revisions), and BG-Persona (individualized player feedback from 150 profiles). The system is trained on 2.2K structured rulebooks and 180K player reviews. It claims that experiments on 207 held-out games show substantial outperformance over baselines such as GPT-5.4, with generated rulebooks approaching published quality, and that a 30-participant user study confirms reductions in blank-page anxiety and practical assistance across experience levels.

Significance. If the empirical claims hold with proper validation, the work would represent a meaningful advance in HCI for AI-supported creative workflows, demonstrating an end-to-end system that handles ideation through iterative refinement and audience simulation in a domain requiring both design thinking and playtesting experience.

major comments (2)
  1. [Experiments on 207 held-out games] The central outperformance claim on the 207 held-out games depends on the BG-Critic module reliably detecting design flaws from rulebook text alone and correctly accepting only verified improvements in the closed-loop refinement with BG-Realizer. No separate held-out evaluation (e.g., precision, recall, or inter-rater agreement with human playtesters) of this critic is described, despite its training on the 2.2K rulebooks and 180K reviews; without it, the iterative process could accept spurious changes, undermining the quality claims.
  2. [User study] The abstract and results assert benefits from the 30-participant user study (reduced anxiety, surfaced flaws, highly rated assistance) but supply no evaluation criteria, statistical details, methodology for measuring outcomes, or breakdown by participant experience level, preventing assessment of whether the data supports the stated conclusions.
minor comments (2)
  1. Clarify the exact task-specific training data derivation process for each of the four modules and how the 207 held-out games were selected and preprocessed to ensure they are truly unseen.
  2. Provide concrete metrics (e.g., rulebook quality scores, comparison tables) rather than qualitative statements like 'substantially outperforms' and 'approach the quality of published games' to allow direct comparison with baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the strength of our empirical claims. We address each major point below with clarifications drawn from the manuscript and commit to revisions where the presentation can be strengthened.

read point-by-point responses
  1. Referee: [Experiments on 207 held-out games] The central outperformance claim on the 207 held-out games depends on the BG-Critic module reliably detecting design flaws from rulebook text alone and correctly accepting only verified improvements in the closed-loop refinement with BG-Realizer. No separate held-out evaluation (e.g., precision, recall, or inter-rater agreement with human playtesters) of this critic is described, despite its training on the 2.2K rulebooks and 180K reviews; without it, the iterative process could accept spurious changes, undermining the quality claims.

    Authors: We agree that a dedicated, module-level evaluation of BG-Critic (precision, recall, and inter-rater agreement against human playtesters) is not reported in the current manuscript. The 207 held-out game results demonstrate end-to-end improvement of the full AutoBG pipeline over baselines, but this does not isolate the critic's contribution or rule out acceptance of spurious revisions. We will add a new subsection under Experiments that reports critic performance on a held-out subset of rulebooks with expert annotations, including agreement metrics. This revision will directly address the concern. revision: yes

  2. Referee: [User study] The abstract and results assert benefits from the 30-participant user study (reduced anxiety, surfaced flaws, highly rated assistance) but supply no evaluation criteria, statistical details, methodology for measuring outcomes, or breakdown by participant experience level, preventing assessment of whether the data supports the stated conclusions.

    Authors: The full manuscript contains a User Study section that describes the protocol (pre/post questionnaires, task instructions, and qualitative feedback collection) and reports aggregate ratings. However, we acknowledge that explicit evaluation criteria, complete statistical reporting (means, SDs, p-values), and experience-level breakdowns are not presented with sufficient detail. We will expand this section in revision to include these elements, allowing readers to assess support for the claims about anxiety reduction and assistance across experience levels. revision: yes

Circularity Check

0 steps flagged

No circularity: system trained on external data with held-out evaluation

full rationale

The paper presents an ML-based design assistant trained on 2.2K external rulebooks and 180K player reviews, with performance measured on 207 held-out games and a separate user study. No mathematical derivations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the described workflow. The BG-Critic training and gating mechanism is presented as learned from the external corpus rather than defined in terms of the target outputs. Evaluation claims rest on standard train/held-out splits and human ratings, which are independent of the system internals. This is the expected non-finding for an empirical systems paper without closed-form derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper introduces a new applied system relying on standard LLM capabilities and curated datasets rather than new mathematical constructs or invented physical entities.

axioms (1)
  • domain assumption LLMs can be effectively fine-tuned or prompted for structured creative generation, criticism, and persona simulation tasks
    The four modules depend on this background capability to function as described.

pith-pipeline@v0.9.1-grok · 5847 in / 1314 out tokens · 34250 ms · 2026-06-28T13:00:06.484024+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 10 canonical work pages · 8 internal anchors

  1. [1]

    In International conference on machine learning, pages 337–371

    Using large language models to simulate mul- tiple humans and replicate human subject studies. In International conference on machine learning, pages 337–371. PMLR. Eliana Alweis and Richard Alweis. 2025. A narrative re- view of the benefits of board games in health.Cureus, 17(9). Jan Batzner, V olker Stocker, Bingjun Tang, Anusha Natarajan, Qinhao Chen, ...

  2. [2]

    InAnais do XXIV Sim- pósio Brasileiro de Jogos e Entretenimento Digital (SBGames 2025), SBGames 2025, page 655–667

    Boardwalk: Towards a framework for creat- ing board games with llms. InAnais do XXIV Sim- pósio Brasileiro de Jogos e Entretenimento Digital (SBGames 2025), SBGames 2025, page 655–667. Sociedade Brasileira de Computação. Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. 2024. Teaching large language models to self-debug. InThe Twelfth Internati...

  3. [3]

    Training Verifiers to Solve Math Word Problems

    Training verifiers to solve math word prob- lems.Preprint, arXiv:2110.14168. Elaine Conway and Ruth Smith. 2026. Analogue play in the age of ai: A scoping review of non-digital games as active learning strategies in higher educa- tion.Behavioral Sciences, 16(1):133. Yijiang River Dong, Tiancheng Hu, and Nigel Collier

  4. [4]

    Scaling Synthetic Data Creation with 1,000,000,000 Personas

    Can LLM be a personalized judge? InFind- ings of the Association for Computational Linguistics: EMNLP 2024, pages 10126–10141, Miami, Florida, USA. Association for Computational Linguistics. George Skaff Elias, Richard Garfield, and K Robert Gutschera. 2020.Characteristics of games. MIT Press. Geoffrey Engelstein and Isaac Shalev. 2022.Building Blocks of ...

  5. [5]

    CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

    Critic: Large language models can self- correct with tool-interactive critiquing.Preprint, arXiv:2305.11738. Go Frendi Gunawan and Mukhlis Amien. 2026. Com- prehensive evaluation of large language models on software engineering tasks: A multi-task benchmark. Preprint, arXiv:2602.07079. Chengpeng Hu, Yunlong Zhao, and Jialin Liu. 2024. Game generation via ...

  6. [6]

    Let's Verify Step by Step

    Let’s verify step by step.Preprint, arXiv:2305.20050. Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, and 1 others. 2023. Self-refine: Iterative refinement with self-feedback.Advances in neural information processing systems, 36:46534–46594. Mahdi Farrokhi Maleki...

  7. [7]

    arXiv preprint arXiv:2308.03188 , year=

    The effectiveness of intervention with board games: a systematic review.BioPsychoSocial medicine, 13(1):22. Mohd Kamal Othman, Rahimah Mat, and Kah Ching Sim. 2025. A systematic review of paper-based and digital board games for collaborative science learning. Review of Education, 13(3):e70107. Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi ...

  8. [8]

    Weiyan Shi and Kenny Tsu Wei Choo

    Large language models for scientific idea generation: A creativity-centered survey.Preprint, arXiv:2511.07448. Weiyan Shi and Kenny Tsu Wei Choo. 2026. A tax- onomy of human–mllm interaction in early-stage sketch-based design ideation. InProceedings of the Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems, CHI EA ’26, pag...

  9. [9]

    Reflexion: Language Agents with Verbal Reinforcement Learning

    Reflexion: Language agents with verbal rein- forcement learning.Preprint, arXiv:2303.11366. Carla Sousa, Sara Rye, Micael Sousa, Pedro Juan Tor- res, Claudilene Perim, Shivani Atul Mansuklal, and Firdaous Ennami. 2023. Playing at the school ta- ble: Systematic literature review of board, tabletop, and other analog game-based learning approaches. Frontiers...

  10. [10]

    AI Realtor: Towards Grounded Persuasive Language Generation for Automated Copywriting

    Ai realtor: Towards grounded persuasive lan- guage generation for automated copywriting.arXiv preprint arXiv:2502.16810. Haotian Xia, Hao Peng, Yunjia Qi, Xiaozhi Wang, Bin Xu, Lei Hou, and Juanzi Li. 2026. Storyalign: Evalu- ating and training reward models for story generation. Preprint, arXiv:2605.04831. An Yang, Anfeng Li, Baosong Yang, Beichen Zhang,...

  11. [11]

    Qwen3 Technical Report

    Qwen3 technical report.arXiv preprint arXiv:2505.09388. Dingyi Yang and Qin Jin. 2025. What matters in eval- uating book-length stories? a systematic study of long story evaluation. InProceedings of the 63rd An- nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16375– 16398. Yaowei Zheng, Richong Zhang, Junhao Zh...

  12. [12]

    LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

    Llamafactory: Unified efficient fine-tuning of 100+ language models.Preprint, arXiv:2403.13372. Lingfeng Zhou, Jialing Zhang, Jin Gao, Mohan Jiang, and Dequan Wang. 2025. Personaeval: Are llm eval- uators human enough to judge role-play?arXiv preprint arXiv:2508.10014. Appendix Overview A Data Foundation 12 A.1 Dataset Statistics . . . . . . . . . 12 A.2 ...

  13. [13]

    Briefing (5 min).Facilitator explained the AU- TOBG pipeline at high level, demonstrated one example interaction (not used in the study), and confirmed participant’s choice between bring- ing their own design idea or picking one of five prepared seed themes (cooperative survival, asymmetric race, abstract tile placement, social deduction with bidding, rea...

  14. [14]

    Facilita- tor intervened only on technical issues, not on design content

    Ideation with BG-Ideator (15 min).Partici- pants conversed with BG-Ideator in 5–10 turns until reaching a complete design draft. Facilita- tor intervened only on technical issues, not on design content

  15. [15]

    Reading v0 rulebook (10 min).Participants read the generated initial rulebook and were asked to summarize the core gameplay loop in their own words to confirm comprehension before proceeding

  16. [16]

    Participants observed the iteration (Algorithm 1) and, after each round, compared the new version to the previous one

    Closed-loop revision (10 min).BG-Critic’s diagnostic output was displayed alongside the rulebook. Participants observed the iteration (Algorithm 1) and, after each round, compared the new version to the previous one. They had the option to stop iteration early

  17. [17]

    Participants reflected on whether the feedback diversity matched their expectations

    BG-Persona preview (optional, 5 min).Three sample player profiles drawn from the 150-user pool generated feedback on the final rulebook. Participants reflected on whether the feedback diversity matched their expectations

  18. [18]

    Survey and interview (10 min).A 16-item Likert questionnaire (Appendix G.4) was fol- lowed by 6 open-ended interview questions about workflow friction, surprising moments, and comparison to prior tools. G.4 Survey Instrument Likert items (1–7 scale, 1 = strongly disagree, 7 = strongly agree).The full 16-item instrument is organized into six dimensions (3 ...

  19. [19]

    vague encouragement

    but score lower on critic acceptance (10). G.6 Retrospective Survey:AUTOBGvs. General LLMs 22 of the 30 participants reported prior experience using general-purpose LLMs (GPT: 18, Claude: 11, Gemini: 9; multi-select) for board game design assistance. At the end of the session, these par- ticipants were asked to rate theirpriorexperience on the same six di...

  20. [20]

    The game’s structured rulebook (rules, components, gameplay flow, etc.)

  21. [21]

    The game’s BGG metadata (mechanics, categories, complexity, player count, etc.)

  22. [22]

    Would a reviewer mention this in a one-sentence description?

    Official BGG definitions for each mechanic and category tagged to this game Your output must be a single valid JSON object following the schema provided. No other text outside the JSON. ## Key Principles ### Mechanic Classification Classify each mechanic into exactly one tier: - **CORE** (1-3): Defines the game’s identity. The main decision loop players e...

  23. [23]

    SUPPORTING (2-6) = adds depth, serves core

    **Mechanic Classification**: CORE (1-3) = main decision loop, identity-defining. SUPPORTING (2-6) = adds depth, serves core. STRUCTURAL (remainder) = framework, setup, grid type

  24. [24]

    adds depth

    **Rationale**: Must cite specific game mechanisms or design logic, not generic statements like "adds depth" or "enhances tension". Explain HOW and WHY

  25. [25]

    Show how mechanics CONNECT and create interesting DECISIONS

    **Description**: Write as a designer explaining design choices. Show how mechanics CONNECT and create interesting DECISIONS. Not a rule summary or review

  26. [26]

    **Elevator Pitch**: One clever design hook, not a feature list

  27. [27]

    balance X with Y

    **Core Tension**: A specific dilemma players face, not "balance X with Y"

  28. [28]

    **Parameters Rationale**: Explain WHY complexity, player count, and play time form a coherent package

  29. [29]

    "" # Get current categories to exclude similar themes (already handled by random sampling) draft_json = json.dumps(draft, indent=2, ensure_ascii=False) themes_block =

    **This is a NEW game design, not a copy of any existing game.** Do not reference real game titles in the output. """ You are a senior board game design analyst. Your task is to adapt an existing game design to a completely different theme while preserving its core mechanical identity. You will receive the original design draft, a list of candidate themes,...

  30. [30]

    **Pick ONE theme** from the candidates above that creates the most interesting contrast with the original

  31. [31]

    Rewrite every rationale to explain how each core mechanic works within the new theme

    **Core mechanics**: Keep the EXACT SAME core mechanic names. Rewrite every rationale to explain how each core mechanic works within the new theme

  32. [32]

    Aim for 60-80% retention, only change what the new theme demands

    **Supporting & Structural mechanics**: You MAY replace some to better fit the new theme, but every mechanic must come from the available list. Aim for 60-80% retention, only change what the new theme demands

  33. [33]

    **Categories**: Re-select from the available categories list to match the new theme

  34. [34]

    **Rewrite completely**: elevator_pitch, description, design_intent (all three sub-fields), parameters_rationale

  35. [35]

    **Parameters**: Adjust complexity, player_count, play_time if the theme change warrants it, but keep them plausible

  36. [36]

    "" pa_json = json.dumps(parent_a, indent=2, ensure_ascii=False) pb_json = json.dumps(parent_b, indent=2, ensure_ascii=False) cores_str =

    **Classification type**: May change if the new theme shifts the game’s nature (e.g., a wargame theme applied to a family framework could become Thematic Games). The result must feel like a coherent, original game design —not a reskin. ## Output Schema ‘‘‘json {OUTPUT_SCHEMA} ‘‘‘ {QUALITY_BLOCK} Respond with ONLY the JSON object. Figure 10:Theme Migration ...

  37. [37]

    Write fresh rationales explaining how they form the main decision loop TOGETHER in this new design

    **Core mechanics**: Use EXACTLY [{cores_str}]. Write fresh rationales explaining how they form the main decision loop TOGETHER in this new design

  38. [38]

    Re-select categories from the available list —you may adjust to better fit the new mechanic combination

    **Theme & Categories**: Inherit Parent A’s thematic direction. Re-select categories from the available list —you may adjust to better fit the new mechanic combination

  39. [39]

    Each must serve a specific core mechanic

    **Supporting mechanics** (2-6): Choose from BOTH parents’ supporting/structural pools OR introduce new ones from the available list. Each must serve a specific core mechanic

  40. [40]

    **Structural mechanics**: Select appropriate framework mechanics from the available list

  41. [41]

    Do NOT copy sentences from either parent

    **Write everything from scratch**: description, elevator_pitch, design_intent, parameters. Do NOT copy sentences from either parent

  42. [42]

    **The new design must be meaningfully different from both parents.** It should feel like a game that happens to share some DNA, not a remix

  43. [43]

    Strategy Games

    **Classification type**: Inherit from Parent A unless the new core combination clearly shifts it. ## Output Schema ‘‘‘json {OUTPUT_SCHEMA} ‘‘‘ {QUALITY_BLOCK} Respond with ONLY the JSON object. Figure 11:Core Hybridization Prompt.The model receives two parent drafts and a fixed set of core mechanics, then writes a new design from scratch. Quality Verifica...

  44. [44]

    Write in FLOWING PROSE -- paragraphs, not bullet points or numbered lists

  45. [45]

    Be dense and precise

    Keep <response> between 120-250 words. Be dense and precise

  46. [46]

    When mentioning a mechanic, explain it through what it DOES in this design -- never paste a dictionary definition

  47. [47]

    Sound like a senior designer in conversation: warm, direct, opinionated

  48. [48]

    Great question!

    Do NOT start with "Great question!" or any filler. Jump straight into substance

  49. [49]

    {target_pitch}

    <reasoning> should also be prose, not a list of points. ====================================================================== CATEGORY A: MECHANIC REASONING --- A1: Constraint Recommend --- ## Task: Recommend a second core mechanic The best answer for this specific design is **{target_mechanic}**, but you must arrive at it through genuine reasoning about...

  50. [50]

    Start with user, end with assistant

    Alternate user/assistant. Start with user, end with assistant

  51. [51]

    System substitutes complete draft

    Final draft_update = {"_final": true}. System substitutes complete draft

  52. [52]

    Core mechanics must originate from or be confirmed by the user

  53. [53]

    No single message introduces more than 3 new mechanics

  54. [54]

    user_type

    Spread information evenly across all turns. ## Information Blocks to Cover (1) Core mechanics (2) Theme/categories (3) Supporting/structural (4) Parameters (5) Design intent (6) Concept (final turn) ## Output Schema { "user_type": "{user_type}", "num_turns": <number of user messages>, "conversation": [ {"role": "user", "content": "..."}, {"role": "assista...

  55. [55]

    An MDA reasoning analysis (extracted from a chain-of-thought)

  56. [56]

    The original player review text

  57. [57]

    Considering all of this, I would rate it X out of 10

    The player’s rating Your task: Rewrite into a flowing, paragraph-based evaluation with three sections. Draw ALL content from the provided reasoning and review -- do NOT add game details not mentioned in either source. ## Output Format (use these exact headers) ### Mechanics & Design Write 2-4 sentences describing the specific game mechanics, components, r...

  58. [58]

    Delete a loss condition or game-end trigger entirely

  59. [59]

    Delete a mandatory gating rule

  60. [60]

    Delete the resolution mechanics of a core action

  61. [61]

    [deleted]

    Delete a phase transition / turn structure step. === RULES FOR HIGH-QUALITY DATA === - Find every place the target rule appears (main text, FAQ, summary) and remove ALL instances. - DO NOT leave markers like "[deleted]" or "(removed)". - DO NOT introduce contradictions -- delete, do not rewrite-and-contradict. - The rulebook should read smoothly; only a c...

  62. [62]

    No dangling fragments

    CLEAN REMOVAL: Remove entire sentences/bullets cleanly. No dangling fragments

  63. [63]

    Do not touch unrelated rules

    MINIMAL SCOPE: Change ONLY what is required. Do not touch unrelated rules

  64. [64]

    NO EDIT MARKERS: Output must read as a natural, clean rulebook

  65. [65]

    STRUCTURAL COMPLETENESS: Keep all original section headings and formatting

  66. [66]

    delete" -- remove an exact span of text

    PROPAGATE CONSISTENTLY: Apply the edit to ALL restatements. === OUTPUT FORMAT: EDIT OPERATIONS === Instead of rewriting the whole rulebook, output a list of edit operations: "delete" -- remove an exact span of text "replace" -- replace an exact span with new text "replace_all" -- globally replace a term throughout "rewrite_section" -- replace an entire se...

  67. [67]

    - Player count scaling must be explicitly specified for board configuration, component counts, and any variable rules

    FAQ or Edge Cases Critical requirements: - Component counts, board layout, and numeric values must be DERIVED from the draft’s parameters (complexity, player count, play time), NOT copied from any reference example. - Player count scaling must be explicitly specified for board configuration, component counts, and any variable rules. - The Gameplay Flow an...

  68. [68]

    Design component counts and numeric values appropriate for THIS draft’s complexity ({complexity}) and play time ({play_time} min)

  69. [69]

    Adapt all terminology, lore, and flavor to the new theme

  70. [70]

    Ensure the board/spatial design, setup procedure, and gameplay flow reflect the new theme coherently

  71. [71]

    Remember: write in prose paragraphs, not bullet-point lists

    Create FAQ entries that address edge cases specific to YOUR new rulebook. Remember: write in prose paragraphs, not bullet-point lists. === DESIGN DRAFT === {draft_json} Now write the complete rulebook. [USER —2-Core / 3-Core Hybrid variant] (Same system prompt. User prompt additionally provides BOTH parent rulebooks with lineage metadata specifying which ...

  72. [72]

    Overall Tendency: Rating pattern (harsh/generous/balanced), which MDA layer they focus on most

  73. [73]

    Mechanical Preferences: Which mechanics they love/hate, which combinations excite them

  74. [74]

    Dynamic Preferences: What gameplay dynamics they value (tension, interaction, pacing, balance)

  75. [75]

    Aesthetic Preferences: What emotional experiences they seek (intellectual challenge, social fun, thematic immersion, thrill)

  76. [76]

    Complexity Preference: Light/medium/heavy, and how they react across the spectrum

  77. [77]

    Cross-Category Patterns: How their standards differ across game types

  78. [78]

    This player

    Distinctive Traits: What makes this player unique compared to a generic reviewer ## Rules - Do NOT mention any game names (they are hidden for a reason) - Write in third person ("This player...", "They tend to...") - Ground every claim in specific evidence from the reviews - Focus on PATTERNS across reviews, not individual review summaries [USER] === PLAY...

  79. [79]

    match_layer: Same MDA layer identified? 5=correct layer, 3=adjacent or unclear, 1=wrong or flaw not mentioned

  80. [80]

    match_severity: Same severity level? 5=exact match, 3=off by one level, 1=two levels off or not mentioned

Showing first 80 references.