arxiv: 2605.09767 · v1 · submitted 2026-05-10 · 💻 cs.HC

Recognition: no theorem link

LLMs are the Ideal Candidate for Mixed-Initiative Game Design Pillar Workflows

Daniel Dyrda, Georg Groh, Julian Geheeb, Marvin Julian Schwarz

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:11 UTC · model grok-4.3

classification 💻 cs.HC

keywords game design pillarslarge language modelsmixed-initiative workflowsgame development toolsSPINE prototypequalitative evaluationexpert interviewsnatural language design support

0 comments

The pith

Large language models can meaningfully contribute to mixed-initiative workflows built around game design pillars.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that game design pillars, which are natural language statements of a game's core vision, match the strengths of LLMs in generating and interpreting text, making them suitable for collaborative design processes. It backs this by providing a formal definition of pillars, building a prototype system called SPINE, testing it during a local game jam, and gathering feedback from four expert interviews. If the claim holds, developers could use LLMs to keep early-stage decisions aligned with the intended player experience without losing coherence. Readers would care because many games suffer from vision drift during development, and language-based tools might reduce that risk.

Core claim

Game design pillars serve as linguistic anchors that communicate a project's vision and guide coherent development decisions. Because LLMs handle natural language generation and interpretation effectively, they fit mixed-initiative workflows that create, refine, and apply these pillars. The work demonstrates this fit through a prototype, a game jam deployment that received positive reception for early-stage utility, and expert sessions that yielded encouraging overall perceptions, confirming that LLMs can support pillar-centered creation and decision-making.

What carries the argument

The SPINE prototype, which applies LLMs to pillar creation, interpretation, and decision support in game design processes.

If this is right

Early game development teams could use LLM support to maintain vision coherence during rapid iteration.
Game jam participants might produce more consistent prototypes when assisted by pillar-focused tools.
Expert designers could explore alternative pillar applications through LLM-generated suggestions.
Formal pillar workflows create a new research area for automated assistance in experience design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar LLM assistance could extend to other creative domains that rely on shared natural-language specifications, such as film or product design.
Future tools might connect pillar management directly to playtesting data to suggest adjustments automatically.
Widespread adoption could change how small teams document and enforce creative direction without adding heavy process overhead.

Load-bearing premise

Positive qualitative feedback from a single small game jam and four expert interviews is sufficient to indicate meaningful and generalizable utility for LLMs in pillar-driven design.

What would settle it

A follow-up study that deploys the same prototype on multiple projects, tracks measurable outcomes such as time to reach vision alignment or consistency of implemented features with stated pillars, and finds no improvement over unaided teams.

Figures

Figures reproduced from arXiv: 2605.09767 by Daniel Dyrda, Georg Groh, Julian Geheeb, Marvin Julian Schwarz.

**Figure 1.** Figure 1: A screenshot of SPINE’s user interface. We use continuous text as a practical lower bound for an expository statement without imposing unnecessary stylistic constraints. For this initial research, we omitted the criteria Conciseness and Actionability, as they usually require additional domain-grounded interpretation. After prompting the system, the LLM provides structural feedback on the four issues, ind… view at source ↗

read the original abstract

Game Design Pillars are natural language artifacts commonly used in game development to communicate a project's core vision and ensure a coherent player experience. Their linguistic nature aligns well with the strengths of Large Language Models (LLMs), which excel at generating and interpreting natural language, making them strong candidates for supporting mixed-initiative workflows centered on design pillars. In this study, we introduce a formal definition of game design pillars, present an initial prototype -- SPINE -- and investigate the utility of LLMs in the creation and decision-making processes associated with pillar-driven workflows. We begin with a pre-study to identify an appropriate model, comparing \texttt{gemini-2.0-flash} and \texttt{GPT-4o-mini}. Results show that Gemini is better suited to our tasks due to its greater output variety and consistency. We then conduct a case study by deploying the tool at a local game jam. Findings indicate positive reception and clear value in integrating SPINE into early-stage development. Finally, we interview four experts, demonstrating the tool and allowing them to experiment with it in a controlled environment. While individual perspectives vary, the overall perception is encouraging and supports our intuition: LLMs can meaningfully contribute to game design pillar workflows. These early findings highlight the potential of formalizing pillar-driven design as a research space and point toward several promising avenues for future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that LLMs are strong candidates for mixed-initiative game design pillar workflows because their natural language strengths align with the linguistic nature of design pillars. It introduces a formal definition of game design pillars, presents the SPINE prototype, selects gemini-2.0-flash over GPT-4o-mini in a pre-study based on output variety and consistency, deploys SPINE at a local game jam where participants report positive reception and clear value for early-stage development, and interviews four experts who generally find the tool encouraging after demonstration and experimentation. The authors conclude that these early findings support LLMs meaningfully contributing to pillar-driven design processes and identify promising directions for future work.

Significance. If the qualitative findings can be strengthened with more rigorous evaluation, the work could help formalize pillar-driven game design as a distinct research area in HCI and game development, providing an initial prototype and user perspectives that may guide tool-building and mixed-initiative studies. The pre-study model comparison and game-jam deployment offer concrete starting points, though the small scale and lack of quantitative grounding currently limit demonstrated impact.

major comments (3)

[Case Study] The case study reports 'positive reception and clear value' from the game jam deployment without any quantitative metrics (e.g., number of pillar iterations, design coherence ratings, or player-experience outcomes), baseline comparisons to non-LLM pillar workflows, or details on how feedback was collected and coded. This leaves the central claim of meaningful contribution dependent on unquantified subjective impressions.
[Expert Interviews] The expert interviews with only four participants are described as yielding an 'overall perception [that] is encouraging,' yet no information is provided on expert selection criteria, interview protocol, specific tasks performed, or analysis method (e.g., thematic analysis). This makes it difficult to assess how strongly the sessions support generalizable utility for LLM pillar workflows.
[Pre-study] The pre-study model selection concludes that Gemini is preferable due to 'greater output variety and consistency,' but no concrete metrics, example outputs, or scoring rubric are supplied to justify the choice over GPT-4o-mini or to allow replication of the selection process.

minor comments (2)

[Introduction / Definition] The formal definition of game design pillars is referenced in the abstract and introduction but would benefit from being stated explicitly (perhaps as a boxed definition or in §2) so readers can directly evaluate how it aligns with the SPINE implementation.
[Title] The title asserts that 'LLMs are the Ideal Candidate,' which is stronger than the cautious language in the abstract and conclusion; consider revising the title to better reflect the preliminary, exploratory nature of the reported evidence.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback. We address each major comment below with clarifications from our study and note revisions to improve methodological transparency. Our work is an early exploratory investigation, and we value the suggestions for strengthening its presentation.

read point-by-point responses

Referee: [Case Study] The case study reports 'positive reception and clear value' from the game jam deployment without any quantitative metrics (e.g., number of pillar iterations, design coherence ratings, or player-experience outcomes), baseline comparisons to non-LLM pillar workflows, or details on how feedback was collected and coded. This leaves the central claim of meaningful contribution dependent on unquantified subjective impressions.

Authors: We acknowledge that the case study is qualitative and does not include quantitative metrics, baseline comparisons, or formal coding of feedback. The deployment prioritized naturalistic feedback from developers in a game jam setting over controlled measurements of design outcomes. Feedback was obtained through a post-jam questionnaire and informal discussions; we will revise the manuscript to describe the questionnaire and summarization process in detail. We did not collect data on pillar iterations, coherence ratings, or player-experience outcomes, as these require a different experimental design. We will add an explicit limitations discussion and future work directions addressing the value of such metrics and comparisons in subsequent studies. The practitioner impressions still offer relevant early support for the tool's utility in initial development phases. revision: partial
Referee: [Expert Interviews] The expert interviews with only four participants are described as yielding an 'overall perception [that] is encouraging,' yet no information is provided on expert selection criteria, interview protocol, specific tasks performed, or analysis method (e.g., thematic analysis). This makes it difficult to assess how strongly the sessions support generalizable utility for LLM pillar workflows.

Authors: We will revise the expert interviews section to supply the missing details. The four experts were selected for their professional game design experience and familiarity with pillars. Each session included a demonstration of SPINE, a hands-on task creating and iterating a sample pillar, and a semi-structured discussion. Analysis consisted of reviewing session notes to identify recurring themes in the feedback. We will include the selection criteria, protocol description, task details, and analysis approach in the revised manuscript. The limited sample size restricts generalizability, which we will state as a limitation, but the direct interaction provides targeted insights into practical utility. revision: yes
Referee: [Pre-study] The pre-study model selection concludes that Gemini is preferable due to 'greater output variety and consistency,' but no concrete metrics, example outputs, or scoring rubric are supplied to justify the choice over GPT-4o-mini or to allow replication of the selection process.

Authors: We will expand the pre-study section with the requested specifics to support replication. Both models were tested using the same pillar generation and refinement prompts. Variety was judged by the diversity of distinct concepts in the outputs, and consistency by adherence to prompt constraints across repeated runs. Gemini yielded more varied yet coherent results. The revised version will present the evaluation criteria, sample outputs from each model, and the prompts used. This addition will clarify the basis for selecting Gemini-2.0-flash. revision: yes

standing simulated objections not resolved

Providing quantitative metrics (e.g., pillar iterations, design coherence ratings, or player-experience outcomes) or baseline comparisons for the case study, as these data were not collected during the original game jam deployment.

Circularity Check

0 steps flagged

No circularity: empirical qualitative evaluation with external feedback

full rationale

The paper contains no mathematical derivations, equations, fitted parameters, or self-referential constructions. Its central claim rests on a pre-study model comparison, a game-jam deployment, and four expert interviews, all of which draw on external participant responses rather than any internal reduction of outputs to inputs. No load-bearing steps reduce by definition, by construction, or by self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical qualitative study in HCI with no mathematical modeling. It relies on standard domain assumptions such as the validity of small-scale user feedback for indicating tool utility and the appropriateness of the chosen models and tasks.

axioms (1)

domain assumption Positive reception from game jam participants and four experts indicates meaningful contribution of LLMs to pillar workflows
Invoked in the case study and interview sections to support the overall conclusion.

pith-pipeline@v0.9.0 · 5549 in / 1090 out tokens · 56187 ms · 2026-05-12T03:11:12.358102+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

[1]

2025.4 Basic Pillars of System Design – Scalability, A vailability, Reliability, Performance

Arslan Ahmad. 2025.4 Basic Pillars of System Design – Scalability, A vailability, Reliability, Performance. https://www.designgurus.io/blog/4-basic-pillars-of- system-design Accessed: 2025-11-20

work page 2025
[2]

Orry Ali. 2013. Destiny: Bungie’s Brave New Worlds — An In-Depth Look atDestiny. Online article, Polygon. Accessed 05 November 2025, https://www.polygon.com/2013/2/17/3993058/destiny-bungie-first-look- preview/

work page 2013
[3]

Majd Alkayyal, Simon Malberg, and Georg Groh. 2025. An LLM-Based Decision Support System for Strategic Decision-Making. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 460–464

work page 2025
[4]

Andrew Begemann and James Hutson. 2024. Empirical insights into AI-assisted game development: A case study on the integration of generative AI tools in creative pipelines.Metaverse5, 2 (2024)

work page 2024
[5]

Timothy Cain. 2023. Design Pillars. Video recording, YouTube. YouTube video, accessed 05 November 2025, https://www.youtube.com/watch?v=N7b7LFXBZ9M

work page 2023
[6]

Charlie Cleveland. 2019. The Design ofSubnautica. Video recording, Game Developers Conference 2019. YouTube video, accessed 27 October 2025, https://www.youtube.com/watch?v=7R-x9NSBS2Y

work page 2019
[7]

Felipe A Csaszar, Harsh Ketkar, and Hyunjin Kim. 2024. Artificial intelligence and strategic decision-making: Evidence from entrepreneurs and investors.Strategy Science9, 4 (2024), 322–345

work page 2024
[8]

Rob Davis. 2018. The Level Design ofGod of War. Video recording, Game Developers Conference 2018. YouTube video, accessed 27 October 2025, https://www.youtube.com/watch?v=eSB29qx6sWw

work page 2018
[9]

2013.100 Principles of Game Design

Wendy Despain (Ed.). 2013.100 Principles of Game Design. New Riders (an imprint of Peachpit, a division of Pearson Education), Berkeley, CA

work page 2013
[10]

Anil R Doshi, J Jason Bell, Emil Mirzayev, and Bart S Vanneste. 2025. Generative artificial intelligence and evaluating strategic decisions.Strategic Management Journal46, 3 (2025), 583–610

work page 2025
[11]

Daniel Dyrda and Gudrun Klinker. 2025. Toward a Game Design Engineering Process Centered on Player Experience. In2025 IEEE Conference on Games (CoG). IEEE, 1–4

work page 2025
[12]

Daniel Dyrda, Felipe Wink Rodrigues Lucas, Martin Schacherbauer, Chrysa Bika, Julian Geheeb, and Johanna Pirker. 2026. Game Design Pillars: Between Concept and Practice. InProceedings of the Foundations of Digital Games Conference (FDG ’26). Association for Computing Machinery, Copenhagen, Denmark. doi:10.1145/ 3815598.3815686

work page arXiv 2026
[13]

Eva Eigner and Thorsten Händler. 2024. Determinants of llm-assisted decision- making.arXiv preprint arXiv:2402.17385(2024)

work page arXiv 2024
[14]

Roberto Gallotta, Graham Todd, Marvin Zammit, Sam Earle, Antonios Liapis, Julian Togelius, and Georgios N Yannakakis. 2024. Large language models and games: A survey and roadmap.IEEE Transactions on Games(2024)

work page 2024
[15]

Julian Geheeb, Farhan Abid Ivan, Daniel Dyrda, Miriam Anschütz, and Georg Groh. 2025. Diamonds in the rough: Transforming SPARCs of imagination into a game concept by leveraging medium sized LLMs. (October 2025)

work page 2025
[16]

Kris Graft. 2012. The Devil’s Workshop: An Interview with Diablo III’s Jay Wilson. Game Developer(14 May 2012). https://www.gamedeveloper.com/design/the- devil-s-workshop-an-interview-with-i-diablo-iii-i-s-jay-wilson Accessed: 2025- 11-05

work page 2012
[17]

2024.The Fantastic Four of System Design: Scalability, A vailability, Reliability and Performance

Joshua Idunnu Paul. 2024.The Fantastic Four of System Design: Scalability, A vailability, Reliability and Performance. https://cybernerdie.medium.com/the- fantastic-four-of-system-design-scalability-availability-reliability-and- performance-ef247cd4bd2c Accessed: 2025-11-20

work page 2024
[18]

Paradox Interactive. n.d.. Game Pillars – What makes a game a Paradox game. Web page. https://www.paradoxinteractive.com/our-company/our-business/ game-pillars (accessed 27 October 2025)

work page 2025
[19]

Anna Kalyuzhnaya, Sergey Mityagin, Elizaveta Lutsenko, Andrey Getmanov, Yaroslav Aksenkin, Kamil Fatkhiev, Kirill Fedorin, Nikolay O Nikitin, Natalia Chichkova, Vladimir Vorona, et al. 2025. LLM Agents for Smart City Management: Enhancing Decision Support Through Multi-Agent AI Systems.Smart Cities (2624-6511)8, 1 (2025)

work page 2025
[20]

2021.Game Pillars: Set Limits To Your Game Direction To Focus Your Design

Kara. 2021.Game Pillars: Set Limits To Your Game Direction To Focus Your Design. https://www.whalebraindesign.com/newsletter/what-are-game-pillars Accessed: 2025-11-04

work page 2021
[21]

Tim Keenan. 2017. Finding Duskers: Innovation Through Better Design Pillars. Video recording, Game Developers Conference 2017. YouTube video, accessed 27 October 2025, https://www.youtube.com/watch?v=kzQDVtysXjA

work page 2017
[22]

Gorm Lai, William Latham, and Frederic Fol Leymarie. 2020. Towards friendly mixed initiative procedural content generation: Three pillars of industry. In Proceedings of the 15th International Conference on the Foundations of Digital Games. 1–4

work page 2020
[23]

François Lapikas. 2017. Reimagining a Classic: The Design Chal- lenges ofDeus Ex: Human Revolution. Video recording, Game Devel- opers Conference 2012. YouTube video, accessed 05 November 2025, https://www.youtube.com/watch?v=I5wwviUJV9M

work page 2017
[24]

David Ledo, Steven Houben, Jo Vermeulen, Nicolai Marquardt, Lora Oehlberg, and Saul Greenberg. 2018. Evaluation strategies for HCI toolkit research. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–17

work page 2018
[25]

J Lee, So-Youn Eom, and J Lee. 2023. Empowering game designers with generative AI.IADIS International Journal on Computer Science & Information Systems18, 2 (2023), 213–230

work page 2023
[26]

LING Long, CHEN Xinyi, WEN Ruoyu, LI Toby Jia-Jun, and LC Ray. 2024. Sketchar: supporting character design and illustration prototyping using gener- ative AI.Proceedings of the ACM on Human-Computer Interaction8, CHI PLAY (2024), 337

work page 2024
[27]

Sebastian Lubos, Alexander Felfernig, Damian Garber, Viet-Man Le, Manuel Henrich, Reinhard Willfort, and Jeremias Fuchs. 2025. Towards Group Decision Support with LLM-based Meeting Analysis. InAdjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization. 331–335

work page 2025
[28]

Vinson Luo, Lawrence J Klinkert, Paul Foster, Ching-Yu Tseng, Elizabeth Adams, Leanne Ketterlin-Geller, Eric C Larson, and Corey Clark. 2021. A multidisciplinary approach To designing immersive gameplay elements for learning standard- based educational content. InExtended Abstracts of the 2021 Annual Symposium on Computer-Human Interaction in Play. 67–73

work page 2021
[29]

Chaelim Park, Hayoung Lee, Seonghee Lee, and Okran Jeong. 2025. Synergistic joint model of knowledge graph and llm for enhancing xai-based clinical decision support systems.Mathematics13, 6 (2025), 949

work page 2025
[30]

Max Pears. 2017. Design Pillars – The Core of Your Game. Web page. https://www. maxpears.com/2017/09/02/design-pillars-the-core-of-your-game/ (accessed 27 Oct 2025)

work page 2017
[31]

Penny Sweetser. 2024. Large language models and video games: A preliminary scoping review. InProceedings of the 6th ACM Conference on Conversational User Interfaces. 1–8

work page 2024
[32]

Celia Wagar. 2023. Game Design Pillars: What Are They and How to Practically Apply Them. Web page. https://gamedesignskills.com/game-design/design- pillars/ (accessed 27 October 2025)

work page 2023
[33]

What if?

Jose Zagal. 2023. Considering Large Student Teams in Game Development Education: A Post-Mortem. InConference Proceedings of DiGRA 2023 Conference: Limits and Margins of Games Settings. FDG ’26, August 10–13, 2026, Copenhagen, Denmark Geheeb et al. A Game Design Pillar Dataset Table 4: Game: Subnautica, Credibility: high [6] (Timestamp 10:00) Title Descrip...

work page 2023
[34]

The name does not match the description

work page
[37]

Name: %s Description: %s For each feedback limit your answer to one sentence

The description uses bullet points or lists. Name: %s Description: %s For each feedback limit your answer to one sentence. Answer as if you were talking directly to the designer. B.2 Pillar Improvement Prompt Improve the following Game Design Pillar. Check for structural issues regarding the following points:

work page
[38]

The title does not match the description

work page
[39]

The intent of the pillar is not clear

work page
[40]

The pillar focuses on more than one aspect

work page
[41]

Pillar Title: %s Pillar Description: %s Rewrite erroneous parts of the pillar and return a new pillar object

The description uses bullet points or lists. Pillar Title: %s Pillar Description: %s Rewrite erroneous parts of the pillar and return a new pillar object. B.3 Pillar Completeness Prompt Assume the role of a game design expert. Evaluate if the following Game Design Pillars are a good fit for the game idea, explain why. Also check if the pillar contradicts ...

work page 2025