AI Native Games: A Survey and Roadmap

Clark Verbrugge; Fandi Meng; Jian Zhao; Kaijie Xu; Simon Lucas; Zhiyue Xu

arxiv: 2607.00527 · v1 · pith:4ELUVY7Bnew · submitted 2026-07-01 · 💻 cs.AI

AI Native Games: A Survey and Roadmap

Zhiyue Xu , Fandi Meng , Kaijie Xu , Clark Verbrugge , Simon Lucas , Jian Zhao This is my paper

Pith reviewed 2026-07-02 12:57 UTC · model grok-4.3

classification 💻 cs.AI

keywords AI-native gamesgenerative AIgame designcore gameplay looptaxonomynarrative gamesprocedural generationmechanical invariants

0 comments

The pith

AI-native games are those where runtime generative AI is essential to the core play loop, as removing it would collapse or alter the central form of play.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines AI-native games by a counterfactual test: runtime generative AI must be constitutive of the core loop, such that its removal or trivial replacement would make the central form of play collapse or change fundamentally. This criterion is used to screen candidates and analyze 53 publicly available games and prototypes. A dual-axis taxonomy is introduced to classify them by game type on one axis and dominant AI mechanic on the other. The analysis shows concentration in language-forward designs, identifies the problem of turning semantic openness into stable gameplay, and provides a roadmap for future work.

Core claim

AI-native games are defined by whether runtime generative AI is constitutive of the core loop, separated from AI-augmented games and other forms by the test that removing or trivially replacing the AI component would collapse or fundamentally change the central form of play. Screening yields 53 examples that cluster around language-forward designs such as narrative adventure, epistemic interaction, and generative narrative, while other categories remain less represented. The central design problem is organizing semantic openness into stable gameplay through mechanical invariants of goals, rules, state, feedback, pacing, and player agency.

What carries the argument

The counterfactual criterion that determines whether runtime generative AI is constitutive of the core loop by checking if its removal or trivial replacement would collapse or fundamentally change the central form of play.

If this is right

The current corpus is concentrated in language-forward designs such as narrative adventure and generative narrative.
Categories such as semantic adjudication, multi-agent simulation, generative construction, and relationship play are underrepresented.
Mechanical invariants of goals, rules, state, feedback, pacing, and player agency are required to make open-ended AI outputs interpretable and consequential.
Development priorities include controllable generation, AI-as-mechanic design, multimodal and multi-agent systems, inference economics, evaluation, safety, and regulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The definition could guide developers on when generative AI should be deeply integrated rather than added as a feature.
The focus on semantic openness may apply to interactive systems outside games that rely on runtime generation.
The roadmap's emphasis on inference economics points to practical barriers for widespread adoption of these games.
Safety and regulation issues could affect public deployment of games that depend on open-ended AI outputs.

Load-bearing premise

The counterfactual test of removing or trivially replacing the AI can be applied consistently across games without subjective judgment or selection bias.

What would settle it

A set of games where independent analysts reach conflicting conclusions about whether removing the generative AI changes the core play loop would show the test cannot be applied reliably.

Figures

Figures reproduced from arXiv: 2607.00527 by Clark Verbrugge, Fandi Meng, Jian Zhao, Kaijie Xu, Simon Lucas, Zhiyue Xu.

**Figure 1.** Figure 1: Representative roadmap of AI-native games and adjacent artifacts from early interactive drama to runtime generative AI systems. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Distribution of the dated corpus (n=53) by game type (G). [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of the dated corpus (n=53) by dominant AI mechanic [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Cross matrix of game type (G, columns) and dominant AI mechanic (N, rows) over the 53 dated artifacts. Cell values are game counts; [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Generative AI now enables games to produce dialogue, quests, characters, images, and worlds at runtime. Yet generation alone does not make a game AI-native, nor does it guarantee playability. This paper defines AI-native games by whether runtime generative AI is constitutive of the core loop: if the AI component were removed or trivially replaced, the central form of play would collapse or become fundamentally different. This counterfactual criterion separates AI-native games from AI-augmented games, boundary artifacts, chatbots, tavern-style role-play, procedural content generation, and AI-assisted production. Using this definition, we screen candidate artifacts and analyze 53 publicly available AI-native games and prototypes. We introduce a dual-axis G/N taxonomy: the G-axis captures player-facing game type, while the N-axis captures the dominant AI mechanic that makes generative AI indispensable to play. The corpus is concentrated around language-forward designs, especially narrative adventure, epistemic interaction, and generative narrative, while categories such as semantic adjudication, multi-agent simulation, generative construction, and relationship/companion play remain less represented. We argue that the central design problem is organizing semantic openness into stable gameplay. AI-native design depends on mechanical invariants: goals, rules, state, feedback, pacing, and player agency that make open-ended AI outputs interpretable and consequential. We conclude with a roadmap for controllable generation, AI-as-mechanic design, multimodal and multi-agent systems, inference economics, evaluation, safety, and regulation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines AI-native games via a counterfactual on the core loop and maps 53 examples with a G/N taxonomy, but the test's application looks subjective and the screening details are thin.

read the letter

The main point is that this survey carves out AI-native games as those where runtime generative AI is essential to the central play experience. Remove or trivially swap the AI and the game changes in a fundamental way. They apply this filter to 53 publicly available examples and lay out a dual-axis taxonomy: one axis for the player-facing game type, the other for the dominant AI mechanic that makes the generation indispensable.

It organizes the space reasonably by noting the heavy concentration in language-based narrative and epistemic designs while flagging under-represented areas like multi-agent simulation and generative construction. The discussion of mechanical invariants—goals, rules, feedback, agency—needed to keep open-ended outputs playable is a useful reminder for designers. The roadmap sections on controllable generation, inference costs, and safety are direct and practical.

The soft spot is the counterfactual itself. Figuring out what counts as the core loop and whether a replacement is trivial seems to rest on judgment calls that the abstract does not operationalize with explicit, reproducible rules. In a corpus of 53 games this risks inconsistent assignments or selection effects, and the screening process is not described enough to check for bias. As a survey it introduces no new measurements or derivations, which is expected but limits how much weight the taxonomy can carry without further validation.

This is for game researchers and designers working on generative systems who want a current map and a set of open problems. A reader looking for terminology and example clusters would get value from it. It deserves peer review because the framing could help structure work in the area, though the classification criteria would need tightening.

Referee Report

2 major / 2 minor

Summary. The paper defines AI-native games via a counterfactual test: runtime generative AI is constitutive of the core loop if its removal or trivial replacement would collapse or fundamentally alter the central form of play. This separates AI-native games from AI-augmented ones and related categories. The authors apply the definition to screen and analyze a corpus of 53 publicly available games/prototypes, introduce a dual-axis G/N taxonomy (G-axis for player-facing game type, N-axis for dominant AI mechanic), observe concentration in language-forward designs (narrative adventure, epistemic interaction, generative narrative), and outline a roadmap addressing controllable generation, AI-as-mechanic design, multimodal/multi-agent systems, inference economics, evaluation, safety, and regulation.

Significance. If the definition and taxonomy can be applied reproducibly, the work supplies a needed conceptual boundary for an emerging subfield, grounds it in an empirical corpus of 53 artifacts, and identifies the core design challenge of turning semantic openness into stable, interpretable gameplay. The roadmap surfaces concrete open problems (e.g., mechanical invariants for agency and feedback) that could guide subsequent technical and design research.

major comments (2)

[Definition and screening process] Definition and screening process (abstract; the section introducing the counterfactual criterion): the test for whether generative AI is 'constitutive of the core loop' is not accompanied by explicit, reproducible operational criteria for identifying the core loop, assessing 'trivial replacement,' or determining when play 'collapses or becomes fundamentally different.' This judgment is load-bearing for the classification of all 53 games and for the subsequent claim that the corpus is concentrated in particular G/N categories.
[Corpus construction and taxonomy application] Corpus construction and taxonomy application (the section describing the 53-game analysis and G/N taxonomy): no details are provided on how boundary cases were resolved, whether multiple annotators were used, or what inter-rater agreement was obtained when assigning games to G-axis and N-axis categories. Without such information the reported concentration around language-forward designs cannot be evaluated for selection or assignment bias.

minor comments (2)

[Abstract / screening description] The abstract states that the definition 'separates AI-native games from AI-augmented games, boundary artifacts, chatbots, tavern-style role-play, procedural content generation, and AI-assisted production,' but the manuscript does not include a dedicated table or appendix listing the screened-but-excluded candidates and the reasons for exclusion.
[Taxonomy introduction] Notation for the G/N taxonomy is introduced without an accompanying figure that visually maps the two axes and populates them with example games from the corpus.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important opportunities to strengthen the reproducibility of our definition and analysis. We address each major comment below and commit to revisions that improve transparency without altering the core contributions.

read point-by-point responses

Referee: [Definition and screening process] Definition and screening process (abstract; the section introducing the counterfactual criterion): the test for whether generative AI is 'constitutive of the core loop' is not accompanied by explicit, reproducible operational criteria for identifying the core loop, assessing 'trivial replacement,' or determining when play 'collapses or becomes fundamentally different.' This judgment is load-bearing for the classification of all 53 games and for the subsequent claim that the corpus is concentrated in particular G/N categories.

Authors: We agree that the counterfactual criterion would benefit from explicit operational criteria to support reproducible application. The manuscript presents the definition conceptually, which is typical for introducing a new category in a survey, but the referee correctly identifies that this leaves room for subjective judgment in screening. In the revised manuscript we will expand the relevant section with operational guidelines: (1) core loop identification via primary player actions, goals, and win/lose conditions; (2) trivial replacement assessment by testing whether static or rule-based substitutes preserve equivalent play dynamics; (3) collapse determination via loss of essential features such as runtime semantic adaptation or branching. We will include worked examples from the corpus to illustrate application. revision: yes
Referee: [Corpus construction and taxonomy application] Corpus construction and taxonomy application (the section describing the 53-game analysis and G/N taxonomy): no details are provided on how boundary cases were resolved, whether multiple annotators were used, or what inter-rater agreement was obtained when assigning games to G-axis and N-axis categories. Without such information the reported concentration around language-forward designs cannot be evaluated for selection or assignment bias.

Authors: We acknowledge the lack of methodological detail on the annotation process. Screening and G/N assignment were conducted by the author team via iterative discussion and consensus, with boundary cases resolved by returning to the counterfactual test. No formal multi-annotator protocol or inter-rater statistics were used, as this is an author-driven survey rather than a crowdsourced annotation effort. In revision we will add a dedicated 'Screening and Taxonomy Methodology' subsection describing the process, boundary-case resolution examples, the complete classification table for all 53 artifacts, and an explicit discussion of potential biases. The full list of games and assignments will be released publicly to enable external verification. While we cannot retroactively compute inter-rater agreement, these additions will substantially improve evaluability of the concentration claims. revision: partial

Circularity Check

0 steps flagged

No circularity: purely definitional survey with no derivations, fits, or self-referential predictions

full rationale

The paper advances a counterfactual definition of AI-native games and applies it to classify a corpus of 53 examples. No equations, fitted parameters, or predictive claims appear; the work is classificatory and taxonomic. The definition is stated explicitly rather than derived from prior results, and the screening process is presented as an application of that definition without reduction to self-citation chains or constructed inputs. This matches the default expectation of no significant circularity for non-mathematical survey papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper that introduces a definition and taxonomy based on analysis of existing games. No free parameters, mathematical axioms, or invented entities are present.

pith-pipeline@v0.9.1-grok · 5801 in / 1025 out tokens · 16199 ms · 2026-07-02T12:57:36.464230+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

104 extracted references · 65 canonical work pages · 27 internal anchors

[1]

Shaker, J

N. Shaker, J. Togelius, and M. J. Nelson,Procedural Content Generation in Games. Springer, 2016. [Online]. Available: https://link.springer.com/book/10.1007/978-3-319-42716-4

work page doi:10.1007/978-3-319-42716-4 2016
[2]

Large language models and games: A survey and roadmap,

R. Gallotta, G. Todd, M. Zammit, S. Earle, A. Liapis, J. Togelius, and G. N. Yannakakis, “Large language models and games: A survey and roadmap,”IEEE Transactions on Games,
[3]

Available: https://ieeexplore.ieee.org/abstract/ document/10680313/

[Online]. Available: https://ieeexplore.ieee.org/abstract/ document/10680313/

work page arXiv
[4]

Gpt for games: A scoping review (2020–2023),

D. Yang, E. Kleinman, and C. Harteveld, “Gpt for games: A scoping review (2020–2023),” in2024 IEEE Conference on Games. IEEE, 2024, pp. 1–8. [Online]. Available: https: //ieeexplore.ieee.org/abstract/document/10645548/

work page arXiv 2020
[5]

Procedural content generation in games: A survey with insights on emerging llm integration,

M. F. Maleki and R. Zhao, “Procedural content generation in games: A survey with insights on emerging llm integration,” inProceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 20, no. 1, 2024, pp. 167–178

2024
[6]

Language as reality: A co-creative storytelling game experience in 1001 nights using generative ai,

Y. Sun, Z. Li, K. Fang, C. H. Lee, and A. Asadipour, “Language as reality: A co-creative storytelling game experience in 1001 nights using generative ai,” 2023

2023
[7]

Mda: A formal approach to game design and game research,

R. Hunicke, M. LeBlanc, and R. Zubek, “Mda: A formal approach to game design and game research,” inProceedings of the AAAI Workshop on Challenges in Game AI, 2004. [Online]. Available: https://users.cs.northwestern.edu/~hunicke/MDA.pdf

2004
[8]

Gameflow: A model for evaluating player enjoyment in games,

P . Sweetser and P . Wyeth, “Gameflow: A model for evaluating player enjoyment in games,”Computers in Entertainment, vol. 3, no. 3, pp. 3–3, 2005

2005
[9]

Large language models and video games: A prelim- inary scoping review,

P . Sweetser, “Large language models and video games: A prelim- inary scoping review,” inACM Conversational User Interfaces 2024. ACM, 2024, pp. 1–8

2024
[10]

G. N. Yannakakis and J. Togelius,Artificial Intelligence and Games. Springer, 2018. [Online]. Available: https://link.springer.com/ book/10.1007/978-3-319-63519-4

work page doi:10.1007/978-3-319-63519-4 2018
[11]

On mixed-initiative content creation for video games,

G. Lai, F. F. Leymarie, and W. Latham, “On mixed-initiative content creation for video games,”IEEE Transactions on Games, vol. 14, no. 4, pp. 543–557, 2022

2022
[12]

Gpt for games: An updated scoping review (2020–2024),

D. Yang, E. Kleinman, and C. Harteveld, “Gpt for games: An updated scoping review (2020–2024),”IEEE Transactions on Games,

2020
[13]

Available: https://ieeexplore.ieee.org/abstract/ document/10974629/

[Online]. Available: https://ieeexplore.ieee.org/abstract/ document/10974629/

work page arXiv
[14]

Millington and J

I. Millington and J. Funge,Artificial Intelligence for Games, 2nd ed. Morgan Kaufmann, 2009. [Online]. Available: https://www.routledge.com/Artificial-Intelligence-for-Games/ Millington-Funge/p/book/9780123747310

work page arXiv 2009
[15]

The case for dynamic difficulty adjustment in games,

R. Hunicke, “The case for dynamic difficulty adjustment in games,” inProceedings of the International Conference on Advances in Computer Entertainment Technology, 2005, pp. 429–433

2005
[16]

Interactive narrative: An intelligent systems approach,

M. O. Riedl and V . Bulitko, “Interactive narrative: An intelligent systems approach,”AI Magazine, vol. 34, no. 1, pp. 67–77, 2013. [Online]. Available: https://ojs.aaai.org/aimagazine/index.php/ aimagazine/article/view/2449

2013
[17]

Structuring content in the façade interactive drama architecture,

M. Mateas and A. Stern, “Structuring content in the façade interactive drama architecture,” inProceedings of the First AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. AAAI Press, 2005, pp. 93–98. [Online]. Available: https://cdn.aaai.org/AIIDE/2005/AIIDE05-016.pdf

2005
[18]

A behavior language for story-based believable agents,

——, “A behavior language for story-based believable agents,” inIEEE Intelligent Systems, 2002. [Online]. Available: https: //doi.org/10.1109/MIS.2002.1024751 13

work page doi:10.1109/mis.2002.1024751 2002
[19]

Search-based procedural content generation: A taxonomy and survey,

J. Togelius, G. N. Yannakakis, K. O. Stanley, and C. Browne, “Search-based procedural content generation: A taxonomy and survey,” inIEEE Transactions on Computational Intelligence and AI in Games, vol. 3, no. 3, 2011, pp. 172–186

2011
[20]

Procedural content generation for games: A survey,

M. Hendrikx, S. Meijer, J. V . D. Velden, and A. Iosup, “Procedural content generation for games: A survey,”ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 9, no. 1, pp. 1–22, 2013

2013
[21]

Procedural Content Generation via Machine Learning (PCGML)

A. Summerville, S. Snodgrass, M. Guzdial, C. Holmgård, A. K. Hoover, A. Isaksen, A. Nealen, and J. Togelius, “Procedural content generation via machine learning (pcgml),” 2018. [Online]. Available: https://arxiv.org/abs/1702.00539

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Experience-driven procedu- ral content generation,

G. N. Yannakakis and J. Togelius, “Experience-driven procedu- ral content generation,”IEEE Transactions on Affective Computing, vol. 2, no. 3, pp. 147–161, 2011

2011
[23]

The ai systems of left 4 dead,

Valve, “The ai systems of left 4 dead,” 2009, valve developer materials on the AI Director and pacing systems. [Online]. Available: https://steamcdn-a.akamaihd.net/apps/valve/2009/ ai_systems_of_l4d_mike_booth.pdf

2009
[24]

Expressive ai: A hybrid art and science practice,

M. Mateas, “Expressive ai: A hybrid art and science practice,” Leonardo, vol. 34, no. 2, pp. 137–139, 2001. [Online]. Available: https://doi.org/10.1162/002409401750184690

work page doi:10.1162/002409401750184690 2001
[25]

Façade: An experiment in building a fully-realized interactive drama,

M. Mateas and A. Stern, “Façade: An experiment in building a fully-realized interactive drama,” inGame Developers Conference, 2003

2003
[26]

Narrative planning: Balancing plot and character,

M. O. Riedl and R. M. Young, “Narrative planning: Balancing plot and character,”Journal of Artificial Intelligence Research, vol. 39, pp. 217–268, 2010. [Online]. Available: https://doi.org/10.1613/jair. 2989

work page doi:10.1613/jair 2010
[27]

Pcg-based game design patterns,

M. Cook, M. Eladhari, A. Nealen, M. Treanor, E. Boxerman, A. Jaffe, P . Sottosanti, and S. Swink, “Pcg-based game design patterns,” 2016. [Online]. Available: https://arxiv.org/abs/1610. 03138

2016
[28]

Level generation through large language models,

G. Todd, S. Earle, M. U. Nasir, M. C. Green, and J. Togelius, “Level generation through large language models,” inProceedings of the International Conference on the Foundations of Digital Games,
[29]

Available: https://arxiv.org/abs/2302.05817

[Online]. Available: https://arxiv.org/abs/2302.05817

work page arXiv
[30]

Mariogpt: Open-ended text2level generation through large language models,

S. Sudhakaran, M. González-Duque, C. Glanois, M. Freiberger, E. Najarro, and S. Risi, “Mariogpt: Open-ended text2level generation through large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05981

work page arXiv 2023
[31]

Ai dungeon 2,

Latitude, “Ai dungeon 2,” 2019. [Online]. Available: https: //github.com/latitudegames/AIDungeon

2019
[32]

Infinite craft,

N. Agarwal, “Infinite craft,” 2024. [Online]. Available: https: //neal.fun/infinite-craft/

2024
[33]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P . Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” 2023. [Online]. Available: https://arxiv.org/abs/2304. 03442

2023
[34]

Player-driven emergence in llm-driven game narrative,

X. Peng, J. Quaye, S. Rao, W. Xu, P . Botchway, C. Brockett, N. Jojic, G. DesGarennes, K. Lobb, M. Xu, J. Leandro, C. Jin, and B. Dolan, “Player-driven emergence in llm-driven game narrative,” 2024. [Online]. Available: https://arxiv.org/abs/2404.17027

work page arXiv 2024
[35]

Hacc-man: An arcade game for jailbreaking llms,

M. Valentim, J. Falk, and N. Inie, “Hacc-man: An arcade game for jailbreaking llms,” inDesigning Interactive Systems Conference, ser. DIS ’24. ACM, 2024, p. 338–341. [Online]. Available: http://dx.doi.org/10.1145/3656156.3665432

work page doi:10.1145/3656156.3665432 2024
[36]

Exploring presence in interactions with llm-driven npcs: A comparative study of speech recognition and dialogue options,

F. R. Christiansen, L. N. Hollensberg, N. B. Jensen, K. Julsgaard, K. N. Jespersen, and I. Nikolov, “Exploring presence in interactions with llm-driven npcs: A comparative study of speech recognition and dialogue options,” inProceedings of the 30th ACM Symposium on Virtual Reality Software and Technology, ser. VRST ’24. New York, NY, USA: Association for ...

work page doi:10.1145/3641825.3687716 2024
[37]

Uncover the smoking gun,

ReLU Games, “Uncover the smoking gun,” 2024. [On- line]. Available: https://store.steampowered.com/app/2492290/ Uncover_the_Smoking_Gun/

work page arXiv 2024
[38]

friendsfables,

Side Quest Labs, “friendsfables,” 2023. [Online]. Available: https://fables.gg/

2023
[39]

gandalf,

Lakera, “gandalf,” 2023. [Online]. Available: https://gandalf. lakera.ai/baseline

2023
[40]

Historical simulator:chongzhen,

Qinggan Workshop, “Historical simulator:chongzhen,” 2026. [On- line]. Available: https://store.steampowered.com/app/4304230/

work page arXiv 2026
[41]

aivilization,

HKUST, “aivilization,” 2025. [Online]. Available: https:// aivilization.ai

2025
[42]

artimpostor,

Pocketpair, “artimpostor,” 2022. [Online]. Available: https: //store.steampowered.com/app/2154230/

work page arXiv 2022
[43]

More than words,

Soul Shell, “More than words,” 2023. [Online]. Available: https: //store.steampowered.com/app/2285280/More_than_words/

work page arXiv 2023
[44]

devnullstower,

aieuo, “devnullstower,” 2026. [Online]. Available: https://store. steampowered.com/app/4350940/Dev_Nulls_Tower/

work page arXiv 2026
[45]

Why video game genres fail: A classificatory analysis,

R. Clarke, J. Lee, and N. Clark, “Why video game genres fail: A classificatory analysis,”Games and Culture, vol. 12, 07 2015

2015
[46]

Using thematic analysis in psychology,

V . Braun and V . Clarke, “Using thematic analysis in psychology,” Qualitative Research in Psychology, vol. 3, no. 2, pp. 77–101, 2006. [Online]. Available: https://doi.org/10.1191/1478088706qp063oa

work page doi:10.1191/1478088706qp063oa 2006
[47]

Vaudeville,

Bumblebee Studios, “Vaudeville,” 2023. [Online]. Available: https://store.steampowered.com/app/2240920/Vaudeville/

work page arXiv 2023
[48]

[Online]

Proxima, “suckup,” 2023. [Online]. Available: https://store. steampowered.com/app/2726370/Suck_Up/

work page arXiv 2023
[49]

Hidden door,

Hidden Door, “Hidden door,” 2025. [Online]. Available: https://www.hiddendoor.co/

2025
[50]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,”
[51]

High-Resolution Image Synthesis with Latent Diffusion Models

[Online]. Available: https://arxiv.org/abs/2112.10752

work page internal anchor Pith review Pith/arXiv arXiv
[52]

Robust Speech Recognition via Large-Scale Weak Supervision

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large- scale weak supervision,” 2022. [Online]. Available: https: //arxiv.org/abs/2212.04356

work page internal anchor Pith review Pith/arXiv arXiv 2022
[53]

whispersstar,

Anuttacon, “whispersstar,” 2025. [Online]. Avail- able: https://store.steampowered.com/app/3730100/Whispers_ from_the_Star/

work page arXiv 2025
[54]

Self-refine: Iterative refinement with self-feedback,

A. Madaan, N. Tandon, P . Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang, S. Gupta, B. P . Majumder, K. Hermann, S. Welleck, A. Yazdanbakhsh, and P . Clark, “Self-refine: Iterative refinement with self-feedback,”
[55]

Self-Refine: Iterative Refinement with Self-Feedback

[Online]. Available: https://arxiv.org/abs/2303.17651

work page internal anchor Pith review Pith/arXiv arXiv
[56]

Efficient Guided Generation for Large Language Models

B. T. Willard and R. Louf, “Efficient guided generation for large language models,” 2023. [Online]. Available: https: //arxiv.org/abs/2307.09702

work page internal anchor Pith review Pith/arXiv arXiv 2023
[57]

Grammar- constrained decoding for structured nlp tasks without finetuning,

S. Geng, M. Josifoski, M. Peyrard, and R. West, “Grammar- constrained decoding for structured nlp tasks without finetuning,”
[58]

Available: https://arxiv.org/abs/2305.13971

[Online]. Available: https://arxiv.org/abs/2305.13971

work page arXiv
[59]

Toolformer: Language Models Can Teach Themselves to Use Tools

T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” 2023. [Online]. Available: https://arxiv.org/abs/2302.04761

work page internal anchor Pith review Pith/arXiv arXiv 2023
[60]

React: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models,” 2023. [Online]. Available: https://arxiv.org/abs/2210. 03629

2023
[61]

Answer set programming for procedural content generation: A design space approach,

A. M. Smith and M. Mateas, “Answer set programming for procedural content generation: A design space approach,”IEEE Transactions on Computational Intelligence and AI in Games, vol. 3, no. 3, pp. 187–200, 2011

2011
[62]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

P . Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. tau Yih, T. Rocktäschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive nlp tasks,” 2021. [Online]. Available: https://arxiv.org/abs/2005.11401

work page internal anchor Pith review Pith/arXiv arXiv 2021
[63]

Training language models to follow instructions with human feedback

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P . Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P . Welinder, P . Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” 2022. [Online]. Available: https://arxiv.org/abs/2203.02155

work page internal anchor Pith review Pith/arXiv arXiv 2022
[64]

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” 2024. [Online]. Available: https://arxiv.org/abs/2305.18290

work page internal anchor Pith review Pith/arXiv arXiv 2024
[65]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P . Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging llm-as-a-judge with mt-bench and chatbot arena,” 2023. [Online]. Available: https: //arxiv.org/abs/2306.05685

work page internal anchor Pith review Pith/arXiv arXiv 2023
[66]

Human-level performance in no-press diplomacy via equilibrium search,

J. Gray, A. Lerer, A. Bakhtin, and N. Brown, “Human-level performance in no-press diplomacy via equilibrium search,” 2021. [Online]. Available: https://arxiv.org/abs/2010.02923

work page arXiv 2021
[67]

onespellfitsall,

YenR, “onespellfitsall,” 2024. [Online]. Available: https://github. com/YenR/OneSpellFitsAll

2024
[68]

Voyager: An Open-Ended Embodied Agent with Large Language Models

G. Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Y. Zhu, L. Fan, and A. Anandkumar, “Voyager: An open-ended embodied agent with large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2305.16291 14

work page internal anchor Pith review Pith/arXiv arXiv 2023
[69]

Memgpt: Towards llms as operating systems,

C. Packer, S. Wooders, K. Lin, V . Fang, S. G. Patil, I. Stoica, and J. E. Gonzalez, “Memgpt: Towards llms as operating systems,”
[70]

MemGPT: Towards LLMs as Operating Systems

[Online]. Available: https://arxiv.org/abs/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv
[71]

aisociety,

b8ve, “aisociety,” 2026. [Online]. Available: https://store. steampowered.com/app/4468180/AI_Society/

work page arXiv 2026
[72]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” 2015. [Online]. Available: https://arxiv.org/ abs/1503.02531

work page internal anchor Pith review Pith/arXiv arXiv 2015
[73]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “Gptq: Accurate post-training quantization for generative pre-trained transformers,” 2023. [Online]. Available: https://arxiv.org/abs/ 2210.17323

work page internal anchor Pith review Pith/arXiv arXiv 2023
[74]

QLoRA: Efficient Finetuning of Quantized LLMs

T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “Qlora: Efficient finetuning of quantized llms,” 2023. [Online]. Available: https://arxiv.org/abs/2305.14314

work page internal anchor Pith review Pith/arXiv arXiv 2023
[75]

Fast Inference from Transformers via Speculative Decoding

Y. Leviathan, M. Kalman, and Y. Matias, “Fast inference from transformers via speculative decoding,” 2023. [Online]. Available: https://arxiv.org/abs/2211.17192

work page internal anchor Pith review Pith/arXiv arXiv 2023
[76]

Efficient Memory Management for Large Language Model Serving with PagedAttention

W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, J. E. Gonzalez, H. Zhang, and I. Stoica, “Efficient memory management for large language model serving with pagedattention,” 2023. [Online]. Available: https://arxiv.org/abs/2309.06180

work page internal anchor Pith review Pith/arXiv arXiv 2023
[77]

airoguelite,

Max Loh, “airoguelite,” 2024. [Online]. Available: https: //store.steampowered.com/app/1889620/AI_Roguelite/

work page arXiv 2024
[78]

skaldsong,

Fenris Labs, “skaldsong,” 2025. [Online]. Available: https: //store.steampowered.com/app/3808550/Skaldsong/

work page arXiv 2025
[79]

How is chatgpt’s behavior changing over time?

L. Chen, M. Zaharia, and J. Zou, “How is chatgpt’s behavior changing over time?” 2023. [Online]. Available: https://arxiv.org/abs/2307.09009

work page arXiv 2023
[80]

Holistic Evaluation of Language Models

P . Liang, R. Bommasani, T. Lee, D. Tsipras, D. Soylu, M. Yasunaga, Y. Zhang, D. Narayanan, Y. Wu, A. Kumar, B. Newman, B. Yuan, B. Yan, C. Zhang, C. Cosgrove, C. D. Manning, C. Ré, D. Acosta-Navas, D. A. Hudson, E. Zelikman, E. Durmus, F. Ladhak, F. Rong, H. Ren, H. Yao, J. Wang, K. Santhanam, L. Orr, L. Zheng, M. Yuksekgonul, M. Suzgun, N. Kim, N. Guha,...

work page internal anchor Pith review Pith/arXiv arXiv 2023

Showing first 80 references.

[1] [1]

Shaker, J

N. Shaker, J. Togelius, and M. J. Nelson,Procedural Content Generation in Games. Springer, 2016. [Online]. Available: https://link.springer.com/book/10.1007/978-3-319-42716-4

work page doi:10.1007/978-3-319-42716-4 2016

[2] [2]

Large language models and games: A survey and roadmap,

R. Gallotta, G. Todd, M. Zammit, S. Earle, A. Liapis, J. Togelius, and G. N. Yannakakis, “Large language models and games: A survey and roadmap,”IEEE Transactions on Games,

[3] [3]

Available: https://ieeexplore.ieee.org/abstract/ document/10680313/

[Online]. Available: https://ieeexplore.ieee.org/abstract/ document/10680313/

work page arXiv

[4] [4]

Gpt for games: A scoping review (2020–2023),

D. Yang, E. Kleinman, and C. Harteveld, “Gpt for games: A scoping review (2020–2023),” in2024 IEEE Conference on Games. IEEE, 2024, pp. 1–8. [Online]. Available: https: //ieeexplore.ieee.org/abstract/document/10645548/

work page arXiv 2020

[5] [5]

Procedural content generation in games: A survey with insights on emerging llm integration,

M. F. Maleki and R. Zhao, “Procedural content generation in games: A survey with insights on emerging llm integration,” inProceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 20, no. 1, 2024, pp. 167–178

2024

[6] [6]

Language as reality: A co-creative storytelling game experience in 1001 nights using generative ai,

Y. Sun, Z. Li, K. Fang, C. H. Lee, and A. Asadipour, “Language as reality: A co-creative storytelling game experience in 1001 nights using generative ai,” 2023

2023

[7] [7]

Mda: A formal approach to game design and game research,

R. Hunicke, M. LeBlanc, and R. Zubek, “Mda: A formal approach to game design and game research,” inProceedings of the AAAI Workshop on Challenges in Game AI, 2004. [Online]. Available: https://users.cs.northwestern.edu/~hunicke/MDA.pdf

2004

[8] [8]

Gameflow: A model for evaluating player enjoyment in games,

P . Sweetser and P . Wyeth, “Gameflow: A model for evaluating player enjoyment in games,”Computers in Entertainment, vol. 3, no. 3, pp. 3–3, 2005

2005

[9] [9]

Large language models and video games: A prelim- inary scoping review,

P . Sweetser, “Large language models and video games: A prelim- inary scoping review,” inACM Conversational User Interfaces 2024. ACM, 2024, pp. 1–8

2024

[10] [10]

G. N. Yannakakis and J. Togelius,Artificial Intelligence and Games. Springer, 2018. [Online]. Available: https://link.springer.com/ book/10.1007/978-3-319-63519-4

work page doi:10.1007/978-3-319-63519-4 2018

[11] [11]

On mixed-initiative content creation for video games,

G. Lai, F. F. Leymarie, and W. Latham, “On mixed-initiative content creation for video games,”IEEE Transactions on Games, vol. 14, no. 4, pp. 543–557, 2022

2022

[12] [12]

Gpt for games: An updated scoping review (2020–2024),

D. Yang, E. Kleinman, and C. Harteveld, “Gpt for games: An updated scoping review (2020–2024),”IEEE Transactions on Games,

2020

[13] [13]

Available: https://ieeexplore.ieee.org/abstract/ document/10974629/

[Online]. Available: https://ieeexplore.ieee.org/abstract/ document/10974629/

work page arXiv

[14] [14]

Millington and J

I. Millington and J. Funge,Artificial Intelligence for Games, 2nd ed. Morgan Kaufmann, 2009. [Online]. Available: https://www.routledge.com/Artificial-Intelligence-for-Games/ Millington-Funge/p/book/9780123747310

work page arXiv 2009

[15] [15]

The case for dynamic difficulty adjustment in games,

R. Hunicke, “The case for dynamic difficulty adjustment in games,” inProceedings of the International Conference on Advances in Computer Entertainment Technology, 2005, pp. 429–433

2005

[16] [16]

Interactive narrative: An intelligent systems approach,

M. O. Riedl and V . Bulitko, “Interactive narrative: An intelligent systems approach,”AI Magazine, vol. 34, no. 1, pp. 67–77, 2013. [Online]. Available: https://ojs.aaai.org/aimagazine/index.php/ aimagazine/article/view/2449

2013

[17] [17]

Structuring content in the façade interactive drama architecture,

M. Mateas and A. Stern, “Structuring content in the façade interactive drama architecture,” inProceedings of the First AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. AAAI Press, 2005, pp. 93–98. [Online]. Available: https://cdn.aaai.org/AIIDE/2005/AIIDE05-016.pdf

2005

[18] [18]

A behavior language for story-based believable agents,

——, “A behavior language for story-based believable agents,” inIEEE Intelligent Systems, 2002. [Online]. Available: https: //doi.org/10.1109/MIS.2002.1024751 13

work page doi:10.1109/mis.2002.1024751 2002

[19] [19]

Search-based procedural content generation: A taxonomy and survey,

J. Togelius, G. N. Yannakakis, K. O. Stanley, and C. Browne, “Search-based procedural content generation: A taxonomy and survey,” inIEEE Transactions on Computational Intelligence and AI in Games, vol. 3, no. 3, 2011, pp. 172–186

2011

[20] [20]

Procedural content generation for games: A survey,

M. Hendrikx, S. Meijer, J. V . D. Velden, and A. Iosup, “Procedural content generation for games: A survey,”ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 9, no. 1, pp. 1–22, 2013

2013

[21] [21]

Procedural Content Generation via Machine Learning (PCGML)

A. Summerville, S. Snodgrass, M. Guzdial, C. Holmgård, A. K. Hoover, A. Isaksen, A. Nealen, and J. Togelius, “Procedural content generation via machine learning (pcgml),” 2018. [Online]. Available: https://arxiv.org/abs/1702.00539

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

Experience-driven procedu- ral content generation,

G. N. Yannakakis and J. Togelius, “Experience-driven procedu- ral content generation,”IEEE Transactions on Affective Computing, vol. 2, no. 3, pp. 147–161, 2011

2011

[23] [23]

The ai systems of left 4 dead,

Valve, “The ai systems of left 4 dead,” 2009, valve developer materials on the AI Director and pacing systems. [Online]. Available: https://steamcdn-a.akamaihd.net/apps/valve/2009/ ai_systems_of_l4d_mike_booth.pdf

2009

[24] [24]

Expressive ai: A hybrid art and science practice,

M. Mateas, “Expressive ai: A hybrid art and science practice,” Leonardo, vol. 34, no. 2, pp. 137–139, 2001. [Online]. Available: https://doi.org/10.1162/002409401750184690

work page doi:10.1162/002409401750184690 2001

[25] [25]

Façade: An experiment in building a fully-realized interactive drama,

M. Mateas and A. Stern, “Façade: An experiment in building a fully-realized interactive drama,” inGame Developers Conference, 2003

2003

[26] [26]

Narrative planning: Balancing plot and character,

M. O. Riedl and R. M. Young, “Narrative planning: Balancing plot and character,”Journal of Artificial Intelligence Research, vol. 39, pp. 217–268, 2010. [Online]. Available: https://doi.org/10.1613/jair. 2989

work page doi:10.1613/jair 2010

[27] [27]

Pcg-based game design patterns,

M. Cook, M. Eladhari, A. Nealen, M. Treanor, E. Boxerman, A. Jaffe, P . Sottosanti, and S. Swink, “Pcg-based game design patterns,” 2016. [Online]. Available: https://arxiv.org/abs/1610. 03138

2016

[28] [28]

Level generation through large language models,

G. Todd, S. Earle, M. U. Nasir, M. C. Green, and J. Togelius, “Level generation through large language models,” inProceedings of the International Conference on the Foundations of Digital Games,

[29] [29]

Available: https://arxiv.org/abs/2302.05817

[Online]. Available: https://arxiv.org/abs/2302.05817

work page arXiv

[30] [30]

Mariogpt: Open-ended text2level generation through large language models,

S. Sudhakaran, M. González-Duque, C. Glanois, M. Freiberger, E. Najarro, and S. Risi, “Mariogpt: Open-ended text2level generation through large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05981

work page arXiv 2023

[31] [31]

Ai dungeon 2,

Latitude, “Ai dungeon 2,” 2019. [Online]. Available: https: //github.com/latitudegames/AIDungeon

2019

[32] [32]

Infinite craft,

N. Agarwal, “Infinite craft,” 2024. [Online]. Available: https: //neal.fun/infinite-craft/

2024

[33] [33]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P . Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” 2023. [Online]. Available: https://arxiv.org/abs/2304. 03442

2023

[34] [34]

Player-driven emergence in llm-driven game narrative,

X. Peng, J. Quaye, S. Rao, W. Xu, P . Botchway, C. Brockett, N. Jojic, G. DesGarennes, K. Lobb, M. Xu, J. Leandro, C. Jin, and B. Dolan, “Player-driven emergence in llm-driven game narrative,” 2024. [Online]. Available: https://arxiv.org/abs/2404.17027

work page arXiv 2024

[35] [35]

Hacc-man: An arcade game for jailbreaking llms,

M. Valentim, J. Falk, and N. Inie, “Hacc-man: An arcade game for jailbreaking llms,” inDesigning Interactive Systems Conference, ser. DIS ’24. ACM, 2024, p. 338–341. [Online]. Available: http://dx.doi.org/10.1145/3656156.3665432

work page doi:10.1145/3656156.3665432 2024

[36] [36]

Exploring presence in interactions with llm-driven npcs: A comparative study of speech recognition and dialogue options,

F. R. Christiansen, L. N. Hollensberg, N. B. Jensen, K. Julsgaard, K. N. Jespersen, and I. Nikolov, “Exploring presence in interactions with llm-driven npcs: A comparative study of speech recognition and dialogue options,” inProceedings of the 30th ACM Symposium on Virtual Reality Software and Technology, ser. VRST ’24. New York, NY, USA: Association for ...

work page doi:10.1145/3641825.3687716 2024

[37] [37]

Uncover the smoking gun,

ReLU Games, “Uncover the smoking gun,” 2024. [On- line]. Available: https://store.steampowered.com/app/2492290/ Uncover_the_Smoking_Gun/

work page arXiv 2024

[38] [38]

friendsfables,

Side Quest Labs, “friendsfables,” 2023. [Online]. Available: https://fables.gg/

2023

[39] [39]

gandalf,

Lakera, “gandalf,” 2023. [Online]. Available: https://gandalf. lakera.ai/baseline

2023

[40] [40]

Historical simulator:chongzhen,

Qinggan Workshop, “Historical simulator:chongzhen,” 2026. [On- line]. Available: https://store.steampowered.com/app/4304230/

work page arXiv 2026

[41] [41]

aivilization,

HKUST, “aivilization,” 2025. [Online]. Available: https:// aivilization.ai

2025

[42] [42]

artimpostor,

Pocketpair, “artimpostor,” 2022. [Online]. Available: https: //store.steampowered.com/app/2154230/

work page arXiv 2022

[43] [43]

More than words,

Soul Shell, “More than words,” 2023. [Online]. Available: https: //store.steampowered.com/app/2285280/More_than_words/

work page arXiv 2023

[44] [44]

devnullstower,

aieuo, “devnullstower,” 2026. [Online]. Available: https://store. steampowered.com/app/4350940/Dev_Nulls_Tower/

work page arXiv 2026

[45] [45]

Why video game genres fail: A classificatory analysis,

R. Clarke, J. Lee, and N. Clark, “Why video game genres fail: A classificatory analysis,”Games and Culture, vol. 12, 07 2015

2015

[46] [46]

Using thematic analysis in psychology,

V . Braun and V . Clarke, “Using thematic analysis in psychology,” Qualitative Research in Psychology, vol. 3, no. 2, pp. 77–101, 2006. [Online]. Available: https://doi.org/10.1191/1478088706qp063oa

work page doi:10.1191/1478088706qp063oa 2006

[47] [47]

Vaudeville,

Bumblebee Studios, “Vaudeville,” 2023. [Online]. Available: https://store.steampowered.com/app/2240920/Vaudeville/

work page arXiv 2023

[48] [48]

[Online]

Proxima, “suckup,” 2023. [Online]. Available: https://store. steampowered.com/app/2726370/Suck_Up/

work page arXiv 2023

[49] [49]

Hidden door,

Hidden Door, “Hidden door,” 2025. [Online]. Available: https://www.hiddendoor.co/

2025

[50] [50]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,”

[51] [51]

High-Resolution Image Synthesis with Latent Diffusion Models

[Online]. Available: https://arxiv.org/abs/2112.10752

work page internal anchor Pith review Pith/arXiv arXiv

[52] [52]

Robust Speech Recognition via Large-Scale Weak Supervision

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large- scale weak supervision,” 2022. [Online]. Available: https: //arxiv.org/abs/2212.04356

work page internal anchor Pith review Pith/arXiv arXiv 2022

[53] [53]

whispersstar,

Anuttacon, “whispersstar,” 2025. [Online]. Avail- able: https://store.steampowered.com/app/3730100/Whispers_ from_the_Star/

work page arXiv 2025

[54] [54]

Self-refine: Iterative refinement with self-feedback,

A. Madaan, N. Tandon, P . Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang, S. Gupta, B. P . Majumder, K. Hermann, S. Welleck, A. Yazdanbakhsh, and P . Clark, “Self-refine: Iterative refinement with self-feedback,”

[55] [55]

Self-Refine: Iterative Refinement with Self-Feedback

[Online]. Available: https://arxiv.org/abs/2303.17651

work page internal anchor Pith review Pith/arXiv arXiv

[56] [56]

Efficient Guided Generation for Large Language Models

B. T. Willard and R. Louf, “Efficient guided generation for large language models,” 2023. [Online]. Available: https: //arxiv.org/abs/2307.09702

work page internal anchor Pith review Pith/arXiv arXiv 2023

[57] [57]

Grammar- constrained decoding for structured nlp tasks without finetuning,

S. Geng, M. Josifoski, M. Peyrard, and R. West, “Grammar- constrained decoding for structured nlp tasks without finetuning,”

[58] [58]

Available: https://arxiv.org/abs/2305.13971

[Online]. Available: https://arxiv.org/abs/2305.13971

work page arXiv

[59] [59]

Toolformer: Language Models Can Teach Themselves to Use Tools

T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” 2023. [Online]. Available: https://arxiv.org/abs/2302.04761

work page internal anchor Pith review Pith/arXiv arXiv 2023

[60] [60]

React: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models,” 2023. [Online]. Available: https://arxiv.org/abs/2210. 03629

2023

[61] [61]

Answer set programming for procedural content generation: A design space approach,

A. M. Smith and M. Mateas, “Answer set programming for procedural content generation: A design space approach,”IEEE Transactions on Computational Intelligence and AI in Games, vol. 3, no. 3, pp. 187–200, 2011

2011

[62] [62]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

P . Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. tau Yih, T. Rocktäschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive nlp tasks,” 2021. [Online]. Available: https://arxiv.org/abs/2005.11401

work page internal anchor Pith review Pith/arXiv arXiv 2021

[63] [63]

Training language models to follow instructions with human feedback

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P . Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P . Welinder, P . Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” 2022. [Online]. Available: https://arxiv.org/abs/2203.02155

work page internal anchor Pith review Pith/arXiv arXiv 2022

[64] [64]

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” 2024. [Online]. Available: https://arxiv.org/abs/2305.18290

work page internal anchor Pith review Pith/arXiv arXiv 2024

[65] [65]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P . Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging llm-as-a-judge with mt-bench and chatbot arena,” 2023. [Online]. Available: https: //arxiv.org/abs/2306.05685

work page internal anchor Pith review Pith/arXiv arXiv 2023

[66] [66]

Human-level performance in no-press diplomacy via equilibrium search,

J. Gray, A. Lerer, A. Bakhtin, and N. Brown, “Human-level performance in no-press diplomacy via equilibrium search,” 2021. [Online]. Available: https://arxiv.org/abs/2010.02923

work page arXiv 2021

[67] [67]

onespellfitsall,

YenR, “onespellfitsall,” 2024. [Online]. Available: https://github. com/YenR/OneSpellFitsAll

2024

[68] [68]

Voyager: An Open-Ended Embodied Agent with Large Language Models

G. Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Y. Zhu, L. Fan, and A. Anandkumar, “Voyager: An open-ended embodied agent with large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2305.16291 14

work page internal anchor Pith review Pith/arXiv arXiv 2023

[69] [69]

Memgpt: Towards llms as operating systems,

C. Packer, S. Wooders, K. Lin, V . Fang, S. G. Patil, I. Stoica, and J. E. Gonzalez, “Memgpt: Towards llms as operating systems,”

[70] [70]

MemGPT: Towards LLMs as Operating Systems

[Online]. Available: https://arxiv.org/abs/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv

[71] [71]

aisociety,

b8ve, “aisociety,” 2026. [Online]. Available: https://store. steampowered.com/app/4468180/AI_Society/

work page arXiv 2026

[72] [72]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” 2015. [Online]. Available: https://arxiv.org/ abs/1503.02531

work page internal anchor Pith review Pith/arXiv arXiv 2015

[73] [73]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “Gptq: Accurate post-training quantization for generative pre-trained transformers,” 2023. [Online]. Available: https://arxiv.org/abs/ 2210.17323

work page internal anchor Pith review Pith/arXiv arXiv 2023

[74] [74]

QLoRA: Efficient Finetuning of Quantized LLMs

T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “Qlora: Efficient finetuning of quantized llms,” 2023. [Online]. Available: https://arxiv.org/abs/2305.14314

work page internal anchor Pith review Pith/arXiv arXiv 2023

[75] [75]

Fast Inference from Transformers via Speculative Decoding

Y. Leviathan, M. Kalman, and Y. Matias, “Fast inference from transformers via speculative decoding,” 2023. [Online]. Available: https://arxiv.org/abs/2211.17192

work page internal anchor Pith review Pith/arXiv arXiv 2023

[76] [76]

Efficient Memory Management for Large Language Model Serving with PagedAttention

W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, J. E. Gonzalez, H. Zhang, and I. Stoica, “Efficient memory management for large language model serving with pagedattention,” 2023. [Online]. Available: https://arxiv.org/abs/2309.06180

work page internal anchor Pith review Pith/arXiv arXiv 2023

[77] [77]

airoguelite,

Max Loh, “airoguelite,” 2024. [Online]. Available: https: //store.steampowered.com/app/1889620/AI_Roguelite/

work page arXiv 2024

[78] [78]

skaldsong,

Fenris Labs, “skaldsong,” 2025. [Online]. Available: https: //store.steampowered.com/app/3808550/Skaldsong/

work page arXiv 2025

[79] [79]

How is chatgpt’s behavior changing over time?

L. Chen, M. Zaharia, and J. Zou, “How is chatgpt’s behavior changing over time?” 2023. [Online]. Available: https://arxiv.org/abs/2307.09009

work page arXiv 2023

[80] [80]

Holistic Evaluation of Language Models

P . Liang, R. Bommasani, T. Lee, D. Tsipras, D. Soylu, M. Yasunaga, Y. Zhang, D. Narayanan, Y. Wu, A. Kumar, B. Newman, B. Yuan, B. Yan, C. Zhang, C. Cosgrove, C. D. Manning, C. Ré, D. Acosta-Navas, D. A. Hudson, E. Zelikman, E. Durmus, F. Ladhak, F. Rong, H. Ren, H. Yao, J. Wang, K. Santhanam, L. Orr, L. Zheng, M. Yuksekgonul, M. Suzgun, N. Kim, N. Guha,...

work page internal anchor Pith review Pith/arXiv arXiv 2023