AI Native Games: A Survey and Roadmap
Pith reviewed 2026-07-02 12:57 UTC · model grok-4.3
The pith
AI-native games are those where runtime generative AI is essential to the core play loop, as removing it would collapse or alter the central form of play.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AI-native games are defined by whether runtime generative AI is constitutive of the core loop, separated from AI-augmented games and other forms by the test that removing or trivially replacing the AI component would collapse or fundamentally change the central form of play. Screening yields 53 examples that cluster around language-forward designs such as narrative adventure, epistemic interaction, and generative narrative, while other categories remain less represented. The central design problem is organizing semantic openness into stable gameplay through mechanical invariants of goals, rules, state, feedback, pacing, and player agency.
What carries the argument
The counterfactual criterion that determines whether runtime generative AI is constitutive of the core loop by checking if its removal or trivial replacement would collapse or fundamentally change the central form of play.
If this is right
- The current corpus is concentrated in language-forward designs such as narrative adventure and generative narrative.
- Categories such as semantic adjudication, multi-agent simulation, generative construction, and relationship play are underrepresented.
- Mechanical invariants of goals, rules, state, feedback, pacing, and player agency are required to make open-ended AI outputs interpretable and consequential.
- Development priorities include controllable generation, AI-as-mechanic design, multimodal and multi-agent systems, inference economics, evaluation, safety, and regulation.
Where Pith is reading between the lines
- The definition could guide developers on when generative AI should be deeply integrated rather than added as a feature.
- The focus on semantic openness may apply to interactive systems outside games that rely on runtime generation.
- The roadmap's emphasis on inference economics points to practical barriers for widespread adoption of these games.
- Safety and regulation issues could affect public deployment of games that depend on open-ended AI outputs.
Load-bearing premise
The counterfactual test of removing or trivially replacing the AI can be applied consistently across games without subjective judgment or selection bias.
What would settle it
A set of games where independent analysts reach conflicting conclusions about whether removing the generative AI changes the core play loop would show the test cannot be applied reliably.
Figures
read the original abstract
Generative AI now enables games to produce dialogue, quests, characters, images, and worlds at runtime. Yet generation alone does not make a game AI-native, nor does it guarantee playability. This paper defines AI-native games by whether runtime generative AI is constitutive of the core loop: if the AI component were removed or trivially replaced, the central form of play would collapse or become fundamentally different. This counterfactual criterion separates AI-native games from AI-augmented games, boundary artifacts, chatbots, tavern-style role-play, procedural content generation, and AI-assisted production. Using this definition, we screen candidate artifacts and analyze 53 publicly available AI-native games and prototypes. We introduce a dual-axis G/N taxonomy: the G-axis captures player-facing game type, while the N-axis captures the dominant AI mechanic that makes generative AI indispensable to play. The corpus is concentrated around language-forward designs, especially narrative adventure, epistemic interaction, and generative narrative, while categories such as semantic adjudication, multi-agent simulation, generative construction, and relationship/companion play remain less represented. We argue that the central design problem is organizing semantic openness into stable gameplay. AI-native design depends on mechanical invariants: goals, rules, state, feedback, pacing, and player agency that make open-ended AI outputs interpretable and consequential. We conclude with a roadmap for controllable generation, AI-as-mechanic design, multimodal and multi-agent systems, inference economics, evaluation, safety, and regulation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper defines AI-native games via a counterfactual test: runtime generative AI is constitutive of the core loop if its removal or trivial replacement would collapse or fundamentally alter the central form of play. This separates AI-native games from AI-augmented ones and related categories. The authors apply the definition to screen and analyze a corpus of 53 publicly available games/prototypes, introduce a dual-axis G/N taxonomy (G-axis for player-facing game type, N-axis for dominant AI mechanic), observe concentration in language-forward designs (narrative adventure, epistemic interaction, generative narrative), and outline a roadmap addressing controllable generation, AI-as-mechanic design, multimodal/multi-agent systems, inference economics, evaluation, safety, and regulation.
Significance. If the definition and taxonomy can be applied reproducibly, the work supplies a needed conceptual boundary for an emerging subfield, grounds it in an empirical corpus of 53 artifacts, and identifies the core design challenge of turning semantic openness into stable, interpretable gameplay. The roadmap surfaces concrete open problems (e.g., mechanical invariants for agency and feedback) that could guide subsequent technical and design research.
major comments (2)
- [Definition and screening process] Definition and screening process (abstract; the section introducing the counterfactual criterion): the test for whether generative AI is 'constitutive of the core loop' is not accompanied by explicit, reproducible operational criteria for identifying the core loop, assessing 'trivial replacement,' or determining when play 'collapses or becomes fundamentally different.' This judgment is load-bearing for the classification of all 53 games and for the subsequent claim that the corpus is concentrated in particular G/N categories.
- [Corpus construction and taxonomy application] Corpus construction and taxonomy application (the section describing the 53-game analysis and G/N taxonomy): no details are provided on how boundary cases were resolved, whether multiple annotators were used, or what inter-rater agreement was obtained when assigning games to G-axis and N-axis categories. Without such information the reported concentration around language-forward designs cannot be evaluated for selection or assignment bias.
minor comments (2)
- [Abstract / screening description] The abstract states that the definition 'separates AI-native games from AI-augmented games, boundary artifacts, chatbots, tavern-style role-play, procedural content generation, and AI-assisted production,' but the manuscript does not include a dedicated table or appendix listing the screened-but-excluded candidates and the reasons for exclusion.
- [Taxonomy introduction] Notation for the G/N taxonomy is introduced without an accompanying figure that visually maps the two axes and populates them with example games from the corpus.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important opportunities to strengthen the reproducibility of our definition and analysis. We address each major comment below and commit to revisions that improve transparency without altering the core contributions.
read point-by-point responses
-
Referee: [Definition and screening process] Definition and screening process (abstract; the section introducing the counterfactual criterion): the test for whether generative AI is 'constitutive of the core loop' is not accompanied by explicit, reproducible operational criteria for identifying the core loop, assessing 'trivial replacement,' or determining when play 'collapses or becomes fundamentally different.' This judgment is load-bearing for the classification of all 53 games and for the subsequent claim that the corpus is concentrated in particular G/N categories.
Authors: We agree that the counterfactual criterion would benefit from explicit operational criteria to support reproducible application. The manuscript presents the definition conceptually, which is typical for introducing a new category in a survey, but the referee correctly identifies that this leaves room for subjective judgment in screening. In the revised manuscript we will expand the relevant section with operational guidelines: (1) core loop identification via primary player actions, goals, and win/lose conditions; (2) trivial replacement assessment by testing whether static or rule-based substitutes preserve equivalent play dynamics; (3) collapse determination via loss of essential features such as runtime semantic adaptation or branching. We will include worked examples from the corpus to illustrate application. revision: yes
-
Referee: [Corpus construction and taxonomy application] Corpus construction and taxonomy application (the section describing the 53-game analysis and G/N taxonomy): no details are provided on how boundary cases were resolved, whether multiple annotators were used, or what inter-rater agreement was obtained when assigning games to G-axis and N-axis categories. Without such information the reported concentration around language-forward designs cannot be evaluated for selection or assignment bias.
Authors: We acknowledge the lack of methodological detail on the annotation process. Screening and G/N assignment were conducted by the author team via iterative discussion and consensus, with boundary cases resolved by returning to the counterfactual test. No formal multi-annotator protocol or inter-rater statistics were used, as this is an author-driven survey rather than a crowdsourced annotation effort. In revision we will add a dedicated 'Screening and Taxonomy Methodology' subsection describing the process, boundary-case resolution examples, the complete classification table for all 53 artifacts, and an explicit discussion of potential biases. The full list of games and assignments will be released publicly to enable external verification. While we cannot retroactively compute inter-rater agreement, these additions will substantially improve evaluability of the concentration claims. revision: partial
Circularity Check
No circularity: purely definitional survey with no derivations, fits, or self-referential predictions
full rationale
The paper advances a counterfactual definition of AI-native games and applies it to classify a corpus of 53 examples. No equations, fitted parameters, or predictive claims appear; the work is classificatory and taxonomic. The definition is stated explicitly rather than derived from prior results, and the screening process is presented as an application of that definition without reduction to self-citation chains or constructed inputs. This matches the default expectation of no significant circularity for non-mathematical survey papers.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
N. Shaker, J. Togelius, and M. J. Nelson,Procedural Content Generation in Games. Springer, 2016. [Online]. Available: https://link.springer.com/book/10.1007/978-3-319-42716-4
-
[2]
Large language models and games: A survey and roadmap,
R. Gallotta, G. Todd, M. Zammit, S. Earle, A. Liapis, J. Togelius, and G. N. Yannakakis, “Large language models and games: A survey and roadmap,”IEEE Transactions on Games,
-
[3]
Available: https://ieeexplore.ieee.org/abstract/ document/10680313/
[Online]. Available: https://ieeexplore.ieee.org/abstract/ document/10680313/
-
[4]
Gpt for games: A scoping review (2020–2023),
D. Yang, E. Kleinman, and C. Harteveld, “Gpt for games: A scoping review (2020–2023),” in2024 IEEE Conference on Games. IEEE, 2024, pp. 1–8. [Online]. Available: https: //ieeexplore.ieee.org/abstract/document/10645548/
-
[5]
Procedural content generation in games: A survey with insights on emerging llm integration,
M. F. Maleki and R. Zhao, “Procedural content generation in games: A survey with insights on emerging llm integration,” inProceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 20, no. 1, 2024, pp. 167–178
2024
-
[6]
Language as reality: A co-creative storytelling game experience in 1001 nights using generative ai,
Y. Sun, Z. Li, K. Fang, C. H. Lee, and A. Asadipour, “Language as reality: A co-creative storytelling game experience in 1001 nights using generative ai,” 2023
2023
-
[7]
Mda: A formal approach to game design and game research,
R. Hunicke, M. LeBlanc, and R. Zubek, “Mda: A formal approach to game design and game research,” inProceedings of the AAAI Workshop on Challenges in Game AI, 2004. [Online]. Available: https://users.cs.northwestern.edu/~hunicke/MDA.pdf
2004
-
[8]
Gameflow: A model for evaluating player enjoyment in games,
P . Sweetser and P . Wyeth, “Gameflow: A model for evaluating player enjoyment in games,”Computers in Entertainment, vol. 3, no. 3, pp. 3–3, 2005
2005
-
[9]
Large language models and video games: A prelim- inary scoping review,
P . Sweetser, “Large language models and video games: A prelim- inary scoping review,” inACM Conversational User Interfaces 2024. ACM, 2024, pp. 1–8
2024
-
[10]
G. N. Yannakakis and J. Togelius,Artificial Intelligence and Games. Springer, 2018. [Online]. Available: https://link.springer.com/ book/10.1007/978-3-319-63519-4
-
[11]
On mixed-initiative content creation for video games,
G. Lai, F. F. Leymarie, and W. Latham, “On mixed-initiative content creation for video games,”IEEE Transactions on Games, vol. 14, no. 4, pp. 543–557, 2022
2022
-
[12]
Gpt for games: An updated scoping review (2020–2024),
D. Yang, E. Kleinman, and C. Harteveld, “Gpt for games: An updated scoping review (2020–2024),”IEEE Transactions on Games,
2020
-
[13]
Available: https://ieeexplore.ieee.org/abstract/ document/10974629/
[Online]. Available: https://ieeexplore.ieee.org/abstract/ document/10974629/
-
[14]
I. Millington and J. Funge,Artificial Intelligence for Games, 2nd ed. Morgan Kaufmann, 2009. [Online]. Available: https://www.routledge.com/Artificial-Intelligence-for-Games/ Millington-Funge/p/book/9780123747310
-
[15]
The case for dynamic difficulty adjustment in games,
R. Hunicke, “The case for dynamic difficulty adjustment in games,” inProceedings of the International Conference on Advances in Computer Entertainment Technology, 2005, pp. 429–433
2005
-
[16]
Interactive narrative: An intelligent systems approach,
M. O. Riedl and V . Bulitko, “Interactive narrative: An intelligent systems approach,”AI Magazine, vol. 34, no. 1, pp. 67–77, 2013. [Online]. Available: https://ojs.aaai.org/aimagazine/index.php/ aimagazine/article/view/2449
2013
-
[17]
Structuring content in the façade interactive drama architecture,
M. Mateas and A. Stern, “Structuring content in the façade interactive drama architecture,” inProceedings of the First AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. AAAI Press, 2005, pp. 93–98. [Online]. Available: https://cdn.aaai.org/AIIDE/2005/AIIDE05-016.pdf
2005
-
[18]
A behavior language for story-based believable agents,
——, “A behavior language for story-based believable agents,” inIEEE Intelligent Systems, 2002. [Online]. Available: https: //doi.org/10.1109/MIS.2002.1024751 13
-
[19]
Search-based procedural content generation: A taxonomy and survey,
J. Togelius, G. N. Yannakakis, K. O. Stanley, and C. Browne, “Search-based procedural content generation: A taxonomy and survey,” inIEEE Transactions on Computational Intelligence and AI in Games, vol. 3, no. 3, 2011, pp. 172–186
2011
-
[20]
Procedural content generation for games: A survey,
M. Hendrikx, S. Meijer, J. V . D. Velden, and A. Iosup, “Procedural content generation for games: A survey,”ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 9, no. 1, pp. 1–22, 2013
2013
-
[21]
Procedural Content Generation via Machine Learning (PCGML)
A. Summerville, S. Snodgrass, M. Guzdial, C. Holmgård, A. K. Hoover, A. Isaksen, A. Nealen, and J. Togelius, “Procedural content generation via machine learning (pcgml),” 2018. [Online]. Available: https://arxiv.org/abs/1702.00539
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[22]
Experience-driven procedu- ral content generation,
G. N. Yannakakis and J. Togelius, “Experience-driven procedu- ral content generation,”IEEE Transactions on Affective Computing, vol. 2, no. 3, pp. 147–161, 2011
2011
-
[23]
The ai systems of left 4 dead,
Valve, “The ai systems of left 4 dead,” 2009, valve developer materials on the AI Director and pacing systems. [Online]. Available: https://steamcdn-a.akamaihd.net/apps/valve/2009/ ai_systems_of_l4d_mike_booth.pdf
2009
-
[24]
Expressive ai: A hybrid art and science practice,
M. Mateas, “Expressive ai: A hybrid art and science practice,” Leonardo, vol. 34, no. 2, pp. 137–139, 2001. [Online]. Available: https://doi.org/10.1162/002409401750184690
-
[25]
Façade: An experiment in building a fully-realized interactive drama,
M. Mateas and A. Stern, “Façade: An experiment in building a fully-realized interactive drama,” inGame Developers Conference, 2003
2003
-
[26]
Narrative planning: Balancing plot and character,
M. O. Riedl and R. M. Young, “Narrative planning: Balancing plot and character,”Journal of Artificial Intelligence Research, vol. 39, pp. 217–268, 2010. [Online]. Available: https://doi.org/10.1613/jair. 2989
-
[27]
Pcg-based game design patterns,
M. Cook, M. Eladhari, A. Nealen, M. Treanor, E. Boxerman, A. Jaffe, P . Sottosanti, and S. Swink, “Pcg-based game design patterns,” 2016. [Online]. Available: https://arxiv.org/abs/1610. 03138
2016
-
[28]
Level generation through large language models,
G. Todd, S. Earle, M. U. Nasir, M. C. Green, and J. Togelius, “Level generation through large language models,” inProceedings of the International Conference on the Foundations of Digital Games,
-
[29]
Available: https://arxiv.org/abs/2302.05817
[Online]. Available: https://arxiv.org/abs/2302.05817
-
[30]
Mariogpt: Open-ended text2level generation through large language models,
S. Sudhakaran, M. González-Duque, C. Glanois, M. Freiberger, E. Najarro, and S. Risi, “Mariogpt: Open-ended text2level generation through large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2302.05981
-
[31]
Ai dungeon 2,
Latitude, “Ai dungeon 2,” 2019. [Online]. Available: https: //github.com/latitudegames/AIDungeon
2019
-
[32]
Infinite craft,
N. Agarwal, “Infinite craft,” 2024. [Online]. Available: https: //neal.fun/infinite-craft/
2024
-
[33]
Generative agents: Interactive simulacra of human behavior,
J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P . Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” 2023. [Online]. Available: https://arxiv.org/abs/2304. 03442
2023
-
[34]
Player-driven emergence in llm-driven game narrative,
X. Peng, J. Quaye, S. Rao, W. Xu, P . Botchway, C. Brockett, N. Jojic, G. DesGarennes, K. Lobb, M. Xu, J. Leandro, C. Jin, and B. Dolan, “Player-driven emergence in llm-driven game narrative,” 2024. [Online]. Available: https://arxiv.org/abs/2404.17027
-
[35]
Hacc-man: An arcade game for jailbreaking llms,
M. Valentim, J. Falk, and N. Inie, “Hacc-man: An arcade game for jailbreaking llms,” inDesigning Interactive Systems Conference, ser. DIS ’24. ACM, 2024, p. 338–341. [Online]. Available: http://dx.doi.org/10.1145/3656156.3665432
-
[36]
F. R. Christiansen, L. N. Hollensberg, N. B. Jensen, K. Julsgaard, K. N. Jespersen, and I. Nikolov, “Exploring presence in interactions with llm-driven npcs: A comparative study of speech recognition and dialogue options,” inProceedings of the 30th ACM Symposium on Virtual Reality Software and Technology, ser. VRST ’24. New York, NY, USA: Association for ...
-
[37]
ReLU Games, “Uncover the smoking gun,” 2024. [On- line]. Available: https://store.steampowered.com/app/2492290/ Uncover_the_Smoking_Gun/
-
[38]
friendsfables,
Side Quest Labs, “friendsfables,” 2023. [Online]. Available: https://fables.gg/
2023
-
[39]
gandalf,
Lakera, “gandalf,” 2023. [Online]. Available: https://gandalf. lakera.ai/baseline
2023
-
[40]
Historical simulator:chongzhen,
Qinggan Workshop, “Historical simulator:chongzhen,” 2026. [On- line]. Available: https://store.steampowered.com/app/4304230/
-
[41]
aivilization,
HKUST, “aivilization,” 2025. [Online]. Available: https:// aivilization.ai
2025
-
[42]
Pocketpair, “artimpostor,” 2022. [Online]. Available: https: //store.steampowered.com/app/2154230/
-
[43]
Soul Shell, “More than words,” 2023. [Online]. Available: https: //store.steampowered.com/app/2285280/More_than_words/
-
[44]
aieuo, “devnullstower,” 2026. [Online]. Available: https://store. steampowered.com/app/4350940/Dev_Nulls_Tower/
-
[45]
Why video game genres fail: A classificatory analysis,
R. Clarke, J. Lee, and N. Clark, “Why video game genres fail: A classificatory analysis,”Games and Culture, vol. 12, 07 2015
2015
-
[46]
Using thematic analysis in psychology,
V . Braun and V . Clarke, “Using thematic analysis in psychology,” Qualitative Research in Psychology, vol. 3, no. 2, pp. 77–101, 2006. [Online]. Available: https://doi.org/10.1191/1478088706qp063oa
-
[47]
Bumblebee Studios, “Vaudeville,” 2023. [Online]. Available: https://store.steampowered.com/app/2240920/Vaudeville/
- [48]
-
[49]
Hidden door,
Hidden Door, “Hidden door,” 2025. [Online]. Available: https://www.hiddendoor.co/
2025
-
[50]
High-resolution image synthesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,”
-
[51]
High-Resolution Image Synthesis with Latent Diffusion Models
[Online]. Available: https://arxiv.org/abs/2112.10752
work page internal anchor Pith review Pith/arXiv arXiv
-
[52]
Robust Speech Recognition via Large-Scale Weak Supervision
A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large- scale weak supervision,” 2022. [Online]. Available: https: //arxiv.org/abs/2212.04356
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[53]
Anuttacon, “whispersstar,” 2025. [Online]. Avail- able: https://store.steampowered.com/app/3730100/Whispers_ from_the_Star/
-
[54]
Self-refine: Iterative refinement with self-feedback,
A. Madaan, N. Tandon, P . Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang, S. Gupta, B. P . Majumder, K. Hermann, S. Welleck, A. Yazdanbakhsh, and P . Clark, “Self-refine: Iterative refinement with self-feedback,”
-
[55]
Self-Refine: Iterative Refinement with Self-Feedback
[Online]. Available: https://arxiv.org/abs/2303.17651
work page internal anchor Pith review Pith/arXiv arXiv
-
[56]
Efficient Guided Generation for Large Language Models
B. T. Willard and R. Louf, “Efficient guided generation for large language models,” 2023. [Online]. Available: https: //arxiv.org/abs/2307.09702
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[57]
Grammar- constrained decoding for structured nlp tasks without finetuning,
S. Geng, M. Josifoski, M. Peyrard, and R. West, “Grammar- constrained decoding for structured nlp tasks without finetuning,”
-
[58]
Available: https://arxiv.org/abs/2305.13971
[Online]. Available: https://arxiv.org/abs/2305.13971
-
[59]
Toolformer: Language Models Can Teach Themselves to Use Tools
T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” 2023. [Online]. Available: https://arxiv.org/abs/2302.04761
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[60]
React: Synergizing reasoning and acting in language models,
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models,” 2023. [Online]. Available: https://arxiv.org/abs/2210. 03629
2023
-
[61]
Answer set programming for procedural content generation: A design space approach,
A. M. Smith and M. Mateas, “Answer set programming for procedural content generation: A design space approach,”IEEE Transactions on Computational Intelligence and AI in Games, vol. 3, no. 3, pp. 187–200, 2011
2011
-
[62]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
P . Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. tau Yih, T. Rocktäschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive nlp tasks,” 2021. [Online]. Available: https://arxiv.org/abs/2005.11401
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[63]
Training language models to follow instructions with human feedback
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P . Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P . Welinder, P . Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” 2022. [Online]. Available: https://arxiv.org/abs/2203.02155
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[64]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” 2024. [Online]. Available: https://arxiv.org/abs/2305.18290
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[65]
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P . Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging llm-as-a-judge with mt-bench and chatbot arena,” 2023. [Online]. Available: https: //arxiv.org/abs/2306.05685
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[66]
Human-level performance in no-press diplomacy via equilibrium search,
J. Gray, A. Lerer, A. Bakhtin, and N. Brown, “Human-level performance in no-press diplomacy via equilibrium search,” 2021. [Online]. Available: https://arxiv.org/abs/2010.02923
-
[67]
onespellfitsall,
YenR, “onespellfitsall,” 2024. [Online]. Available: https://github. com/YenR/OneSpellFitsAll
2024
-
[68]
Voyager: An Open-Ended Embodied Agent with Large Language Models
G. Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Y. Zhu, L. Fan, and A. Anandkumar, “Voyager: An open-ended embodied agent with large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2305.16291 14
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[69]
Memgpt: Towards llms as operating systems,
C. Packer, S. Wooders, K. Lin, V . Fang, S. G. Patil, I. Stoica, and J. E. Gonzalez, “Memgpt: Towards llms as operating systems,”
-
[70]
MemGPT: Towards LLMs as Operating Systems
[Online]. Available: https://arxiv.org/abs/2310.08560
work page internal anchor Pith review Pith/arXiv arXiv
-
[71]
b8ve, “aisociety,” 2026. [Online]. Available: https://store. steampowered.com/app/4468180/AI_Society/
-
[72]
Distilling the Knowledge in a Neural Network
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” 2015. [Online]. Available: https://arxiv.org/ abs/1503.02531
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[73]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “Gptq: Accurate post-training quantization for generative pre-trained transformers,” 2023. [Online]. Available: https://arxiv.org/abs/ 2210.17323
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[74]
QLoRA: Efficient Finetuning of Quantized LLMs
T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “Qlora: Efficient finetuning of quantized llms,” 2023. [Online]. Available: https://arxiv.org/abs/2305.14314
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[75]
Fast Inference from Transformers via Speculative Decoding
Y. Leviathan, M. Kalman, and Y. Matias, “Fast inference from transformers via speculative decoding,” 2023. [Online]. Available: https://arxiv.org/abs/2211.17192
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[76]
Efficient Memory Management for Large Language Model Serving with PagedAttention
W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, J. E. Gonzalez, H. Zhang, and I. Stoica, “Efficient memory management for large language model serving with pagedattention,” 2023. [Online]. Available: https://arxiv.org/abs/2309.06180
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[77]
Max Loh, “airoguelite,” 2024. [Online]. Available: https: //store.steampowered.com/app/1889620/AI_Roguelite/
-
[78]
Fenris Labs, “skaldsong,” 2025. [Online]. Available: https: //store.steampowered.com/app/3808550/Skaldsong/
-
[79]
How is chatgpt’s behavior changing over time?
L. Chen, M. Zaharia, and J. Zou, “How is chatgpt’s behavior changing over time?” 2023. [Online]. Available: https://arxiv.org/abs/2307.09009
-
[80]
Holistic Evaluation of Language Models
P . Liang, R. Bommasani, T. Lee, D. Tsipras, D. Soylu, M. Yasunaga, Y. Zhang, D. Narayanan, Y. Wu, A. Kumar, B. Newman, B. Yuan, B. Yan, C. Zhang, C. Cosgrove, C. D. Manning, C. Ré, D. Acosta-Navas, D. A. Hudson, E. Zelikman, E. Durmus, F. Ladhak, F. Rong, H. Ren, H. Yao, J. Wang, K. Santhanam, L. Orr, L. Zheng, M. Yuksekgonul, M. Suzgun, N. Kim, N. Guha,...
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.