NARRA-Gym is an executable benchmark that generates complete interactive narrative episodes from emotional seeds and logs full model trajectories to expose gaps in coherence, adaptation, and personalization that static story tests miss.
International Conference on Learning Representations , year =
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Evaluation artifacts substantially inflate the measured unsolvability ceiling in multi-LLM routing, leading to distorted router training and overstated headroom.
The Generalized Turing Test defines relative intelligence as the inability of one agent to distinguish an imitator from the original through interaction.
citing papers explorer
-
NARRA-Gym for Evaluating Interactive Narrative Agents
NARRA-Gym is an executable benchmark that generates complete interactive narrative episodes from emotional seeds and logs full model trajectories to expose gaps in coherence, adaptation, and personalization that static story tests miss.
-
Unsolvability Ceiling in Multi-LLM Routing: An Empirical Study of Evaluation Artifacts
Evaluation artifacts substantially inflate the measured unsolvability ceiling in multi-LLM routing, leading to distorted router training and overstated headroom.
-
The Generalized Turing Test: A Foundation for Comparing Intelligence
The Generalized Turing Test defines relative intelligence as the inability of one agent to distinguish an imitator from the original through interaction.