A new 263k TTCW-annotated story dataset shows non-reasoning fine-tuning of Qwen3 models outperforms reasoning-supervised fine-tuning for fixed-format long-form literary review generation.
S tory W ars: A Dataset and Instruction Tuning Baselines for Collaborative Story Understanding and Generation
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
NARRA-Gym is an executable benchmark that generates complete interactive narrative episodes from emotional seeds and logs full model trajectories to expose gaps in coherence, adaptation, and personalization that static story tests miss.
CRGC models instructions as constraint graphs, identifies bridge constraints, and cuts violations by 39% on three datasets while preserving reasoning performance.
citing papers explorer
-
When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation
A new 263k TTCW-annotated story dataset shows non-reasoning fine-tuning of Qwen3 models outperforms reasoning-supervised fine-tuning for fixed-format long-form literary review generation.
-
NARRA-Gym for Evaluating Interactive Narrative Agents
NARRA-Gym is an executable benchmark that generates complete interactive narrative episodes from emotional seeds and logs full model trajectories to expose gaps in coherence, adaptation, and personalization that static story tests miss.
-
Bridging Auxiliary Constraints to Resolve Instruction Following in Large Reasoning Models
CRGC models instructions as constraint graphs, identifies bridge constraints, and cuts violations by 39% on three datasets while preserving reasoning performance.