pith. machine review for the scientific record. sign in

arxiv: 2605.03244 · v1 · submitted 2026-05-05 · 💻 cs.CL · cs.AI

Recognition: unknown

S²tory: Story Spine Distillation for Movie Script Summarization

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:55 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords movie script summarizationnarratologyplot nucleicharacter developmentstory distillationnon-linear narrativessemantic fidelitynarrative expert agent
0
0 comments X

The pith

Movie script summarization improves by distilling narratological theory to identify plot nuclei from character development trajectories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Movie scripts challenge standard summarization because their non-linear, cross-cut structure defeats surface-level methods that pick salient sentences. The paper shows that tracking how characters develop across the script reliably isolates the core events that advance the story, called plot nuclei, while discarding atmosphere-building details. An expert reasoning agent applies this theory to produce distilled guidance that conditions a smaller model to generate the final summary. The result is summaries that keep semantic meaning intact at high compression ratios and transfer to other narrative domains without retraining. This matters because it offers a concrete way to embed literary structure into AI systems for handling complex stories.

Core claim

The central claim is that non-linear movie scripts can be summarized effectively by first using character development trajectories to locate plot nuclei—the essential events that drive narrative progression—via theory-constrained reasoning performed by a Narrative Expert Agent, whose output then conditions a compact model to produce the summary.

What carries the argument

Story Spine Distillation, which extracts plot nuclei by analyzing character development trajectories and uses a Narrative Expert Agent to perform theory-constrained reasoning that conditions the summarization model.

If this is right

  • Summaries achieve state-of-the-art semantic fidelity at roughly 3.5 times compression on movie script data.
  • The same trained components generalize in zero-shot fashion to book-length narratives.
  • Human judges rate the outputs as preserving complex story structure more faithfully when narratological constraints are applied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same trajectory-based filtering could be tested on other non-linear formats such as episodic television or interactive stories.
  • Distilling expert narrative reasoning this way may let smaller models handle creative generation tasks with less domain-specific data.
  • If the approach holds, literary theory could become a standard conditioning layer for AI systems that process long-form fiction or drama.

Load-bearing premise

Character development trajectories can be tracked reliably enough to separate the essential plot-driving events from peripheral ones in non-linear scripts.

What would settle it

Human raters on a new set of movie scripts score summaries generated without character-trajectory analysis as equal or better in plot coherence and completeness than those produced by the full method.

Figures

Figures reproduced from arXiv: 2605.03244 by Jiarui Zhang, Jiayue Wu, Mingzhe Lu, Qihao Wang, Yanbing Liu, Yangyan Xu, Yue Hu, Yunpeng Li.

Figure 1
Figure 1. Figure 1: The S2 tory framework leverages character-guided reasoning to identify plot nuclei and filter satellites, thereby distilling expert knowledge to effectively condition abstractive summarization. 4 Methodology In S2 tory, NEAgent models characters, infers plot nuclei under narrative con￾straints, and distills this reasoning into a nuclei-conditioned backbone for sum￾marization. The overall process is illustr… view at source ↗
Figure 2
Figure 2. Figure 2: Scene-length distribution: Original(Blue) vs. Nuclei(Orange) view at source ↗
read the original abstract

Movie scripts pose a fundamental challenge for automatic summarization due to their non-linear, cross-cut narrative structure, which makes surface-level saliency methods ineffective at preserving core story progression. To address this, we introduce S^2tory (Story Spine Distillation), a narratology-grounded framework that leverages character development trajectories to identify plot nuclei, the essential events that drive the narrative forward, while filtering out peripheral satellite events that merely enrich atmosphere or emotion. Our Narrative Expert Agent (NEAgent) performs theory-constrained reasoning, whose distilled knowledge conditions a small model to identify plot nuclei. Another model then uses these plot nuclei to generate the summary. Experiments on the MovieSum dataset demonstrate state-of-the-art semantic fidelity at approximately 3.5x compression, and zero-shot evaluation on BookSum confirms strong out-of-domain generalization. Human evaluation further validates that narratological theory provides an indispensable foundation for modeling complex, non-linear narratives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces S^2tory, a narratology-grounded framework for movie script summarization. It identifies plot nuclei (core driving events) from character development trajectories using a Narrative Expert Agent (NEAgent) that performs theory-constrained reasoning to distill knowledge; this conditions a small model for nuclei detection, after which a second model generates the summary. The authors claim state-of-the-art semantic fidelity on MovieSum at ~3.5x compression, strong zero-shot generalization on BookSum, and human evaluations confirming that narratological theory is an indispensable foundation for non-linear narratives.

Significance. If the performance claims and the necessity of the theoretical component hold, the work offers a principled way to move beyond surface saliency for complex narratives, with potential impact on story understanding and summarization tasks. The zero-shot transfer result is a notable strength if replicated.

major comments (2)
  1. [§5 Experiments and §5.2 Results] §5 Experiments and §5.2 Results: The SOTA semantic fidelity claim at 3.5x compression is not supported by explicit metric definitions (e.g., which semantic similarity measure, exact baselines, data splits, or error bars), preventing verification of the central performance assertions.
  2. [§4.3 and §5.3] §4.3 and §5.3: No ablation studies isolate the contribution of narratological theory and the NEAgent's theory-constrained reasoning from the two-stage architecture, generic multi-step prompting, or LLM capabilities alone. This directly undermines the human-evaluation claim that the theory is an 'indispensable foundation,' as alternative saliency or trajectory-free methods could produce equivalent results.
minor comments (2)
  1. [Abstract and §3] Abstract and §3: The compression ratio is stated as 'approximately 3.5x' without detailing the exact length-ratio calculation or per-script variance.
  2. [§3.2] §3.2: The conditioning mechanism from NEAgent output to the small model would benefit from pseudocode or an explicit diagram to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment point by point below, agreeing where clarifications and additional experiments are warranted, and outlining specific revisions to strengthen the empirical support for our claims.

read point-by-point responses
  1. Referee: [§5 Experiments and §5.2 Results] The SOTA semantic fidelity claim at 3.5x compression is not supported by explicit metric definitions (e.g., which semantic similarity measure, exact baselines, data splits, or error bars), preventing verification of the central performance assertions.

    Authors: We acknowledge that the experimental reporting in §5.2 requires greater explicitness to enable verification. The manuscript references semantic fidelity metrics and 3.5x compression on MovieSum but does not enumerate the precise similarity measure (e.g., BERTScore F1 or embedding cosine), full baseline configurations, exact data splits, or error bars. In the revised version we will expand §5.2 with a dedicated subsection detailing all metrics, listing every baseline with hyperparameters, specifying the MovieSum train/validation/test splits, and adding error bars or confidence intervals to the reported results. This will directly support the SOTA claim without altering the underlying experiments. revision: yes

  2. Referee: [§4.3 and §5.3] No ablation studies isolate the contribution of narratological theory and the NEAgent's theory-constrained reasoning from the two-stage architecture, generic multi-step prompting, or LLM capabilities alone. This directly undermines the human-evaluation claim that the theory is an 'indispensable foundation,' as alternative saliency or trajectory-free methods could produce equivalent results.

    Authors: We agree that the absence of targeted ablations limits the strength of the claim that narratological theory is indispensable. While the human evaluations in §5.3 contrast S^2tory against non-narratological baselines and the NEAgent is described as theory-constrained, we did not include direct ablations that remove the theory component or replace it with generic multi-step prompting. In the revision we will add these ablation experiments to §5.3, comparing the full model against variants that omit character-trajectory modeling or use standard chain-of-thought prompting without narratological constraints. The human evaluation discussion will be updated to reflect the new results, allowing us to either reinforce or appropriately qualify the statement on the theory's foundational role. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on experiments rather than definitional reduction

full rationale

The paper proposes S^2tory as a narratology-grounded framework that uses character development trajectories and NEAgent theory-constrained reasoning to distill plot nuclei for summarization. No equations, derivations, or self-citations appear in the provided text that would make any prediction or result equivalent to its inputs by construction. Performance claims (SOTA semantic fidelity at 3.5x compression on MovieSum, zero-shot on BookSum) and the human-evaluation validation of narratological theory as indispensable are presented as outcomes of the two-stage architecture and experiments, not as tautologies or fitted parameters renamed as predictions. This is a standard empirical method paper whose central claims remain independent of the listed circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Only the abstract is available, so the ledger reflects high-level elements described there. The central claim rests on narratological assumptions and the effectiveness of the introduced agent and distillation process.

axioms (1)
  • domain assumption Narratological theory can identify plot nuclei via character development trajectories while distinguishing them from peripheral satellite events
    The framework is explicitly narratology-grounded and uses this distinction to filter events.
invented entities (2)
  • Narrative Expert Agent (NEAgent) no independent evidence
    purpose: Performs theory-constrained reasoning to distill knowledge that conditions a small model for plot nuclei identification
    New component introduced in the abstract to bridge theory and model training.
  • Plot nuclei no independent evidence
    purpose: Essential events that drive the narrative forward, identified from character trajectories
    Core concept the summarization pipeline depends on.

pith-pipeline@v0.9.0 · 5482 in / 1487 out tokens · 52544 ms · 2026-05-07T16:55:18.297028+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 5 canonical work pages · 2 internal anchors

  1. [1]

    New literary history6(2), 237–272 (1975)

    Barthes, R., Duisit, L.: An introduction to the structural analysis of narrative. New literary history6(2), 237–272 (1975)

  2. [2]

    Longformer: The Long-Document Transformer

    Beltagy, I., Peters, M.E., Cohan, A.: Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020)

  3. [3]

    Brahman, F.: Modeling Key Narrative Elements for Story Understanding and Gen- eration. Ph.D. thesis, University of California, Santa Cruz (2022)

  4. [4]

    Cornell university press (1978)

    Chatman, S.B., Chatman, S.: Story and discourse: Narrative structure in fiction and film. Cornell university press (1978)

  5. [5]

    In: Proceedings of the 60th Annual Meeting of the AssociationforComputationalLinguistics(Volume1:LongPapers).pp.8602–8615 (2022) 12 Authors Suppressed Due to Excessive Length

    Chen, M., Chu, Z., Wiseman, S., Gimpel, K.: Summscreen: A dataset for abstrac- tive screenplay summarization. In: Proceedings of the 60th Annual Meeting of the AssociationforComputationalLinguistics(Volume1:LongPapers).pp.8602–8615 (2022) 12 Authors Suppressed Due to Excessive Length

  6. [6]

    In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

    Chitale, M.P., Bindal, U., Rajkumar, R.P., Mishra, R.: Discograms: Enhancing movie screen-play summarization using movie character-aware discourse graph. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). pp. 954–965 (2025)

  7. [7]

    In: Findings of the Association for Computational Linguistics: NAACL 2022

    Guo, M., Ainslie, J., Uthus, D.C., Ontanon, S., Ni, J., Sung, Y.H., Yang, Y.: Longt5: Efficient text-to-text transformer for long sequences. In: Findings of the Association for Computational Linguistics: NAACL 2022. pp. 724–736 (2022)

  8. [8]

    arXiv preprint arXiv:2405.10860 (2024)

    Huang, Z., Zhao, J., Jin, Q.: Ecr-chain: Advancing generative language mod- els to better emotion-cause reasoners through reasoning chains. arXiv preprint arXiv:2405.10860 (2024)

  9. [9]

    In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

    Jiayang,C.,Qiu,L.,Chan,C.,Liu,X.,Song,Y.,Zhang,Z.:Eventground:Narrative reasoning by grounding to eventuality-centric knowledge graphs. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). pp. 6622–6642 (2024)

  10. [10]

    In: Findings of the as- sociation for computational linguistics: EMNLP 2022

    Kryściński, W., Rajani, N., Agarwal, D., Xiong, C., Radev, D.: Booksum: A col- lection of datasets for long-form narrative summarization. In: Findings of the as- sociation for computational linguistics: EMNLP 2022. pp. 6536–6558 (2022)

  11. [11]

    Cognitive science5(4), 293–331 (1981)

    Lehnert, W.G.: Plot units and narrative summarization. Cognitive science5(4), 293–331 (1981)

  12. [12]

    arXiv preprint arXiv:2410.19809 (2024)

    Mahon, L., Lapata, M.: Screenwriter: Automatic screenplay generation and movie summarisation. arXiv preprint arXiv:2410.19809 (2024)

  13. [13]

    A Survey of Context Engineering for Large Language Models

    Mei, L., Yao, J., Ge, Y., Wang, Y., Bi, B., Cai, Y., Liu, J., Li, M., Li, Z.Z., Zhang, D., et al.: A survey of context engineering for large language models. arXiv preprint arXiv:2507.13334 (2025)

  14. [14]

    In: Proceedings of the 2004 conference on empirical methods in natural language processing

    Mihalcea, R., Tarau, P.: Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing. pp. 404–411 (2004)

  15. [15]

    In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

    Phang, J., Zhao, Y., Liu, P.J.: Investigating efficiently extending transformers for long input summarization. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. pp. 3946–3961 (2023)

  16. [16]

    In: Bobrow, D.G., Collins, A

    Rumelhart, D.E.: Notes on a schema for stories. In: Bobrow, D.G., Collins, A. (eds.) Representation and Understanding: Studies in Cognitive Science (1975)

  17. [17]

    In: Findings of the Association for Computational Linguistics: ACL

    Saxena, R., Keller, F.: Moviesum: An abstractive summarization dataset for movie screenplays. In: Findings of the Association for Computational Linguistics: ACL

  18. [18]

    4043–4050 (2024)

    pp. 4043–4050 (2024)

  19. [19]

    In: Findings of the Association for Computational Linguistics: NAACL

    Saxena, R., Keller, F.: Select and summarize: Scene saliency for movie script sum- marization. In: Findings of the Association for Computational Linguistics: NAACL

  20. [20]

    3439–3455 (2024)

    pp. 3439–3455 (2024)

  21. [21]

    Psychology press (2013)

    Schank, R.C., Abelson, R.P.: Scripts, plans, goals, and understanding: An inquiry into human knowledge structures. Psychology press (2013)

  22. [22]

    13 EmbeddingGemma: Powerful and Lightweight Text Representations G

    Tay, Y., Dehghani, M., Tran, V.Q., Garcia, X., Wei, J., Wang, X., Chung, H.W., Shakeri, S., Bahri, D., Schuster, T., et al.: Ul2: Unifying language learn- ing paradigms. arXiv preprint arXiv:2205.05131 (2022)

  23. [23]

    Advances in neural information processing systems33, 17283–17297 (2020)

    Zaheer, M., Guruganesh, G., Dubey, K.A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., Ravula, A., Wang, Q., Yang, L., et al.: Big bird: Transformers for longer sequences. Advances in neural information processing systems33, 17283–17297 (2020)

  24. [24]

    Advances in neural information processing systems36, 46595–46623 (2023)

    Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al.: Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in neural information processing systems36, 46595–46623 (2023)