pith. machine review for the scientific record. sign in

arxiv: 2604.02545 · v1 · submitted 2026-04-02 · 💻 cs.AI

Recognition: no theorem link

Competency Questions as Executable Plans: a Controlled RAG Architecture for Cultural Heritage Storytelling

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:52 UTC · model grok-4.3

classification 💻 cs.AI
keywords competency questionsRAGknowledge graphscultural heritagestorytellingneuro-symbolicLive Aid KGauditable generation
0
0 comments X

The pith

Competency questions can be turned into runtime executable plans that control retrieval and generation in a knowledge-graph RAG system for cultural heritage stories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models tend to invent facts when asked to tell stories about cultural heritage. The paper shows how to avoid this by grounding generation in a knowledge graph and using competency questions not just for design checks but as active plans that specify exactly what to retrieve and narrate. This creates a transparent workflow where every part of the story stays linked to source evidence. Experiments on a new dataset built around the 1985 Live Aid concert compare three retrieval strategies and map the resulting differences in factual accuracy, contextual detail, and story flow.

Core claim

Repurposing competency questions as executable narrative plans bridges high-level user personas to atomic graph retrieval inside a plan-retrieve-generate workflow, so that every generated story remains evidence-closed and fully auditable.

What carries the argument

The plan-retrieve-generate workflow that executes competency questions as dynamic narrative plans to direct knowledge-graph retrieval.

Load-bearing premise

Competency questions written for static validation can be executed directly as dynamic narrative plans at runtime without adaptation, error, or loss of evidence closure.

What would settle it

A generated story that includes a fact absent from the Live Aid knowledge graph after following its competency-question plan.

Figures

Figures reproduced from arXiv: 2604.02545 by Jacopo de Berardinis, Naga Sowjanya Barla.

Figure 1
Figure 1. Figure 1: Story construction as a list of beats based on a shared CQ pool. Each beat is [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: End-to-end execution of a narrative beat for the [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
read the original abstract

The preservation of intangible cultural heritage is a critical challenge as collective memory fades over time. While Large Language Models (LLMs) offer a promising avenue for generating engaging narratives, their propensity for factual inaccuracies or "hallucinations" makes them unreliable for heritage applications where veracity is a central requirement. To address this, we propose a novel neuro-symbolic architecture grounded in Knowledge Graphs (KGs) that establishes a transparent "plan-retrieve-generate" workflow for story generation. A key novelty of our approach is the repurposing of competency questions (CQs) - traditionally design-time validation artifacts - into run-time executable narrative plans. This approach bridges the gap between high-level user personas and atomic knowledge retrieval, ensuring that generation is evidence-closed and fully auditable. We validate this architecture using a new resource: the Live Aid KG, a multimodal dataset aligning 1985 concert data with the Music Meta Ontology and linking to external multimedia assets. We present a systematic comparative evaluation of three distinct Retrieval-Augmented Generation (RAG) strategies over this graph: a purely symbolic KG-RAG, a text-enriched Hybrid-RAG, and a structure-aware Graph-RAG. Our experiments reveal a quantifiable trade-off between the factual precision of symbolic retrieval, the contextual richness of hybrid methods, and the narrative coherence of graph-based traversal. Our findings offer actionable insights for designing personalised and controllable storytelling systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a neuro-symbolic RAG architecture for cultural heritage storytelling that repurposes competency questions (traditionally static validation artifacts) as runtime executable narrative plans within a plan-retrieve-generate workflow. It introduces the Live Aid KG (a multimodal dataset aligned with the Music Meta Ontology) and reports a comparative evaluation of three RAG strategies—purely symbolic KG-RAG, text-enriched Hybrid-RAG, and structure-aware Graph-RAG—claiming quantifiable trade-offs in factual precision, contextual richness, and narrative coherence while ensuring evidence-closed and auditable generation.

Significance. If the mapping from static CQs to dynamic plans is formally specified with explicit conversion rules and the evaluation supplies verifiable quantitative metrics, the work could provide a practical framework for controllable, persona-driven storytelling systems in heritage domains that require high veracity, bridging symbolic KG validation with LLM generation.

major comments (2)
  1. [Abstract] Abstract: the comparative evaluation of the three RAG strategies is described as revealing 'quantifiable trade-offs' but supplies no metrics, statistical tests, baseline details, or results tables, which is load-bearing for the central validation claim and prevents assessment of whether the trade-offs hold.
  2. [Architecture] Architecture description (plan-retrieve-generate workflow): the key novelty of directly executing competency questions as narrative plans lacks any explicit conversion rules, sequencing logic, or persona-parameterization mechanism that would maintain bijective linkage to KG triples; without this, the 'evidence-closed and fully auditable' guarantee cannot be verified and may be violated by implicit steps.
minor comments (2)
  1. [Abstract] Abstract: the Live Aid KG is presented as a new resource but no details are given on its scale, construction process, or specific alignment procedure with the Music Meta Ontology.
  2. [Abstract] The abstract refers to 'high-level user personas' but does not indicate how these are formally represented or linked to the CQ parameterization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity and verifiability of the claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the comparative evaluation of the three RAG strategies is described as revealing 'quantifiable trade-offs' but supplies no metrics, statistical tests, baseline details, or results tables, which is load-bearing for the central validation claim and prevents assessment of whether the trade-offs hold.

    Authors: We agree that the abstract should explicitly reference the quantitative results to support the trade-off claim. The full manuscript includes evaluation tables with metrics (factual precision, contextual richness scores, narrative coherence ratings), baseline comparisons, and statistical tests. We will revise the abstract to include key figures, such as specific precision gains for KG-RAG and coherence advantages for Graph-RAG with associated p-values, making the validation immediately assessable. revision: yes

  2. Referee: [Architecture] Architecture description (plan-retrieve-generate workflow): the key novelty of directly executing competency questions as narrative plans lacks any explicit conversion rules, sequencing logic, or persona-parameterization mechanism that would maintain bijective linkage to KG triples; without this, the 'evidence-closed and fully auditable' guarantee cannot be verified and may be violated by implicit steps.

    Authors: We acknowledge that the architecture description would benefit from greater formal detail. While the manuscript describes the high-level plan-retrieve-generate workflow and the repurposing of CQs, it does not supply the explicit conversion rules or sequencing logic. We will add a new subsection with formal conversion rules, dependency-based sequencing, and persona-parameterization mappings that preserve bijective linkage to KG triples, supported by pseudocode and examples to verify the evidence-closed property. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architecture is a novel workflow proposal without self-referential reductions.

full rationale

The paper introduces a plan-retrieve-generate RAG architecture by explicitly repurposing competency questions as runtime narrative plans, validated via comparative experiments on the new Live Aid KG resource. No equations, fitted parameters, or load-bearing self-citations appear in the derivation; the central claim is an architectural design choice presented as such, with independent content in the symbolic/hybrid/graph RAG evaluation. The workflow does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard assumptions in KG and RAG research without introducing new free parameters or entities.

axioms (2)
  • domain assumption Knowledge graphs can provide verifiable facts for story generation
    Assumed in the neuro-symbolic approach.
  • domain assumption LLMs can generate coherent narratives from retrieved evidence
    Core to the generate step.

pith-pipeline@v0.9.0 · 8262 in / 932 out tokens · 60557 ms · 2026-05-13T20:52:10.279957+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    In: Knight, K., Ng, H.T., Oflazer, K

    Barzilay, R., Lapata, M.: Modeling local coherence: An entity-based approach. In: Knight, K., Ng, H.T., Oflazer, K. (eds.) Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). pp. 141–

  2. [2]

    Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales

    Association for Computational Linguistics, Ann Arbor, Michigan (Jun 2005). https://doi.org/10.3115/1219840.1219858, https://aclanthology.org/P05-1018/

  3. [3]

    In: The Semantic Web – ISWC 2023

    de Berardinis, J., Carriero, V.A., Jain, N., Pasqual, A., Meroño-Peñuela, A., Pre- sutti,V.:Thepolifoniaontologynetwork:Buildingasemanticbackboneformusical heritage. In: The Semantic Web – ISWC 2023. Lecture Notes in Computer Science, vol. 14266, pp. 302–322. Springer (2023)

  4. [4]

    In: Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023 (2023)

    deBerardinis,J.,Carriero,V.A.,Meroño-Peñuela,A.,Presutti,V.:Themusicmeta ontology: A flexible semantic model for the interoperability of music metadata. In: Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023 (2023)

  5. [5]

    Journal of the Brazilian Computer Society29, 1–22 (2023)

    Bezerra, C., Freitas, F., Gadelha, B., Casanova, M.A.: Use of competency questions in ontology engineering: a survey. Journal of the Brazilian Computer Society29, 1–22 (2023)

  6. [6]

    In: Extended Se- mantic Web Conference

    Blin, L.: Building narrative structures from knowledge graphs. In: Extended Se- mantic Web Conference. pp. 15–29. Springer (2022)

  7. [7]

    In: International Conference on Knowledge Engineering and Knowledge Man- agement

    Blomqvist, E., Seil Sepour, A., Presutti, V.: Ontology testing-methodology and tool. In: International Conference on Knowledge Engineering and Knowledge Man- agement. pp. 216–226. Springer (2012)

  8. [8]

    In: Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS) (2020)

    Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., et al.: Language models are few-shot learners. In: Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS) (2020)

  9. [9]

    In: International semantic web conference

    Carriero, V.A., Gangemi, A., Mancinelli, M.L., Marinucci, L., Nuzzolese, A.G., Presutti, V., Veninata, C.: Arco: The italian cultural heritage knowledge graph. In: International semantic web conference. pp. 36–52. Springer (2019)

  10. [10]

    In: Nash-Webber, B., Schank, R

    Clark, H.H.: Bridging. In: Nash-Webber, B., Schank, R. (eds.) Theoretical Issues in Natural Language Processing (1975), https://aclanthology.org/T75-2034/

  11. [11]

    ACM Computing Surveys55(8), 173:1–173:38 (2022)

    Dong, C., Li, Y., Gong, H., Chen, M., Li, J., Shen, Y., Yang, M.: A survey of natural language generation. ACM Computing Surveys55(8), 173:1–173:38 (2022). https://doi.org/10.1145/3554727, https://doi.org/10.1145/3554727

  12. [12]

    Open Book Publishers (2020)

    Edmond, J.: Digital Technology and the Practices of Humanities Research. Open Book Publishers (2020)

  13. [13]

    Emonet, V., Bolleman, J., Duvaud, S., de Farias, T.M., Sima, A.C.: Llm-based sparql query generation from natural language over federated knowledge graphs (2025), https://arxiv.org/abs/2410.06062

  14. [14]

    The American Historical Review128(1), 335–369 (03 2023)

    van Erp, M., Tullett, W., Christlein, V., Ehrhart, T., Hürriyetoğlu, A., Lee- mans, I., Lisena, P., Menini, S., Schwabe, D., Tonelli, S., Troncy, R., Zinnen, M.: More than the name of the rose: How to make computers read, see, and organize smells. The American Historical Review128(1), 335–369 (03 2023). https://doi.org/10.1093/ahr/rhad141, https://doi.org...

  15. [15]

    Computational Linguistics21(2), 203–225 (1995), https://aclanthology.org/J95-2003/

    Grosz, B.J., Joshi, A.K., Weinstein, S.: Centering: A framework for modeling the local coherence of discourse. Computational Linguistics21(2), 203–225 (1995), https://aclanthology.org/J95-2003/

  16. [16]

    In: IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing (1995) 18 N.S

    Grüninger, M., Fox, M.S.: Methodology for the design and evaluation of ontologies. In: IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing (1995) 18 N.S. Barla and J. de Berardinis

  17. [17]

    Communications of the ACM59(2), 44–51 (2016)

    Guha, R.V., Brickley, D., Macbeth, S.: Schema.org: Evolution of structured data on the web. Communications of the ACM59(2), 44–51 (2016)

  18. [18]

    In: Schuetze, H., Fung, P., Poesio, M

    Guinaudeau, C., Strube, M.: Graph-based local coherence modeling. In: Schuetze, H., Fung, P., Poesio, M. (eds.) Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 93–103. Association for Computational Linguistics, Sofia, Bulgaria (Aug 2013), https://aclanthology.org/P13-1010/

  19. [19]

    ACM Computing Surveys (CSUR)54(4), 1–37 (2021)

    Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutierrez, C., Gayo, J.E.L., Kirrane, S., Neumaier, S., Polleres, A., Robbes, R., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., Zimmermann, A.: Knowledge graphs. ACM Computing Surveys (CSUR)54(4), 1–37 (2021)

  20. [20]

    In: Heritage Science

    Huang, Z., Liu, J.: Using knowledge graphs and deep learning algorithms to en- hance digital cultural heritage management. In: Heritage Science. vol. 11, pp. 1–26. Springer (2023)

  21. [21]

    ACM Computing Surveys (CSUR)55(12), 1–38 (2023)

    Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A., Fung, P.: Survey of hallucination in natural language generation. ACM Computing Surveys (CSUR)55(12), 1–38 (2023)

  22. [22]

    Keet, C.M., Khan, Z.C.: Discerning and characterising types of competency ques- tions for ontologies (2024), https://arxiv.org/abs/2412.13688

  23. [23]

    Kincaid, J.P., et al.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Tech. Rep. Research Branch Report 8-75, Naval Air Station Memphis, Research Branch, Millington TN (1975)

  24. [24]

    In: TEXT2STORY 2024, 7th International Workshop on Narrative Extraction from Texts (Text2Story), colocated with ECIR 2024 (2024)

    de Kok, M., Rebboud, Y., Lisena, P., Troncy, R., Tiddi, I.: From nodes to narra- tives: A knowledge graph-based storytelling approach. In: TEXT2STORY 2024, 7th International Workshop on Narrative Extraction from Texts (Text2Story), colocated with ECIR 2024 (2024)

  25. [25]

    In: Advances in Neural Information Processing Systems 33 (2020)

    Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Advances in Neural Information Processing Systems 33 (2020)

  26. [26]

    CoRR abs/2106.01623(2021), https://arxiv.org/abs/2106.01623

    Li, J., Tang, T., Zhao, W.X., Wei, Z., Yuan, N.J., Wen, J.R.: Few-shot knowledge graph-to-text generation with pretrained language models. CoRR abs/2106.01623(2021), https://arxiv.org/abs/2106.01623

  27. [27]

    In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023)

    Min, S., Krishna, K., Lyu, Q., Lewis, M., Yih, W.t., Zettlemoyer, L., Hajishirzi, H.: FActScore: Fine-grained atomic evaluation of factual precision in long form text generation. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023)

  28. [28]

    Monfardini, G.K.Q., Salamon, J.S., Barcellos, M.P.: Use of competency questions in ontology engineering: A survey. p. 45–64. Springer-Verlag, Berlin, Heidelberg (2023). https://doi.org/10.1007/978-3-031-47262-6_3

  29. [29]

    In: Pro- ceedings of the 3rd International Conference of the ACM Greek SIGCHI Chapter

    Nikolakopoulou, V., Koutsabasis, P.: A framework for co-designing mixed re- ality experiences to enrich intangible cultural heritage in museums. In: Pro- ceedings of the 3rd International Conference of the ACM Greek SIGCHI Chapter. pp. 117–125. CHIGreece ’25, Association for Computing Machin- ery, New York, NY, USA (2025). https://doi.org/10.1145/3749012....

  30. [30]

    arXiv preprint arXiv:2408.08921 (2024) A CQ-Driven RAG Workflow for Digital Storytelling 19

    Peng, B., Gao, T., Chen, D.: Graph retrieval-augmented generation: A survey. arXiv preprint arXiv:2408.08921 (2024) A CQ-Driven RAG Workflow for Digital Storytelling 19

  31. [31]

    In: Proc

    Presutti, V., Daga, E., Gangemi, A., Blomqvist, E.: extreme design with content ontology design patterns. In: Proc. Workshop on Ontology Patterns. pp. 83–97 (2009)

  32. [32]

    Pustejovsky, J., Castaño, J., Ingria, R., Saurí, R., Gaizauskas, R., Setzer, A., Katz, G., Radev, D.: Timeml: Robust specification of event and temporal expressions in text. pp. 28–34 (01 2003)

  33. [33]

    In: Ontology engineering in a networked world, pp

    Suárez-Figueroa, M.C., Gómez-Pérez, A., Fernández-López, M.: The neon method- ology for ontology engineering. In: Ontology engineering in a networked world, pp. 9–34. Springer (2011)

  34. [34]

    https://ich.unesco.org/en/convention (2003)

    UNESCO: Convention for the safeguarding of the intangible cultural heritage. https://ich.unesco.org/en/convention (2003)

  35. [35]

    PersDB (2012)

    Vayanou, M., Karvounis, M., Kyriakidi, M., Roussou, M., Katifori, A., Ioannidis, Y.: Towards personalized storytelling for museum visits. PersDB (2012)

  36. [36]

    arXiv preprint arXiv:2310.01558 (2023)

    Yoran, O., Wolfson, T., Ram, O., Berant, J.: Making retrieval-augmented language models robust to irrelevant context. arXiv preprint arXiv:2310.01558 (2023)

  37. [37]

    A survey of graph retrieval-augmented generation for customized large language models.arXiv preprint arXiv:2501.13958, 2025

    Zhang, Q., Chen, S., Bei, Y., Yuan, Z., Zhou, H., Hong, Z., Chen, H., Xiao, Y., Zhou, C., Dong, J., et al.: A survey of graph retrieval-augmented generation for customized large language models. arXiv preprint arXiv:2501.13958 (2025)

  38. [38]

    Hammer to Fall

    Zhu, X., Zhang, Z., Zhang, Z., Chen, Y., Chen, H., Cheng, G.: Knowledge graph- guided retrieval augmented generation. arXiv preprint arXiv:2405.15421 (2025) A Qualitative examples from KG–RAG This appendix presents selected excerpts from the Luca–Long (KG) narratives across multiple runs, illustrating different behaviours observed during qualita- tive ins...