arxiv: 2604.02545 · v1 · submitted 2026-04-02 · 💻 cs.AI

Recognition: no theorem link

Competency Questions as Executable Plans: a Controlled RAG Architecture for Cultural Heritage Storytelling

Naga Sowjanya Barla , Jacopo de Berardinis

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:52 UTC · model grok-4.3

classification 💻 cs.AI

keywords competency questionsRAGknowledge graphscultural heritagestorytellingneuro-symbolicLive Aid KGauditable generation

0 comments

The pith

Competency questions can be turned into runtime executable plans that control retrieval and generation in a knowledge-graph RAG system for cultural heritage stories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models tend to invent facts when asked to tell stories about cultural heritage. The paper shows how to avoid this by grounding generation in a knowledge graph and using competency questions not just for design checks but as active plans that specify exactly what to retrieve and narrate. This creates a transparent workflow where every part of the story stays linked to source evidence. Experiments on a new dataset built around the 1985 Live Aid concert compare three retrieval strategies and map the resulting differences in factual accuracy, contextual detail, and story flow.

Core claim

Repurposing competency questions as executable narrative plans bridges high-level user personas to atomic graph retrieval inside a plan-retrieve-generate workflow, so that every generated story remains evidence-closed and fully auditable.

What carries the argument

The plan-retrieve-generate workflow that executes competency questions as dynamic narrative plans to direct knowledge-graph retrieval.

Load-bearing premise

Competency questions written for static validation can be executed directly as dynamic narrative plans at runtime without adaptation, error, or loss of evidence closure.

What would settle it

A generated story that includes a fact absent from the Live Aid knowledge graph after following its competency-question plan.

Figures

Figures reproduced from arXiv: 2604.02545 by Jacopo de Berardinis, Naga Sowjanya Barla.

**Figure 2.** Figure 2: End-to-end execution of a narrative beat for the [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

read the original abstract

The preservation of intangible cultural heritage is a critical challenge as collective memory fades over time. While Large Language Models (LLMs) offer a promising avenue for generating engaging narratives, their propensity for factual inaccuracies or "hallucinations" makes them unreliable for heritage applications where veracity is a central requirement. To address this, we propose a novel neuro-symbolic architecture grounded in Knowledge Graphs (KGs) that establishes a transparent "plan-retrieve-generate" workflow for story generation. A key novelty of our approach is the repurposing of competency questions (CQs) - traditionally design-time validation artifacts - into run-time executable narrative plans. This approach bridges the gap between high-level user personas and atomic knowledge retrieval, ensuring that generation is evidence-closed and fully auditable. We validate this architecture using a new resource: the Live Aid KG, a multimodal dataset aligning 1985 concert data with the Music Meta Ontology and linking to external multimedia assets. We present a systematic comparative evaluation of three distinct Retrieval-Augmented Generation (RAG) strategies over this graph: a purely symbolic KG-RAG, a text-enriched Hybrid-RAG, and a structure-aware Graph-RAG. Our experiments reveal a quantifiable trade-off between the factual precision of symbolic retrieval, the contextual richness of hybrid methods, and the narrative coherence of graph-based traversal. Our findings offer actionable insights for designing personalised and controllable storytelling systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's real move is turning competency questions into runtime executable plans to control RAG-based storytelling, but the conversion mechanics stay underspecified.

read the letter

The central idea here is repurposing competency questions—normally static validation tools—into dynamic plans that drive retrieval and keep story generation tied to a knowledge graph. They pair this with a new Live Aid KG that links 1985 concert facts to the Music Meta Ontology and external media. The plan-retrieve-generate flow is laid out clearly enough to see how personas connect to atomic retrieval steps, and the comparison of three RAG variants (pure symbolic KG-RAG, text-enriched hybrid, and structure-aware graph traversal) surfaces a practical trade-off between factual precision, contextual richness, and narrative flow. That part is useful for anyone building controllable systems in heritage domains where hallucinations are costly. The evaluation is described as systematic, which is a step above pure claims, and the new dataset itself is a concrete addition worth having. The soft spot is the missing detail on how static CQs actually become executable runtime plans. The abstract gives no conversion rules, sequencing logic, or guarantee that every plan step stays bijectively linked to KG triples. Without that explicit mapping, the evidence-closed and fully auditable properties rest on an unstated step that could introduce implicit LLM or orchestration choices. If the full paper supplies a fully symbolic transformation, the claim strengthens; otherwise the auditability guarantee is harder to verify. This work is aimed at researchers doing KG-grounded generation or AI applications in cultural heritage. A reader focused on neuro-symbolic control or RAG trade-offs would get value from the workflow and the dataset. I would send it to peer review because the architecture is concrete, the new resource is real, and the evaluation setup is worth checking in detail even if the mapping needs tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a neuro-symbolic RAG architecture for cultural heritage storytelling that repurposes competency questions (traditionally static validation artifacts) as runtime executable narrative plans within a plan-retrieve-generate workflow. It introduces the Live Aid KG (a multimodal dataset aligned with the Music Meta Ontology) and reports a comparative evaluation of three RAG strategies—purely symbolic KG-RAG, text-enriched Hybrid-RAG, and structure-aware Graph-RAG—claiming quantifiable trade-offs in factual precision, contextual richness, and narrative coherence while ensuring evidence-closed and auditable generation.

Significance. If the mapping from static CQs to dynamic plans is formally specified with explicit conversion rules and the evaluation supplies verifiable quantitative metrics, the work could provide a practical framework for controllable, persona-driven storytelling systems in heritage domains that require high veracity, bridging symbolic KG validation with LLM generation.

major comments (2)

[Abstract] Abstract: the comparative evaluation of the three RAG strategies is described as revealing 'quantifiable trade-offs' but supplies no metrics, statistical tests, baseline details, or results tables, which is load-bearing for the central validation claim and prevents assessment of whether the trade-offs hold.
[Architecture] Architecture description (plan-retrieve-generate workflow): the key novelty of directly executing competency questions as narrative plans lacks any explicit conversion rules, sequencing logic, or persona-parameterization mechanism that would maintain bijective linkage to KG triples; without this, the 'evidence-closed and fully auditable' guarantee cannot be verified and may be violated by implicit steps.

minor comments (2)

[Abstract] Abstract: the Live Aid KG is presented as a new resource but no details are given on its scale, construction process, or specific alignment procedure with the Music Meta Ontology.
[Abstract] The abstract refers to 'high-level user personas' but does not indicate how these are formally represented or linked to the CQ parameterization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity and verifiability of the claims.

read point-by-point responses

Referee: [Abstract] Abstract: the comparative evaluation of the three RAG strategies is described as revealing 'quantifiable trade-offs' but supplies no metrics, statistical tests, baseline details, or results tables, which is load-bearing for the central validation claim and prevents assessment of whether the trade-offs hold.

Authors: We agree that the abstract should explicitly reference the quantitative results to support the trade-off claim. The full manuscript includes evaluation tables with metrics (factual precision, contextual richness scores, narrative coherence ratings), baseline comparisons, and statistical tests. We will revise the abstract to include key figures, such as specific precision gains for KG-RAG and coherence advantages for Graph-RAG with associated p-values, making the validation immediately assessable. revision: yes
Referee: [Architecture] Architecture description (plan-retrieve-generate workflow): the key novelty of directly executing competency questions as narrative plans lacks any explicit conversion rules, sequencing logic, or persona-parameterization mechanism that would maintain bijective linkage to KG triples; without this, the 'evidence-closed and fully auditable' guarantee cannot be verified and may be violated by implicit steps.

Authors: We acknowledge that the architecture description would benefit from greater formal detail. While the manuscript describes the high-level plan-retrieve-generate workflow and the repurposing of CQs, it does not supply the explicit conversion rules or sequencing logic. We will add a new subsection with formal conversion rules, dependency-based sequencing, and persona-parameterization mappings that preserve bijective linkage to KG triples, supported by pseudocode and examples to verify the evidence-closed property. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architecture is a novel workflow proposal without self-referential reductions.

full rationale

The paper introduces a plan-retrieve-generate RAG architecture by explicitly repurposing competency questions as runtime narrative plans, validated via comparative experiments on the new Live Aid KG resource. No equations, fitted parameters, or load-bearing self-citations appear in the derivation; the central claim is an architectural design choice presented as such, with independent content in the symbolic/hybrid/graph RAG evaluation. The workflow does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard assumptions in KG and RAG research without introducing new free parameters or entities.

axioms (2)

domain assumption Knowledge graphs can provide verifiable facts for story generation
Assumed in the neuro-symbolic approach.
domain assumption LLMs can generate coherent narratives from retrieved evidence
Core to the generate step.

pith-pipeline@v0.9.0 · 8262 in / 932 out tokens · 60557 ms · 2026-05-13T20:52:10.279957+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

In: Knight, K., Ng, H.T., Oflazer, K

Barzilay, R., Lapata, M.: Modeling local coherence: An entity-based approach. In: Knight, K., Ng, H.T., Oflazer, K. (eds.) Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). pp. 141–

work page
[2]

Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales

Association for Computational Linguistics, Ann Arbor, Michigan (Jun 2005). https://doi.org/10.3115/1219840.1219858, https://aclanthology.org/P05-1018/

work page doi:10.3115/1219840.1219858 2005
[3]

In: The Semantic Web – ISWC 2023

de Berardinis, J., Carriero, V.A., Jain, N., Pasqual, A., Meroño-Peñuela, A., Pre- sutti,V.:Thepolifoniaontologynetwork:Buildingasemanticbackboneformusical heritage. In: The Semantic Web – ISWC 2023. Lecture Notes in Computer Science, vol. 14266, pp. 302–322. Springer (2023)

work page 2023
[4]

In: Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023 (2023)

deBerardinis,J.,Carriero,V.A.,Meroño-Peñuela,A.,Presutti,V.:Themusicmeta ontology: A flexible semantic model for the interoperability of music metadata. In: Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023 (2023)

work page 2023
[5]

Journal of the Brazilian Computer Society29, 1–22 (2023)

Bezerra, C., Freitas, F., Gadelha, B., Casanova, M.A.: Use of competency questions in ontology engineering: a survey. Journal of the Brazilian Computer Society29, 1–22 (2023)

work page 2023
[6]

In: Extended Se- mantic Web Conference

Blin, L.: Building narrative structures from knowledge graphs. In: Extended Se- mantic Web Conference. pp. 15–29. Springer (2022)

work page 2022
[7]

In: International Conference on Knowledge Engineering and Knowledge Man- agement

Blomqvist, E., Seil Sepour, A., Presutti, V.: Ontology testing-methodology and tool. In: International Conference on Knowledge Engineering and Knowledge Man- agement. pp. 216–226. Springer (2012)

work page 2012
[8]

In: Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS) (2020)

Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., et al.: Language models are few-shot learners. In: Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS) (2020)

work page 2020
[9]

In: International semantic web conference

Carriero, V.A., Gangemi, A., Mancinelli, M.L., Marinucci, L., Nuzzolese, A.G., Presutti, V., Veninata, C.: Arco: The italian cultural heritage knowledge graph. In: International semantic web conference. pp. 36–52. Springer (2019)

work page 2019
[10]

In: Nash-Webber, B., Schank, R

Clark, H.H.: Bridging. In: Nash-Webber, B., Schank, R. (eds.) Theoretical Issues in Natural Language Processing (1975), https://aclanthology.org/T75-2034/

work page 1975
[11]

ACM Computing Surveys55(8), 173:1–173:38 (2022)

Dong, C., Li, Y., Gong, H., Chen, M., Li, J., Shen, Y., Yang, M.: A survey of natural language generation. ACM Computing Surveys55(8), 173:1–173:38 (2022). https://doi.org/10.1145/3554727, https://doi.org/10.1145/3554727

work page doi:10.1145/3554727 2022
[12]

Open Book Publishers (2020)

Edmond, J.: Digital Technology and the Practices of Humanities Research. Open Book Publishers (2020)

work page 2020
[13]

Emonet, V., Bolleman, J., Duvaud, S., de Farias, T.M., Sima, A.C.: Llm-based sparql query generation from natural language over federated knowledge graphs (2025), https://arxiv.org/abs/2410.06062

work page arXiv 2025
[14]

The American Historical Review128(1), 335–369 (03 2023)

van Erp, M., Tullett, W., Christlein, V., Ehrhart, T., Hürriyetoğlu, A., Lee- mans, I., Lisena, P., Menini, S., Schwabe, D., Tonelli, S., Troncy, R., Zinnen, M.: More than the name of the rose: How to make computers read, see, and organize smells. The American Historical Review128(1), 335–369 (03 2023). https://doi.org/10.1093/ahr/rhad141, https://doi.org...

work page doi:10.1093/ahr/rhad141 2023
[15]

Computational Linguistics21(2), 203–225 (1995), https://aclanthology.org/J95-2003/

Grosz, B.J., Joshi, A.K., Weinstein, S.: Centering: A framework for modeling the local coherence of discourse. Computational Linguistics21(2), 203–225 (1995), https://aclanthology.org/J95-2003/

work page 1995
[16]

In: IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing (1995) 18 N.S

Grüninger, M., Fox, M.S.: Methodology for the design and evaluation of ontologies. In: IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing (1995) 18 N.S. Barla and J. de Berardinis

work page 1995
[17]

Communications of the ACM59(2), 44–51 (2016)

Guha, R.V., Brickley, D., Macbeth, S.: Schema.org: Evolution of structured data on the web. Communications of the ACM59(2), 44–51 (2016)

work page 2016
[18]

In: Schuetze, H., Fung, P., Poesio, M

Guinaudeau, C., Strube, M.: Graph-based local coherence modeling. In: Schuetze, H., Fung, P., Poesio, M. (eds.) Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 93–103. Association for Computational Linguistics, Sofia, Bulgaria (Aug 2013), https://aclanthology.org/P13-1010/

work page 2013
[19]

ACM Computing Surveys (CSUR)54(4), 1–37 (2021)

Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutierrez, C., Gayo, J.E.L., Kirrane, S., Neumaier, S., Polleres, A., Robbes, R., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., Zimmermann, A.: Knowledge graphs. ACM Computing Surveys (CSUR)54(4), 1–37 (2021)

work page 2021
[20]

In: Heritage Science

Huang, Z., Liu, J.: Using knowledge graphs and deep learning algorithms to en- hance digital cultural heritage management. In: Heritage Science. vol. 11, pp. 1–26. Springer (2023)

work page 2023
[21]

ACM Computing Surveys (CSUR)55(12), 1–38 (2023)

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A., Fung, P.: Survey of hallucination in natural language generation. ACM Computing Surveys (CSUR)55(12), 1–38 (2023)

work page 2023
[22]

Keet, C.M., Khan, Z.C.: Discerning and characterising types of competency ques- tions for ontologies (2024), https://arxiv.org/abs/2412.13688

work page arXiv 2024
[23]

Kincaid, J.P., et al.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Tech. Rep. Research Branch Report 8-75, Naval Air Station Memphis, Research Branch, Millington TN (1975)

work page 1975
[24]

In: TEXT2STORY 2024, 7th International Workshop on Narrative Extraction from Texts (Text2Story), colocated with ECIR 2024 (2024)

de Kok, M., Rebboud, Y., Lisena, P., Troncy, R., Tiddi, I.: From nodes to narra- tives: A knowledge graph-based storytelling approach. In: TEXT2STORY 2024, 7th International Workshop on Narrative Extraction from Texts (Text2Story), colocated with ECIR 2024 (2024)

work page 2024
[25]

In: Advances in Neural Information Processing Systems 33 (2020)

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Advances in Neural Information Processing Systems 33 (2020)

work page 2020
[26]

CoRR abs/2106.01623(2021), https://arxiv.org/abs/2106.01623

Li, J., Tang, T., Zhao, W.X., Wei, Z., Yuan, N.J., Wen, J.R.: Few-shot knowledge graph-to-text generation with pretrained language models. CoRR abs/2106.01623(2021), https://arxiv.org/abs/2106.01623

work page arXiv 2021
[27]

In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023)

Min, S., Krishna, K., Lyu, Q., Lewis, M., Yih, W.t., Zettlemoyer, L., Hajishirzi, H.: FActScore: Fine-grained atomic evaluation of factual precision in long form text generation. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (2023)

work page 2023
[28]

Monfardini, G.K.Q., Salamon, J.S., Barcellos, M.P.: Use of competency questions in ontology engineering: A survey. p. 45–64. Springer-Verlag, Berlin, Heidelberg (2023). https://doi.org/10.1007/978-3-031-47262-6_3

work page doi:10.1007/978-3-031-47262-6_3 2023
[29]

In: Pro- ceedings of the 3rd International Conference of the ACM Greek SIGCHI Chapter

Nikolakopoulou, V., Koutsabasis, P.: A framework for co-designing mixed re- ality experiences to enrich intangible cultural heritage in museums. In: Pro- ceedings of the 3rd International Conference of the ACM Greek SIGCHI Chapter. pp. 117–125. CHIGreece ’25, Association for Computing Machin- ery, New York, NY, USA (2025). https://doi.org/10.1145/3749012....

work page doi:10.1145/3749012.3749053 2025
[30]

arXiv preprint arXiv:2408.08921 (2024) A CQ-Driven RAG Workflow for Digital Storytelling 19

Peng, B., Gao, T., Chen, D.: Graph retrieval-augmented generation: A survey. arXiv preprint arXiv:2408.08921 (2024) A CQ-Driven RAG Workflow for Digital Storytelling 19

work page arXiv 2024
[31]

In: Proc

Presutti, V., Daga, E., Gangemi, A., Blomqvist, E.: extreme design with content ontology design patterns. In: Proc. Workshop on Ontology Patterns. pp. 83–97 (2009)

work page 2009
[32]

Pustejovsky, J., Castaño, J., Ingria, R., Saurí, R., Gaizauskas, R., Setzer, A., Katz, G., Radev, D.: Timeml: Robust specification of event and temporal expressions in text. pp. 28–34 (01 2003)

work page 2003
[33]

In: Ontology engineering in a networked world, pp

Suárez-Figueroa, M.C., Gómez-Pérez, A., Fernández-López, M.: The neon method- ology for ontology engineering. In: Ontology engineering in a networked world, pp. 9–34. Springer (2011)

work page 2011
[34]

https://ich.unesco.org/en/convention (2003)

UNESCO: Convention for the safeguarding of the intangible cultural heritage. https://ich.unesco.org/en/convention (2003)

work page 2003
[35]

PersDB (2012)

Vayanou, M., Karvounis, M., Kyriakidi, M., Roussou, M., Katifori, A., Ioannidis, Y.: Towards personalized storytelling for museum visits. PersDB (2012)

work page 2012
[36]

arXiv preprint arXiv:2310.01558 (2023)

Yoran, O., Wolfson, T., Ram, O., Berant, J.: Making retrieval-augmented language models robust to irrelevant context. arXiv preprint arXiv:2310.01558 (2023)

work page arXiv 2023
[37]

A survey of graph retrieval-augmented generation for customized large language models.arXiv preprint arXiv:2501.13958, 2025

Zhang, Q., Chen, S., Bei, Y., Yuan, Z., Zhou, H., Hong, Z., Chen, H., Xiao, Y., Zhou, C., Dong, J., et al.: A survey of graph retrieval-augmented generation for customized large language models. arXiv preprint arXiv:2501.13958 (2025)

work page arXiv 2025
[38]

Hammer to Fall

Zhu, X., Zhang, Z., Zhang, Z., Chen, Y., Chen, H., Cheng, G.: Knowledge graph- guided retrieval augmented generation. arXiv preprint arXiv:2405.15421 (2025) A Qualitative examples from KG–RAG This appendix presents selected excerpts from the Luca–Long (KG) narratives across multiple runs, illustrating different behaviours observed during qualita- tive ins...

work page arXiv 2025