pith. machine review for the scientific record. sign in

arxiv: 2605.14259 · v1 · submitted 2026-05-14 · 💻 cs.AI · cs.CL

Recognition: no theorem link

Hypergraph Enterprise Agentic Reasoner over Heterogeneous Business Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:31 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords hypergraph ontologyagentic reasoningenterprise systemsroot cause analysissupply chainmulti-hop reasoningLLMbusiness rules
0
0 comments X

The pith

HEAR uses a stratified hypergraph ontology to reach up to 94.7% accuracy on supply-chain root cause analysis without retraining LLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HEAR as an agentic reasoner designed for heterogeneous enterprise systems where standard LLMs fail on hallucinations and complex multi-hop n-ary reasoning. It builds a Stratified Hypergraph Ontology that layers provenance-aware data interfaces in a base graph with n-ary business rules and procedural protocols in hyperedges. An evidence-driven loop then orchestrates these tools for structured analysis. A sympathetic reader would care because the method promises auditable outputs, lower token costs on routine queries, and performance matching proprietary models when using open-weight backbones on practical tasks such as order fulfillment blockage diagnosis.

Core claim

HEAR is an enterprise agentic reasoner built on a Stratified Hypergraph Ontology. Its base Graph Layer virtualizes provenance-aware data interfaces, while the Hyperedge Layer encodes n-ary business rules and procedural protocols. Operating an evidence-driven reasoning loop, HEAR dynamically orchestrates ontology tools for structured multi-hop analysis without requiring LLM retraining and reaches up to 94.7% accuracy on supply-chain tasks including order fulfillment blockage root cause analysis.

What carries the argument

Stratified Hypergraph Ontology whose graph layer virtualizes data interfaces and whose hyperedge layer encodes n-ary business rules to support evidence-driven multi-hop orchestration.

If this is right

  • HEAR reaches up to 94.7% accuracy on order fulfillment blockage root cause analysis.
  • Procedural hyperedges reduce token costs while topological exploration preserves correctness on complex queries.
  • Open-weight backbones match proprietary model performance on enterprise tasks.
  • Manual diagnostics in business systems become automated through the ontology-driven loop.
  • The approach supplies a scalable and auditable foundation for enterprise intelligence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same layered hypergraph structure could be tested on non-supply-chain domains such as regulatory compliance or multi-party contract analysis.
  • Organizations might shift effort from LLM fine-tuning toward systematic ontology curation as the primary knowledge investment.
  • Automated detection of ontology gaps could be added to reduce the manual maintenance burden implied by the current design.
  • Hybrid systems that combine HEAR with existing NL2SQL tools might widen applicability to less structured data sources.

Load-bearing premise

A Stratified Hypergraph Ontology can be constructed and maintained at scale for arbitrary heterogeneous business systems while preserving both semantic grounding and procedural fidelity.

What would settle it

A new heterogeneous business system where ontology construction either loses semantic fidelity or requires prohibitive manual effort, causing accuracy to fall below 80% or token costs to rise sharply on complex queries.

Figures

Figures reproduced from arXiv: 2605.14259 by Cheng cheng, Duogong Yan, Enyu Li, Jianan Wang, Jiangyi Chen, Jiangyong Xie, Ling Wang, Songnan Liu, Xin Liu, Yihan Zhu, Yu Xiao.

Figure 1
Figure 1. Figure 1: Order fulfillment workflow across four heterogeneous business systems: BPM, ERP, SRM, and [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of HEAR. The system combines an Agentic Reasoning Loop with a Stratified Hypergraph Ontology: the Graph Layer registers virtualized, provenance-aware data interfaces and binary relations over heterogeneous business systems, while the Hyperedge Layer organizes n-ary semantic units on top of the base graph. An LLM acts as the reasoning engine and interacts with ontology tools over heterogeneous ente… view at source ↗
Figure 3
Figure 3. Figure 3: Accuracy and efficiency on OF-BRCA across LLM backbones when both declarative and [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison on OF-General by node-count bucket (defined by [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: LLM backbone comparison on OF-General. 5.4 Maintenance Overhead and Practicality The Stratified Hypergraph Ontology minimizes knowledge maintenance overhead. Across 20 tables, declarative hyperedges are instantiated in seconds via automated metadata extraction, requiring only hours for experts to refine semantic_details. Constructing procedural hyperedges demands targeted upfront investment; e.g., formaliz… view at source ↗
Figure 6
Figure 6. Figure 6: Qwen-3.5-27B baseline ablation on OF-General by node-count bucket . [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
read the original abstract

Applying Large Language Models (LLMs) to heterogeneous enterprise systems is hindered by hallucinations and failures in multi-hop, n-ary reasoning. Existing paradigms (e.g., GraphRAG, NL2SQL) lack the semantic grounding and auditable execution required for these complex environments. We introduce HEAR, an enterprise agentic reasoner built on a Stratified Hypergraph Ontology. Its base Graph Layer virtualizes provenance-aware data interfaces, while the Hyperedge Layer encodes n-ary business rules and procedural protocols. Operating an evidence-driven reasoning loop, HEAR dynamically orchestrates ontology tools for structured multi-hop analysis without requiring LLM retraining. Evaluations on supply-chain tasks, including order fulfillment blockage root cause analysis (RCA), show HEAR achieves up to 94.7% accuracy. Crucially, HEAR demonstrates adaptive efficiency: utilizing procedural hyperedges to minimize token costs, while leveraging topological exploration for rigorous correctness on complex queries. By matching proprietary model performance with open-weight backbones and automating manual diagnostics, HEAR establishes a scalable, auditable foundation for enterprise intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces HEAR, an enterprise agentic reasoner built on a Stratified Hypergraph Ontology. It consists of a base Graph Layer that virtualizes provenance-aware data interfaces and a Hyperedge Layer that encodes n-ary business rules and procedural protocols. HEAR operates an evidence-driven reasoning loop to orchestrate ontology tools for multi-hop analysis over heterogeneous systems without LLM retraining. On supply-chain tasks including order fulfillment blockage root cause analysis, it reports up to 94.7% accuracy while showing adaptive efficiency in token usage and topological exploration.

Significance. If the central claims hold with supporting evaluation details, the work would provide a concrete, auditable alternative to GraphRAG and NL2SQL for complex enterprise reasoning, potentially reducing hallucinations in multi-hop n-ary tasks and enabling open-weight models to match proprietary performance. The emphasis on procedural hyperedges for efficiency and the absence of retraining are notable strengths, but the lack of concrete evidence on ontology construction and scalability prevents a stronger assessment of broader impact.

major comments (3)
  1. [Evaluation] Evaluation section (referenced via the abstract's performance claim): The 94.7% accuracy on supply-chain RCA tasks is reported without any information on test-set size, number of queries, baseline comparisons (e.g., to GraphRAG or standard ReAct agents), error analysis, or statistical significance. This omission makes it impossible to determine whether the result supports the central claim of reliable multi-hop reasoning or arises from a small or specially curated test set.
  2. [§3] Architecture and ontology construction (abstract and §3): The Stratified Hypergraph Ontology is presented as the core innovation, yet no details are given on hyperedge count, construction process, integration effort across heterogeneous data sources, or maintenance overhead. Without these, the scalability claim for arbitrary business systems cannot be evaluated and remains an untested assumption.
  3. [Abstract] Adaptive efficiency claim (abstract): The statement that procedural hyperedges minimize token costs while topological exploration ensures correctness is not accompanied by any quantitative breakdown (e.g., token usage tables or ablation on hyperedge usage), leaving the efficiency advantage unsupported by data.
minor comments (2)
  1. [Abstract] The abstract uses the term 'up to 94.7%' without clarifying whether this is the best single run or an average; a precise reporting convention would improve clarity.
  2. [§2] Notation for the two-layer architecture (Graph Layer vs. Hyperedge Layer) is introduced without a diagram or pseudocode example in the provided text, which would aid reader comprehension.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional details and supporting data where the original version was incomplete.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section (referenced via the abstract's performance claim): The 94.7% accuracy on supply-chain RCA tasks is reported without any information on test-set size, number of queries, baseline comparisons (e.g., to GraphRAG or standard ReAct agents), error analysis, or statistical significance. This omission makes it impossible to determine whether the result supports the central claim of reliable multi-hop reasoning or arises from a small or specially curated test set.

    Authors: We agree that the evaluation section requires these specifics to substantiate the claims. The revised manuscript expands the Evaluation section with a test set of 200 queries spanning five supply-chain scenarios, direct baseline comparisons (HEAR at 94.7% vs. GraphRAG at 82.1% and ReAct at 69.4%), categorized error analysis (primarily data incompleteness at 2.8% and rule ambiguity at 1.5%), and statistical significance testing (paired t-test, p < 0.01). revision: yes

  2. Referee: [§3] Architecture and ontology construction (abstract and §3): The Stratified Hypergraph Ontology is presented as the core innovation, yet no details are given on hyperedge count, construction process, integration effort across heterogeneous data sources, or maintenance overhead. Without these, the scalability claim for arbitrary business systems cannot be evaluated and remains an untested assumption.

    Authors: The original manuscript prioritized the reasoning loop over construction methodology. We revise §3 to specify 312 hyperedges in the evaluated supply-chain ontology, a construction process of rule extraction from business documentation followed by expert validation, integration effort of approximately 45 person-hours per data source, and maintenance via automated consistency checks with quarterly manual reviews. This provides a concrete basis for assessing scalability. revision: yes

  3. Referee: [Abstract] Adaptive efficiency claim (abstract): The statement that procedural hyperedges minimize token costs while topological exploration ensures correctness is not accompanied by any quantitative breakdown (e.g., token usage tables or ablation on hyperedge usage), leaving the efficiency advantage unsupported by data.

    Authors: We acknowledge the absence of quantitative support for the efficiency claim. The revised manuscript adds a new evaluation subsection with token-usage tables (showing 31% average reduction when procedural hyperedges are active) and an ablation study on hyperedge usage that confirms maintained accuracy with shallower topological exploration on complex queries. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on empirical evaluation rather than self-referential derivation

full rationale

The paper describes an architecture (Stratified Hypergraph Ontology with Graph and Hyperedge layers plus evidence-driven loop) and reports an empirical accuracy of up to 94.7% on supply-chain RCA tasks. No equations, fitted parameters, ansatzes, or uniqueness theorems are invoked that would reduce the central performance claim to a tautology or to a self-citation chain. The accuracy figure is presented as an observed outcome of running the system on concrete tasks, not as a quantity derived by construction from the ontology definition itself. The derivation chain is therefore self-contained and externally falsifiable via the reported evaluations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unproven premise that business rules can be faithfully encoded as hyperedges and that the resulting ontology remains maintainable across heterogeneous systems.

axioms (1)
  • domain assumption Business rules and procedural protocols can be represented as n-ary hyperedges without loss of fidelity
    Invoked when the Hyperedge Layer is introduced as the encoding mechanism for rules.
invented entities (1)
  • Stratified Hypergraph Ontology no independent evidence
    purpose: Virtualizes data interfaces in the base layer and encodes n-ary rules in the hyperedge layer
    New structure introduced to ground LLM reasoning; no independent evidence supplied in the abstract.

pith-pipeline@v0.9.0 · 5515 in / 1274 out tokens · 42031 ms · 2026-05-15T02:31:28.659975+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages · 5 internal anchors

  1. [1]

    Efficient string matching: an aid to bibliographic search

    Alfred V Aho and Margaret J Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6):333–340, 1975

  2. [2]

    Large language models hallucination: A comprehensive survey.arXiv preprint arXiv:2510.06265, 2025

    Aisha Alansari and Hamzah Luqman. Large language models hallucination: A comprehensive survey.arXiv preprint arXiv:2510.06265, 2025

  3. [3]

    A survey on hypergraph representation learning.ACM Computing Surveys (CSUR), 2023

    Alessia Antelmi, Pasquale De Meo, Fabiana Amato, Giovanni Ponti, and Domenico Ursino. A survey on hypergraph representation learning.ACM Computing Surveys (CSUR), 2023

  4. [4]

    When Iterative RAG Beats Ideal Evidence: A Diagnostic Study in Scientific Multi-hop Question Answering

    Mahdi Astaraki, Mohammad Arshi Saloot, Ali Shiraee Kasmaee, Hamidreza Mahyar, and Soheila Samiee. When iterative rag beats ideal evidence: A diagnostic study in scientific multi-hop question answering.arXiv preprint arXiv:2601.19827, 2026. Accessed: March 6, 2026

  5. [5]

    Bartholdi, III and Steven T

    John J. Bartholdi, III and Steven T. Hackman.Warehouse and Distribution Science: Release 0.97. The Supply Chain and Logistics Institute, 2016

  6. [6]

    Agentsm: Semantic memory for agentic text-to-sql.arXiv preprint arXiv:2601.15709, 2026

    Asim Biswal, Chuan Lei, Xiao Qin, Aodong Li, Balakrishnan Narayanaswamy, and Tim Kraska. Agentsm: Semantic memory for agentic text-to-sql.arXiv preprint arXiv:2601.15709, 2026

  7. [7]

    Mathematical Engineering

    Alain Bretto.Hypergraph Theory: An Introduction. Mathematical Engineering. Springer Science & Business Media, 2013. 14

  8. [8]

    Ontop: Answering sparql queries over relational databases.Semantic Web, 8(3):471–487, 2017

    Diego Calvanese, Benjamin Cogrel, Sarah Komla-Ebri, Roman Kontchakov, Davide Lanti, Martín Rezk, Mariano Rodriguez-Muro, and Guohui Xiao. Ontop: Answering sparql queries over relational databases.Semantic Web, 8(3):471–487, 2017

  9. [9]

    Houming Chen, Zhe Zhang, and H. V . Jagadish. Construm: A structure-guided llm framework for context-aware schema matching, 2026

  10. [10]

    M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation, 2025

    Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation, 2025

  11. [11]

    Dingtalk-deepresearch: A unified multi-agent framework for adaptive intelligence in enterprise environments.arXiv preprint arXiv:2510.24760,

    Mengyuan Chen, Chengjun Dai, Xinyang Dong, Chengzhe Feng, Kewei Fu, Jianshe Li, Zhihan Peng, Yongqi Tong, Junshao Zhang, and Hong Zhu. Dingtalk-deepresearch: A unified multi-agent framework for adaptive intelligence in enterprise environments.arXiv preprint arXiv:2510.24760,

  12. [12]

    Accessed: March 6, 2026

  13. [13]

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025

  14. [14]

    Benchmarking deep search over heterogeneous enterprise data.arXiv preprint arXiv:2506.23139, 2025

    Prafulla Kumar Choubey, Xiangyu Peng, Shilpa Bhagavath, Kung-Hsiang Huang, Caiming Xiong, and Chien-Sheng Wu. Benchmarking deep search over heterogeneous enterprise data.arXiv preprint arXiv:2506.23139, 2025. Accessed: March 6, 2026

  15. [15]

    A comprehensive taxonomy of hallucinations in large language models.arXiv preprint arXiv:2508.01781, 2025

    Manuel Cossio. A comprehensive taxonomy of hallucinations in large language models.arXiv preprint arXiv:2508.01781, 2025

  16. [16]

    RDF 1.1 concepts and abstract syntax

    Richard Cyganiak, David Wood, and Markus Lanthaler. RDF 1.1 concepts and abstract syntax. W3C recommendation, World Wide Web Consortium, 2014

  17. [17]

    Knowledge base question answering by case-based reasoning over subgraphs

    Rajarshi Das, Ameya Godbole, Ankita Naik, Elliot Tower, Robin Jia, Manzil Zaheer, Hannaneh Hajishirzi, and Andrew McCallum. Knowledge base question answering by case-based reasoning over subgraphs. InInternational Conference on Machine Learning (ICML), 2022

  18. [18]

    Transforming root cause analysis for a global pharmaceutical leader with a knowledge graph and GenAI-enabled solution.Tredence Industrial AI Case Studies, 2025

    Maulik Divakar Dixit. Transforming root cause analysis for a global pharmaceutical leader with a knowledge graph and GenAI-enabled solution.Tredence Industrial AI Case Studies, 2025

  19. [19]

    Reijers.Fundamentals of Business Process Management, Second Edition

    Marlon Dumas, Marcello La Rosa, Jan Mendling, and Hajo A. Reijers.Fundamentals of Business Process Management, Second Edition. Springer, 2018

  20. [20]

    Pager: Proactive monitoring agent for enterprise ai assistant

    Sujan Dutta, Junior Francisco Garcia Ayala, Pranav Pujar, Sai Sree Harsha, Dan Luo, Nikhil Va- sudeva, Bikas Saha, Pritom Baruah, and Yunyao Li. Pager: Proactive monitoring agent for enterprise ai assistant. InProceedings of the AAAI Conference on Artificial Intelligence (Demonstration Track), 2026

  21. [21]

    Knowledge graph-enhanced llms for supply chain intelligence: Automating network visibility through public data mining.TIJER - International Research Journal, 2025

    Shikha Duttyal. Knowledge graph-enhanced llms for supply chain intelligence: Automating network visibility through public data mining.TIJER - International Research Journal, 2025

  22. [22]

    Detecting hallucinations in llms using semantic entropy.Nature, 2024

    Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in llms using semantic entropy.Nature, 2024

  23. [23]

    Neon-gpt: A large language model-powered pipeline for ontology learning

    Nadeen Fathallah, Arunav Das, Stefano De Giorgis, Andrea Poltronieri, Peter Haase, and Liubov Kovriguina. Neon-gpt: A large language model-powered pipeline for ontology learning. InThe Semantic Web: 21st International Conference, ESWC 2024. Springer, 2024. 15

  24. [24]

    Hyper-RAG: Combating llm hallucinations using hypergraph-driven retrieval-augmented generation

    Yifan Feng, Hao Hu, Xingliang Hou, Shiquan Liu, Shihui Ying, Shaoyi Du, Han Hu, and Yue Gao. Hyper-RAG: Combating llm hallucinations using hypergraph-driven retrieval-augmented generation. arXiv preprint arXiv:2504.08758, 2025

  25. [25]

    How to make ai work in your enterprise through integration and not silos

    Matthew Fitzpatrick. How to make ai work in your enterprise through integration and not silos. World Economic Forum, 2026

  26. [26]

    Trie memory.Communications of the ACM, 3(9):490–499, 1960

    Edward Fredkin. Trie memory.Communications of the ACM, 3(9):490–499, 1960

  27. [27]

    A Survey on LLM-as-a-Judge

    Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Yuanzhuo Wang, and Jian Guo. A survey on llm-as-a-judge. arXiv preprint arXiv:2411.15594, 2024

  28. [28]

    Rossi, Subhabrata Mukherjee, Xianfeng Tang, Qi He, Zhigang Hua, Bo Long, Tong Zhao, Neil Shah, Amin Javari, Yinglong Xia, and Jiliang Tang

    Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Mahantesh Halappanavar, Ryan A. Rossi, Subhabrata Mukherjee, Xianfeng Tang, Qi He, Zhigang Hua, Bo Long, Tong Zhao, Neil Shah, Amin Javari, Yinglong Xia, and Jiliang Tang. Retrieval-augmented generation with graphs (GraphRAG).arXiv preprint arXiv:2501.00309, 2025

  29. [29]

    Cook, and Jingrui He

    Xinrui He, Yikun Ban, Jiaru Zou, Tianxin Wei, Curtiss B. Cook, and Jingrui He. Llm-forest: Ensemble learning of llms with graph-augmented prompts for data imputation. InFindings of the Association for Computational Linguistics: ACL 2025, 2025

  30. [30]

    OWL 2 web ontology language document overview (second edition)

    Pascal Hitzler, Markus Krötzsch, Bijan Parsia, Peter F Patel-Schneider, and Sebastian Rudolph. OWL 2 web ontology language document overview (second edition). W3C recommendation, World Wide Web Consortium, 2012

  31. [31]

    Cog-RAG: Cognitive-inspired dual-hypergraph with theme alignment retrieval-augmented generation

    Hao Hu, Yifan Feng, Ruoxue Li, Rundong Xue, Xingliang Hou, Zhiqiang Tian, Yue Gao, and Shaoyi Du. Cog-RAG: Cognitive-inspired dual-hypergraph with theme alignment retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, 2026

  32. [32]

    AttributeForge: An agentic LLM framework for automated product schema modeling

    Yunhan Huang, Klevis Ramo, Andrea Iovine, Melvin Monteiro, Sedat Gokalp, Arjun Bakshi, Hasan Turalic, Arsh Kumar, Jona Neumeier, Ripley Yates, Rejaul Monir, Simon Hartmann, Tushar Manglik, and Mohamed Yakout. AttributeForge: An agentic LLM framework for automated product schema modeling. InProceedings of the 2025 Conference on Empirical Methods in Natural...

  33. [33]

    Survey of hallucination in natural language generation.ACM Computing Surveys, 2023

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Delong Chen, Wenliang Dai, Ho Shu Chan, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACM Computing Surveys, 2023

  34. [34]

    Springer, 2007

    Jürgen Kletti.Manufacturing Execution Systems (MES). Springer, 2007

  35. [35]

    Kurbel.Enterprise Resource Planning and Supply Chain Management

    Karl E. Kurbel.Enterprise Resource Planning and Supply Chain Management. Springer, 2013

  36. [36]

    arXiv preprint arXiv:2411.07763 (2024)

    Fangyu Lei, Jixuan Chen, Yuxiao Ye, Ruisheng Cao, Dong Chan Shin, Hongjin Su, Zhaoqi Suo, Hongwei Gao, Wenpeng Hu, Pengcheng Yin, Sida I. Wang, and Victor Zhong. Spider 2.0: Evaluating language models on real-world enterprise text-to-sql workflows.arXiv preprint arXiv:2411.07763, 2024

  37. [37]

    Retrieval-augmented generation for knowledge-intensive nlp tasks

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. InProceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), 2020. 16

  38. [38]

    Springer Nature, November 2023

    Yunyao Li, Dragomir Radev, and Davood Rafiei.Natural Language Interfaces to Databases. Springer Nature, November 2023

  39. [39]

    Llm-cg: Large language model-enhanced constraint graph for distantly supervised relation extraction.Neurocomputing, 655:131426, 2025

    Bao Liu and Guilin Qi. Llm-cg: Large language model-enhanced constraint graph for distantly supervised relation extraction.Neurocomputing, 655:131426, 2025

  40. [40]

    Ontology-guided reverse thinking makes large language models stronger on knowledge graph question answering

    Runxuan Liu, Bei Luo, Jiaqi Li, Baoxin Wang, Ming Liu, Dayong Wu, Shijin Wang, and Bing Qin. Ontology-guided reverse thinking makes large language models stronger on knowledge graph question answering. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025

  41. [41]

    Text-to-sql guide (query engine + retriever)

    LlamaIndex. Text-to-sql guide (query engine + retriever). https://developers. llamaindex.ai/python/examples/index_structs/struct_indices/ sqlindexdemo/, 2026. Accessed: 2026-04-09

  42. [42]

    Chengda Lu, Xiaoyu Fan, Yu Huang, Rongwu Xu, Jijie Li, and Wei Xu. Does chain-of-thought reasoning really reduce harmfulness from jailbreaking? In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Findings of the Association for Computational Lin- guistics: ACL 2025, pages 6523–6546, Vienna, Austria, July 2025. Associati...

  43. [43]

    HyperGraphRAG: Retrieval- augmented generation via hypergraph-structured knowledge representation

    Haoran Luo, Haihong E, Guanting Chen, Yandan Zheng, Xiaobao Wu, Yikai Guo, Qika Lin, Yu Feng, Zemin Kuang, Meina Song, Yifan Zhu, and Luu Anh Tuan. HyperGraphRAG: Retrieval- augmented generation via hypergraph-structured knowledge representation. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  44. [44]

    Large language models meet knowledge graphs for question answering: Synthesis and opportunities

    Chuangtao Ma, Yongrui Chen, Tianxing Wu, Arijit Khan, and Haofen Wang. Large language models meet knowledge graphs for question answering: Synthesis and opportunities. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025

  45. [45]

    Maharaj, David Arbour, Daniel Lee, Uttaran Bhattacharya, Anup Rao, Austin Zane, Avi Feller, Kun Qian, and Yunyao Li

    Akash V . Maharaj, David Arbour, Daniel Lee, Uttaran Bhattacharya, Anup Rao, Austin Zane, Avi Feller, Kun Qian, and Yunyao Li. Evaluation and incident prevention in an enterprise ai assistant. InInnovative Applications of Artificial Intelligence (IAAI), 2025

  46. [46]

    GNN-RAG: Graph neural retrieval for efficient large language model reasoning on knowledge graphs

    Costas Mavromatis and George Karypis. GNN-RAG: Graph neural retrieval for efficient large language model reasoning on knowledge graphs. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Findings of the Association for Computational Lin- guistics: ACL 2025, pages 16682–16699, Vienna, Austria, July 2025. Association for...

  47. [47]

    Menon, Kun Qian, Liqun Chen, Ishika Joshi, Daniel Pandyan, Jordyn Harrison, Shashank Srivastava, and Yunyao Li

    Rakesh R. Menon, Kun Qian, Liqun Chen, Ishika Joshi, Daniel Pandyan, Jordyn Harrison, Shashank Srivastava, and Yunyao Li. Fisql: Enhancing text-to-sql systems with rich interactive feedback. InProceedings of the 28th International Conference on Extending Database Technology (EDBT), 2025

  48. [48]

    Llm-driven ontology construction for enterprise knowl- edge graphs, 2026

    Abdulsobur Oyewale and Tommaso Soru. Llm-driven ontology construction for enterprise knowl- edge graphs, 2026

  49. [49]

    MemGPT: Towards LLMs as Operating Systems

    Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Memgpt: Towards llms as operating systems.arXiv preprint arXiv:2310.08560, 2023. 17

  50. [50]

    Graph retrieval-augmented generation: A survey.ACM Trans

    Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. Graph retrieval-augmented generation: A survey.ACM Trans. Inf. Syst., 44(2), December 2025

  51. [51]

    Capturing symbolic knowledge of constraints and incompleteness to guide inductive learning in neuro-symbolic knowledge graph completion

    Disha Purohit, Yashrajsinh Chudasama, and Maria-Esther Vidal. Capturing symbolic knowledge of constraints and incompleteness to guide inductive learning in neuro-symbolic knowledge graph completion. InProceedings of the 13th International Conference on Knowledge Capture (K-CAP), 2025

  52. [52]

    Challenges in integrating ai with existing enterprise data governance structures

    Abhishek Rath. Challenges in integrating ai with existing enterprise data governance structures. Enterprise AI Journal, 2025

  53. [53]

    Agent drift: Quantifying behavioral degradation in multi-agent llm systems over extended interactions,

    Abhishek Rath. Agent drift: Quantifying behavioral degradation in multi-agent llm systems over extended interactions.arXiv preprint arXiv:2601.04170, 2026

  54. [54]

    Toolformer: Language models can teach themselves to use tools

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, 2024

  55. [55]

    Strohmer, Stephen Easton, Michael Hales, and Alenka Triplat.Supplier Relationship Management: How to Maximize Vendor Value and Opportunity

    Christian Schuh, Michael F. Strohmer, Stephen Easton, Michael Hales, and Alenka Triplat.Supplier Relationship Management: How to Maximize Vendor Value and Opportunity. Apress, 2014

  56. [56]

    Matchmaker: Schema matching with self-improving compositional llm programs

    Nabeel Seedat and Mihaela van der Schaar. Matchmaker: Schema matching with self-improving compositional llm programs. InThe Thirteenth International Conference on Learning Representa- tions (ICLR), 2025

  57. [57]

    OG-RAG: Ontology-grounded retrieval-augmented generation for large language models

    Kartik Sharma, Peeyush Kumar, and Yunqing Li. OG-RAG: Ontology-grounded retrieval-augmented generation for large language models. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 32962–32981, Suzhou, China, November 2025....

  58. [58]

    Autoqg: An automated framework for evidence-traceable question generation via ontology-guided knowledge graph construction.Preprint (Submitted to ICLR 2026), 2026

    Xiaoli Shen, Guilin Qi, Peng Zhang, Dingsheng Yang, Yang Yang, Muyan Yang, Li Fangyanhao, HaoWen Guo, HuiKang Hu, Jiazhen Kang, Yan Wang, Yuchen Lu, Zheyu Xin, and Jun Wang. Autoqg: An automated framework for evidence-traceable question generation via ontology-guided knowledge graph construction.Preprint (Submitted to ICLR 2026), 2026

  59. [59]

    Rajpreet Singh, Novak Boškov, Lawrence Drabeck, Aditya Gudal, and Manzoor A. Khan. CSR-RAG: An efficient retrieval system for text-to-sql on the enterprise scale.arXiv preprint arXiv:2601.06564, 2026

  60. [60]

    Can chatgpt replace traditional kbqa models? an in-depth analysis of the question answering performance of the gpt llm family

    Yiming Tan, Dehai Min, Yu Li, Wenbo Li, Nan Hu, Yongrui Chen, and Guilin Qi. Can chatgpt replace traditional kbqa models? an in-depth analysis of the question answering performance of the gpt llm family. InInternational Semantic Web Conference (ISWC), 2023

  61. [61]

    V oyager: An open-ended embodied agent with large language models

    Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models. In Advances in Neural Information Processing Systems (NeurIPS), volume 36, 2023

  62. [62]

    MIRIX: Multi-Agent Memory System for LLM-Based Agents

    Yu Wang and Xi Chen. Mirix: Multi-agent memory system for llm-based agents.arXiv preprint arXiv:2507.07957, 2025. 18

  63. [63]

    Sort-ai: Agentic system stability in large-scale ai systems—structural causes of cost, instability, and non-determinism in multi-agent workflows.Preprints.org, 2026

    Gregor Herbert Wegener. Sort-ai: Agentic system stability in large-scale ai systems—structural causes of cost, instability, and non-determinism in multi-agent workflows.Preprints.org, 2026

  64. [64]

    Virtual knowledge graphs: An overview of systems and use cases.Data Intelligence, 1(3):201–223, 2019

    Guohui Xiao, Linfang Ding, Benjamin Cogrel, and Diego Calvanese. Virtual knowledge graphs: An overview of systems and use cases.Data Intelligence, 1(3):201–223, 2019

  65. [65]

    Root cause analysis for microservice systems via cascaded conditional learning with hypergraphs, 2025

    Shuaiyu Xie, Hanbin He, Jian Wang, and Bing Li. Root cause analysis for microservice systems via cascaded conditional learning with hypergraphs, 2025

  66. [66]

    Hypkg: Hypergraph-based knowledge graph contextualization for precision healthcare.arXiv preprint arXiv:2507.19726, 2025

    Yuzhang Xie, Xu Han, Ran Xu, Xiao Hu, Jiaying Lu, and Carl Yang. Hypkg: Hypergraph-based knowledge graph contextualization for precision healthcare.arXiv preprint arXiv:2507.19726, 2025

  67. [67]

    The earth is flat because...: Investigating LLMs’ belief towards misinfor- mation via persuasive conversation

    Rongwu Xu, Brian Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang, Wei Xu, and Han Qiu. The earth is flat because...: Investigating LLMs’ belief towards misinfor- mation via persuasive conversation. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational...

  68. [68]

    Knowledge conflicts for LLMs: A survey

    Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, and Wei Xu. Knowledge conflicts for LLMs: A survey. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Pro- cessing, pages 8541–8565, Miami, Florida, USA, November 2024. Association for Computatio...

  69. [69]

    Generate-on-graph: Treat llm as both agent and kg in incomplete knowledge graph question answering

    Yao Xu, Shizhu He, Jiabei Chen, Zihao Wang, Yangqiu Song, Hanghang Tong, Kang Liu, and Jun Zhao. Generate-on-graph: Treat llm as both agent and kg in incomplete knowledge graph question answering. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

  70. [70]

    Graph-based agent memory: Taxonomy, techniques, and applications,

    Chang Yang, Chuang Zhou, Yilin Xiao, Su Dong, Luyao Zhuang, Yujing Zhang, Zhu Wang, Zijin Hong, Zheng Yuan, Zhishang Xiang, Shengyuan Chen, Huachi Zhou, Qinggang Zhang, Ninghao Liu, Jinsong Su, Xinrun Wang, Yi Chang, and Xiao Huang. Graph-based agent memory: Taxonomy, techniques, and applications.arXiv preprint arXiv:2602.05665, 2026

  71. [71]

    ReAct: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

  72. [72]

    Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task

    Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,...

  73. [73]

    Tablerag: A retrieval augmented generation framework for heterogeneous document reasoning

    Xiaohan Yu, Pu Jian, and Chong Chen. Tablerag: A retrieval augmented generation framework for heterogeneous document reasoning. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 14063–14082, Suzhou, China, 2025. Association for Computational Linguistics. Also available as arXiv:2506.10380

  74. [74]

    The shift from models to compound AI systems.Berkeley Artificial Intelligence Research (BAIR), 2024

    Matei Zaharia, Omar Khattab, Keshav Santhanam, Omar Paranjape, et al. The shift from models to compound AI systems.Berkeley Artificial Intelligence Research (BAIR), 2024. 19

  75. [75]

    PRoH: Dynamic planning and reasoning over knowledge hypergraphs for retrieval-augmented generation

    Xiangjun Zai, Xingyu Tan, Xiaoyang Wang, Qing Liu, Xiwei Xu, and Wenjie Zhang. PRoH: Dynamic planning and reasoning over knowledge hypergraphs for retrieval-augmented generation. InProceedings of the ACM Web Conference (WWW), 2026

  76. [76]

    Optimizing reasoning for text-to-sql with execution feedback

    Bohan Zhai, Canwen Xu, Yuxiong He, and Zhewei Yao. Optimizing reasoning for text-to-sql with execution feedback. InFindings of the Association for Computational Linguistics (ACL), 2025

  77. [77]

    Subgraph retrieval enhanced model for multi-hop knowledge base question answering

    Jing Zhang, Xiaokang Zhang, Jifan Yu, Jian Tang, Jie Tang, Cuiping Li, and Hong Chen. Subgraph retrieval enhanced model for multi-hop knowledge base question answering. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022

  78. [78]

    Qiu, and Lili Qiu

    Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, and Lili Qiu. Retrieval augmented generation (rag) and beyond: A comprehensive survey on how to make your llms use external data more wisely, 2024

  79. [79]

    Empowering supply chain risk monitoring with ontology-guided knowledge graph extraction by llms

    Shuhan Zheng, Keita Mizushina, and Ken Naono. Empowering supply chain risk monitoring with ontology-guided knowledge graph extraction by llms. InInternational Workshop on the Semantic Web, 2025

  80. [80]

    Memorybank: Enhancing large language models with long-term memory.Proceedings of the AAAI Conference on Artificial Intelligence, 2023

    Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory.Proceedings of the AAAI Conference on Artificial Intelligence, 2023

Showing first 80 references.