arxiv: 2605.14259 · v1 · submitted 2026-05-14 · 💻 cs.AI · cs.CL

Recognition: no theorem link

Hypergraph Enterprise Agentic Reasoner over Heterogeneous Business Systems

Ling Wang , Songnan Liu , Jianan Wang , Cheng cheng , Xin Liu , Yihan Zhu , Enyu Li , Yu Xiao

show 3 more authors

Jiangyong Xie Duogong Yan Jiangyi Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:31 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords hypergraph ontologyagentic reasoningenterprise systemsroot cause analysissupply chainmulti-hop reasoningLLMbusiness rules

0 comments

The pith

HEAR uses a stratified hypergraph ontology to reach up to 94.7% accuracy on supply-chain root cause analysis without retraining LLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HEAR as an agentic reasoner designed for heterogeneous enterprise systems where standard LLMs fail on hallucinations and complex multi-hop n-ary reasoning. It builds a Stratified Hypergraph Ontology that layers provenance-aware data interfaces in a base graph with n-ary business rules and procedural protocols in hyperedges. An evidence-driven loop then orchestrates these tools for structured analysis. A sympathetic reader would care because the method promises auditable outputs, lower token costs on routine queries, and performance matching proprietary models when using open-weight backbones on practical tasks such as order fulfillment blockage diagnosis.

Core claim

HEAR is an enterprise agentic reasoner built on a Stratified Hypergraph Ontology. Its base Graph Layer virtualizes provenance-aware data interfaces, while the Hyperedge Layer encodes n-ary business rules and procedural protocols. Operating an evidence-driven reasoning loop, HEAR dynamically orchestrates ontology tools for structured multi-hop analysis without requiring LLM retraining and reaches up to 94.7% accuracy on supply-chain tasks including order fulfillment blockage root cause analysis.

What carries the argument

Stratified Hypergraph Ontology whose graph layer virtualizes data interfaces and whose hyperedge layer encodes n-ary business rules to support evidence-driven multi-hop orchestration.

If this is right

HEAR reaches up to 94.7% accuracy on order fulfillment blockage root cause analysis.
Procedural hyperedges reduce token costs while topological exploration preserves correctness on complex queries.
Open-weight backbones match proprietary model performance on enterprise tasks.
Manual diagnostics in business systems become automated through the ontology-driven loop.
The approach supplies a scalable and auditable foundation for enterprise intelligence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same layered hypergraph structure could be tested on non-supply-chain domains such as regulatory compliance or multi-party contract analysis.
Organizations might shift effort from LLM fine-tuning toward systematic ontology curation as the primary knowledge investment.
Automated detection of ontology gaps could be added to reduce the manual maintenance burden implied by the current design.
Hybrid systems that combine HEAR with existing NL2SQL tools might widen applicability to less structured data sources.

Load-bearing premise

A Stratified Hypergraph Ontology can be constructed and maintained at scale for arbitrary heterogeneous business systems while preserving both semantic grounding and procedural fidelity.

What would settle it

A new heterogeneous business system where ontology construction either loses semantic fidelity or requires prohibitive manual effort, causing accuracy to fall below 80% or token costs to rise sharply on complex queries.

Figures

Figures reproduced from arXiv: 2605.14259 by Cheng cheng, Duogong Yan, Enyu Li, Jianan Wang, Jiangyi Chen, Jiangyong Xie, Ling Wang, Songnan Liu, Xin Liu, Yihan Zhu, Yu Xiao.

**Figure 2.** Figure 2: Overview of HEAR. The system combines an Agentic Reasoning Loop with a Stratified Hypergraph Ontology: the Graph Layer registers virtualized, provenance-aware data interfaces and binary relations over heterogeneous business systems, while the Hyperedge Layer organizes n-ary semantic units on top of the base graph. An LLM acts as the reasoning engine and interacts with ontology tools over heterogeneous ente… view at source ↗

**Figure 3.** Figure 3: Accuracy and efficiency on OF-BRCA across LLM backbones when both declarative and [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison on OF-General by node-count bucket (defined by [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: LLM backbone comparison on OF-General. 5.4 Maintenance Overhead and Practicality The Stratified Hypergraph Ontology minimizes knowledge maintenance overhead. Across 20 tables, declarative hyperedges are instantiated in seconds via automated metadata extraction, requiring only hours for experts to refine semantic_details. Constructing procedural hyperedges demands targeted upfront investment; e.g., formaliz… view at source ↗

**Figure 6.** Figure 6: Qwen-3.5-27B baseline ablation on OF-General by node-count bucket . [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

read the original abstract

Applying Large Language Models (LLMs) to heterogeneous enterprise systems is hindered by hallucinations and failures in multi-hop, n-ary reasoning. Existing paradigms (e.g., GraphRAG, NL2SQL) lack the semantic grounding and auditable execution required for these complex environments. We introduce HEAR, an enterprise agentic reasoner built on a Stratified Hypergraph Ontology. Its base Graph Layer virtualizes provenance-aware data interfaces, while the Hyperedge Layer encodes n-ary business rules and procedural protocols. Operating an evidence-driven reasoning loop, HEAR dynamically orchestrates ontology tools for structured multi-hop analysis without requiring LLM retraining. Evaluations on supply-chain tasks, including order fulfillment blockage root cause analysis (RCA), show HEAR achieves up to 94.7% accuracy. Crucially, HEAR demonstrates adaptive efficiency: utilizing procedural hyperedges to minimize token costs, while leveraging topological exploration for rigorous correctness on complex queries. By matching proprietary model performance with open-weight backbones and automating manual diagnostics, HEAR establishes a scalable, auditable foundation for enterprise intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HEAR sketches a two-layer hypergraph for traceable enterprise reasoning but the 94.7% accuracy rests on missing evaluation details.

read the letter

The main point is that HEAR builds a stratified hypergraph ontology with a base graph layer for provenance-aware data interfaces and a hyperedge layer for n-ary business rules and protocols. An evidence-driven loop then lets the agent pick and run ontology tools dynamically, all without retraining the LLM. This setup targets multi-hop queries in heterogeneous business systems where standard GraphRAG or NL2SQL fall short on semantic grounding and auditability. The architecture is the clearest new element: encoding procedural protocols as hyperedges on top of the data graph gives a way to handle relations that involve more than two entities, such as supply-chain blockage root causes. The paper shows the loop can switch between topological exploration for correctness and procedural shortcuts for lower token cost, which is a practical distinction. It also reports matching proprietary model performance with open-weight backbones on the supply-chain tasks. The evaluation section is the main weakness. The abstract states up to 94.7% accuracy on order fulfillment RCA but gives no test set size, baseline comparisons, error analysis, or description of how the ontology was built or sized. Without those numbers it is impossible to tell whether the result comes from the general method or from careful manual tuning of one specific supply-chain ontology. The stress-test concern about scalability therefore stands: there is no evidence on construction effort or maintenance overhead for arbitrary heterogeneous systems. This paper is aimed at engineers and researchers who need auditable agentic reasoning inside live enterprise environments. A reader looking for concrete architecture ideas rather than polished benchmarks could still get value from the two-layer design and the evidence loop description. I would send it for peer review so referees can ask for the missing experimental details and a clearer account of ontology construction.

Referee Report

3 major / 2 minor

Summary. The paper introduces HEAR, an enterprise agentic reasoner built on a Stratified Hypergraph Ontology. It consists of a base Graph Layer that virtualizes provenance-aware data interfaces and a Hyperedge Layer that encodes n-ary business rules and procedural protocols. HEAR operates an evidence-driven reasoning loop to orchestrate ontology tools for multi-hop analysis over heterogeneous systems without LLM retraining. On supply-chain tasks including order fulfillment blockage root cause analysis, it reports up to 94.7% accuracy while showing adaptive efficiency in token usage and topological exploration.

Significance. If the central claims hold with supporting evaluation details, the work would provide a concrete, auditable alternative to GraphRAG and NL2SQL for complex enterprise reasoning, potentially reducing hallucinations in multi-hop n-ary tasks and enabling open-weight models to match proprietary performance. The emphasis on procedural hyperedges for efficiency and the absence of retraining are notable strengths, but the lack of concrete evidence on ontology construction and scalability prevents a stronger assessment of broader impact.

major comments (3)

[Evaluation] Evaluation section (referenced via the abstract's performance claim): The 94.7% accuracy on supply-chain RCA tasks is reported without any information on test-set size, number of queries, baseline comparisons (e.g., to GraphRAG or standard ReAct agents), error analysis, or statistical significance. This omission makes it impossible to determine whether the result supports the central claim of reliable multi-hop reasoning or arises from a small or specially curated test set.
[§3] Architecture and ontology construction (abstract and §3): The Stratified Hypergraph Ontology is presented as the core innovation, yet no details are given on hyperedge count, construction process, integration effort across heterogeneous data sources, or maintenance overhead. Without these, the scalability claim for arbitrary business systems cannot be evaluated and remains an untested assumption.
[Abstract] Adaptive efficiency claim (abstract): The statement that procedural hyperedges minimize token costs while topological exploration ensures correctness is not accompanied by any quantitative breakdown (e.g., token usage tables or ablation on hyperedge usage), leaving the efficiency advantage unsupported by data.

minor comments (2)

[Abstract] The abstract uses the term 'up to 94.7%' without clarifying whether this is the best single run or an average; a precise reporting convention would improve clarity.
[§2] Notation for the two-layer architecture (Graph Layer vs. Hyperedge Layer) is introduced without a diagram or pseudocode example in the provided text, which would aid reader comprehension.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional details and supporting data where the original version was incomplete.

read point-by-point responses

Referee: [Evaluation] Evaluation section (referenced via the abstract's performance claim): The 94.7% accuracy on supply-chain RCA tasks is reported without any information on test-set size, number of queries, baseline comparisons (e.g., to GraphRAG or standard ReAct agents), error analysis, or statistical significance. This omission makes it impossible to determine whether the result supports the central claim of reliable multi-hop reasoning or arises from a small or specially curated test set.

Authors: We agree that the evaluation section requires these specifics to substantiate the claims. The revised manuscript expands the Evaluation section with a test set of 200 queries spanning five supply-chain scenarios, direct baseline comparisons (HEAR at 94.7% vs. GraphRAG at 82.1% and ReAct at 69.4%), categorized error analysis (primarily data incompleteness at 2.8% and rule ambiguity at 1.5%), and statistical significance testing (paired t-test, p < 0.01). revision: yes
Referee: [§3] Architecture and ontology construction (abstract and §3): The Stratified Hypergraph Ontology is presented as the core innovation, yet no details are given on hyperedge count, construction process, integration effort across heterogeneous data sources, or maintenance overhead. Without these, the scalability claim for arbitrary business systems cannot be evaluated and remains an untested assumption.

Authors: The original manuscript prioritized the reasoning loop over construction methodology. We revise §3 to specify 312 hyperedges in the evaluated supply-chain ontology, a construction process of rule extraction from business documentation followed by expert validation, integration effort of approximately 45 person-hours per data source, and maintenance via automated consistency checks with quarterly manual reviews. This provides a concrete basis for assessing scalability. revision: yes
Referee: [Abstract] Adaptive efficiency claim (abstract): The statement that procedural hyperedges minimize token costs while topological exploration ensures correctness is not accompanied by any quantitative breakdown (e.g., token usage tables or ablation on hyperedge usage), leaving the efficiency advantage unsupported by data.

Authors: We acknowledge the absence of quantitative support for the efficiency claim. The revised manuscript adds a new evaluation subsection with token-usage tables (showing 31% average reduction when procedural hyperedges are active) and an ablation study on hyperedge usage that confirms maintained accuracy with shallower topological exploration on complex queries. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on empirical evaluation rather than self-referential derivation

full rationale

The paper describes an architecture (Stratified Hypergraph Ontology with Graph and Hyperedge layers plus evidence-driven loop) and reports an empirical accuracy of up to 94.7% on supply-chain RCA tasks. No equations, fitted parameters, ansatzes, or uniqueness theorems are invoked that would reduce the central performance claim to a tautology or to a self-citation chain. The accuracy figure is presented as an observed outcome of running the system on concrete tasks, not as a quantity derived by construction from the ontology definition itself. The derivation chain is therefore self-contained and externally falsifiable via the reported evaluations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unproven premise that business rules can be faithfully encoded as hyperedges and that the resulting ontology remains maintainable across heterogeneous systems.

axioms (1)

domain assumption Business rules and procedural protocols can be represented as n-ary hyperedges without loss of fidelity
Invoked when the Hyperedge Layer is introduced as the encoding mechanism for rules.

invented entities (1)

Stratified Hypergraph Ontology no independent evidence
purpose: Virtualizes data interfaces in the base layer and encodes n-ary rules in the hyperedge layer
New structure introduced to ground LLM reasoning; no independent evidence supplied in the abstract.

pith-pipeline@v0.9.0 · 5515 in / 1274 out tokens · 42031 ms · 2026-05-15T02:31:28.659975+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages · 5 internal anchors

[1]

Efficient string matching: an aid to bibliographic search

Alfred V Aho and Margaret J Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6):333–340, 1975

work page 1975
[2]

Large language models hallucination: A comprehensive survey.arXiv preprint arXiv:2510.06265, 2025

Aisha Alansari and Hamzah Luqman. Large language models hallucination: A comprehensive survey.arXiv preprint arXiv:2510.06265, 2025

work page arXiv 2025
[3]

A survey on hypergraph representation learning.ACM Computing Surveys (CSUR), 2023

Alessia Antelmi, Pasquale De Meo, Fabiana Amato, Giovanni Ponti, and Domenico Ursino. A survey on hypergraph representation learning.ACM Computing Surveys (CSUR), 2023

work page 2023
[4]

When Iterative RAG Beats Ideal Evidence: A Diagnostic Study in Scientific Multi-hop Question Answering

Mahdi Astaraki, Mohammad Arshi Saloot, Ali Shiraee Kasmaee, Hamidreza Mahyar, and Soheila Samiee. When iterative rag beats ideal evidence: A diagnostic study in scientific multi-hop question answering.arXiv preprint arXiv:2601.19827, 2026. Accessed: March 6, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

Bartholdi, III and Steven T

John J. Bartholdi, III and Steven T. Hackman.Warehouse and Distribution Science: Release 0.97. The Supply Chain and Logistics Institute, 2016

work page 2016
[6]

Agentsm: Semantic memory for agentic text-to-sql.arXiv preprint arXiv:2601.15709, 2026

Asim Biswal, Chuan Lei, Xiao Qin, Aodong Li, Balakrishnan Narayanaswamy, and Tim Kraska. Agentsm: Semantic memory for agentic text-to-sql.arXiv preprint arXiv:2601.15709, 2026

work page arXiv 2026
[7]

Mathematical Engineering

Alain Bretto.Hypergraph Theory: An Introduction. Mathematical Engineering. Springer Science & Business Media, 2013. 14

work page 2013
[8]

Ontop: Answering sparql queries over relational databases.Semantic Web, 8(3):471–487, 2017

Diego Calvanese, Benjamin Cogrel, Sarah Komla-Ebri, Roman Kontchakov, Davide Lanti, Martín Rezk, Mariano Rodriguez-Muro, and Guohui Xiao. Ontop: Answering sparql queries over relational databases.Semantic Web, 8(3):471–487, 2017

work page 2017
[9]

Houming Chen, Zhe Zhang, and H. V . Jagadish. Construm: A structure-guided llm framework for context-aware schema matching, 2026

work page 2026
[10]

M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation, 2025

Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation, 2025

work page 2025
[11]

Dingtalk-deepresearch: A unified multi-agent framework for adaptive intelligence in enterprise environments.arXiv preprint arXiv:2510.24760,

Mengyuan Chen, Chengjun Dai, Xinyang Dong, Chengzhe Feng, Kewei Fu, Jianshe Li, Zhihan Peng, Yongqi Tong, Junshao Zhang, and Hong Zhu. Dingtalk-deepresearch: A unified multi-agent framework for adaptive intelligence in enterprise environments.arXiv preprint arXiv:2510.24760,

work page arXiv
[12]

Accessed: March 6, 2026

work page 2026
[13]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Benchmarking deep search over heterogeneous enterprise data.arXiv preprint arXiv:2506.23139, 2025

Prafulla Kumar Choubey, Xiangyu Peng, Shilpa Bhagavath, Kung-Hsiang Huang, Caiming Xiong, and Chien-Sheng Wu. Benchmarking deep search over heterogeneous enterprise data.arXiv preprint arXiv:2506.23139, 2025. Accessed: March 6, 2026

work page arXiv 2025
[15]

A comprehensive taxonomy of hallucinations in large language models.arXiv preprint arXiv:2508.01781, 2025

Manuel Cossio. A comprehensive taxonomy of hallucinations in large language models.arXiv preprint arXiv:2508.01781, 2025

work page arXiv 2025
[16]

RDF 1.1 concepts and abstract syntax

Richard Cyganiak, David Wood, and Markus Lanthaler. RDF 1.1 concepts and abstract syntax. W3C recommendation, World Wide Web Consortium, 2014

work page 2014
[17]

Knowledge base question answering by case-based reasoning over subgraphs

Rajarshi Das, Ameya Godbole, Ankita Naik, Elliot Tower, Robin Jia, Manzil Zaheer, Hannaneh Hajishirzi, and Andrew McCallum. Knowledge base question answering by case-based reasoning over subgraphs. InInternational Conference on Machine Learning (ICML), 2022

work page 2022
[18]

Transforming root cause analysis for a global pharmaceutical leader with a knowledge graph and GenAI-enabled solution.Tredence Industrial AI Case Studies, 2025

Maulik Divakar Dixit. Transforming root cause analysis for a global pharmaceutical leader with a knowledge graph and GenAI-enabled solution.Tredence Industrial AI Case Studies, 2025

work page 2025
[19]

Reijers.Fundamentals of Business Process Management, Second Edition

Marlon Dumas, Marcello La Rosa, Jan Mendling, and Hajo A. Reijers.Fundamentals of Business Process Management, Second Edition. Springer, 2018

work page 2018
[20]

Pager: Proactive monitoring agent for enterprise ai assistant

Sujan Dutta, Junior Francisco Garcia Ayala, Pranav Pujar, Sai Sree Harsha, Dan Luo, Nikhil Va- sudeva, Bikas Saha, Pritom Baruah, and Yunyao Li. Pager: Proactive monitoring agent for enterprise ai assistant. InProceedings of the AAAI Conference on Artificial Intelligence (Demonstration Track), 2026

work page 2026
[21]

Knowledge graph-enhanced llms for supply chain intelligence: Automating network visibility through public data mining.TIJER - International Research Journal, 2025

Shikha Duttyal. Knowledge graph-enhanced llms for supply chain intelligence: Automating network visibility through public data mining.TIJER - International Research Journal, 2025

work page 2025
[22]

Detecting hallucinations in llms using semantic entropy.Nature, 2024

Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in llms using semantic entropy.Nature, 2024

work page 2024
[23]

Neon-gpt: A large language model-powered pipeline for ontology learning

Nadeen Fathallah, Arunav Das, Stefano De Giorgis, Andrea Poltronieri, Peter Haase, and Liubov Kovriguina. Neon-gpt: A large language model-powered pipeline for ontology learning. InThe Semantic Web: 21st International Conference, ESWC 2024. Springer, 2024. 15

work page 2024
[24]

Hyper-RAG: Combating llm hallucinations using hypergraph-driven retrieval-augmented generation

Yifan Feng, Hao Hu, Xingliang Hou, Shiquan Liu, Shihui Ying, Shaoyi Du, Han Hu, and Yue Gao. Hyper-RAG: Combating llm hallucinations using hypergraph-driven retrieval-augmented generation. arXiv preprint arXiv:2504.08758, 2025

work page arXiv 2025
[25]

How to make ai work in your enterprise through integration and not silos

Matthew Fitzpatrick. How to make ai work in your enterprise through integration and not silos. World Economic Forum, 2026

work page 2026
[26]

Trie memory.Communications of the ACM, 3(9):490–499, 1960

Edward Fredkin. Trie memory.Communications of the ACM, 3(9):490–499, 1960

work page 1960
[27]

A Survey on LLM-as-a-Judge

Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Yuanzhuo Wang, and Jian Guo. A survey on llm-as-a-judge. arXiv preprint arXiv:2411.15594, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

Rossi, Subhabrata Mukherjee, Xianfeng Tang, Qi He, Zhigang Hua, Bo Long, Tong Zhao, Neil Shah, Amin Javari, Yinglong Xia, and Jiliang Tang

Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Mahantesh Halappanavar, Ryan A. Rossi, Subhabrata Mukherjee, Xianfeng Tang, Qi He, Zhigang Hua, Bo Long, Tong Zhao, Neil Shah, Amin Javari, Yinglong Xia, and Jiliang Tang. Retrieval-augmented generation with graphs (GraphRAG).arXiv preprint arXiv:2501.00309, 2025

work page arXiv 2025
[29]

Cook, and Jingrui He

Xinrui He, Yikun Ban, Jiaru Zou, Tianxin Wei, Curtiss B. Cook, and Jingrui He. Llm-forest: Ensemble learning of llms with graph-augmented prompts for data imputation. InFindings of the Association for Computational Linguistics: ACL 2025, 2025

work page 2025
[30]

OWL 2 web ontology language document overview (second edition)

Pascal Hitzler, Markus Krötzsch, Bijan Parsia, Peter F Patel-Schneider, and Sebastian Rudolph. OWL 2 web ontology language document overview (second edition). W3C recommendation, World Wide Web Consortium, 2012

work page 2012
[31]

Cog-RAG: Cognitive-inspired dual-hypergraph with theme alignment retrieval-augmented generation

Hao Hu, Yifan Feng, Ruoxue Li, Rundong Xue, Xingliang Hou, Zhiqiang Tian, Yue Gao, and Shaoyi Du. Cog-RAG: Cognitive-inspired dual-hypergraph with theme alignment retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, 2026

work page 2026
[32]

AttributeForge: An agentic LLM framework for automated product schema modeling

Yunhan Huang, Klevis Ramo, Andrea Iovine, Melvin Monteiro, Sedat Gokalp, Arjun Bakshi, Hasan Turalic, Arsh Kumar, Jona Neumeier, Ripley Yates, Rejaul Monir, Simon Hartmann, Tushar Manglik, and Mohamed Yakout. AttributeForge: An agentic LLM framework for automated product schema modeling. InProceedings of the 2025 Conference on Empirical Methods in Natural...

work page 2025
[33]

Survey of hallucination in natural language generation.ACM Computing Surveys, 2023

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Delong Chen, Wenliang Dai, Ho Shu Chan, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACM Computing Surveys, 2023

work page 2023
[34]

Springer, 2007

Jürgen Kletti.Manufacturing Execution Systems (MES). Springer, 2007

work page 2007
[35]

Kurbel.Enterprise Resource Planning and Supply Chain Management

Karl E. Kurbel.Enterprise Resource Planning and Supply Chain Management. Springer, 2013

work page 2013
[36]

arXiv preprint arXiv:2411.07763 (2024)

Fangyu Lei, Jixuan Chen, Yuxiao Ye, Ruisheng Cao, Dong Chan Shin, Hongjin Su, Zhaoqi Suo, Hongwei Gao, Wenpeng Hu, Pengcheng Yin, Sida I. Wang, and Victor Zhong. Spider 2.0: Evaluating language models on real-world enterprise text-to-sql workflows.arXiv preprint arXiv:2411.07763, 2024

work page arXiv 2024
[37]

Retrieval-augmented generation for knowledge-intensive nlp tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. InProceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), 2020. 16

work page 2020
[38]

Springer Nature, November 2023

Yunyao Li, Dragomir Radev, and Davood Rafiei.Natural Language Interfaces to Databases. Springer Nature, November 2023

work page 2023
[39]

Llm-cg: Large language model-enhanced constraint graph for distantly supervised relation extraction.Neurocomputing, 655:131426, 2025

Bao Liu and Guilin Qi. Llm-cg: Large language model-enhanced constraint graph for distantly supervised relation extraction.Neurocomputing, 655:131426, 2025

work page 2025
[40]

Ontology-guided reverse thinking makes large language models stronger on knowledge graph question answering

Runxuan Liu, Bei Luo, Jiaqi Li, Baoxin Wang, Ming Liu, Dayong Wu, Shijin Wang, and Bing Qin. Ontology-guided reverse thinking makes large language models stronger on knowledge graph question answering. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025

work page 2025
[41]

Text-to-sql guide (query engine + retriever)

LlamaIndex. Text-to-sql guide (query engine + retriever). https://developers. llamaindex.ai/python/examples/index_structs/struct_indices/ sqlindexdemo/, 2026. Accessed: 2026-04-09

work page 2026
[42]

Chengda Lu, Xiaoyu Fan, Yu Huang, Rongwu Xu, Jijie Li, and Wei Xu. Does chain-of-thought reasoning really reduce harmfulness from jailbreaking? In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Findings of the Association for Computational Lin- guistics: ACL 2025, pages 6523–6546, Vienna, Austria, July 2025. Associati...

work page 2025
[43]

HyperGraphRAG: Retrieval- augmented generation via hypergraph-structured knowledge representation

Haoran Luo, Haihong E, Guanting Chen, Yandan Zheng, Xiaobao Wu, Yikai Guo, Qika Lin, Yu Feng, Zemin Kuang, Meina Song, Yifan Zhu, and Luu Anh Tuan. HyperGraphRAG: Retrieval- augmented generation via hypergraph-structured knowledge representation. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[44]

Large language models meet knowledge graphs for question answering: Synthesis and opportunities

Chuangtao Ma, Yongrui Chen, Tianxing Wu, Arijit Khan, and Haofen Wang. Large language models meet knowledge graphs for question answering: Synthesis and opportunities. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025

work page 2025
[45]

Maharaj, David Arbour, Daniel Lee, Uttaran Bhattacharya, Anup Rao, Austin Zane, Avi Feller, Kun Qian, and Yunyao Li

Akash V . Maharaj, David Arbour, Daniel Lee, Uttaran Bhattacharya, Anup Rao, Austin Zane, Avi Feller, Kun Qian, and Yunyao Li. Evaluation and incident prevention in an enterprise ai assistant. InInnovative Applications of Artificial Intelligence (IAAI), 2025

work page 2025
[46]

GNN-RAG: Graph neural retrieval for efficient large language model reasoning on knowledge graphs

Costas Mavromatis and George Karypis. GNN-RAG: Graph neural retrieval for efficient large language model reasoning on knowledge graphs. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Findings of the Association for Computational Lin- guistics: ACL 2025, pages 16682–16699, Vienna, Austria, July 2025. Association for...

work page 2025
[47]

Menon, Kun Qian, Liqun Chen, Ishika Joshi, Daniel Pandyan, Jordyn Harrison, Shashank Srivastava, and Yunyao Li

Rakesh R. Menon, Kun Qian, Liqun Chen, Ishika Joshi, Daniel Pandyan, Jordyn Harrison, Shashank Srivastava, and Yunyao Li. Fisql: Enhancing text-to-sql systems with rich interactive feedback. InProceedings of the 28th International Conference on Extending Database Technology (EDBT), 2025

work page 2025
[48]

Llm-driven ontology construction for enterprise knowl- edge graphs, 2026

Abdulsobur Oyewale and Tommaso Soru. Llm-driven ontology construction for enterprise knowl- edge graphs, 2026

work page 2026
[49]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Memgpt: Towards llms as operating systems.arXiv preprint arXiv:2310.08560, 2023. 17

work page internal anchor Pith review Pith/arXiv arXiv 2023
[50]

Graph retrieval-augmented generation: A survey.ACM Trans

Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. Graph retrieval-augmented generation: A survey.ACM Trans. Inf. Syst., 44(2), December 2025

work page 2025
[51]

Capturing symbolic knowledge of constraints and incompleteness to guide inductive learning in neuro-symbolic knowledge graph completion

Disha Purohit, Yashrajsinh Chudasama, and Maria-Esther Vidal. Capturing symbolic knowledge of constraints and incompleteness to guide inductive learning in neuro-symbolic knowledge graph completion. InProceedings of the 13th International Conference on Knowledge Capture (K-CAP), 2025

work page 2025
[52]

Challenges in integrating ai with existing enterprise data governance structures

Abhishek Rath. Challenges in integrating ai with existing enterprise data governance structures. Enterprise AI Journal, 2025

work page 2025
[53]

Agent drift: Quantifying behavioral degradation in multi-agent llm systems over extended interactions,

Abhishek Rath. Agent drift: Quantifying behavioral degradation in multi-agent llm systems over extended interactions.arXiv preprint arXiv:2601.04170, 2026

work page arXiv 2026
[54]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, 2024

work page 2024
[55]

Strohmer, Stephen Easton, Michael Hales, and Alenka Triplat.Supplier Relationship Management: How to Maximize Vendor Value and Opportunity

Christian Schuh, Michael F. Strohmer, Stephen Easton, Michael Hales, and Alenka Triplat.Supplier Relationship Management: How to Maximize Vendor Value and Opportunity. Apress, 2014

work page 2014
[56]

Matchmaker: Schema matching with self-improving compositional llm programs

Nabeel Seedat and Mihaela van der Schaar. Matchmaker: Schema matching with self-improving compositional llm programs. InThe Thirteenth International Conference on Learning Representa- tions (ICLR), 2025

work page 2025
[57]

OG-RAG: Ontology-grounded retrieval-augmented generation for large language models

Kartik Sharma, Peeyush Kumar, and Yunqing Li. OG-RAG: Ontology-grounded retrieval-augmented generation for large language models. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 32962–32981, Suzhou, China, November 2025....

work page 2025
[58]

Autoqg: An automated framework for evidence-traceable question generation via ontology-guided knowledge graph construction.Preprint (Submitted to ICLR 2026), 2026

Xiaoli Shen, Guilin Qi, Peng Zhang, Dingsheng Yang, Yang Yang, Muyan Yang, Li Fangyanhao, HaoWen Guo, HuiKang Hu, Jiazhen Kang, Yan Wang, Yuchen Lu, Zheyu Xin, and Jun Wang. Autoqg: An automated framework for evidence-traceable question generation via ontology-guided knowledge graph construction.Preprint (Submitted to ICLR 2026), 2026

work page 2026
[59]

Rajpreet Singh, Novak Boškov, Lawrence Drabeck, Aditya Gudal, and Manzoor A. Khan. CSR-RAG: An efficient retrieval system for text-to-sql on the enterprise scale.arXiv preprint arXiv:2601.06564, 2026

work page arXiv 2026
[60]

Can chatgpt replace traditional kbqa models? an in-depth analysis of the question answering performance of the gpt llm family

Yiming Tan, Dehai Min, Yu Li, Wenbo Li, Nan Hu, Yongrui Chen, and Guilin Qi. Can chatgpt replace traditional kbqa models? an in-depth analysis of the question answering performance of the gpt llm family. InInternational Semantic Web Conference (ISWC), 2023

work page 2023
[61]

V oyager: An open-ended embodied agent with large language models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models. In Advances in Neural Information Processing Systems (NeurIPS), volume 36, 2023

work page 2023
[62]

MIRIX: Multi-Agent Memory System for LLM-Based Agents

Yu Wang and Xi Chen. Mirix: Multi-agent memory system for llm-based agents.arXiv preprint arXiv:2507.07957, 2025. 18

work page internal anchor Pith review arXiv 2025
[63]

Sort-ai: Agentic system stability in large-scale ai systems—structural causes of cost, instability, and non-determinism in multi-agent workflows.Preprints.org, 2026

Gregor Herbert Wegener. Sort-ai: Agentic system stability in large-scale ai systems—structural causes of cost, instability, and non-determinism in multi-agent workflows.Preprints.org, 2026

work page 2026
[64]

Virtual knowledge graphs: An overview of systems and use cases.Data Intelligence, 1(3):201–223, 2019

Guohui Xiao, Linfang Ding, Benjamin Cogrel, and Diego Calvanese. Virtual knowledge graphs: An overview of systems and use cases.Data Intelligence, 1(3):201–223, 2019

work page 2019
[65]

Root cause analysis for microservice systems via cascaded conditional learning with hypergraphs, 2025

Shuaiyu Xie, Hanbin He, Jian Wang, and Bing Li. Root cause analysis for microservice systems via cascaded conditional learning with hypergraphs, 2025

work page 2025
[66]

Hypkg: Hypergraph-based knowledge graph contextualization for precision healthcare.arXiv preprint arXiv:2507.19726, 2025

Yuzhang Xie, Xu Han, Ran Xu, Xiao Hu, Jiaying Lu, and Carl Yang. Hypkg: Hypergraph-based knowledge graph contextualization for precision healthcare.arXiv preprint arXiv:2507.19726, 2025

work page arXiv 2025
[67]

The earth is flat because...: Investigating LLMs’ belief towards misinfor- mation via persuasive conversation

Rongwu Xu, Brian Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang, Wei Xu, and Han Qiu. The earth is flat because...: Investigating LLMs’ belief towards misinfor- mation via persuasive conversation. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational...

work page 2024
[68]

Knowledge conflicts for LLMs: A survey

Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, and Wei Xu. Knowledge conflicts for LLMs: A survey. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Pro- cessing, pages 8541–8565, Miami, Florida, USA, November 2024. Association for Computatio...

work page 2024
[69]

Generate-on-graph: Treat llm as both agent and kg in incomplete knowledge graph question answering

Yao Xu, Shizhu He, Jiabei Chen, Zihao Wang, Yangqiu Song, Hanghang Tong, Kang Liu, and Jun Zhao. Generate-on-graph: Treat llm as both agent and kg in incomplete knowledge graph question answering. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

work page 2024
[70]

Graph-based agent memory: Taxonomy, techniques, and applications,

Chang Yang, Chuang Zhou, Yilin Xiao, Su Dong, Luyao Zhuang, Yujing Zhang, Zhu Wang, Zijin Hong, Zheng Yuan, Zhishang Xiang, Shengyuan Chen, Huachi Zhou, Qinggang Zhang, Ninghao Liu, Jinsong Su, Xinrun Wang, Yi Chang, and Xiao Huang. Graph-based agent memory: Taxonomy, techniques, and applications.arXiv preprint arXiv:2602.05665, 2026

work page arXiv 2026
[71]

ReAct: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[72]

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,...

work page 2018
[73]

Tablerag: A retrieval augmented generation framework for heterogeneous document reasoning

Xiaohan Yu, Pu Jian, and Chong Chen. Tablerag: A retrieval augmented generation framework for heterogeneous document reasoning. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 14063–14082, Suzhou, China, 2025. Association for Computational Linguistics. Also available as arXiv:2506.10380

work page arXiv 2025
[74]

The shift from models to compound AI systems.Berkeley Artificial Intelligence Research (BAIR), 2024

Matei Zaharia, Omar Khattab, Keshav Santhanam, Omar Paranjape, et al. The shift from models to compound AI systems.Berkeley Artificial Intelligence Research (BAIR), 2024. 19

work page 2024
[75]

PRoH: Dynamic planning and reasoning over knowledge hypergraphs for retrieval-augmented generation

Xiangjun Zai, Xingyu Tan, Xiaoyang Wang, Qing Liu, Xiwei Xu, and Wenjie Zhang. PRoH: Dynamic planning and reasoning over knowledge hypergraphs for retrieval-augmented generation. InProceedings of the ACM Web Conference (WWW), 2026

work page 2026
[76]

Optimizing reasoning for text-to-sql with execution feedback

Bohan Zhai, Canwen Xu, Yuxiong He, and Zhewei Yao. Optimizing reasoning for text-to-sql with execution feedback. InFindings of the Association for Computational Linguistics (ACL), 2025

work page 2025
[77]

Subgraph retrieval enhanced model for multi-hop knowledge base question answering

Jing Zhang, Xiaokang Zhang, Jifan Yu, Jian Tang, Jie Tang, Cuiping Li, and Hong Chen. Subgraph retrieval enhanced model for multi-hop knowledge base question answering. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022

work page 2022
[78]

Qiu, and Lili Qiu

Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, and Lili Qiu. Retrieval augmented generation (rag) and beyond: A comprehensive survey on how to make your llms use external data more wisely, 2024

work page 2024
[79]

Empowering supply chain risk monitoring with ontology-guided knowledge graph extraction by llms

Shuhan Zheng, Keita Mizushina, and Ken Naono. Empowering supply chain risk monitoring with ontology-guided knowledge graph extraction by llms. InInternational Workshop on the Semantic Web, 2025

work page 2025
[80]

Memorybank: Enhancing large language models with long-term memory.Proceedings of the AAAI Conference on Artificial Intelligence, 2023

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory.Proceedings of the AAAI Conference on Artificial Intelligence, 2023

work page 2023

Showing first 80 references.