CYGNET: Cypher Gate for Neural Execution Triage and Cost Containment

Nikodem Tomczak

arxiv: 2606.04645 · v1 · pith:GDKDPFGFnew · submitted 2026-06-03 · 💻 cs.CL · cs.DB

CYGNET: Cypher Gate for Neural Execution Triage and Cost Containment

Nikodem Tomczak This is my paper

Pith reviewed 2026-06-28 06:47 UTC · model grok-4.3

classification 💻 cs.CL cs.DB

keywords Cypher querieslanguage model agentsknowledge graphsquery validationerror correctionstructural validationNeo4jpre-execution gate

0 comments

The pith

A pre-execution gate using a mirror graph catches every structural Cypher error from language models at zero false positives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a gate inserted between language-model query generation and a live Neo4j database. The gate runs each candidate query through a four-backend chain that ends by executing it on a mirror graph copy. Queries that fail are sent to a corrector that feeds the structured error back to the model for repair. The approach leaves the original accuracy of every tested model unchanged while blocking crashes and wrong-result queries on thousands of examples across seven schemas.

Core claim

CYGNET validates structure through a four-backend chain culminating in execution against a mirror graph at 5.6 ms median latency. Structurally broken queries are routed to a corrector that iterates structured error feedback through a language model. On seven CypherBench schemas the pipeline maintains generation accuracy on every model tested. The corrector achieves 81% to 95% success across five models. On a template-generated corpus the gate catches 100% of parse errors, 100% of constraint violations, and 100% of schema-reference errors in path queries with labelled endpoints, at zero false positives across 1135 queries.

What carries the argument

The four-backend validation chain that executes queries against a mirror graph to detect structural failures before production.

If this is right

The pipeline preserves generation accuracy on every model tested across 2348 questions.
The corrector repairs 81% to 95% of broken queries (mean 89%).
Property sibling-swaps that remain valid on the target label mark the formal boundary between structural and semantic validation.
A planner-based cost gate flags catastrophic plan structures before execution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same staged gate could be applied to other query languages and graph stores beyond Neo4j.
The separation of structural from semantic checks suggests a modular defense that later layers can build upon.
Low median latency supports insertion into real-time agent loops without noticeable slowdown.

Load-bearing premise

The mirror graph is an exact structural replica of the production database and the four-backend chain covers all possible structural failure modes.

What would settle it

A query that passes the gate yet produces a parse error, constraint violation, or schema-reference error when executed on the production database would falsify the claim.

Figures

Figures reproduced from arXiv: 2606.04645 by Nikodem Tomczak.

**Figure 2.** Figure 2: Validator chain modes. In short-circuit mode (top) the chain stops at the first failure [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Mirror graph construction from a schema description. A schema loaded from one of [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

Language models acting as agents over knowledge graphs generate Cypher queries that fail structurally (crashing at the database) or semantically (executing but returning wrong results). We place a pre-execution gate between query generation and a production Neo4j database. The gate validates structure through a four-backend chain culminating in execution against a mirror graph at 5.6 ms median latency. Structurally broken queries are routed to a corrector that iterates structured error feedback through a language model. On seven CypherBench schemas (2348 questions, ACL 2025) the pipeline maintains generation accuracy on every model tested, confirming it operates as a safe defensive layer. The corrector achieves 81% to 95% success across five models (mean 89%). On a template-generated corpus across nine schemas the gate catches 100% of parse errors, 100% of constraint violations, and 100% of schema-reference errors in path queries with labelled endpoints, at zero false positives across 1135 queries. Property sibling-swaps where the substituted name is valid on the target label score 0%, marking the formal boundary where structural validation ends and semantic validation must begin. A planner-based cost gate flags catastrophic plan structures before execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical four-backend gate plus corrector for catching structural Cypher errors from LLMs at reported 100% on its test set, but the live-system transfer rests on an unverified mirror replica.

read the letter

The one thing to know is that this paper describes a pre-execution gate for LLM-generated Cypher queries aimed at Neo4j. It chains four validation backends that end with execution on a mirror graph, then feeds errors into an iterative corrector loop.

What is new is the specific engineering of that pipeline for Cypher, including the 5.6 ms median latency figure and the exact error-type coverage on the 1135-query template corpus. The work does well on the reported results: 100% catch of parse errors, constraint violations, and schema-reference errors at zero false positives, plus 81-95% corrector success (mean 89%) across five models on seven CypherBench schemas. It also keeps generation accuracy intact and adds a planner-based cost gate. These are measurable, deployment-oriented numbers on named benchmarks.

The soft spot is the assumption that the mirror graph is an exact replica and that the four backends cover every structural failure mode. The 100% rates are measured against that mirror, yet the text supplies no independent check that the mirror schema, labels, relationships, and constraints match the production instance bitwise, nor a completeness argument for the chosen backends. Any divergence would break the transfer to live queries. The paper itself marks the boundary correctly by noting that property sibling-swaps score 0% and fall to semantic validation.

This paper is for people building LLM agents over graph databases who need a defensive layer before execution. Readers working on safe query generation or Neo4j tooling will find the architecture and numbers directly useful. It deserves a serious referee because the claims are concrete and the problem is real for deployment, even if the mirror assumption needs tighter verification in revision.

Referee Report

3 major / 0 minor

Summary. The paper proposes CYGNET, a pre-execution gate placed between LLM-generated Cypher queries and a production Neo4j database. It employs a four-backend validation chain ending in execution against a mirror graph (median 5.6 ms latency), routes structurally invalid queries to an iterative LM corrector, and adds a planner-based cost gate. On CypherBench (seven schemas, 2348 questions) the pipeline preserves generation accuracy; the corrector achieves 81-95% success (mean 89%). On a template corpus across nine schemas the gate reports 100% detection of parse errors, constraint violations, and schema-reference errors in labelled-endpoint path queries at 0% false positives over 1135 queries, while property sibling-swaps that remain syntactically valid score 0%.

Significance. If the mirror-graph fidelity and backend-coverage assumptions can be independently verified, the work supplies a concrete, low-latency defensive layer that separates structural from semantic validation for LLM agents over graph databases. The explicit 0% result on valid-name sibling swaps usefully demarcates the boundary at which structural triage must hand off to semantic checking.

major comments (3)

[Abstract] Abstract: the headline result (100% catch of parse, constraint, and schema-reference errors at 0 FP on 1135 queries) is obtained by final-stage execution against the mirror graph, yet the manuscript supplies neither a verification procedure confirming that the mirror schema, labels, relationships, and constraints are bitwise identical to the production Neo4j instance nor a completeness argument that the four-backend chain exercises every possible structural Cypher failure mode.
[Abstract] Abstract / Results: the reported performance figures (100% catch rates, 0 false positives, 89% mean corrector success) are presented as direct measurements on fixed benchmark sets with no dataset splits, error analysis, or description of how the template-generated corpus was constructed to ensure coverage of the claimed error classes, rendering the central empirical claims unverifiable from the given text.
[Abstract] Abstract: the transfer of the reported rates to live production queries rests on the untested assumptions that the mirror graph is an exact replica and that the four backends exhaustively cover all structural failure modes; any divergence or unexercised Cypher construct would falsify the claimed 100%/0% figures.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments identifying gaps in verifiability. We address each point below and will revise the manuscript to incorporate additional detail on mirror construction, corpus methodology, and assumption caveats while preserving the core claims.

read point-by-point responses

Referee: [Abstract] Abstract: the headline result (100% catch of parse, constraint, and schema-reference errors at 0 FP on 1135 queries) is obtained by final-stage execution against the mirror graph, yet the manuscript supplies neither a verification procedure confirming that the mirror schema, labels, relationships, and constraints are bitwise identical to the production Neo4j instance nor a completeness argument that the four-backend chain exercises every possible structural Cypher failure mode.

Authors: We agree the current text lacks an explicit verification procedure. In revision we will add a methods subsection describing mirror construction via Neo4j schema export, constraint replication, and automated metadata checksums to confirm bitwise identity on labels, relationships, and constraints. For completeness, we will expand the discussion to argue that the four backends (parser, schema validator, constraint checker, mirror execution) systematically target all structural failure modes detectable without query intent, while noting that exhaustive enumeration of every Cypher construct remains an open formal question. revision: yes
Referee: [Abstract] Abstract / Results: the reported performance figures (100% catch rates, 0 false positives, 89% mean corrector success) are presented as direct measurements on fixed benchmark sets with no dataset splits, error analysis, or description of how the template-generated corpus was constructed to ensure coverage of the claimed error classes, rendering the central empirical claims unverifiable from the given text.

Authors: The template corpus is generated from parameterized templates explicitly designed to inject the targeted error classes across the nine schemas; the full manuscript methods section enumerates the templates and resulting query counts. To improve verifiability we will add an expanded description of template construction, a per-error-class breakdown of the 1135 queries, and a short error analysis. Because the evaluation is exhaustive rather than a sampled training regime, no train/test splits were applied; we will insert a brief justification for reporting on the complete fixed sets. revision: yes
Referee: [Abstract] Abstract: the transfer of the reported rates to live production queries rests on the untested assumptions that the mirror graph is an exact replica and that the four backends exhaustively cover all structural failure modes; any divergence or unexercised Cypher construct would falsify the claimed 100%/0% figures.

Authors: We accept that transfer to production depends on these assumptions and will revise the discussion to state them explicitly, describe the synchronization protocol used to keep the mirror current, and note the sibling-swap result as the boundary beyond which semantic validation is required. We will also add a limitations paragraph acknowledging that unexercised Cypher constructs could affect coverage and that the 100%/0% figures are benchmark-specific. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical measurements on fixed benchmarks with no fitted quantities or self-referential derivations

full rationale

The paper reports direct empirical measurements of error detection rates (100% catch at 0% false positives on 1135 template queries) obtained by executing the four-backend validation chain against a mirror graph. No equations, parameter fitting, or derivation steps are described that would reduce the reported outcomes to the inputs by construction. The core claims rest on benchmark execution results rather than any self-definitional, fitted-prediction, or self-citation load-bearing structure. Assumptions about mirror-graph fidelity are external to the measurement process itself and do not create circularity in the reported detection statistics.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, background axioms, or newly postulated entities.

pith-pipeline@v0.9.1-grok · 5743 in / 1107 out tokens · 32583 ms · 2026-06-28T06:47:02.049893+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 6 canonical work pages

[1]

Knowledge graph-guided retrieval augmented generation

Xiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, and Wei Hu. Knowledge graph-guided retrieval augmented generation. InProceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pages 8912–8924, 2025. doi: 10.18653/v1/2025.naacl-long.449

work page doi:10.18653/v1/2025.naacl-long.449 2025
[2]

Graph retrieval-augmented generation: A survey.ACM Transactions on Information Systems, 44(2):1–52, 2026

Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. Graph retrieval-augmented generation: A survey.ACM Transactions on Information Systems, 44(2):1–52, 2026

2026
[3]

From lo- cal to global: A graph RAG approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From lo- cal to global: A graph RAG approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

Pith/arXiv arXiv 2024
[4]

Cypher: An evolving query language for property graphs

Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. Cypher: An evolving query language for property graphs. InProceedings of the 2018 International Conference on Management of Data (SIGMOD), pages 1433–1445, 2018. doi: 10.1145/ 3183713.3190657

arXiv 2018
[5]

Chatty-KG: A multi-agent AI system for on-demand conversational question answering over knowledge graphs.Proceedings of the ACM on Management of Data, 4(1): 18:1–18:26, 2026

Reham Omar, Abdelghny Orogat, Ibrahim Abdelaziz, Omij Mangukiya, Panos Kalnis, and Essam Mansour. Chatty-KG: A multi-agent AI system for on-demand conversational question answering over knowledge graphs.Proceedings of the ACM on Management of Data, 4(1): 18:1–18:26, 2026. doi: 10.1145/3786632

work page doi:10.1145/3786632 2026
[6]

AGENTiGraph: An interactive knowledge graph platform for LLM-based chatbots utilising private data.arXiv preprint arXiv:2410.11531, 2024

Xinjie Zhao, Moritz Blum, Rui Yang, Boming Yang, Luis Márquez Carpintero, Mónica Pina-Navarro, Tony Wang, Xin Li, Huitao Li, Yanran Fu, Rongrong Wang, Juntao Zhang, and Irene Li. AGENTiGraph: An interactive knowledge graph platform for LLM-based chatbots utilising private data.arXiv preprint arXiv:2410.11531, 2024

arXiv 2024
[7]

Decoding the mystery: How can LLMs turn text into Cypher in complex knowledge graphs?IEEE Access, 13:80981–81001, 2025

Ioanna Mandilara, Christina Maria Androna, Eleni Fotopoulou, Anastasios Zafeiropoulos, and Symeon Papavassiliou. Decoding the mystery: How can LLMs turn text into Cypher in complex knowledge graphs?IEEE Access, 13:80981–81001, 2025. doi: 10.1109/ACCESS. 2025.3567759

work page doi:10.1109/access 2025
[8]

CypherQueryCorrector in the LangChain Neo4j integration

LangChain. CypherQueryCorrector in the LangChain Neo4j integration. Python pack- age, langchain-neo4j, 2023. https://python.langchain.com/docs/integrations/graphs/ neo4j_cypher/

2023
[9]

mcp-neo4j-cypher: Model context protocol server for Neo4j

Neo4j Labs. mcp-neo4j-cypher: Model context protocol server for Neo4j. GitHub repository, 2024.https://github.com/neo4j-contrib/mcp-neo4j

2024
[10]

Extending confidence-based Text2Cypher with grammar and schema aware filtering.arXiv preprint arXiv:2605.10318, 2026

Makbule Gulcin Ozsoy. Extending confidence-based Text2Cypher with grammar and schema aware filtering.arXiv preprint arXiv:2605.10318, 2026. 20

Pith/arXiv arXiv 2026
[11]

SynthCypher: A fully synthetic data generation framework for text-to-Cypher querying in knowledge graphs.arXiv preprint arXiv:2412.12612, 2024

Aman Tiwari, Shiva Krishna Reddy Malay, Vikas Yadav, Masoud Hashemi, and Sath- wik Tejaswi Madhusudhan. SynthCypher: A fully synthetic data generation framework for text-to-Cypher querying in knowledge graphs.arXiv preprint arXiv:2412.12612, 2024

arXiv 2024
[12]

SyntheT2C: Generating synthetic data for fine-tuning large language models on the text- to-Cypher task

Zijie Zhong, Linqing Zhong, Zhaoze Sun, Qingyun Jin, Zengchang Qin, and Xiaofan Zhang. SyntheT2C: Generating synthetic data for fine-tuning large language models on the text- to-Cypher task. InProceedings of the 31st International Conference on Computational Linguistics (COLING), 2025

2025
[13]

Pragmatic Bookshelf, 2013

Terence Parr.The Definitive ANTLR 4 Reference. Pragmatic Bookshelf, 2013

2013
[14]

Cost trade-offs of reasoning and non- reasoning large language models in text-to-SQL.arXiv preprint arXiv:2512.22364, 2025

Saurabh Deochake and Debajyoti Mukhopadhyay. Cost trade-offs of reasoning and non- reasoning large language models in text-to-SQL.arXiv preprint arXiv:2512.22364, 2025

arXiv 2025
[15]

Laredo, Malik Magdon-Ismail, Louis Mandel, and Erik Wittern

Georgios Mavroudeas, Guillaume Baudart, Alan Cha, Martin Hirzel, Jim A. Laredo, Malik Magdon-Ismail, Louis Mandel, and Erik Wittern. Learning GraphQL query costs.arXiv preprint arXiv:2108.11139, 2021

arXiv 2021
[16]

MAC-SQL: A multi-agent collaborative framework for text-to-SQL

Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, Linzheng Chai, Zhao Yan, Qian-Wen Zhang, Di Yin, Xing Sun, and Zhoujun Li. MAC-SQL: A multi-agent collaborative framework for text-to-SQL. InProceedings of the 31st International Conference on Computational Linguistics (COLING), pages 540–557, 2025

2025
[17]

Wang, and Xi Victoria Lin

Ansong Ni, Srinivasan Iyer, Dragomir Radev, Veselin Stoyanov, Wen-tau Yih, Sida I. Wang, and Xi Victoria Lin. LEVER: Learning to verify language-to-code generation with execution. InProceedings of the 40th International Conference on Machine Learning (ICML), 2023

2023
[18]

RAMPART: Retrieval-augmented multi-block prompt assembly and registry toolkit, 2026

Nikodem Tomczak. RAMPART: Retrieval-augmented multi-block prompt assembly and registry toolkit, 2026. Paper submitted for publication

2026
[19]

Synthetic text-to-Cypher GPT-4 turbo dataset

Tomasz Bratanic. Synthetic text-to-Cypher GPT-4 turbo dataset. Hugging Face dataset, 2024.https://huggingface.co/datasets/tomasonjo/synthetic-text2cypher-gpt4turbo

2024
[20]

Enhancing Text2Cypher with schema filtering

Makbule Gulcin Ozsoy. Enhancing Text2Cypher with schema filtering. InProceedings of the 4th International Workshop on LLM-Integrated Knowledge Graph Generation from Text (LLM-TEXT2KG), co-located with ESWC 2025, volume 4020 ofCEUR Workshop Proceedings. CEUR-WS.org, 2025. arXiv:2505.05118

arXiv 2025
[21]

CypherBench: Towards precise retrieval over full-scale modern knowledge graphs in the LLM era

Yanlin Feng, Simone Papicchio, and Sajjadur Rahman. CypherBench: Towards precise retrieval over full-scale modern knowledge graphs in the LLM era. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8934–8958, Vienna, Austria, 2025. Association for Computational Linguistics. doi: 10.1865...

work page doi:10.18653/v1/2025.acl-long.438 2025
[22]

libcypher-parser: A parser library for the Cypher graph query language

Chris Leishman. libcypher-parser: A parser library for the Cypher graph query language. GitHub repository, 2017.https://github.com/cleishm/libcypher-parser

2017
[23]

Text2Cypher: Bridging natural language and graph databases

Makbule Gulcin Ozsoy, Leila Messallem, Jon Besga, and Gianandrea Minneci. Text2Cypher: Bridging natural language and graph databases. InProceedings of the Workshop on Generative AI and Knowledge Graphs (GenAIK), co-located with COLING 2025, pages 100–108, 2025. arXiv:2412.10064

arXiv 2025
[24]

Refining Text2Cypher on small language model with reinforcement learning leveraging semantic information.Applied Sciences, 15(15):8206, 2025

Quoc-Bao-Huy Tran, Aagha Abdul Waheed, Syed Mudasir, and Sun-Tae Chung. Refining Text2Cypher on small language model with reinforcement learning leveraging semantic information.Applied Sciences, 15(15):8206, 2025. doi: 10.3390/app15158206. 21

work page doi:10.3390/app15158206 2025
[25]

Text2Cypher across languages: Evaluating and finetuning LLMs

Makbule Gulcin Ozsoy and William Tai. Text2Cypher across languages: Evaluating and finetuning LLMs. InProceedings of the International Conference on Natural Language Processing and Information Retrieval (NLPIR), 2025. arXiv:2506.21445

arXiv 2025
[26]

Chang, Fei Huang, Reynold Cheng, and Yongbin Li

Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin C.-C. Chang, Fei Huang, Reynold Cheng, and Yongbin Li. Can LLM already serve as a database interface? a BIGbenchforlarge-scaledatabasegroundedtext-to-SQLs. InAdvances in Neural Information Processing S...

2023
[27]

DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction

Mohammadreza Pourreza and Davood Rafiei. DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023
[28]

RetrySQL: Text-to-SQL training with retry data for self-correcting query generation

Alicja Rączkowska, Riccardo Belluzzo, Piotr Zieliński, Joanna Baran, and Paweł Olszewski. RetrySQL: Text-to-SQL training with retry data for self-correcting query generation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2026

2026
[29]

Self-correction distillation for structured data question answering.arXiv preprint arXiv:2511.07998, 2025

Yushan Zhu, Wen Zhang, Long Jin, Mengshu Sun, Ling Zhong, Zhiqiang Liu, Juan Li, Lei Liang, Chong Long, Chao Deng, and Junlan Feng. Self-correction distillation for structured data question answering.arXiv preprint arXiv:2511.07998, 2025

arXiv 2025
[30]

Robust text-to-SQL generation with execution-guided decoding

Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Oleksandr Polozov, and Rishabh Singh. Robust text-to-SQL generation with execution-guided decoding. arXiv preprint arXiv:1807.03100, 2018

Pith/arXiv arXiv 2018
[31]

Teaching large language models to self-debug.arXiv preprint arXiv:2304.05128, 2023

Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. Teaching large language models to self-debug.arXiv preprint arXiv:2304.05128, 2023

Pith/arXiv arXiv 2023
[32]

Next-generation database interfaces: A survey of LLM-based text-to-SQL.IEEE Transactions on Knowledge and Data Engineering, 2025

Zijin Hong, Zheng Yuan, Qinggang Zhang, Hao Chen, Junnan Dong, Feiran Huang, and Xiao Huang. Next-generation database interfaces: A survey of LLM-based text-to-SQL.IEEE Transactions on Knowledge and Data Engineering, 2025

2025
[33]

A survey on employing large language models for text-to-SQL tasks.ACM Computing Surveys, 58(2):1–37, 2026

Liang Shi, Zhengju Tang, Nan Zhang, Xiaotong Zhang, and Zhi Yang. A survey on employing large language models for text-to-SQL tasks.ACM Computing Surveys, 58(2):1–37, 2026. doi: 10.1145/3737873

work page doi:10.1145/3737873 2026
[34]

A survey of large language model-based generative AI for text-to-SQL: Benchmarks, applications, use cases, and challenges.arXiv preprint arXiv:2412.05208, 2025

Aditi Singh, Akash Shetty, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. A survey of large language model-based generative AI for text-to-SQL: Benchmarks, applications, use cases, and challenges.arXiv preprint arXiv:2412.05208, 2025

arXiv 2025
[35]

KnowledgeGraphQueryEngine: Knowledge graph integration for LlamaIndex

LlamaIndex. KnowledgeGraphQueryEngine: Knowledge graph integration for LlamaIndex. Python package, 2024.https://docs.llamaindex.ai/

2024
[36]

cypher":

Maciej Besta, Łukasz Jarmocik, Orest Hrycyna, Shachar Klaiman, Konrad Mączka, Robert Gerstenberger, Jürgen Müller, Piotr Nyczyk, Hubert Niewiadomski, and Torsten Hoefler. GraphSeek: Next-generation graph analytics with LLMs.arXiv preprint arXiv:2602.11052, 2026. A Corrector prompt examples The following listings show the user prompt each corrector sends t...

arXiv 2026
[37]

did_you_mean has one strong match -- substitute and return
[38]

did_you_mean is empty but available_in_scope is short -- pick the most semantically plausible option
[39]

category

available_in_scope_truncated is True -- did_you_mean is the only hint; if empty, abort --- # Refinement intent (attempt 1) ## Failing query MATCH (n:Moive) RETURN n.title ## Failure category: schema ## Error payload (JSON) {"category":"schema", "unknown_reference":"Moive", ...} --- # Schema: label Movie - movieId: STRING (required) - title: STRING (requir...

[1] [1]

Knowledge graph-guided retrieval augmented generation

Xiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, and Wei Hu. Knowledge graph-guided retrieval augmented generation. InProceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pages 8912–8924, 2025. doi: 10.18653/v1/2025.naacl-long.449

work page doi:10.18653/v1/2025.naacl-long.449 2025

[2] [2]

Graph retrieval-augmented generation: A survey.ACM Transactions on Information Systems, 44(2):1–52, 2026

Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. Graph retrieval-augmented generation: A survey.ACM Transactions on Information Systems, 44(2):1–52, 2026

2026

[3] [3]

From lo- cal to global: A graph RAG approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From lo- cal to global: A graph RAG approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

Pith/arXiv arXiv 2024

[4] [4]

Cypher: An evolving query language for property graphs

Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. Cypher: An evolving query language for property graphs. InProceedings of the 2018 International Conference on Management of Data (SIGMOD), pages 1433–1445, 2018. doi: 10.1145/ 3183713.3190657

arXiv 2018

[5] [5]

Chatty-KG: A multi-agent AI system for on-demand conversational question answering over knowledge graphs.Proceedings of the ACM on Management of Data, 4(1): 18:1–18:26, 2026

Reham Omar, Abdelghny Orogat, Ibrahim Abdelaziz, Omij Mangukiya, Panos Kalnis, and Essam Mansour. Chatty-KG: A multi-agent AI system for on-demand conversational question answering over knowledge graphs.Proceedings of the ACM on Management of Data, 4(1): 18:1–18:26, 2026. doi: 10.1145/3786632

work page doi:10.1145/3786632 2026

[6] [6]

AGENTiGraph: An interactive knowledge graph platform for LLM-based chatbots utilising private data.arXiv preprint arXiv:2410.11531, 2024

Xinjie Zhao, Moritz Blum, Rui Yang, Boming Yang, Luis Márquez Carpintero, Mónica Pina-Navarro, Tony Wang, Xin Li, Huitao Li, Yanran Fu, Rongrong Wang, Juntao Zhang, and Irene Li. AGENTiGraph: An interactive knowledge graph platform for LLM-based chatbots utilising private data.arXiv preprint arXiv:2410.11531, 2024

arXiv 2024

[7] [7]

Decoding the mystery: How can LLMs turn text into Cypher in complex knowledge graphs?IEEE Access, 13:80981–81001, 2025

Ioanna Mandilara, Christina Maria Androna, Eleni Fotopoulou, Anastasios Zafeiropoulos, and Symeon Papavassiliou. Decoding the mystery: How can LLMs turn text into Cypher in complex knowledge graphs?IEEE Access, 13:80981–81001, 2025. doi: 10.1109/ACCESS. 2025.3567759

work page doi:10.1109/access 2025

[8] [8]

CypherQueryCorrector in the LangChain Neo4j integration

LangChain. CypherQueryCorrector in the LangChain Neo4j integration. Python pack- age, langchain-neo4j, 2023. https://python.langchain.com/docs/integrations/graphs/ neo4j_cypher/

2023

[9] [9]

mcp-neo4j-cypher: Model context protocol server for Neo4j

Neo4j Labs. mcp-neo4j-cypher: Model context protocol server for Neo4j. GitHub repository, 2024.https://github.com/neo4j-contrib/mcp-neo4j

2024

[10] [10]

Extending confidence-based Text2Cypher with grammar and schema aware filtering.arXiv preprint arXiv:2605.10318, 2026

Makbule Gulcin Ozsoy. Extending confidence-based Text2Cypher with grammar and schema aware filtering.arXiv preprint arXiv:2605.10318, 2026. 20

Pith/arXiv arXiv 2026

[11] [11]

SynthCypher: A fully synthetic data generation framework for text-to-Cypher querying in knowledge graphs.arXiv preprint arXiv:2412.12612, 2024

Aman Tiwari, Shiva Krishna Reddy Malay, Vikas Yadav, Masoud Hashemi, and Sath- wik Tejaswi Madhusudhan. SynthCypher: A fully synthetic data generation framework for text-to-Cypher querying in knowledge graphs.arXiv preprint arXiv:2412.12612, 2024

arXiv 2024

[12] [12]

SyntheT2C: Generating synthetic data for fine-tuning large language models on the text- to-Cypher task

Zijie Zhong, Linqing Zhong, Zhaoze Sun, Qingyun Jin, Zengchang Qin, and Xiaofan Zhang. SyntheT2C: Generating synthetic data for fine-tuning large language models on the text- to-Cypher task. InProceedings of the 31st International Conference on Computational Linguistics (COLING), 2025

2025

[13] [13]

Pragmatic Bookshelf, 2013

Terence Parr.The Definitive ANTLR 4 Reference. Pragmatic Bookshelf, 2013

2013

[14] [14]

Cost trade-offs of reasoning and non- reasoning large language models in text-to-SQL.arXiv preprint arXiv:2512.22364, 2025

Saurabh Deochake and Debajyoti Mukhopadhyay. Cost trade-offs of reasoning and non- reasoning large language models in text-to-SQL.arXiv preprint arXiv:2512.22364, 2025

arXiv 2025

[15] [15]

Laredo, Malik Magdon-Ismail, Louis Mandel, and Erik Wittern

Georgios Mavroudeas, Guillaume Baudart, Alan Cha, Martin Hirzel, Jim A. Laredo, Malik Magdon-Ismail, Louis Mandel, and Erik Wittern. Learning GraphQL query costs.arXiv preprint arXiv:2108.11139, 2021

arXiv 2021

[16] [16]

MAC-SQL: A multi-agent collaborative framework for text-to-SQL

Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, Linzheng Chai, Zhao Yan, Qian-Wen Zhang, Di Yin, Xing Sun, and Zhoujun Li. MAC-SQL: A multi-agent collaborative framework for text-to-SQL. InProceedings of the 31st International Conference on Computational Linguistics (COLING), pages 540–557, 2025

2025

[17] [17]

Wang, and Xi Victoria Lin

Ansong Ni, Srinivasan Iyer, Dragomir Radev, Veselin Stoyanov, Wen-tau Yih, Sida I. Wang, and Xi Victoria Lin. LEVER: Learning to verify language-to-code generation with execution. InProceedings of the 40th International Conference on Machine Learning (ICML), 2023

2023

[18] [18]

RAMPART: Retrieval-augmented multi-block prompt assembly and registry toolkit, 2026

Nikodem Tomczak. RAMPART: Retrieval-augmented multi-block prompt assembly and registry toolkit, 2026. Paper submitted for publication

2026

[19] [19]

Synthetic text-to-Cypher GPT-4 turbo dataset

Tomasz Bratanic. Synthetic text-to-Cypher GPT-4 turbo dataset. Hugging Face dataset, 2024.https://huggingface.co/datasets/tomasonjo/synthetic-text2cypher-gpt4turbo

2024

[20] [20]

Enhancing Text2Cypher with schema filtering

Makbule Gulcin Ozsoy. Enhancing Text2Cypher with schema filtering. InProceedings of the 4th International Workshop on LLM-Integrated Knowledge Graph Generation from Text (LLM-TEXT2KG), co-located with ESWC 2025, volume 4020 ofCEUR Workshop Proceedings. CEUR-WS.org, 2025. arXiv:2505.05118

arXiv 2025

[21] [21]

CypherBench: Towards precise retrieval over full-scale modern knowledge graphs in the LLM era

Yanlin Feng, Simone Papicchio, and Sajjadur Rahman. CypherBench: Towards precise retrieval over full-scale modern knowledge graphs in the LLM era. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8934–8958, Vienna, Austria, 2025. Association for Computational Linguistics. doi: 10.1865...

work page doi:10.18653/v1/2025.acl-long.438 2025

[22] [22]

libcypher-parser: A parser library for the Cypher graph query language

Chris Leishman. libcypher-parser: A parser library for the Cypher graph query language. GitHub repository, 2017.https://github.com/cleishm/libcypher-parser

2017

[23] [23]

Text2Cypher: Bridging natural language and graph databases

Makbule Gulcin Ozsoy, Leila Messallem, Jon Besga, and Gianandrea Minneci. Text2Cypher: Bridging natural language and graph databases. InProceedings of the Workshop on Generative AI and Knowledge Graphs (GenAIK), co-located with COLING 2025, pages 100–108, 2025. arXiv:2412.10064

arXiv 2025

[24] [24]

Refining Text2Cypher on small language model with reinforcement learning leveraging semantic information.Applied Sciences, 15(15):8206, 2025

Quoc-Bao-Huy Tran, Aagha Abdul Waheed, Syed Mudasir, and Sun-Tae Chung. Refining Text2Cypher on small language model with reinforcement learning leveraging semantic information.Applied Sciences, 15(15):8206, 2025. doi: 10.3390/app15158206. 21

work page doi:10.3390/app15158206 2025

[25] [25]

Text2Cypher across languages: Evaluating and finetuning LLMs

Makbule Gulcin Ozsoy and William Tai. Text2Cypher across languages: Evaluating and finetuning LLMs. InProceedings of the International Conference on Natural Language Processing and Information Retrieval (NLPIR), 2025. arXiv:2506.21445

arXiv 2025

[26] [26]

Chang, Fei Huang, Reynold Cheng, and Yongbin Li

Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin C.-C. Chang, Fei Huang, Reynold Cheng, and Yongbin Li. Can LLM already serve as a database interface? a BIGbenchforlarge-scaledatabasegroundedtext-to-SQLs. InAdvances in Neural Information Processing S...

2023

[27] [27]

DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction

Mohammadreza Pourreza and Davood Rafiei. DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023

[28] [28]

RetrySQL: Text-to-SQL training with retry data for self-correcting query generation

Alicja Rączkowska, Riccardo Belluzzo, Piotr Zieliński, Joanna Baran, and Paweł Olszewski. RetrySQL: Text-to-SQL training with retry data for self-correcting query generation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2026

2026

[29] [29]

Self-correction distillation for structured data question answering.arXiv preprint arXiv:2511.07998, 2025

Yushan Zhu, Wen Zhang, Long Jin, Mengshu Sun, Ling Zhong, Zhiqiang Liu, Juan Li, Lei Liang, Chong Long, Chao Deng, and Junlan Feng. Self-correction distillation for structured data question answering.arXiv preprint arXiv:2511.07998, 2025

arXiv 2025

[30] [30]

Robust text-to-SQL generation with execution-guided decoding

Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Oleksandr Polozov, and Rishabh Singh. Robust text-to-SQL generation with execution-guided decoding. arXiv preprint arXiv:1807.03100, 2018

Pith/arXiv arXiv 2018

[31] [31]

Teaching large language models to self-debug.arXiv preprint arXiv:2304.05128, 2023

Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. Teaching large language models to self-debug.arXiv preprint arXiv:2304.05128, 2023

Pith/arXiv arXiv 2023

[32] [32]

Next-generation database interfaces: A survey of LLM-based text-to-SQL.IEEE Transactions on Knowledge and Data Engineering, 2025

Zijin Hong, Zheng Yuan, Qinggang Zhang, Hao Chen, Junnan Dong, Feiran Huang, and Xiao Huang. Next-generation database interfaces: A survey of LLM-based text-to-SQL.IEEE Transactions on Knowledge and Data Engineering, 2025

2025

[33] [33]

A survey on employing large language models for text-to-SQL tasks.ACM Computing Surveys, 58(2):1–37, 2026

Liang Shi, Zhengju Tang, Nan Zhang, Xiaotong Zhang, and Zhi Yang. A survey on employing large language models for text-to-SQL tasks.ACM Computing Surveys, 58(2):1–37, 2026. doi: 10.1145/3737873

work page doi:10.1145/3737873 2026

[34] [34]

A survey of large language model-based generative AI for text-to-SQL: Benchmarks, applications, use cases, and challenges.arXiv preprint arXiv:2412.05208, 2025

Aditi Singh, Akash Shetty, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. A survey of large language model-based generative AI for text-to-SQL: Benchmarks, applications, use cases, and challenges.arXiv preprint arXiv:2412.05208, 2025

arXiv 2025

[35] [35]

KnowledgeGraphQueryEngine: Knowledge graph integration for LlamaIndex

LlamaIndex. KnowledgeGraphQueryEngine: Knowledge graph integration for LlamaIndex. Python package, 2024.https://docs.llamaindex.ai/

2024

[36] [36]

cypher":

Maciej Besta, Łukasz Jarmocik, Orest Hrycyna, Shachar Klaiman, Konrad Mączka, Robert Gerstenberger, Jürgen Müller, Piotr Nyczyk, Hubert Niewiadomski, and Torsten Hoefler. GraphSeek: Next-generation graph analytics with LLMs.arXiv preprint arXiv:2602.11052, 2026. A Corrector prompt examples The following listings show the user prompt each corrector sends t...

arXiv 2026

[37] [37]

did_you_mean has one strong match -- substitute and return

[38] [38]

did_you_mean is empty but available_in_scope is short -- pick the most semantically plausible option

[39] [39]

category

available_in_scope_truncated is True -- did_you_mean is the only hint; if empty, abort --- # Refinement intent (attempt 1) ## Failing query MATCH (n:Moive) RETURN n.title ## Failure category: schema ## Error payload (JSON) {"category":"schema", "unknown_reference":"Moive", ...} --- # Schema: label Movie - movieId: STRING (required) - title: STRING (requir...