Code-on-Graph: Iterative Programmatic Reasoning via Large Language Models on Knowledge Graphs

Fei Wang; Jiafeng Guo; Jin Zhang; Kun Su; Long Bai; Weiwei Ding; Xiaolong Jin; Xueqi Cheng; Zhuo Chen; Zixuan Li

arxiv: 2606.03705 · v1 · pith:2WFO6RG2new · submitted 2026-06-02 · 💻 cs.AI

Code-on-Graph: Iterative Programmatic Reasoning via Large Language Models on Knowledge Graphs

Weiwei Ding , Zixuan Li , Long Bai , Zhuo Chen , Kun Su , Fei Wang , Xiaolong Jin , Jin Zhang

show 2 more authors

Jiafeng Guo Xueqi Cheng

This is my paper

Pith reviewed 2026-06-28 10:04 UTC · model grok-4.3

classification 💻 cs.AI

keywords knowledge graphslarge language modelsprogrammatic reasoningquestion answeringcode generationschema representationiterative reasoning

0 comments

The pith

Representing knowledge graph schemas as Python classes lets LLMs generate executable reasoning code over retrieved facts without direct prompt injection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that standard LLM-KG methods are limited by fixed operators that cannot express complex question semantics and by the need to stuff many facts straight into prompts. CoG instead turns retrieved facts' schemas into Python classes at each step, treats the facts as class objects, and has the model write code that operates on those objects to reach an answer. This code-based interface supplies the needed flexibility and keeps prompts from growing with the size of the knowledge. If the approach holds, models can tackle harder multi-hop questions on bigger graphs while staying grounded in the data.

Core claim

Given facts retrieved at each step, the framework identifies the matching KG schemas, encodes them as Python classes that act as abstract interfaces, instantiates the facts as objects of those classes, and produces executable code that performs the required reasoning by operating on the objects, thereby achieving up to 10.5 percent higher accuracy than prior methods on WebQSP, CWQ, and GrailQA.

What carries the argument

The Code-on-Graph process of mapping retrieved facts to Python class schemas and generating grounded executable code for iterative reasoning.

If this is right

Reasoning over knowledge graphs can become more compositional because code can combine operations freely rather than being restricted to a fixed set of operators.
Prompt size stays bounded because facts remain outside the prompt and are accessed only through the class objects at runtime.
Iterative, multi-step question answering on knowledge graphs improves because each step can produce new code that builds on prior results.
The same retrieved facts can support different reasoning paths by changing only the generated code rather than the prompt content.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same class-based code interface could be applied to other structured sources such as databases or APIs if their schemas can be represented similarly.
Reliable code generation might allow verification of intermediate reasoning steps by executing the code rather than trusting the model's final answer alone.
Performance on very large graphs would depend on how well the model scales its schema identification and code writing as the number of available classes grows.

Load-bearing premise

Large language models can reliably pick the correct KG schema for each retrieved fact and write code that correctly encodes the needed reasoning steps.

What would settle it

An experiment in which code generated by the model fails to execute correctly or produces wrong answers on a substantial fraction of queries from WebQSP, CWQ, or GrailQA, erasing the reported accuracy gains.

Figures

Figures reproduced from arXiv: 2606.03705 by Fei Wang, Jiafeng Guo, Jin Zhang, Kun Su, Long Bai, Weiwei Ding, Xiaolong Jin, Xueqi Cheng, Zhuo Chen, Zixuan Li.

**Figure 2.** Figure 2: The framework of CoG. Compared with conventional semantic parsing, CoG does not aim to generate a single database query in one step. Instead, it treats KG reasoning as an iterative program synthesis problem over retrieved schema abstractions. Compared with agentic tool-use frameworks, CoG does not depend on a small fixed operator inventory to represent all reasoning patterns. Instead, it allows the model… view at source ↗

**Figure 3.** Figure 3: An example for comparing PoG and CoG in answering complex questions. the governors of Ohio" and "Filter for those whose term started before 2011-01-10". The KG contains extensive information about Ohio and its officeholders. When faced with such a large number of facts, PoG reduces the reasoning context by extensively pruning, which ultimately leads to the loss of critical information and results in erro… view at source ↗

**Figure 4.** Figure 4: Task decomposition and evaluation prompts. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Code generation and correction prompts [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Error distribution across the three datasets. [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Effect of the Maximum Number of Correction [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: The proportion of cases that involve code [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: The hit rate of the corrected cases for Qwen3-Coder-30B-A3B and DeepSeek-V3.2 [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: A comparison between CoG and predefined [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Three representative operators in the CWQ dataset (i.e., Argmax, Argmin, and Union). For each question, [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

read the original abstract

Knowledge Graphs (KGs) are widely used to mitigate the limitations of Large Language Models (LLMs), such as outdated knowledge and hallucinations. Existing LLM-KG integration frameworks typically rely on predefined operators to retrieve factual knowledge from KGs and inject it into prompts for answer generation. This paradigm faces two critical bottlenecks: 1) Inflexibility: The predefined operators are limited in scope and thus lack sufficient compositional expressiveness to fully capture the complex semantics required by KG questions. 2) Unscalability: Direct injection of factual knowledge into prompts limits scalability in handling large-scale factual knowledge. To address these two bottlenecks, we propose Code-on-Graph (CoG), a programmatic reasoning framework for LLM-KG integration. Specifically, given the factual knowledge retrieved at each reasoning step, CoG first identifies the corresponding KG schemas and represents these schemas as Python classes, which serve as abstract interfaces to the retrieved facts. It then generates executable code grounded in these classes, with the retrieved facts instantiated as objects of the corresponding classes during execution. This design enables flexible code-based reasoning while avoiding the direct injection of large-scale factual knowledge into prompts. Experiments on WebQSP, CWQ, and GrailQA demonstrate that CoG outperforms prior state-of-the-art models by up to 10.5%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CoG's shift to dynamic Python classes and code generation for KG reasoning is a clear step past fixed operators, but the abstract gives almost no evidence that the LLM steps actually work reliably enough to deliver the claimed gains.

read the letter

The main takeaway is that this paper replaces predefined operators and fact injection with a loop where the LLM maps retrieved facts to KG schema classes turned into Python classes, then writes and runs code on those objects. That directly targets the two bottlenecks they name: limited compositionality and prompt bloat.

The approach is new in how it makes the schema the interface and keeps facts out of the prompt by instantiating them only at execution time. If the full paper shows that this actually produces more expressive reasoning paths than the baselines it cites, it would be a useful addition to the hybrid LLM-KG line of work.

The obvious gap is the lack of any numbers on the two LLM steps that everything depends on: schema identification accuracy and whether the generated code matches the needed semantics. The abstract mentions iterative execution but supplies no per-step error rates, no code verification method, and no ablation that separates generation failures from end-task accuracy. On an iterative method, even 10-15% per-step mistakes would compound fast, so the 10.5% headline number is hard to interpret without those details.

The experiments are only summarized, with no baseline descriptions or error analysis visible here. That makes it difficult to judge whether the gains are robust or tied to particular dataset quirks.

This is for people already working on programmatic or tool-use extensions to LLM reasoning over structured data. It is worth sending to peer review because the framing is distinct and the problems it attacks are real; a referee can ask for the missing diagnostics and see whether the implementation holds up.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes Code-on-Graph (CoG), an iterative programmatic reasoning framework for LLM-KG integration. It represents retrieved KG facts' schemas as Python classes, generates executable code grounded in these classes, and instantiates facts as objects during execution to enable flexible reasoning without direct fact injection into prompts. The approach is claimed to outperform prior SOTA by up to 10.5% on WebQSP, CWQ, and GrailQA.

Significance. If validated, this framework could significantly advance LLM-KG systems by providing greater compositional expressiveness through code and better scalability by avoiding prompt overload with facts. The programmatic approach may offer advantages in handling complex query semantics that predefined operators cannot capture.

major comments (1)

[Abstract] Abstract: The central performance claim depends on the LLM reliably identifying the correct KG schema classes from retrieved facts at each step and generating executable code that accurately captures the required semantics. However, the abstract supplies no experimental details, error analysis, code verification methods, or ablations to evaluate the accuracy of these LLM steps or to demonstrate that per-step failures do not compound across iterations to negate the reported gains.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive comment. We address the concern regarding the abstract point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: The central performance claim depends on the LLM reliably identifying the correct KG schema classes from retrieved facts at each step and generating executable code that accurately captures the required semantics. However, the abstract supplies no experimental details, error analysis, code verification methods, or ablations to evaluate the accuracy of these LLM steps or to demonstrate that per-step failures do not compound across iterations to negate the reported gains.

Authors: The abstract is a high-level summary constrained by length and is not intended to contain full experimental details. The full manuscript includes dedicated sections on the method (Section 3), experiments (Section 4), and analysis (Section 5) that provide ablations on schema class identification accuracy, code generation fidelity, execution verification via runtime success rates, and iterative error analysis. These results show that per-step LLM decisions maintain high reliability and that any failures do not compound sufficiently to erase the reported gains of up to 10.5%. We are willing to add one sentence to the abstract referencing the robustness demonstrated in the experiments if the editor deems it appropriate. revision: partial

Circularity Check

0 steps flagged

No circularity detected; framework proposal is self-contained

full rationale

The paper presents CoG as a new framework that represents KG schemas as Python classes and generates executable code from retrieved facts to enable flexible reasoning without direct fact injection. The abstract and description contain no equations, fitted parameters, or derivations that reduce the claimed improvements (up to 10.5% on WebQSP/CWQ/GrailQA) to inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked in the provided text. The central claim rests on the empirical performance of the described iterative code-generation process, which is presented as independent of prior results and externally falsifiable via benchmark evaluation. This is the expected outcome for a methods paper introducing a novel integration approach.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review provides insufficient detail to enumerate free parameters or invented entities; the central mechanism implicitly rests on one domain assumption about LLM code generation capability.

axioms (1)

domain assumption LLMs can reliably identify KG schemas from retrieved facts and generate correct executable code grounded in Python classes representing those schemas
This is the core mechanism described for CoG in the abstract and is required for the framework to function as claimed.

pith-pipeline@v0.9.1-grok · 5787 in / 1346 out tokens · 41056 ms · 2026-06-28T10:04:35.858042+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 17 canonical work pages

[1]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Knowcoder: Coding structured knowledge into llms for universal information extraction , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[2]

2026 , isbn =

Chen, Zhuo and Wang, Fei and Li, Zixuan and Zhang, Zhao and Ding, Weiwei and Yang, Chuanguang and Xu, Yongjun and Jin, Xiaolong , title =. 2026 , isbn =. doi:10.1145/3774904.3792662 , booktitle =

work page doi:10.1145/3774904.3792662 2026
[3]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

D-RAG: Differentiable Retrieval-Augmented Generation for Knowledge Graph Question Answering , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025
[4]

arXiv preprint arXiv:2506.06881 , year=

Knowcoder-v2: Deep knowledge analysis , author=. arXiv preprint arXiv:2506.06881 , year=

arXiv
[5]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Knowcoder-x: Boosting multilingual information extraction via code , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025
[6]

CCF International Conference on Natural Language Processing and Chinese Computing , pages=

Retrieval-augmented code generation for universal information extraction , author=. CCF International Conference on Natural Language Processing and Chinese Computing , pages=. 2024 , organization=

2024
[7]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

Self-improvement programming for temporal knowledge graph question answering , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

2024
[8]

arXiv preprint arXiv:2604.07720 , year=

Towards Knowledgeable Deep Research: Framework and Benchmark , author=. arXiv preprint arXiv:2604.07720 , year=

Pith/arXiv arXiv
[9]

The Value of Semantic Parse Labeling for Knowledge Base Question Answering

Yih, Wen-tau and Richardson, Matthew and Meek, Chris and Chang, Ming-Wei and Suh, Jina. The Value of Semantic Parse Labeling for Knowledge Base Question Answering. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2016. doi:10.18653/v1/P16-2033

work page doi:10.18653/v1/p16-2033 2016
[10]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence,

Knowledge Base Question Answering with Topic Units , author =. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence,. 2019 , month =. doi:10.24963/ijcai.2019/701 , url =

work page doi:10.24963/ijcai.2019/701 2019
[11]

Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering

Zhang, Jing and Zhang, Xiaokang and Yu, Jifan and Tang, Jian and Tang, Jie and Li, Cuiping and Chen, Hong. Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.396

work page doi:10.18653/v1/2022.acl-long.396 2022
[12]

The Eleventh International Conference on Learning Representations,

Donghan Yu and Sheng Zhang and Patrick Ng and Henghui Zhu and Alexander Hanbo Li and Jun Wang and Yiqun Hu and William Yang Wang and Zhiguo Wang and Bing Xiang , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =

2023
[13]

C hat KBQA : A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models

Luo, Haoran and E, Haihong and Tang, Zichen and Peng, Shiyao and Guo, Yikai and Zhang, Wentai and Ma, Chenghao and Dong, Guanting and Song, Meina and Lin, Wei and Zhu, Yifan and Luu, Anh Tuan. C hat KBQA : A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models. Findings of the Association for Computa...

work page doi:10.18653/v1/2024.findings-acl.122 2024
[14]

HTML : Hierarchical Topology Multi-task Learning for Semantic Parsing in Knowledge Base Question Answering

Wulamu, Aziguli and Zhengyu, Lyu and Gong, Kaiyuan and Han, Yu and Wang, Zewen and Zhu, Zhihong and Xing, Bowen. HTML : Hierarchical Topology Multi-task Learning for Semantic Parsing in Knowledge Base Question Answering. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.485

work page doi:10.18653/v1/2025.findings-acl.485 2025
[15]

Rule- KBQA : Rule-Guided Reasoning for Complex Knowledge Base Question Answering with Large Language Models

Zhang, Zhiqiang and Wen, Liqiang and Zhao, Wen. Rule- KBQA : Rule-Guided Reasoning for Complex Knowledge Base Question Answering with Large Language Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025
[16]

DARA : Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs

Fang, Haishuo and Zhu, Xiaodan and Gurevych, Iryna. DARA : Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.203

work page doi:10.18653/v1/2024.findings-acl.203 2024
[17]

2025 , eprint=

A Survey of Large Language Models , author=. 2025 , eprint=

2025
[18]

2025 , eprint=

RFKG-CoT: Relation-Driven Adaptive Hop-count Selection and Few-Shot Path Guidance for Knowledge-Aware QA , author=. 2025 , eprint=

2025
[19]

2025 , eprint=

Hallucination is Inevitable: An Innate Limitation of Large Language Models , author=. 2025 , eprint=

2025
[20]

Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities

Ma, Chuangtao and Chen, Yongrui and Wu, Tianxing and Khan, Arijit and Wang, Haofen. Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1249

work page doi:10.18653/v1/2025.emnlp-main.1249 2025
[21]

Knowledge Boundary of Large Language Models: A Survey

Li, Moxin and Zhao, Yong and Zhang, Wenxuan and Li, Shuaiyi and Xie, Wenya and Ng, See-Kiong and Chua, Tat-Seng and Deng, Yang. Knowledge Boundary of Large Language Models: A Survey. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.256

work page doi:10.18653/v1/2025.acl-long.256 2025
[22]

2023 , eprint=

StructGPT: A General Framework for Large Language Model to Reason over Structured Data , author=. 2023 , eprint=

2023
[23]

2024 , eprint=

Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph , author=. 2024 , eprint=

2024
[24]

2024 , eprint=

Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs , author=. 2024 , eprint=

2024
[25]

2024 , eprint=

KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph , author=. 2024 , eprint=

2024
[26]

2023 , eprint=

Few-shot In-context Learning for Knowledge Base Question Answering , author=. 2023 , eprint=

2023
[27]

2024 , eprint=

MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models , author=. 2024 , eprint=

2024
[28]

2024 , eprint=

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning , author=. 2024 , eprint=

2024
[29]

2024 , eprint=

G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering , author=. 2024 , eprint=

2024
[30]

2023 , eprint=

PAL: Program-aided Language Models , author=. 2023 , eprint=

2023
[31]

2023 , eprint=

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , author=. 2023 , eprint=

2023
[32]

2023 , eprint=

Toolformer: Language Models Can Teach Themselves to Use Tools , author=. 2023 , eprint=

2023
[33]

2023 , eprint=

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models , author=. 2023 , eprint=

2023
[34]

2023 , eprint=

Binding Language Models in Symbolic Languages , author=. 2023 , eprint=

2023
[35]

2023 , eprint=

ViperGPT: Visual Inference via Python Execution for Reasoning , author=. 2023 , eprint=

2023
[36]

2018 , eprint=

The Web as a Knowledge-base for Answering Complex Questions , author=. 2018 , eprint=

2018
[37]

Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases , url=

Gu, Yu and Kase, Sue and Vanni, Michelle and Sadler, Brian and Liang, Percy and Yan, Xifeng and Su, Yu , year=. Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases , url=. doi:10.1145/3442381.3449992 , booktitle=

work page doi:10.1145/3442381.3449992
[38]

Bollacker and Colin Evans and Praveen K

Kurt D. Bollacker and Colin Evans and Praveen K. Paritosh and Tim Sturge and Jamie Taylor , editor =. Freebase: a collaboratively created graph database for structuring human knowledge , booktitle =. 2008 , url =. doi:10.1145/1376616.1376746 , timestamp =

work page doi:10.1145/1376616.1376746 2008
[39]

2023 , eprint=

UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph , author=. 2023 , eprint=

2023
[40]

2022 , eprint=

TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Bases , author=. 2022 , eprint=

2022
[41]

2024 , eprint=

Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments , author=. 2024 , eprint=

2024
[42]

2025 , eprint=

Reasoning of Large Language Models over Knowledge Graphs with Super-Relations , author=. 2025 , eprint=

2025
[43]

2025 , eprint=

Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering , author=. 2025 , eprint=

2025
[44]

2020 , eprint=

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , author=. 2020 , eprint=

2020
[45]

A Framework of Knowledge Graph-Enhanced Large Language Model Based on Question Decomposition and Atomic Retrieval

Li, Yading and Song, Dandan and Zhou, Changzhi and Tian, Yuhang and Wang, Hao and Yang, Ziyi and Zhang, Shuhao. A Framework of Knowledge Graph-Enhanced Large Language Model Based on Question Decomposition and Atomic Retrieval. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.670

work page doi:10.18653/v1/2024.findings-emnlp.670 2024
[46]

2023 , eprint=

T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering , author=. 2023 , eprint=

2023
[47]

2023 , eprint=

UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models , author=. 2023 , eprint=

2023
[48]

2023 , eprint=

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. 2023 , eprint=

2023
[49]

Gen- SQL : Efficient Text-to- SQL By Bridging Natural Language Question And Database Schema With Pseudo-Schema

Shi, Jie and Xu, Bo and Liang, Jiaqing and Xiao, Yanghua and Chen, Jia and Xie, Chenhao and Wang, Peng and Wang, Wei. Gen- SQL : Efficient Text-to- SQL By Bridging Natural Language Question And Database Schema With Pseudo-Schema. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025
[50]

LLM s Are Few-Shot In-Context Low-Resource Language Learners

Cahyawijaya, Samuel and Lovenia, Holy and Fung, Pascale. LLM s Are Few-Shot In-Context Low-Resource Language Learners. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.naacl-long.24

work page doi:10.18653/v1/2024.naacl-long.24 2024
[51]

2021 , eprint=

A Survey on Complex Knowledge Base Question Answering: Methods, Challenges and Solutions , author=. 2021 , eprint=

2021
[52]

DBpedia:

S. DBpedia:. The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference,. 2007 , url =. doi:10.1007/978-3-540-76298-0\_52 , timestamp =

work page doi:10.1007/978-3-540-76298-0 2007
[53]

Unleashing the Power of Knowledge Graph for Recommendation via Invariant Learning , booktitle =

Shuyao Wang and Yongduo Sui and Chao Wang and Hui Xiong , editor =. Unleashing the Power of Knowledge Graph for Recommendation via Invariant Learning , booktitle =. 2024 , url =. doi:10.1145/3589334.3645576 , timestamp =

work page doi:10.1145/3589334.3645576 2024
[54]

2024 , eprint=

Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering , author=. 2024 , eprint=

2024
[55]

2024 , eprint=

FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering , author=. 2024 , eprint=

2024
[56]

2023 , eprint=

Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments , author=. 2023 , eprint=

2023
[57]

2023 , eprint=

DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases , author=. 2023 , eprint=

2023
[58]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025
[59]

2025 , eprint=

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models , author=. 2025 , eprint=

2025
[60]

Introducing GPT-4.1 in the API , year =
[61]

2024 , eprint=

EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs , author=. 2024 , eprint=

2024
[62]

Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering

Liu, Runxuan and Luo, Bei and Li, Jiaqi and Wang, Baoxin and Liu, Ming and Wu, Dayong and Wang, Shijin and Qin, Bing. Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.1865...

work page doi:10.18653/v1/2025.acl-long.741 2025
[63]

M ark QA : A large scale KBQA dataset with numerical reasoning

Huang, Xiang and Cheng, Sitao and Bao, Yuheng and Huang, Shanshan and Qu, Yuzhong. M ark QA : A large scale KBQA dataset with numerical reasoning. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

2023
[64]

RNG - KBQA : Generation Augmented Iterative Ranking for Knowledge Base Question Answering

Ye, Xi and Yavuz, Semih and Hashimoto, Kazuma and Zhou, Yingbo and Xiong, Caiming. RNG - KBQA : Generation Augmented Iterative Ranking for Knowledge Base Question Answering. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.417

work page doi:10.18653/v1/2022.acl-long.417 2022

[1] [1]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Knowcoder: Coding structured knowledge into llms for universal information extraction , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[2] [2]

2026 , isbn =

Chen, Zhuo and Wang, Fei and Li, Zixuan and Zhang, Zhao and Ding, Weiwei and Yang, Chuanguang and Xu, Yongjun and Jin, Xiaolong , title =. 2026 , isbn =. doi:10.1145/3774904.3792662 , booktitle =

work page doi:10.1145/3774904.3792662 2026

[3] [3]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

D-RAG: Differentiable Retrieval-Augmented Generation for Knowledge Graph Question Answering , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025

[4] [4]

arXiv preprint arXiv:2506.06881 , year=

Knowcoder-v2: Deep knowledge analysis , author=. arXiv preprint arXiv:2506.06881 , year=

arXiv

[5] [5]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Knowcoder-x: Boosting multilingual information extraction via code , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025

[6] [6]

CCF International Conference on Natural Language Processing and Chinese Computing , pages=

Retrieval-augmented code generation for universal information extraction , author=. CCF International Conference on Natural Language Processing and Chinese Computing , pages=. 2024 , organization=

2024

[7] [7]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

Self-improvement programming for temporal knowledge graph question answering , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

2024

[8] [8]

arXiv preprint arXiv:2604.07720 , year=

Towards Knowledgeable Deep Research: Framework and Benchmark , author=. arXiv preprint arXiv:2604.07720 , year=

Pith/arXiv arXiv

[9] [9]

The Value of Semantic Parse Labeling for Knowledge Base Question Answering

Yih, Wen-tau and Richardson, Matthew and Meek, Chris and Chang, Ming-Wei and Suh, Jina. The Value of Semantic Parse Labeling for Knowledge Base Question Answering. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2016. doi:10.18653/v1/P16-2033

work page doi:10.18653/v1/p16-2033 2016

[10] [10]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence,

Knowledge Base Question Answering with Topic Units , author =. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence,. 2019 , month =. doi:10.24963/ijcai.2019/701 , url =

work page doi:10.24963/ijcai.2019/701 2019

[11] [11]

Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering

Zhang, Jing and Zhang, Xiaokang and Yu, Jifan and Tang, Jian and Tang, Jie and Li, Cuiping and Chen, Hong. Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.396

work page doi:10.18653/v1/2022.acl-long.396 2022

[12] [12]

The Eleventh International Conference on Learning Representations,

Donghan Yu and Sheng Zhang and Patrick Ng and Henghui Zhu and Alexander Hanbo Li and Jun Wang and Yiqun Hu and William Yang Wang and Zhiguo Wang and Bing Xiang , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =

2023

[13] [13]

C hat KBQA : A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models

Luo, Haoran and E, Haihong and Tang, Zichen and Peng, Shiyao and Guo, Yikai and Zhang, Wentai and Ma, Chenghao and Dong, Guanting and Song, Meina and Lin, Wei and Zhu, Yifan and Luu, Anh Tuan. C hat KBQA : A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models. Findings of the Association for Computa...

work page doi:10.18653/v1/2024.findings-acl.122 2024

[14] [14]

HTML : Hierarchical Topology Multi-task Learning for Semantic Parsing in Knowledge Base Question Answering

Wulamu, Aziguli and Zhengyu, Lyu and Gong, Kaiyuan and Han, Yu and Wang, Zewen and Zhu, Zhihong and Xing, Bowen. HTML : Hierarchical Topology Multi-task Learning for Semantic Parsing in Knowledge Base Question Answering. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.485

work page doi:10.18653/v1/2025.findings-acl.485 2025

[15] [15]

Rule- KBQA : Rule-Guided Reasoning for Complex Knowledge Base Question Answering with Large Language Models

Zhang, Zhiqiang and Wen, Liqiang and Zhao, Wen. Rule- KBQA : Rule-Guided Reasoning for Complex Knowledge Base Question Answering with Large Language Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025

[16] [16]

DARA : Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs

Fang, Haishuo and Zhu, Xiaodan and Gurevych, Iryna. DARA : Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.203

work page doi:10.18653/v1/2024.findings-acl.203 2024

[17] [17]

2025 , eprint=

A Survey of Large Language Models , author=. 2025 , eprint=

2025

[18] [18]

2025 , eprint=

RFKG-CoT: Relation-Driven Adaptive Hop-count Selection and Few-Shot Path Guidance for Knowledge-Aware QA , author=. 2025 , eprint=

2025

[19] [19]

2025 , eprint=

Hallucination is Inevitable: An Innate Limitation of Large Language Models , author=. 2025 , eprint=

2025

[20] [20]

Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities

Ma, Chuangtao and Chen, Yongrui and Wu, Tianxing and Khan, Arijit and Wang, Haofen. Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1249

work page doi:10.18653/v1/2025.emnlp-main.1249 2025

[21] [21]

Knowledge Boundary of Large Language Models: A Survey

Li, Moxin and Zhao, Yong and Zhang, Wenxuan and Li, Shuaiyi and Xie, Wenya and Ng, See-Kiong and Chua, Tat-Seng and Deng, Yang. Knowledge Boundary of Large Language Models: A Survey. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.256

work page doi:10.18653/v1/2025.acl-long.256 2025

[22] [22]

2023 , eprint=

StructGPT: A General Framework for Large Language Model to Reason over Structured Data , author=. 2023 , eprint=

2023

[23] [23]

2024 , eprint=

Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph , author=. 2024 , eprint=

2024

[24] [24]

2024 , eprint=

Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs , author=. 2024 , eprint=

2024

[25] [25]

2024 , eprint=

KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph , author=. 2024 , eprint=

2024

[26] [26]

2023 , eprint=

Few-shot In-context Learning for Knowledge Base Question Answering , author=. 2023 , eprint=

2023

[27] [27]

2024 , eprint=

MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models , author=. 2024 , eprint=

2024

[28] [28]

2024 , eprint=

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning , author=. 2024 , eprint=

2024

[29] [29]

2024 , eprint=

G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering , author=. 2024 , eprint=

2024

[30] [30]

2023 , eprint=

PAL: Program-aided Language Models , author=. 2023 , eprint=

2023

[31] [31]

2023 , eprint=

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , author=. 2023 , eprint=

2023

[32] [32]

2023 , eprint=

Toolformer: Language Models Can Teach Themselves to Use Tools , author=. 2023 , eprint=

2023

[33] [33]

2023 , eprint=

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models , author=. 2023 , eprint=

2023

[34] [34]

2023 , eprint=

Binding Language Models in Symbolic Languages , author=. 2023 , eprint=

2023

[35] [35]

2023 , eprint=

ViperGPT: Visual Inference via Python Execution for Reasoning , author=. 2023 , eprint=

2023

[36] [36]

2018 , eprint=

The Web as a Knowledge-base for Answering Complex Questions , author=. 2018 , eprint=

2018

[37] [37]

Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases , url=

Gu, Yu and Kase, Sue and Vanni, Michelle and Sadler, Brian and Liang, Percy and Yan, Xifeng and Su, Yu , year=. Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases , url=. doi:10.1145/3442381.3449992 , booktitle=

work page doi:10.1145/3442381.3449992

[38] [38]

Bollacker and Colin Evans and Praveen K

Kurt D. Bollacker and Colin Evans and Praveen K. Paritosh and Tim Sturge and Jamie Taylor , editor =. Freebase: a collaboratively created graph database for structuring human knowledge , booktitle =. 2008 , url =. doi:10.1145/1376616.1376746 , timestamp =

work page doi:10.1145/1376616.1376746 2008

[39] [39]

2023 , eprint=

UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph , author=. 2023 , eprint=

2023

[40] [40]

2022 , eprint=

TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Bases , author=. 2022 , eprint=

2022

[41] [41]

2024 , eprint=

Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments , author=. 2024 , eprint=

2024

[42] [42]

2025 , eprint=

Reasoning of Large Language Models over Knowledge Graphs with Super-Relations , author=. 2025 , eprint=

2025

[43] [43]

2025 , eprint=

Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering , author=. 2025 , eprint=

2025

[44] [44]

2020 , eprint=

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , author=. 2020 , eprint=

2020

[45] [45]

A Framework of Knowledge Graph-Enhanced Large Language Model Based on Question Decomposition and Atomic Retrieval

Li, Yading and Song, Dandan and Zhou, Changzhi and Tian, Yuhang and Wang, Hao and Yang, Ziyi and Zhang, Shuhao. A Framework of Knowledge Graph-Enhanced Large Language Model Based on Question Decomposition and Atomic Retrieval. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.670

work page doi:10.18653/v1/2024.findings-emnlp.670 2024

[46] [46]

2023 , eprint=

T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering , author=. 2023 , eprint=

2023

[47] [47]

2023 , eprint=

UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models , author=. 2023 , eprint=

2023

[48] [48]

2023 , eprint=

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. 2023 , eprint=

2023

[49] [49]

Gen- SQL : Efficient Text-to- SQL By Bridging Natural Language Question And Database Schema With Pseudo-Schema

Shi, Jie and Xu, Bo and Liang, Jiaqing and Xiao, Yanghua and Chen, Jia and Xie, Chenhao and Wang, Peng and Wang, Wei. Gen- SQL : Efficient Text-to- SQL By Bridging Natural Language Question And Database Schema With Pseudo-Schema. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025

[50] [50]

LLM s Are Few-Shot In-Context Low-Resource Language Learners

Cahyawijaya, Samuel and Lovenia, Holy and Fung, Pascale. LLM s Are Few-Shot In-Context Low-Resource Language Learners. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.naacl-long.24

work page doi:10.18653/v1/2024.naacl-long.24 2024

[51] [51]

2021 , eprint=

A Survey on Complex Knowledge Base Question Answering: Methods, Challenges and Solutions , author=. 2021 , eprint=

2021

[52] [52]

DBpedia:

S. DBpedia:. The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference,. 2007 , url =. doi:10.1007/978-3-540-76298-0\_52 , timestamp =

work page doi:10.1007/978-3-540-76298-0 2007

[53] [53]

Unleashing the Power of Knowledge Graph for Recommendation via Invariant Learning , booktitle =

Shuyao Wang and Yongduo Sui and Chao Wang and Hui Xiong , editor =. Unleashing the Power of Knowledge Graph for Recommendation via Invariant Learning , booktitle =. 2024 , url =. doi:10.1145/3589334.3645576 , timestamp =

work page doi:10.1145/3589334.3645576 2024

[54] [54]

2024 , eprint=

Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering , author=. 2024 , eprint=

2024

[55] [55]

2024 , eprint=

FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering , author=. 2024 , eprint=

2024

[56] [56]

2023 , eprint=

Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments , author=. 2023 , eprint=

2023

[57] [57]

2023 , eprint=

DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases , author=. 2023 , eprint=

2023

[58] [58]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025

[59] [59]

2025 , eprint=

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models , author=. 2025 , eprint=

2025

[60] [60]

Introducing GPT-4.1 in the API , year =

[61] [61]

2024 , eprint=

EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs , author=. 2024 , eprint=

2024

[62] [62]

Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering

Liu, Runxuan and Luo, Bei and Li, Jiaqi and Wang, Baoxin and Liu, Ming and Wu, Dayong and Wang, Shijin and Qin, Bing. Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.1865...

work page doi:10.18653/v1/2025.acl-long.741 2025

[63] [63]

M ark QA : A large scale KBQA dataset with numerical reasoning

Huang, Xiang and Cheng, Sitao and Bao, Yuheng and Huang, Shanshan and Qu, Yuzhong. M ark QA : A large scale KBQA dataset with numerical reasoning. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

2023

[64] [64]

RNG - KBQA : Generation Augmented Iterative Ranking for Knowledge Base Question Answering

Ye, Xi and Yavuz, Semih and Hashimoto, Kazuma and Zhou, Yingbo and Xiong, Caiming. RNG - KBQA : Generation Augmented Iterative Ranking for Knowledge Base Question Answering. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.417

work page doi:10.18653/v1/2022.acl-long.417 2022