Code-on-Graph: Iterative Programmatic Reasoning via Large Language Models on Knowledge Graphs
Pith reviewed 2026-06-28 10:04 UTC · model grok-4.3
The pith
Representing knowledge graph schemas as Python classes lets LLMs generate executable reasoning code over retrieved facts without direct prompt injection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given facts retrieved at each step, the framework identifies the matching KG schemas, encodes them as Python classes that act as abstract interfaces, instantiates the facts as objects of those classes, and produces executable code that performs the required reasoning by operating on the objects, thereby achieving up to 10.5 percent higher accuracy than prior methods on WebQSP, CWQ, and GrailQA.
What carries the argument
The Code-on-Graph process of mapping retrieved facts to Python class schemas and generating grounded executable code for iterative reasoning.
If this is right
- Reasoning over knowledge graphs can become more compositional because code can combine operations freely rather than being restricted to a fixed set of operators.
- Prompt size stays bounded because facts remain outside the prompt and are accessed only through the class objects at runtime.
- Iterative, multi-step question answering on knowledge graphs improves because each step can produce new code that builds on prior results.
- The same retrieved facts can support different reasoning paths by changing only the generated code rather than the prompt content.
Where Pith is reading between the lines
- The same class-based code interface could be applied to other structured sources such as databases or APIs if their schemas can be represented similarly.
- Reliable code generation might allow verification of intermediate reasoning steps by executing the code rather than trusting the model's final answer alone.
- Performance on very large graphs would depend on how well the model scales its schema identification and code writing as the number of available classes grows.
Load-bearing premise
Large language models can reliably pick the correct KG schema for each retrieved fact and write code that correctly encodes the needed reasoning steps.
What would settle it
An experiment in which code generated by the model fails to execute correctly or produces wrong answers on a substantial fraction of queries from WebQSP, CWQ, or GrailQA, erasing the reported accuracy gains.
Figures
read the original abstract
Knowledge Graphs (KGs) are widely used to mitigate the limitations of Large Language Models (LLMs), such as outdated knowledge and hallucinations. Existing LLM-KG integration frameworks typically rely on predefined operators to retrieve factual knowledge from KGs and inject it into prompts for answer generation. This paradigm faces two critical bottlenecks: 1) Inflexibility: The predefined operators are limited in scope and thus lack sufficient compositional expressiveness to fully capture the complex semantics required by KG questions. 2) Unscalability: Direct injection of factual knowledge into prompts limits scalability in handling large-scale factual knowledge. To address these two bottlenecks, we propose Code-on-Graph (CoG), a programmatic reasoning framework for LLM-KG integration. Specifically, given the factual knowledge retrieved at each reasoning step, CoG first identifies the corresponding KG schemas and represents these schemas as Python classes, which serve as abstract interfaces to the retrieved facts. It then generates executable code grounded in these classes, with the retrieved facts instantiated as objects of the corresponding classes during execution. This design enables flexible code-based reasoning while avoiding the direct injection of large-scale factual knowledge into prompts. Experiments on WebQSP, CWQ, and GrailQA demonstrate that CoG outperforms prior state-of-the-art models by up to 10.5%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Code-on-Graph (CoG), an iterative programmatic reasoning framework for LLM-KG integration. It represents retrieved KG facts' schemas as Python classes, generates executable code grounded in these classes, and instantiates facts as objects during execution to enable flexible reasoning without direct fact injection into prompts. The approach is claimed to outperform prior SOTA by up to 10.5% on WebQSP, CWQ, and GrailQA.
Significance. If validated, this framework could significantly advance LLM-KG systems by providing greater compositional expressiveness through code and better scalability by avoiding prompt overload with facts. The programmatic approach may offer advantages in handling complex query semantics that predefined operators cannot capture.
major comments (1)
- [Abstract] Abstract: The central performance claim depends on the LLM reliably identifying the correct KG schema classes from retrieved facts at each step and generating executable code that accurately captures the required semantics. However, the abstract supplies no experimental details, error analysis, code verification methods, or ablations to evaluate the accuracy of these LLM steps or to demonstrate that per-step failures do not compound across iterations to negate the reported gains.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive comment. We address the concern regarding the abstract point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central performance claim depends on the LLM reliably identifying the correct KG schema classes from retrieved facts at each step and generating executable code that accurately captures the required semantics. However, the abstract supplies no experimental details, error analysis, code verification methods, or ablations to evaluate the accuracy of these LLM steps or to demonstrate that per-step failures do not compound across iterations to negate the reported gains.
Authors: The abstract is a high-level summary constrained by length and is not intended to contain full experimental details. The full manuscript includes dedicated sections on the method (Section 3), experiments (Section 4), and analysis (Section 5) that provide ablations on schema class identification accuracy, code generation fidelity, execution verification via runtime success rates, and iterative error analysis. These results show that per-step LLM decisions maintain high reliability and that any failures do not compound sufficiently to erase the reported gains of up to 10.5%. We are willing to add one sentence to the abstract referencing the robustness demonstrated in the experiments if the editor deems it appropriate. revision: partial
Circularity Check
No circularity detected; framework proposal is self-contained
full rationale
The paper presents CoG as a new framework that represents KG schemas as Python classes and generates executable code from retrieved facts to enable flexible reasoning without direct fact injection. The abstract and description contain no equations, fitted parameters, or derivations that reduce the claimed improvements (up to 10.5% on WebQSP/CWQ/GrailQA) to inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked in the provided text. The central claim rests on the empirical performance of the described iterative code-generation process, which is presented as independent of prior results and externally falsifiable via benchmark evaluation. This is the expected outcome for a methods paper introducing a novel integration approach.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can reliably identify KG schemas from retrieved facts and generate correct executable code grounded in Python classes representing those schemas
Reference graph
Works this paper leans on
-
[1]
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
Knowcoder: Coding structured knowledge into llms for universal information extraction , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[2]
Chen, Zhuo and Wang, Fei and Li, Zixuan and Zhang, Zhao and Ding, Weiwei and Yang, Chuanguang and Xu, Yongjun and Jin, Xiaolong , title =. 2026 , isbn =. doi:10.1145/3774904.3792662 , booktitle =
-
[3]
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
D-RAG: Differentiable Retrieval-Augmented Generation for Knowledge Graph Question Answering , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
2025
-
[4]
arXiv preprint arXiv:2506.06881 , year=
Knowcoder-v2: Deep knowledge analysis , author=. arXiv preprint arXiv:2506.06881 , year=
-
[5]
Findings of the Association for Computational Linguistics: ACL 2025 , pages=
Knowcoder-x: Boosting multilingual information extraction via code , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=
2025
-
[6]
CCF International Conference on Natural Language Processing and Chinese Computing , pages=
Retrieval-augmented code generation for universal information extraction , author=. CCF International Conference on Natural Language Processing and Chinese Computing , pages=. 2024 , organization=
2024
-
[7]
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=
Self-improvement programming for temporal knowledge graph question answering , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=
2024
-
[8]
arXiv preprint arXiv:2604.07720 , year=
Towards Knowledgeable Deep Research: Framework and Benchmark , author=. arXiv preprint arXiv:2604.07720 , year=
-
[9]
The Value of Semantic Parse Labeling for Knowledge Base Question Answering
Yih, Wen-tau and Richardson, Matthew and Meek, Chris and Chang, Ming-Wei and Suh, Jina. The Value of Semantic Parse Labeling for Knowledge Base Question Answering. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2016. doi:10.18653/v1/P16-2033
-
[10]
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence,
Knowledge Base Question Answering with Topic Units , author =. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence,. 2019 , month =. doi:10.24963/ijcai.2019/701 , url =
-
[11]
Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering
Zhang, Jing and Zhang, Xiaokang and Yu, Jifan and Tang, Jian and Tang, Jie and Li, Cuiping and Chen, Hong. Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.396
-
[12]
The Eleventh International Conference on Learning Representations,
Donghan Yu and Sheng Zhang and Patrick Ng and Henghui Zhu and Alexander Hanbo Li and Jun Wang and Yiqun Hu and William Yang Wang and Zhiguo Wang and Bing Xiang , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =
2023
-
[13]
Luo, Haoran and E, Haihong and Tang, Zichen and Peng, Shiyao and Guo, Yikai and Zhang, Wentai and Ma, Chenghao and Dong, Guanting and Song, Meina and Lin, Wei and Zhu, Yifan and Luu, Anh Tuan. C hat KBQA : A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models. Findings of the Association for Computa...
-
[14]
Wulamu, Aziguli and Zhengyu, Lyu and Gong, Kaiyuan and Han, Yu and Wang, Zewen and Zhu, Zhihong and Xing, Bowen. HTML : Hierarchical Topology Multi-task Learning for Semantic Parsing in Knowledge Base Question Answering. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.485
-
[15]
Rule- KBQA : Rule-Guided Reasoning for Complex Knowledge Base Question Answering with Large Language Models
Zhang, Zhiqiang and Wen, Liqiang and Zhao, Wen. Rule- KBQA : Rule-Guided Reasoning for Complex Knowledge Base Question Answering with Large Language Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025
2025
-
[16]
Fang, Haishuo and Zhu, Xiaodan and Gurevych, Iryna. DARA : Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.203
-
[17]
2025 , eprint=
A Survey of Large Language Models , author=. 2025 , eprint=
2025
-
[18]
2025 , eprint=
RFKG-CoT: Relation-Driven Adaptive Hop-count Selection and Few-Shot Path Guidance for Knowledge-Aware QA , author=. 2025 , eprint=
2025
-
[19]
2025 , eprint=
Hallucination is Inevitable: An Innate Limitation of Large Language Models , author=. 2025 , eprint=
2025
-
[20]
Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities
Ma, Chuangtao and Chen, Yongrui and Wu, Tianxing and Khan, Arijit and Wang, Haofen. Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1249
-
[21]
Knowledge Boundary of Large Language Models: A Survey
Li, Moxin and Zhao, Yong and Zhang, Wenxuan and Li, Shuaiyi and Xie, Wenya and Ng, See-Kiong and Chua, Tat-Seng and Deng, Yang. Knowledge Boundary of Large Language Models: A Survey. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.256
-
[22]
2023 , eprint=
StructGPT: A General Framework for Large Language Model to Reason over Structured Data , author=. 2023 , eprint=
2023
-
[23]
2024 , eprint=
Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph , author=. 2024 , eprint=
2024
-
[24]
2024 , eprint=
Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs , author=. 2024 , eprint=
2024
-
[25]
2024 , eprint=
KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph , author=. 2024 , eprint=
2024
-
[26]
2023 , eprint=
Few-shot In-context Learning for Knowledge Base Question Answering , author=. 2023 , eprint=
2023
-
[27]
2024 , eprint=
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models , author=. 2024 , eprint=
2024
-
[28]
2024 , eprint=
Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning , author=. 2024 , eprint=
2024
-
[29]
2024 , eprint=
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering , author=. 2024 , eprint=
2024
-
[30]
2023 , eprint=
PAL: Program-aided Language Models , author=. 2023 , eprint=
2023
-
[31]
2023 , eprint=
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , author=. 2023 , eprint=
2023
-
[32]
2023 , eprint=
Toolformer: Language Models Can Teach Themselves to Use Tools , author=. 2023 , eprint=
2023
-
[33]
2023 , eprint=
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models , author=. 2023 , eprint=
2023
-
[34]
2023 , eprint=
Binding Language Models in Symbolic Languages , author=. 2023 , eprint=
2023
-
[35]
2023 , eprint=
ViperGPT: Visual Inference via Python Execution for Reasoning , author=. 2023 , eprint=
2023
-
[36]
2018 , eprint=
The Web as a Knowledge-base for Answering Complex Questions , author=. 2018 , eprint=
2018
-
[37]
Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases , url=
Gu, Yu and Kase, Sue and Vanni, Michelle and Sadler, Brian and Liang, Percy and Yan, Xifeng and Su, Yu , year=. Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases , url=. doi:10.1145/3442381.3449992 , booktitle=
-
[38]
Bollacker and Colin Evans and Praveen K
Kurt D. Bollacker and Colin Evans and Praveen K. Paritosh and Tim Sturge and Jamie Taylor , editor =. Freebase: a collaboratively created graph database for structuring human knowledge , booktitle =. 2008 , url =. doi:10.1145/1376616.1376746 , timestamp =
-
[39]
2023 , eprint=
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph , author=. 2023 , eprint=
2023
-
[40]
2022 , eprint=
TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Bases , author=. 2022 , eprint=
2022
-
[41]
2024 , eprint=
Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments , author=. 2024 , eprint=
2024
-
[42]
2025 , eprint=
Reasoning of Large Language Models over Knowledge Graphs with Super-Relations , author=. 2025 , eprint=
2025
-
[43]
2025 , eprint=
Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering , author=. 2025 , eprint=
2025
-
[44]
2020 , eprint=
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , author=. 2020 , eprint=
2020
-
[45]
Li, Yading and Song, Dandan and Zhou, Changzhi and Tian, Yuhang and Wang, Hao and Yang, Ziyi and Zhang, Shuhao. A Framework of Knowledge Graph-Enhanced Large Language Model Based on Question Decomposition and Atomic Retrieval. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.670
-
[46]
2023 , eprint=
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering , author=. 2023 , eprint=
2023
-
[47]
2023 , eprint=
UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models , author=. 2023 , eprint=
2023
-
[48]
2023 , eprint=
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author=. 2023 , eprint=
2023
-
[49]
Gen- SQL : Efficient Text-to- SQL By Bridging Natural Language Question And Database Schema With Pseudo-Schema
Shi, Jie and Xu, Bo and Liang, Jiaqing and Xiao, Yanghua and Chen, Jia and Xie, Chenhao and Wang, Peng and Wang, Wei. Gen- SQL : Efficient Text-to- SQL By Bridging Natural Language Question And Database Schema With Pseudo-Schema. Proceedings of the 31st International Conference on Computational Linguistics. 2025
2025
-
[50]
LLM s Are Few-Shot In-Context Low-Resource Language Learners
Cahyawijaya, Samuel and Lovenia, Holy and Fung, Pascale. LLM s Are Few-Shot In-Context Low-Resource Language Learners. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.naacl-long.24
-
[51]
2021 , eprint=
A Survey on Complex Knowledge Base Question Answering: Methods, Challenges and Solutions , author=. 2021 , eprint=
2021
-
[52]
S. DBpedia:. The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference,. 2007 , url =. doi:10.1007/978-3-540-76298-0\_52 , timestamp =
-
[53]
Unleashing the Power of Knowledge Graph for Recommendation via Invariant Learning , booktitle =
Shuyao Wang and Yongduo Sui and Chao Wang and Hui Xiong , editor =. Unleashing the Power of Knowledge Graph for Recommendation via Invariant Learning , booktitle =. 2024 , url =. doi:10.1145/3589334.3645576 , timestamp =
-
[54]
2024 , eprint=
Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering , author=. 2024 , eprint=
2024
-
[55]
2024 , eprint=
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering , author=. 2024 , eprint=
2024
-
[56]
2023 , eprint=
Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments , author=. 2023 , eprint=
2023
-
[57]
2023 , eprint=
DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases , author=. 2023 , eprint=
2023
-
[58]
2025 , eprint=
Qwen3 Technical Report , author=. 2025 , eprint=
2025
-
[59]
2025 , eprint=
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models , author=. 2025 , eprint=
2025
-
[60]
Introducing GPT-4.1 in the API , year =
-
[61]
2024 , eprint=
EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs , author=. 2024 , eprint=
2024
-
[62]
Liu, Runxuan and Luo, Bei and Li, Jiaqi and Wang, Baoxin and Liu, Ming and Wu, Dayong and Wang, Shijin and Qin, Bing. Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.1865...
-
[63]
M ark QA : A large scale KBQA dataset with numerical reasoning
Huang, Xiang and Cheng, Sitao and Bao, Yuheng and Huang, Shanshan and Qu, Yuzhong. M ark QA : A large scale KBQA dataset with numerical reasoning. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023
2023
-
[64]
RNG - KBQA : Generation Augmented Iterative Ranking for Knowledge Base Question Answering
Ye, Xi and Yavuz, Semih and Hashimoto, Kazuma and Zhou, Yingbo and Xiong, Caiming. RNG - KBQA : Generation Augmented Iterative Ranking for Knowledge Base Question Answering. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.417
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.