Bounded Path Context: A Controlled Study of Visible Path History in LLM-Based Knowledge Graph Question Answering

Xihang Shan; Ye Luo

arxiv: 2605.26645 · v1 · pith:XP7GXMCTnew · submitted 2026-05-26 · 💻 cs.CL

Bounded Path Context: A Controlled Study of Visible Path History in LLM-Based Knowledge Graph Question Answering

Xihang Shan , Ye Luo This is my paper

Pith reviewed 2026-06-29 18:27 UTC · model grok-4.3

classification 💻 cs.CL

keywords LLM-based KGQApath contextbounded historyknowledge graph question answeringrelation selectionprompt designWebQSPCWQ

0 comments

The pith

Limiting the visible path history in LLM prompts to the last one or zero hops performs as well as or better than full path history for knowledge graph question answering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the common practice in LLM-driven knowledge graph traversal of including the entire partial path in every prompt for relation selection. Through a controlled experiment that holds graph neighborhoods, beam search, depth, decoding, and answer extraction fixed while varying only the length of visible prior hops (K), it demonstrates that bounded context matches or beats full history on full WebQSP and CWQ test sets. With Qwen3.5-9B-AWQ, K=1 reaches 0.487 F1 on WebQSP versus 0.472 for full history, and K=0 reaches 0.287 on CWQ versus 0.274, while cutting input tokens by 9.7 to 12.1 percent. The study also shows that 71-84 percent of examples are unaffected by history length, with the remainder revealing when prior hops help disambiguate or introduce distraction. This positions path serialization length as a tunable interface choice rather than a fixed default.

Core claim

Bounded Path Context decouples the controller's full symbolic path memory from the relation-selection prompt, exposing only the question, current entity, candidate relations, and at most the last K hops. A sweep over K on complete WebQSP and CWQ benchmarks with fixed settings shows that K=1 or K=0 achieves answer-set F1 equal to or higher than full-history prompting while using fewer tokens; the same pattern holds at the 4B model scale. Per-example analysis indicates most queries are insensitive to history length.

What carries the argument

Bounded Path Context (BPC), which retains complete paths in symbolic state for extraction and audit but limits prompt-visible history to the last K hops.

If this is right

Relation-selection decisions remain effective with minimal or no prior-hop context in the prompt.
Prompt token counts can be reduced by 9-12 percent without loss of answer quality on WebQSP and CWQ.
History length becomes a tunable parameter that can be set per model scale or dataset rather than defaulting to the full path.
The majority of examples (71-84 percent) show no sensitivity to visible history length.
When history length matters, prior hops sometimes disambiguate and sometimes distract the model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation of symbolic state from prompt context could be tested in other LLM sequential decision settings such as tool-use chains or multi-step planning.
Dynamic adjustment of K per question, based on entity ambiguity or hop count, might yield further gains beyond fixed K.
The finding raises the question of whether similar bounded-context benefits appear in LLM-based search over other structured graphs or trees.

Load-bearing premise

That changing only the visible history length K while holding neighborhoods, beams, depth, decoding, and extraction format fixed isolates the effect on the model's relation choices.

What would settle it

A controlled replication on the same benchmarks and models where full-history prompting yields strictly higher F1 than every bounded K setting would falsify the matching-or-exceeding result.

Figures

Figures reproduced from arXiv: 2605.26645 by Xihang Shan, Ye Luo.

read the original abstract

LLM-based knowledge-graph question answering (KGQA) delegates graph traversal to language models, turning each question into a sequence of local relation-selection decisions repeated across beams and hops. A common but untested default is to serialize the complete partial path into every routing prompt, even though the controller already maintains this path as exact symbolic state. Bounded Path Context (BPC) decouples these two roles: the controller retains full paths in symbolic memory for answer extraction and audit, while the relation-selection prompt exposes only the question, the current entity, outgoing relation candidates, and at most the last K hops. A controlled sweep over K -- fixing graph neighborhoods, beam budget, depth, decoding, and answer-extraction format -- shows that bounded histories match or exceed full-history prompting on complete WebQSP and CWQ test sets with Qwen3.5-9B-AWQ: K=1 achieves 0.487 answer-set F1 on WebQSP versus 0.472 for full history, and K=0 reaches 0.287 on CWQ versus 0.274, with 9.7% and 12.1% fewer input tokens respectively. At the 4B scale, K=1 remains the strongest setting on both benchmarks. Per-example analysis reveals that 71-84% of examples are unaffected by history length, while the affected cases expose when prior hops disambiguate versus distract. These results suggest that path serialization length is better treated as a tunable interface variable than as a default assumption in LLM-based graph controllers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Truncating visible path history to the last hop or none matches full history performance while cutting tokens in this LLM KGQA setup.

read the letter

The main result is that you can limit the visible path history in the relation-selection prompt to the last one or zero hops and still match or beat full-history performance on WebQSP and CWQ, while saving roughly 10 percent of the input tokens.

The paper does a controlled sweep that holds graph neighborhoods, beam budget, depth, decoding, and answer extraction fixed. Only the history length K in the prompt changes. On the full test sets with Qwen3.5-9B-AWQ, K=1 reaches 0.487 answer-set F1 on WebQSP compared to 0.472 for full history. On CWQ, K=0 reaches 0.287 versus 0.274. The per-example split shows 71-84 percent of questions are unaffected by the change.

This is new because earlier LLM KGQA systems treated full path serialization as the default without checking whether the extra context helped or hurt the model's next-relation choices. The controller keeps the complete path in symbolic state for final extraction, so the prompt only needs to support the local decision.

The isolation of the variable is the strongest part. The numbers are reported on complete test sets rather than subsets, which is good.

The main limitation is the narrow model coverage. Results are shown for one 9B model and mentioned for 4B, but broader testing across model families would make the finding more general. The paper also relies on their own controller implementation, so independent replication would require the code or a detailed description of the prompt templates and graph interface.

Readers working on LLM controllers for graph traversal or on prompt efficiency in sequential reasoning tasks will get the most from this. It is a practical tweak rather than a new architecture.

The experiment is honest and the central claim holds up on the reported data, so it deserves peer review. I would send it to referees.

Referee Report

0 major / 2 minor

Summary. The paper introduces Bounded Path Context (BPC) for LLM-based KGQA, decoupling symbolic path memory (retained by the controller for extraction) from prompt context by exposing only the last K hops in relation-selection prompts. A controlled ablation on full WebQSP and CWQ test sets with Qwen3.5-9B-AWQ, fixing graph neighborhoods, beam budget, depth, decoding, and extraction format, reports that K=1 yields 0.487 answer-set F1 on WebQSP (vs. 0.472 full history) and K=0 yields 0.287 on CWQ (vs. 0.274), with 9.7–12.1% fewer tokens; 71–84% of examples are unaffected by K.

Significance. If the isolation holds, the result demonstrates that full path serialization is not required and can be detrimental in LLM graph controllers, reframing history length as a tunable interface variable. The fixed-variable sweep and per-example breakdown provide direct empirical support for efficiency gains without performance loss on standard benchmarks.

minor comments (2)

[Experimental setup] The experimental setup section should include explicit pseudocode or prompt templates for each K value to allow full reproduction of the isolation claim.
[Results] Clarify the exact model name (Qwen3.5 vs. Qwen2.5) and any quantization effects on the reported F1 deltas.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed summary of our work and the recommendation of minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity: purely empirical controlled ablation

full rationale

The paper presents a controlled empirical study comparing bounded vs. full path history in LLM-based KGQA. The central claim rests on direct F1 measurements on fixed test sets (WebQSP, CWQ) under fixed graph neighborhoods, beam budget, depth, decoding, and answer-extraction format. No mathematical derivation, fitted parameters renamed as predictions, self-referential equations, or load-bearing self-citations appear. The isolation of K is achieved by explicit experimental controls rather than by construction or prior author theorems. This matches the default expectation of a non-circular empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on a standard domain assumption about experimental control; no free parameters or invented entities are introduced.

axioms (1)

domain assumption The effect of path history length on LLM relation selection can be isolated by fixing all other experimental variables such as beam budget and decoding strategy.
Invoked when describing the controlled sweep over K while holding other factors fixed.

pith-pipeline@v0.9.1-grok · 5817 in / 1318 out tokens · 47963 ms · 2026-06-29T18:27:53.677245+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 1 canonical work pages · 1 internal anchor

[1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
[3]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to represent programs with graphs. In Proceedings of the International Conference on Learning Representations

2018
[4]

Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019 a . code2seq: Generating sequences from structured representations of code. In Proceedings of the International Conference on Learning Representations

2019
[5]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019 b . code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3(POPL):1--29

2019
[6]

Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1533--1544

2013
[7]

Haishuo Fang, Xiaodan Zhu, and Iryna Gurevych. 2024. DARA : Decomposition-alignment-reasoning autonomous language agent for question answering over knowledge graphs. In Findings of the Association for Computational Linguistics: ACL 2024, pages 3406--3432

2024
[8]

Yu Gu, Xiang Deng, and Yu Su. 2023. Don't generate, discriminate: A proposal for grounding language models to real-world environments. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 4928--4949

2023
[9]

Gaole He, Yunshi Lan, Jing Jiang, Wayne Xin Zhao, and Ji-Rong Wen. 2021. Improving multi-hop knowledge base question answering by learning intermediate supervision signals. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pages 553--561

2021
[10]

Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. 2024. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, pages 1658--1677

2024
[11]

Jinhao Jiang, Kun Zhou, Zican Dong, Keming Ye, Wayne Xin Zhao, and Ji-Rong Wen. 2023 a . Structgpt: A general framework for large language model to reason over structured data. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9237--9251

2023
[12]

Jinhao Jiang, Kun Zhou, Wayne Xin Zhao, and Ji-Rong Wen. 2023 b . Unikgqa: Unified retrieval and reasoning for solving multi-hop question answering over knowledge graph. In Proceedings of the International Conference on Learning Representations

2023
[13]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient memory management for large language model serving with PagedAttention . In Proceedings of the 29th Symposium on Operating Systems Principles

2023
[14]

Yunshi Lan and Jing Jiang. 2020. Query graph generation for answering multi-hop complex questions from knowledge bases. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 969--974

2020
[15]

Mufei Li, Siqi Miao, and Pan Li. 2025 a . Simple is effective: The roles of graphs and large language models in knowledge-graph-based retrieval-augmented generation. In Proceedings of the International Conference on Learning Representations

2025
[16]

Tianle Li, Xueguang Ma, Alex Zhuang, Yu Gu, Yu Su, and Wenhu Chen. 2023 a . Few-shot in-context learning for knowledge base question answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 6966--6981

2023
[17]

Yading Li, Dandan Song, Changzhi Zhou, Yuhang Tian, Hao Wang, Ziyi Yang, and Shuhao Zhang. 2024. A framework of knowledge graph-enhanced large language model based on question decomposition and atomic retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 11472--11485

2024
[18]

Yucheng Li, Bo Dong, Frank Guerin, and Chenghua Lin. 2023 b . Compressing context to enhance inference efficiency of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6342--6353

2023
[19]

Zongqian Li, Yinhong Liu, Yixuan Su, and Nigel Collier. 2025 b . Prompt compression for large language models: A survey. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7182--7195

2025
[20]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157--173

2024
[21]

Haoran Luo, Haihong E, Zichen Tang, Shiyao Peng, Yikai Guo, Wentai Zhang, Chenghao Ma, Guanting Dong, Meina Song, Wei Lin, Yifan Zhu, and Anh Tuan Luu. 2024 a . ChatKBQA : A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models. In Findings of the Association for Computational Linguistics: ACL 2024, p...

2024
[22]

Linhao Luo, Yuan-Fang Li, Gholamreza Haffari, and Shirui Pan. 2024 b . Reasoning on graphs: Faithful and interpretable large language model reasoning. In Proceedings of the International Conference on Learning Representations

2024
[23]

Linhao Luo, Zicheng Zhao, Gholamreza Haffari, Yuan-Fang Li, Chen Gong, and Shirui Pan. 2025. Graph-constrained reasoning: Faithful reasoning on knowledge graphs with large language models. In Proceedings of the 42nd International Conference on Machine Learning, volume 267

2025
[24]

Costas Mavromatis and George Karypis. 2025. GNN - RAG : Graph neural retrieval for efficient large language model reasoning on knowledge graphs. In Findings of the Association for Computational Linguistics: ACL 2025, pages 16682--16699

2025
[25]

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Mengzhou Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor R \"u hle, Yuqing Yang, Lili Qiu, and Dongmei Zhang. 2024. Llmlingua-2: Data distillation for efficient and faithful task-agnostic prompt compression. In Findings of the Association for Computational Linguistics: ACL 2024, pages 963--981

2024
[26]

Qwen . 2024. Qwen2.5 technical report. arXiv preprint arXiv:2412.15115

work page internal anchor Pith review Pith/arXiv arXiv 2024
[27]

Apoorv Saxena, Aditay Tripathi, and Partha Talukdar. 2020. Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4498--4507

2020
[28]

Jiaxin Shi, Shulin Cao, Lei Hou, Juanzi Li, and Hanwang Zhang. 2021. TransferNet : An effective and transparent framework for multi-hop question answering over relation graph. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

2021
[29]

Haitian Sun, Tania Bedrax-Weiss, and William W. Cohen. 2019. PullNet : Open domain question answering with iterative retrieval on knowledge bases and text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing

2019
[30]

Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Mazaitis, Ruslan Salakhutdinov, and William W. Cohen. 2018. Open domain question answering using early fusion of knowledge bases and text. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4231--4242

2018
[31]

Ni, Heung-Yeung Shum, Jian Guo, and Nan Zhang

Jiashuo Sun, Chengjin Xu, Luming Tang, Saizhuo Wang, Chen Lin, Yeyun Gong, Lionel M. Ni, Heung-Yeung Shum, Jian Guo, and Nan Zhang. 2024. Think-on-graph: Deep and responsible reasoning of large language model on knowledge graph. In Proceedings of the International Conference on Learning Representations

2024
[32]

Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering complex questions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, pages 641--651

2018
[33]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824--24837

2022
[34]

Fangyuan Xu, Weijia Shi, and Eunsol Choi. 2024. Recomp: Improving retrieval-augmented lms with context compression and selective augmentation. In Proceedings of the International Conference on Learning Representations

2024
[35]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. React: Synergizing reasoning and acting in language models. In Proceedings of the International Conference on Learning Representations

2023
[36]

Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, and Jure Leskovec. 2021. QA-GNN : Reasoning with language models and knowledge graphs for question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics

2021
[37]

Wen-tau Yih, Matthew Richardson, Christopher Meek, Ming-Wei Chang, and Jina Suh. 2016. A value-based search method for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 505--515

2016
[38]

Donghan Yu, Sheng Zhang, Patrick Ng, Henghui Zhu, Alexander Hanbo Li, Jun Wang, Yiqun Hu, William Wang, Zhiguo Wang, and Dilek Hakkani-Tur. 2023. Decaf: Joint decoding of answers and logical forms for knowledge base question answering. In Proceedings of the International Conference on Learning Representations

2023
[39]

Jing Zhang, Xiaokang Zhang, Jifan Yu, Jian Tang, Jie Tang, Cuiping Li, and Hong Chen. 2022. Subgraph retrieval enhanced model for multi-hop knowledge base question answering. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 5773--5784

2022

[1] [1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

[3] [3]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to represent programs with graphs. In Proceedings of the International Conference on Learning Representations

2018

[4] [4]

Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019 a . code2seq: Generating sequences from structured representations of code. In Proceedings of the International Conference on Learning Representations

2019

[5] [5]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019 b . code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3(POPL):1--29

2019

[6] [6]

Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1533--1544

2013

[7] [7]

Haishuo Fang, Xiaodan Zhu, and Iryna Gurevych. 2024. DARA : Decomposition-alignment-reasoning autonomous language agent for question answering over knowledge graphs. In Findings of the Association for Computational Linguistics: ACL 2024, pages 3406--3432

2024

[8] [8]

Yu Gu, Xiang Deng, and Yu Su. 2023. Don't generate, discriminate: A proposal for grounding language models to real-world environments. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 4928--4949

2023

[9] [9]

Gaole He, Yunshi Lan, Jing Jiang, Wayne Xin Zhao, and Ji-Rong Wen. 2021. Improving multi-hop knowledge base question answering by learning intermediate supervision signals. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pages 553--561

2021

[10] [10]

Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. 2024. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, pages 1658--1677

2024

[11] [11]

Jinhao Jiang, Kun Zhou, Zican Dong, Keming Ye, Wayne Xin Zhao, and Ji-Rong Wen. 2023 a . Structgpt: A general framework for large language model to reason over structured data. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9237--9251

2023

[12] [12]

Jinhao Jiang, Kun Zhou, Wayne Xin Zhao, and Ji-Rong Wen. 2023 b . Unikgqa: Unified retrieval and reasoning for solving multi-hop question answering over knowledge graph. In Proceedings of the International Conference on Learning Representations

2023

[13] [13]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient memory management for large language model serving with PagedAttention . In Proceedings of the 29th Symposium on Operating Systems Principles

2023

[14] [14]

Yunshi Lan and Jing Jiang. 2020. Query graph generation for answering multi-hop complex questions from knowledge bases. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 969--974

2020

[15] [15]

Mufei Li, Siqi Miao, and Pan Li. 2025 a . Simple is effective: The roles of graphs and large language models in knowledge-graph-based retrieval-augmented generation. In Proceedings of the International Conference on Learning Representations

2025

[16] [16]

Tianle Li, Xueguang Ma, Alex Zhuang, Yu Gu, Yu Su, and Wenhu Chen. 2023 a . Few-shot in-context learning for knowledge base question answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 6966--6981

2023

[17] [17]

Yading Li, Dandan Song, Changzhi Zhou, Yuhang Tian, Hao Wang, Ziyi Yang, and Shuhao Zhang. 2024. A framework of knowledge graph-enhanced large language model based on question decomposition and atomic retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 11472--11485

2024

[18] [18]

Yucheng Li, Bo Dong, Frank Guerin, and Chenghua Lin. 2023 b . Compressing context to enhance inference efficiency of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6342--6353

2023

[19] [19]

Zongqian Li, Yinhong Liu, Yixuan Su, and Nigel Collier. 2025 b . Prompt compression for large language models: A survey. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7182--7195

2025

[20] [20]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157--173

2024

[21] [21]

Haoran Luo, Haihong E, Zichen Tang, Shiyao Peng, Yikai Guo, Wentai Zhang, Chenghao Ma, Guanting Dong, Meina Song, Wei Lin, Yifan Zhu, and Anh Tuan Luu. 2024 a . ChatKBQA : A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models. In Findings of the Association for Computational Linguistics: ACL 2024, p...

2024

[22] [22]

Linhao Luo, Yuan-Fang Li, Gholamreza Haffari, and Shirui Pan. 2024 b . Reasoning on graphs: Faithful and interpretable large language model reasoning. In Proceedings of the International Conference on Learning Representations

2024

[23] [23]

Linhao Luo, Zicheng Zhao, Gholamreza Haffari, Yuan-Fang Li, Chen Gong, and Shirui Pan. 2025. Graph-constrained reasoning: Faithful reasoning on knowledge graphs with large language models. In Proceedings of the 42nd International Conference on Machine Learning, volume 267

2025

[24] [24]

Costas Mavromatis and George Karypis. 2025. GNN - RAG : Graph neural retrieval for efficient large language model reasoning on knowledge graphs. In Findings of the Association for Computational Linguistics: ACL 2025, pages 16682--16699

2025

[25] [25]

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Mengzhou Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor R \"u hle, Yuqing Yang, Lili Qiu, and Dongmei Zhang. 2024. Llmlingua-2: Data distillation for efficient and faithful task-agnostic prompt compression. In Findings of the Association for Computational Linguistics: ACL 2024, pages 963--981

2024

[26] [26]

Qwen . 2024. Qwen2.5 technical report. arXiv preprint arXiv:2412.15115

work page internal anchor Pith review Pith/arXiv arXiv 2024

[27] [27]

Apoorv Saxena, Aditay Tripathi, and Partha Talukdar. 2020. Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4498--4507

2020

[28] [28]

Jiaxin Shi, Shulin Cao, Lei Hou, Juanzi Li, and Hanwang Zhang. 2021. TransferNet : An effective and transparent framework for multi-hop question answering over relation graph. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

2021

[29] [29]

Haitian Sun, Tania Bedrax-Weiss, and William W. Cohen. 2019. PullNet : Open domain question answering with iterative retrieval on knowledge bases and text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing

2019

[30] [30]

Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Mazaitis, Ruslan Salakhutdinov, and William W. Cohen. 2018. Open domain question answering using early fusion of knowledge bases and text. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4231--4242

2018

[31] [31]

Ni, Heung-Yeung Shum, Jian Guo, and Nan Zhang

Jiashuo Sun, Chengjin Xu, Luming Tang, Saizhuo Wang, Chen Lin, Yeyun Gong, Lionel M. Ni, Heung-Yeung Shum, Jian Guo, and Nan Zhang. 2024. Think-on-graph: Deep and responsible reasoning of large language model on knowledge graph. In Proceedings of the International Conference on Learning Representations

2024

[32] [32]

Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering complex questions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, pages 641--651

2018

[33] [33]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824--24837

2022

[34] [34]

Fangyuan Xu, Weijia Shi, and Eunsol Choi. 2024. Recomp: Improving retrieval-augmented lms with context compression and selective augmentation. In Proceedings of the International Conference on Learning Representations

2024

[35] [35]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. React: Synergizing reasoning and acting in language models. In Proceedings of the International Conference on Learning Representations

2023

[36] [36]

Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, and Jure Leskovec. 2021. QA-GNN : Reasoning with language models and knowledge graphs for question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics

2021

[37] [37]

Wen-tau Yih, Matthew Richardson, Christopher Meek, Ming-Wei Chang, and Jina Suh. 2016. A value-based search method for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 505--515

2016

[38] [38]

Donghan Yu, Sheng Zhang, Patrick Ng, Henghui Zhu, Alexander Hanbo Li, Jun Wang, Yiqun Hu, William Wang, Zhiguo Wang, and Dilek Hakkani-Tur. 2023. Decaf: Joint decoding of answers and logical forms for knowledge base question answering. In Proceedings of the International Conference on Learning Representations

2023

[39] [39]

Jing Zhang, Xiaokang Zhang, Jifan Yu, Jian Tang, Jie Tang, Cuiping Li, and Hong Chen. 2022. Subgraph retrieval enhanced model for multi-hop knowledge base question answering. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 5773--5784

2022