pith. sign in

arxiv: 2607.00508 · v1 · pith:LK3LOF5Onew · submitted 2026-07-01 · 💻 cs.IR

When RAG Meets Query Planning: Logical Query Trees for Resolving Exploratory Reasoning Problems

Pith reviewed 2026-07-02 06:46 UTC · model grok-4.3

classification 💻 cs.IR
keywords PlanRAGlogical query treesexploratory reasoning problemsretrieval-augmented generationdynamic programmingquery planningWikiWeb-ERP
0
0 comments X

The pith

PlanRAG turns exploratory reasoning problems into logical query trees to reduce noise and errors in RAG.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PlanRAG as a framework that decomposes complex natural language queries with high uncertainty into atomic parts and assembles them into logical query trees using dynamic programming and a cost model. This structured planning replaces the ad-hoc iteration or graph traversal common in current RAG systems. If the trees guide retrieval and generation without introducing new gaps, the method supplies an explicit optimization layer for queries that standard approaches handle poorly. Readers would care because it reframes RAG as a query-planning task rather than repeated retrieval loops.

Core claim

PlanRAG models ERPs of natural language as logical query trees by first decomposing them into atomic queries and then organizing them using dynamic programming guided by a cost model involving multiple complementary dimensions. It executes iterative aggregation, rewriting, retrieval, and generation over the LQTs, processing nodes concurrently and propagating intermediate results upward, with further parallelization across multiple threads. Experiments show that PlanRAG outperforms state-of-the-art iteration-based and graph-based RAG systems on the WikiWeb-ERP dataset, thereby providing a new formulation for optimizing natural language queries.

What carries the argument

Logical query trees (LQTs) that represent the decomposition of an exploratory reasoning problem into atomic queries and their dependencies, constructed and optimized via dynamic programming with a multi-dimensional cost model.

If this is right

  • Structured tree planning limits error accumulation across retrieval steps for ambiguous queries.
  • Concurrent node processing and thread-level parallelization improve execution speed over sequential RAG loops.
  • The cost-model-guided dynamic programming supplies an explicit optimization objective for natural-language query trajectories.
  • The same decomposition-plus-aggregation pattern applies to any RAG pipeline that must handle high-uncertainty inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The tree representation could be combined with existing LLM planners to produce hybrid systems that switch between tree search and free-form iteration.
  • If the cost model dimensions prove transferable, the same machinery might adapt to other retrieval settings such as multi-hop question answering or tool-use chains.
  • Parallel execution over LQTs suggests the method scales with additional compute resources without changing the core planning logic.

Load-bearing premise

Exploratory reasoning problems in natural language can be decomposed into atomic queries and organized into logical query trees via dynamic programming without representation gaps or optimization errors that undermine retrieval quality.

What would settle it

On the WikiWeb-ERP dataset, a head-to-head run in which PlanRAG shows no accuracy gain over the iteration-based and graph-based baselines it compares against.

Figures

Figures reproduced from arXiv: 2607.00508 by Chen Yang, Deqing Yang, Ganlin Xu, Hongda Xi, Jiaqing Liang, Linghao Zhang, Sihang Jiang, Weijia Lu, Yanghua Xiao, Zhitao Yin.

Figure 1
Figure 1. Figure 1: (a) A comparison between an ERP, where entities [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The pipeline of our proposed PlanRAG [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparisons of token cost, runtime, and GPU memory across various RAG methods (better viewed in color). [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) Performance of PlanRAG across different scaling [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: System cost of PlanRAG when varying query com [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 6
Figure 6. Figure 6: LQT execution comparison of PlanRAG and ChainRAG on an representative ERP introduced in Section 1. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
read the original abstract

Retrieval-Augmented Generation (RAG) effectively grounds large language models (LLMs) in external knowledge but struggles with \textbf{exploratory reasoning problems (ERPs)} that are the complex queries involving high uncertainty and ambiguity. Resolving ERPs requires complex reasoning with unclear paths, tending to result in retrieval noise and error accumulation. Furthermore, the absence of an end-to-end planning mechanism makes it difficult to generate effective trajectories for ERPs. Motivated by database query planning, we introduce \emph{PlanRAG}, an RAG framework that models ERPs of natural language as \textbf{logical query trees (LQTs)}. However, translating ERPs into LQTs is non-trivial due to representation and optimization gaps between structured SQL and unstructured natural language, making it highly challenging to construct high-quality LQTs. To address these problems, we first decompose ERPs into atomic queries and then organize them into LQTs using dynamic programming guided by a cost model involving multiple complementary dimensions. Finally, we execute iterative aggregation, rewriting, retrieval, and generation over LQTs, processing nodes concurrently and propagating intermediate results upward, with further parallelization across multiple threads for efficiency. Our experimental results show that PlanRAG outperforms state-of-the-art iteration-based and graph-based RAG systems on our newly constructed dataset, \textbf{WikiWeb-ERP}, thereby providing a new formulation for optimizing natural language queries. Our source code and dataset are available at https://anonymous.4open.science/r/PlanRAG-main-B2C8/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper introduces PlanRAG, a RAG framework that models exploratory reasoning problems (ERPs) as logical query trees (LQTs). ERPs are first decomposed into atomic queries, which are then organized into LQTs via dynamic programming guided by a multi-dimensional cost model. The framework executes iterative aggregation, rewriting, retrieval, and generation over the LQT nodes (with concurrency and multi-threading), and reports that this outperforms iteration-based and graph-based RAG baselines on the newly constructed WikiWeb-ERP dataset. Code and data are released.

Significance. If the empirical results hold, the work supplies a concrete engineering formulation that imports database-style query planning into RAG, offering a structured way to manage uncertainty and error accumulation in complex natural-language queries. The open release of both code and the WikiWeb-ERP dataset is a clear strength that supports reproducibility and follow-on work.

minor comments (2)
  1. [Abstract] Abstract: the outperformance claim is stated without any numerical result, baseline name, or dataset statistic; adding one concrete figure (e.g., accuracy delta on WikiWeb-ERP) would make the central empirical contribution immediately verifiable.
  2. [Method] The description of the cost model used inside the dynamic program is high-level; a short paragraph or pseudocode block that lists the exact dimensions and how they are combined would improve clarity without altering the method.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review, recognition of the engineering contribution, and recommendation of minor revision. We are pleased that the open release of code and the WikiWeb-ERP dataset is noted as a strength. No major comments were provided in the report, so we will incorporate any minor suggestions in the revised manuscript.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper presents PlanRAG as an engineering framework that decomposes exploratory reasoning problems into atomic queries, organizes them into logical query trees via dynamic programming with a multi-dimensional cost model, and executes iterative retrieval/generation over the trees. The central claim is an empirical outperformance result on the newly constructed WikiWeb-ERP dataset, with code and data released. No equations, fitted parameters, self-citations, or derivations appear in the abstract or described method that reduce any prediction or uniqueness claim to the authors' own inputs by construction; the approach is self-contained as an independent implementation evaluated against external baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the modeling premise that natural language ERPs admit a faithful tree-structured decomposition analogous to SQL plans; no free parameters or invented physical entities are mentioned.

axioms (1)
  • domain assumption Exploratory reasoning problems in natural language can be decomposed into atomic queries that are then organized into logical query trees without critical loss of semantic fidelity.
    This premise is invoked in the motivation and method paragraphs to justify the translation from unstructured text to structured planning.

pith-pipeline@v0.9.1-grok · 5845 in / 1291 out tokens · 25687 ms · 2026-07-02T06:46:12.605254+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 38 canonical work pages · 8 internal anchors

  1. [1]

    Cagri Balkesen, Nitin Kunal, Georgios Giannikis, Pit Fender, Seema Sundara, Felix Schmidt, Jarod Wen, Sandeep Agrawal, Arun Raghavan, Venkatanathan Varadarajan, Anand Viswanathan, Balakrishnan Chandrasekaran, Sam Idicula, Nipun Agarwal, and Eric Sedlar. 2018. RAPID: In-Memory Analytical Query Processing Engine with Extreme Performance per Watt. InProceedi...

  2. [2]

    Kevin Zhou, and Jianliang Xu

    Yukun Cao, Zengyi Gao, Zhiyang Li, Xike Xie, S. Kevin Zhou, and Jianliang Xu. 2025. LEGO-GraphRAG: Modularizing Graph-Based Retrieval-Augmented Generation for Design Space Exploration.Proceedings of the VLDB Endowment 18, 10 (June 2025), 3269–3283. doi:10.14778/3748191.3748194

  3. [3]

    Surajit Chaudhuri. 1998. An overview of query optimization in relational systems. InProceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems(Seattle, Washington, USA)(PODS ’98). Association for Computing Machinery, New York, NY, USA, 34–43. doi:10.1145/275487.275492

  4. [4]

    Zijian Chen, Xueguang Ma, Shengyao Zhuang, Ping Nie, Kai Zou, Andrew Liu, Joshua Green, Kshama Patel, Ruoxi Meng, Mingyi Su, Sahel Sharifymoghad- dam, Yanxi Li, Haoran Hong, Xinyu Shi, Xuye Liu, Nandan Thakur, Crystina Zhang, Luyu Gao, Wenhu Chen, and Jimmy Lin. 2025. BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent...

  5. [5]

    Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, and Enhong Chen. 2025. A Survey on Knowledge-Oriented Retrieval-Augmented Generation. arXiv:2503.10677 [cs.CL] https://arxiv.org/abs/2503.10677

  6. [6]

    Rong Cheng, Jinyi Liu, Yan Zheng, Fei Ni, Jiazhen Du, Hangyu Mao, Fuzheng Zhang, Bo Wang, and Jianye Hao. 2025. DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Naben...

  7. [7]

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2025. From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv:2404.16130 [cs.CL] https://arxiv.org/abs/2404.16130

  8. [8]

    Jinyuan Fang, Zaiqiao Meng, and Craig MacDonald. 2025. KiRAG: Knowledge- Driven Iterative Retriever for Enhancing Retrieval-Augmented Generation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association...

  9. [9]

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. 2024. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997 [cs.CL] https://arxiv.org/abs/2312.10997

  10. [10]

    Xinyu Geng, Peng Xia, Zhen Zhang, Xinyu Wang, Qiuchen Wang, Ruixue Ding, Chenxi Wang, Jialong Wu, Yida Zhao, Kuan Li, Yong Jiang, Pengjun Xie, Fei Huang, and Jingren Zhou. 2025. WebWatcher: Breaking New Frontier of Vision- Language Deep Research Agent. arXiv:2508.05748 [cs.IR] https://arxiv.org/abs/ 2508.05748

  11. [11]

    Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. 2021. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies.Transactions of the Association for Computational Linguistics9 (2021), 346–361

  12. [12]

    Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. 2020. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reason- ing Steps. InProceedings of the 28th International Conference on Computational Linguistics, Donia Scott, Nuria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, Barcelona, Sp...

  13. [13]

    Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bo- janowski, Armand Joulin, and Edouard Grave. 2021. Unsupervised dense in- formation retrieval with contrastive learning.arXiv preprint arXiv:2112.09118 (2021)

  14. [14]

    Gautier Izacard and Edouard Grave. 2021. Distilling Knowledge from Reader to Retriever for Question Answering. InInternational Conference on Learning When RAG Meets Query Planning: Logical Query Trees for Resolving Exploratory Reasoning Problems Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Representations. https://openreview.net/forum?id=NTEz-6wysdb

  15. [15]

    Gautier Izacard and Edouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. InProceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Paola Merlo, Jorg Tiedemann, and Reut Tsarfaty (Eds.). Association for Computational Linguistics, Online...

  16. [16]

    Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval(Virtual Event, China)(SIGIR ’20). Association for Computing Machinery, New York, NY, USA, 39–48. doi:10.1145/3...

  17. [17]

    Jungyeon Lee, Lee Kangmin, and Taeuk Kim. 2025. MAGIC: A Multi-Hop and Graph-Based Benchmark for Inter-Context Conflicts in Retrieval-Augmented Generation. InFindings of the Association for Computational Linguistics: EMNLP 2025, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics...

  18. [18]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F...

  19. [19]

    Baixuan Li, Yunlong Fan, Tianyi Ma, Miao Gao, Chuanqi Shi, and Zhiqiang Gao

  20. [20]

    InFindings of the Association for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.)

    RASPberry: Retrieval-Augmented Monte Carlo Tree Self-Play with Reason- ing Consistency for Multi-Hop Question Answering. InFindings of the Association for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association for Computational Linguistics, Vienna, Austria, 11258–11276. doi:10.1...

  21. [21]

    Kuan Li, Zhongwang Zhang, Huifeng Yin, Liwen Zhang, Litu Ou, Jialong Wu, Wenbiao Yin, Baixuan Li, Zhengwei Tao, Xinyu Wang, Weizhou Shen, Junkai Zhang, Dingchu Zhang, Xixi Wu, Yong Jiang, Ming Yan, Pengjun Xie, Fei Huang, and Jingren Zhou. 2025. WebSailor: Navigating Super-human Reasoning for Web Agent. arXiv:2507.02592 [cs.CL] https://arxiv.org/abs/2507.02592

  22. [22]

    Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji- Rong Wen, and Zhicheng Dou. 2025. WebThinker: Empowering Large Reasoning Models with Deep Research Capability. arXiv:2504.21776 [cs.CL] https://arxiv. org/abs/2504.21776

  23. [23]

    Zijian Li, Xin Guan, Bo Zhang, Shen Huang, Houquan Zhou, Shaopeng Lai, Ming Yan, Yong Jiang, Pengjun Xie, Fei Huang, Jun Zhang, and Jingren Zhou. 2025. WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open- Ended Deep Research. arXiv:2509.13312 [cs.CL] https://arxiv.org/abs/2509.13312

  24. [24]

    Hao Liu, Zhengren Wang, Xi Chen, Zhiyu Li, Feiyu Xiong, Qinhan Yu, and Wentao Zhang. 2025. HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval- Augmented Generation. InFindings of the Association for Computational Linguis- tics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Moham- mad Taher Pilehvar (Eds.). Association for Computational ...

  25. [25]

    Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, and Thomas Scialom. 2023. GAIA: a benchmark for General AI Assistants. arXiv:2311.12983 [cs.CL] https://arxiv.org/abs/2311.12983

  26. [26]

    Zile Qiao, Guoxin Chen, Xuanzhong Chen, Donglei Yu, Wenbiao Yin, Xinyu Wang, Zhen Zhang, Baixuan Li, Huifeng Yin, Kuan Li, Rui Min, Minpeng Liao, Yong Jiang, Pengjun Xie, Fei Huang, and Jingren Zhou. 2025. WebResearcher: Unleashing un- bounded reasoning capability in Long-Horizon Agents. arXiv:2509.13309 [cs.CL] https://arxiv.org/abs/2509.13309

  27. [27]

    Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-Context Retrieval-Augmented Lan- guage Models.Transactions of the Association for Computational Linguistics11 (2023), 1316–1331. doi:10.1162/tacl_a_00605

  28. [28]

    Gowtham Ramesh, Makesh Narsimhan Sreedhar, and Junjie Hu. 2023. Single Sequence Prediction over Reasoning Graphs for Multi-hop QA. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto,...

  29. [29]

    Alsu Sagirova and Mikhail Burtsev. 2023. Uncertainty Guided Global Memory Improves Multi-Hop Question Answering. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 4317–4328. doi:10.18653/v1/2023.emnlp-main.262

  30. [30]

    Griffiths Selinger, M

    P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access path selection in a relational database management system. InProceedings of the 1979 ACM SIGMOD International Conference on Manage- ment of Data(Boston, Massachusetts)(SIGMOD ’79). Association for Computing Machinery, New York, NY, USA, 23–34. doi:10.1145/5...

  31. [31]

    Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen. 2023. Enhancing Retrieval-Augmented Large Language Models with It- erative Retrieval-Generation Synergy. InFindings of the Association for Com- putational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Sing...

  32. [32]

    Zhengliang Shi, Shuo Zhang, Weiwei Sun, Shen Gao, Pengjie Ren, Zhumin Chen, and Zhaochun Ren. 2024. Generate-then-Ground in Retrieval-Augmented Genera- tion for Multi-hop Question Answering. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds....

  33. [33]

    Zhengwei Tao, Haiyang Shen, Baixuan Li, Wenbiao Yin, Jialong Wu, Kuan Li, Zhongwang Zhang, Huifeng Yin, Rui Ye, Liwen Zhang, Xinyu Wang, Pengjun Xie, Jingren Zhou, and Yong Jiang. 2025. WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking. arXiv:2510.24697 [cs.CL] https://arxiv.org/abs/2510.24697

  34. [34]

    Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal

  35. [35]

    Transactions of the Association for Computational Linguistics10 (2022), 539–554

    MuSiQue: Multihop Questions via Single-hop Question Composition. Transactions of the Association for Computational Linguistics10 (2022), 539–554

  36. [36]

    Jason Wei, Zhiqing Sun, Spencer Papay, Scott McKinney, Jeffrey Han, Isa Fulford, Hyung Won Chung, Alex Tachard Passos, William Fedus, and Amelia Glaese

  37. [37]

    BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

    BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents. arXiv:2504.12516 [cs.CL] https://arxiv.org/abs/2504.12516

  38. [38]

    Jialong Wu, Baixuan Li, Runnan Fang, Wenbiao Yin, Liwen Zhang, Zhengwei Tao, Dingchu Zhang, Zekun Xi, Gang Fu, Yong Jiang, Pengjun Xie, Fei Huang, and Jingren Zhou. 2025. WebDancer: Towards Autonomous Information Seeking Agency. arXiv:2505.22648 [cs.CL] https://arxiv.org/abs/2505.22648

  39. [39]

    Jialong Wu, Wenbiao Yin, Yong Jiang, Zhenglin Wang, Zekun Xi, Runnan Fang, Linhai Zhang, Yulan He, Deyu Zhou, Pengjun Xie, and Fei Huang. 2025. Web- Walker: Benchmarking LLMs in Web Traversal. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, a...

  40. [40]

    Xixi Wu, Kuan Li, Yida Zhao, Liwen Zhang, Litu Ou, Huifeng Yin, Zhong- wang Zhang, Xinmiao Yu, Dingchu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Minhao Cheng, Shuai Wang, Hong Cheng, and Jingren Zhou. 2025. Re- Sum: Unlocking Long-Horizon Search Intelligence via Context Summarization. arXiv:2509.13313 [cs.CL] https://arxiv.org/abs/2509.13313

  41. [41]

    Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff, Defu Lian, and Jian-Yun Nie. 2024. C-Pack: Packed Resources For General Chinese Embed- dings. InProceedings of the 47th International ACM SIGIR Conference on Re- search and Development in Information Retrieval(Washington DC, USA)(SI- GIR ’24). Association for Computing Machinery, New York, NY, USA...

  42. [42]

    Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (...

  43. [43]

    Zhenrui Yue, Huimin Zeng, Lanyu Shang, Yifan Liu, Yang Zhang, and Dong Wang. 2024. Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computationa...

  44. [44]

    Rongzhi Zhu, Xiangyu Liu, Zequn Sun, Yiwei Wang, and Wei Hu. 2025. Mitigating Lost-in-Retrieval Problems in Retrieval Augmented Multi-Hop Question Answer- ing. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.)...

  45. [45]

    Rong Zhu, Lianggui Weng, Bolin Ding, and Jingren Zhou. 2024. Learned Query Optimizer: What is New and What is Next. InCompanion of the 2024 International Conference on Management of Data(Santiago AA, Chile)(SIG- MOD ’24). Association for Computing Machinery, New York, NY, USA, 561–569. doi:10.1145/3626246.3654692

  46. [46]

    Ziyuan Zhuang, Zhiyang Zhang, Sitao Cheng, Fangkai Yang, Jia Liu, Shu- jian Huang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Qi Zhang

  47. [47]

    InProceedings of the 2024 Conference on Empirical Methods in Natural Lan- guage Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.)

    EfficientRAG: Efficient Retriever for Multi-Hop Question Answering. InProceedings of the 2024 Conference on Empirical Methods in Natural Lan- guage Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Florida, USA, 3392–3411. doi:10.18653/v1/2024.emnlp-main.199 Conference acronym ’XX, June...

  48. [48]

    Contain only one triple (subject, relation, object)

  49. [49]

    No pronouns or ambiguous references — replace with full entities from the problem

  50. [50]

    Be directly retrievable from a KB or text corpus

  51. [51]

    Describe clearly what to retrieve, in natural language

  52. [52]

    Preserve ALL information in the original problem — do not omit any entity or relation

  53. [53]

    relation_type

    Output must be in strict JSON format as shown in the examples. —– [[Examples]] —– Problem: [[Problem]] Atomic Queries: Prompts for Determining Relationships between Atomic Queries For the given problem: [[query]] After decomposition, the atomic queries are: [[atomic queries]] Please determine semantic relationships type between two atomic queries: 1 (Pare...

  54. [54]

    Integrate all nodes into one fluent question, preserving all entities and constraints

  55. [55]

    Use clear, unambiguous language, no pronouns or omissions

  56. [56]

    answer":

    Output only the final natural language query — no explanation, no extra formatting. —– [[Examples]] —– Logical query tree: [[LogicalQueryTree]] Natural language query: Prompts for Rewriting Queries For the given original question: [[Query]] The current decomposed atomic query is: [[automic_query]] You are also given the previous questions and their answer...