pith. sign in

arxiv: 2606.28365 · v1 · pith:7SA273QVnew · submitted 2026-06-14 · 💻 cs.IR · cs.AI

CAMI: Cost-Aware Agent-Guided Multi-Indexing for Semantic Retrieval

Pith reviewed 2026-06-30 11:05 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords multi-indexingsemantic retrievalbudgeted optimizationRAG pipelinesportfolio selectionenrichment indices
0
0 comments X

The pith

CAMI selects high-recall enrichment index portfolios for semantic retrieval by solving a budgeted portfolio problem with atomic evaluation and early pruning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CAMI as a way to manage the combinatorial cost of adding semantic enrichment indices such as synthetic queries or summaries to a retrieval corpus. It frames the choice of which enrichment types and generator models to materialize as a budgeted portfolio selection task rather than exhaustive search. The method proposes corpus-specific templates through an agent, evaluates individual enrichment-model pairs at atomic units, recombines promising ones via fidelity-local closure, and prunes weak candidates early through a confidence-aware schedule. This produces portfolios that improve recall over content-only baselines while keeping total evaluation spend independent of full corpus size.

Core claim

CAMI formalizes multi-index construction as a budgeted multi-objective portfolio selection problem and solves it through an agentic discovery phase that proposes representation templates, an atomic-unit search procedure that evaluates enrichment-model pairs and recombines them via fidelity-local closure, and a confidence-aware promotion schedule that prunes unpromising configurations early.

What carries the argument

The atomic-unit search procedure that evaluates individual enrichment-model pairs and recombines them via fidelity-local closure to identify synergistic portfolios.

Load-bearing premise

The atomic-unit evaluations plus local closure reliably surface the best combined portfolios without needing to test every possible combination on the entire corpus.

What would settle it

A controlled experiment on a corpus where the highest-recall combinations only emerge after full-corpus evaluation of all pairs would show whether the early-pruning and closure steps miss superior portfolios.

Figures

Figures reproduced from arXiv: 2606.28365 by Adnan Qidwai, Anand Eswaran, Jaydeep Sen, Sachindra Joshi, Sonam Mishra.

Figure 1
Figure 1. Figure 1: The CAMI Framework. The top panel outlines the multi-index ingestion and retrieval pipeline, where an agent [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The Recall–Cost Trade-off. This figure illustrates how different multi-index combinations impact both retrieval quality [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The Empirical Pareto Frontier. Scatter plots illustrat [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Recall vs. Budget Trajectory at Fidelity-2. The em [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

RAG ingestion pipelines frequently augment search corpus index with semantic enrichment indices (e.g., synthetic queries or summaries generated from corpus chunks) that are subsequently queried alongside the base index to improve retrieval via better alignment between document representations and user intent. While these supplementary representations substantially improve retrieval quality, they introduce a computational bottleneck: the configuration space of enrichment types and generator models is combinatorial, and the cost of exhaustive index-time evaluation scales linearly with corpus size. We introduce CAMI (Cost-Aware Multi-Indexing), a framework that formalizes multi-index construction as a budgeted, multi-objective portfolio selection problem. CAMI targets the upstream decision of which enrichment views to generate and materialize before the retrieval backend is applied. CAMI incorporates three primary mechanisms: (i) an agentic discovery phase that proposes corpus-specific representation templates; (ii) an atomic-unit search procedure that evaluates individual enrichment-model pairs and recombines them via fidelity-local closure to identify synergistic portfolios; and (iii) a confidence-aware promotion schedule that prunes unpromising configurations early, decoupling optimization spend from total corpus size. We evaluate CAMI across diverse retrieval corpora. Our findings reveal that the framework systematically isolates high-recall portfolios under strict budget constraints, outperforming standard content-only baselines in challenging settings by up to 9.4% recall@10. Further, CAMI is able to systematically identify these high-recall portfolios using up to 5x less budget compared to random search baselines, making our approach practical in real production scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces CAMI, a framework that formalizes multi-index construction for RAG ingestion as a budgeted multi-objective portfolio selection problem. It uses an agentic discovery phase, an atomic-unit search procedure that evaluates enrichment-model pairs and recombines them via fidelity-local closure, and a confidence-aware promotion schedule to prune configurations early, claiming to identify high-recall portfolios under budget constraints with up to 9.4% recall@10 gains over content-only baselines and 5x less budget than random search.

Significance. If the central claims hold with rigorous validation, the work would be significant for production RAG systems by decoupling optimization cost from corpus size while improving retrieval quality through systematic portfolio selection.

major comments (2)
  1. Abstract: performance numbers (9.4% recall@10 lift, 5x budget reduction) are stated without any accompanying dataset descriptions, baseline details, error bars, ablation results, or statistical tests, so the central empirical claims cannot be assessed for soundness.
  2. Abstract (atomic-unit search procedure and fidelity-local closure): the headline efficiency and quality claims rest on the assumption that local fidelity evaluations and the closure operator capture synergistic interactions that only manifest at full-corpus scale; no evidence or argument is supplied that non-local synergies are not systematically missed by early pruning, which directly undermines the budgeted-search superiority claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: Abstract: performance numbers (9.4% recall@10 lift, 5x budget reduction) are stated without any accompanying dataset descriptions, baseline details, error bars, ablation results, or statistical tests, so the central empirical claims cannot be assessed for soundness.

    Authors: We agree that the abstract presents the headline results at a high level without sufficient context for immediate assessment. While abstracts have strict length limits, we will revise the abstract to briefly reference the diverse retrieval corpora used and explicitly direct readers to Sections 4 and 5 for full details on datasets, baselines, ablations, error bars, and statistical tests. This change will make the claims more transparent without expanding the abstract beyond reasonable bounds. revision: yes

  2. Referee: Abstract (atomic-unit search procedure and fidelity-local closure): the headline efficiency and quality claims rest on the assumption that local fidelity evaluations and the closure operator capture synergistic interactions that only manifest at full-corpus scale; no evidence or argument is supplied that non-local synergies are not systematically missed by early pruning, which directly undermines the budgeted-search superiority claim.

    Authors: The atomic-unit search with fidelity-local closure is formulated to identify synergistic portfolios by recombining locally evaluated enrichment-model pairs, with the portfolio selection objective providing the basis for why local fidelity is a reasonable proxy. We acknowledge that the current manuscript does not explicitly argue against systematic omission of non-local synergies. We will add a new subsection in the methodology explaining this rationale grounded in the budgeted multi-objective formulation, and include supporting experiments on smaller corpora where full-scale evaluation is tractable to empirically validate that the approach does not systematically miss high-value configurations. revision: yes

Circularity Check

0 steps flagged

No derivation chain or equations present; circularity not applicable

full rationale

The manuscript abstract and description introduce CAMI as a framework with three mechanisms (agentic discovery, atomic-unit search with fidelity-local closure, confidence-aware schedule) and report empirical results, but contain no equations, formal derivations, or load-bearing mathematical steps. No self-citations, ansatzes, or fitted inputs are quoted that reduce a claimed prediction to its inputs by construction. Performance claims rest on experimental evaluation rather than a first-principles derivation. Per the rules, absence of any quotable reduction means score 0 with empty steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; framework is described at the level of high-level mechanisms without mathematical formulation or modeling assumptions.

pith-pipeline@v0.9.1-grok · 5816 in / 1104 out tokens · 31633 ms · 2026-06-30T11:05:12.872006+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 48 canonical work pages · 11 internal anchors

  1. [1]

    GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

    Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khattab. 2026. GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning. arXiv:2507.1945...

  2. [2]

    Elron Bandel, Yotam Perlitz, Elad Venezian, Roni Friedman, Ofir Arviv, Matan Orbach, Shachar Don-Yehiya, Dafna Sheinwald, Ariel Gera, Leshem Choshen, et al

  3. [3]

    InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations)

    Unitxt: Flexible, shareable and reusable data preparation and evaluation for generative ai. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations). 207–215

  4. [4]

    Matthew Barker, Andrew Bell, Evan Thomas, James Carr, Thomas Andrews, and Umang Bhatt. 2025. Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems. arXiv:2502.18635 [cs.LG] https://arxiv. org/abs/2502.18635

  5. [5]

    Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Matei Zaharia, James Zou, and Ion Stoica. 2025. Optimizing Model Selection for Compound AI Systems. arXiv:2502.14815 [cs.AI] https://arxiv.org/abs/2502.14815

  6. [6]

    Lingjiao Chen, Matei Zaharia, and James Zou. 2023. FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance. arXiv:2305.05176 [cs.LG] https://arxiv.org/abs/2305.05176

  7. [7]

    Peter Baile Chen, Tomer Wolfson, Michael Cafarella, and Dan Roth

  8. [8]

    arXiv:2504.03598 [cs.CL] https://arxiv.org/abs/2504.03598

    EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline. arXiv:2504.03598 [cs.CL] https://arxiv.org/abs/2504.03598

  9. [9]

    Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

    Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, and Ion Stoica. 2024. Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. arXiv:2403.04132 [cs.AI] https://arxiv.org/abs/2403. 04132

  10. [10]

    Alexander Conway, Debadeepta Dey, Stefan Hackmann, Matthew Hausknecht, Michael Schmidt, Mark Steadman, and Nick Volynets. 2025. syftr: Pareto-Optimal Generative AI. arXiv:2505.20266 [cs.AI] https://arxiv.org/abs/2505.20266

  11. [11]

    Cormack, Charles L A Clarke, and Stefan Buettcher

    Gordon V. Cormack, Charles L A Clarke, and Stefan Buettcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval(Boston, MA, USA)(SIGIR ’09). Association for Computing Machinery, New York, NY, USA, 758–759...

  12. [12]

    Jonathan Larson Darren Edge, Ha Trinh. 2024. LazyGraphRAG: Setting a new standard for quality and cost. https://www.microsoft.com/en-us/research/blog/ lazygraphrag-setting-a-new-standard-for-quality-and-cost/

  13. [13]

    Laxman Dhulipala, Majid Hadian, Rajesh Jayaram, Jason Lee, and Vahab Mir- rokni. 2024. MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings. arXiv:2405.19504 [cs.DS] https://arxiv.org/abs/2405.19504

  14. [14]

    Kuicai Dong, Derrick Goh Xin Deik, Yi Quan Lee, Hao Zhang, Xiangyang Li, Cong Zhang, and Yong Liu. 2024. Multi-view Content-aware Indexing for Long Document Retrieval. arXiv:2404.15103 [cs.CL] https://arxiv.org/abs/2404.15103

  15. [15]

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2025. From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv:2404.16130 [cs.CL] https://arxiv.org/abs/2404.16130

  16. [16]

    Abdellah Ghassel, Ian Robinson, Gabriel Tanase, Hal Cooper, Bryan Thompson, Zhen Han, Vassilis Ioannidis, Soji Adeshina, and Huzefa Rangwala. 2025. Hierar- chical Lexical Graph for Enhanced Multi-Hop Retrieval. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD ’25). ACM, 4457–4466. doi:10.1145/3711896.3737233

  17. [17]

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783 (2024)

  18. [18]

    Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Saizhuo Wang, Kun Zhang, Yuanzhuo Wang, Wen Gao, Lionel Ni, and Jian Guo. 2025. A Survey on LLM-as- a-Judge. arXiv:2411.15594 [cs.CL] https://arxiv.org/abs/2411.15594

  19. [19]

    Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su

  20. [20]

    Hipporag: Neurobiologically inspired long-term memory for large language models, 2025

    HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. arXiv:2405.14831 [cs.CL] https://arxiv.org/abs/2405.14831

  21. [21]

    Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, and Yu Su. 2025. From RAG to Memory: Non-Parametric Continual Learning for Large Language Models. arXiv:2502.14802 [cs.CL] https://arxiv.org/abs/2502.14802

  22. [22]

    Jonathan Herzig, Thomas Müller, Syrine Krichene, and Julian Martin Eisensch- los. 2021. Open Domain Question Answering over Tables via Dense Retrieval. arXiv:2103.12011 [cs.CL] https://arxiv.org/abs/2103.12011

  23. [23]

    Qihao Huang. 2026. LLM-Guided Beam Search for Decision Graph Optimization with Dynamic Prompting(WSDM ’26). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3773966.3778003

  24. [24]

    Yiqian Huang, Shiqi Zhang, and Xiaokui Xiao. 2025. KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG. arXiv:2502.09304 [cs.IR] https://arxiv.org/abs/2502.09304

  25. [25]

    Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. arXiv:2004.12832 [cs.IR] https://arxiv.org/abs/2004.12832

  26. [26]

    Seungone Kim, Jamin Shin, Yejin Cho, Joel Jang, Shayne Longpre, Hwaran Lee, Sangdoo Yun, Seongjin Shin, Sungdong Kim, James Thorne, and Minjoon Seo

  27. [27]

    arXiv:2310.08491 [cs.CL] https://arxiv.org/abs/2310.08491

    Prometheus: Inducing Fine-grained Evaluation Capability in Language Models. arXiv:2310.08491 [cs.CL] https://arxiv.org/abs/2310.08491

  28. [28]

    Tennison Liu, Nicolas Huynh, and Mihaela van der Schaar. 2025. De- cision Tree Induction Through LLMs via Semantically-Aware Evolution. arXiv:2503.14217 [cs.LG] https://arxiv.org/abs/2503.14217

  29. [29]

    Nikolaos Livathinos, Christoph Auer, Maksym Lysak, Ahmed Nassar, Michele Dolfi, Panos Vagenas, Cesar Berrospi Ramis, Matteo Omenetti, Kasper Dinkla, Yusik Kim, et al. 2025. Docling: An efficient open-source toolkit for ai-driven document conversion.arXiv preprint arXiv:2501.17887(2025)

  30. [30]

    Xing Han Lu, Siva Reddy, and Harm de Vries. 2023. The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2799–2829. doi:10.18653/v1/2023.eacl-main.206

  31. [31]

    Sadegh Mahdavi, Branislav Kisacanin, Shubham Toshniwal, Wei Du, Ivan Moshkov, George Armstrong, Renjie Liao, Christos Thrampoulidis, and Igor Gitman. 2025. Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection. arXiv:2511.13027 [cs.AI] https://arxiv.org/abs/ 2511.13027

  32. [32]

    Factor, Shila Ofek-Koifman, Paula Ta-Shma, and Assaf Toledo

    Matan Orbach, Ohad Eytan, Benjamin Sznajder, Ariel Gera, Odellia Boni, Yoav Kantor, Gal Bloch, Omri Levy, Hadas Abraham, Nitzan Barzilay, Eyal Shnarch, Michael E. Factor, Shila Ofek-Koifman, Paula Ta-Shma, and Assaf Toledo. 2025. An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation. arXiv:2505.03452 [cs.CL] https://arxiv....

  33. [33]

    Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. 2024. Graph Retrieval-Augmented Generation: A Survey. arXiv:2408.08921 [cs.AI] https://arxiv.org/abs/2408.08921

  34. [34]

    Guillem Ramírez, Alexandra Birch, and Ivan Titov. 2024. Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection. arXiv:2405.02134 [cs.CL] https://arxiv.org/abs/2405.02134

  35. [35]

    Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). 3982–3992

  36. [36]

    2009.The probabilistic relevance frame- work: BM25 and beyond

    Stephen Robertson and Hugo Zaragoza. 2009.The probabilistic relevance frame- work: BM25 and beyond. Vol. 4. Now Publishers Inc

  37. [37]

    Cafarella

    Matthew Russo, Sivaprasad Sudhir, Gerardo Vitagliano, Chunwei Liu, Tim Kraska, Samuel Madden, and Michael J. Cafarella. 2025. Abacus: A Cost-Based Optimizer for Semantic Operator Systems.CoRRabs/2505.14661 (2025). arXiv:2505.14661 doi:10.48550/ARXIV.2505.14661

  38. [38]

    Keshav Santhanam, Omar Khattab, Christopher Potts, and Matei Zaharia. 2022. PLAID: An Efficient Engine for Late Interaction Retrieval. arXiv:2205.09707 [cs.IR] https://arxiv.org/abs/2205.09707

  39. [39]

    Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. 2022. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. arXiv:2112.01488 [cs.IR] https://arxiv.org/abs/2112.01488

  40. [40]

    Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning. 2024. RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval. arXiv:2401.18059 [cs.CL] https://arxiv.org/abs/2401. 18059

  41. [41]

    Jan Luca Scheerer, Matei Zaharia, Christopher Potts, Gustavo Alonso, and Omar Khattab. 2025. WARP: An Efficient Engine for Multi-Vector Retrieval. InProceed- ings of the 48th International ACM SIGIR Conference on Research and Develop- ment in Information Retrieval (SIGIR ’25). ACM, 2504–2512. doi:10.1145/3726302. 3729904

  42. [42]

    Robin Schmucker, Michele Donini, Muhammad Bilal Zafar, David Salinas, and Cédric Archambeau. 2021. Multi-objective Asynchronous Successive Halving. arXiv:2106.12639 [stat.ML] https://arxiv.org/abs/2106.12639

  43. [43]

    Parameswaran, and Eugene Wu

    Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, and Eugene Wu. 2025. DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing. arXiv:2410.12189 [cs.DB] https://arxiv.org/abs/2410.12189

  44. [44]

    Shivanshu Shekhar, Tanishq Dubey, Koyel Mukherjee, Apoorv Saxena, Atharv Tyagi, and Nishanth Kotla. 2024. Towards Optimizing the Costs of LLM Usage. arXiv:2402.01742 [cs.CL] https://arxiv.org/abs/2402.01742

  45. [45]

    Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. 2025. Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG. arXiv:2501.09136 [cs.AI] https://arxiv.org/abs/2501.09136

  46. [46]

    Dimitris Stripelis, Zijian Hu, Jipeng Zhang, Zhaozhuo Xu, Alay Dilipbhai Shah, Han Jin, Yuhang Yao, Salman Avestimehr, and Chaoyang He. 2024. CAMI:Cost-Aware Multi-Indexing ACM CAIS ’26, May 26–29, 2026, San Jose, CA, USA TensorOpera Router: A Multi-Model Router for Efficient LLM Inference. arXiv:2408.12320 [cs.AI] https://arxiv.org/abs/2408.12320

  47. [47]

    Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O

    Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, and Tao Yu. 2025. BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval. arXiv:2407.12883 [cs.CL] https://arxiv.org/abs/2407.12883

  48. [48]

    Parameswaran

    Lindsey Linxi Wei, Shreya Shankar, Sepanta Zeighami, Yeounoh Chung, Fatma Ozcan, and Aditya G. Parameswaran. 2026. Multi-Objective Agentic Rewrites for Unstructured Data Processing. arXiv:2512.02289 [cs.DB] https://arxiv.org/abs/ 2512.02289

  49. [49]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)

  50. [50]

    Peiru Yang, Xintian Li, Zhiyang Hu, Jiapeng Wang, Jinhua Yin, Huili Wang, Lizhi He, Shuai Yang, Shangguang Wang, Yongfeng Huang, and Tao Qi. 2025. HeteRAG: A Heterogeneous Retrieval-augmented Generation Framework with Decoupled Knowledge Representations. arXiv:2504.10529 [cs.IR] https://arxiv.org/abs/2504. 10529

  51. [51]

    Zhicheng Yang, Zhijiang Guo, Yinya Huang, Yongxin Wang, Yiwei Wang, Xiaodan Liang, and Jing Tang. 2025. Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers. arXiv:2509.23152 [cs.LG] https://arxiv.org/ abs/2509.23152

  52. [52]

    Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, and Guojie Song. 2024. ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution. arXiv:2402.01145 [cs.NE] https: //arxiv.org/abs/2402.01145

  53. [53]

    Raquib Bin Yousuf, Shengzhe Xu, Mandar Sharma, Andrew Neeser, Chris La- timer, and Naren Ramakrishnan. 2026. Utilizing Metadata for Better Retrieval- Augmented Generation. arXiv:2601.11863 [cs.IR] https://arxiv.org/abs/2601.11863

  54. [54]

    Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, and Danqi Chen

  55. [55]

    arXiv:2310.07641 [cs.CL] https://arxiv.org/abs/2310.07641

    Evaluating Large Language Models at Evaluating Instruction Following. arXiv:2310.07641 [cs.CL] https://arxiv.org/abs/2310.07641

  56. [56]

    Nan Zhang, Prafulla Kumar Choubey, Alexander Fabbri, Gabriel Bernadett- Shapiro, Rui Zhang, Prasenjit Mitra, Caiming Xiong, and Chien-Sheng Wu. 2025. SiReRAG: Indexing Similar and Related Information for Multihop Reasoning. arXiv:2412.06206 [cs.CL] https://arxiv.org/abs/2412.06206

  57. [57]

    Zihao Zhang, Hui Wei, Kenan Jiang, Shijia Pan, Shu Kai, and Fei Liu

  58. [58]

    arXiv:2505.14656 [cs.AI] https://arxiv.org/abs/2505.14656

    Cost-Awareness in Tree-Search LLM Planning: A Systematic Study. arXiv:2505.14656 [cs.AI] https://arxiv.org/abs/2505.14656

  59. [59]

    Eckart Zitzler and Lothar Thiele. 2002. Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach.IEEE transactions on Evolutionary Computation3, 4 (2002), 257–271. A Appendix: Cost Accounting and Token Pricing Model To calculate the real-world dollar cost for any LLM invoca- tion—whether for generating an EDR text ...