pith. machine review for the scientific record. sign in

arxiv: 2605.08333 · v1 · submitted 2026-05-08 · 💻 cs.LG · cs.AI· cs.CL· cs.PF· cs.SE

Recognition: 2 theorem links

· Lean Theorem

CDS4RAG: Cyclic Dual-Sequential Hyperparameter Optimization for RAG

Pengzhou Chen, Tao Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CLcs.PFcs.SE
keywords retrieval-augmented generationhyperparameter optimizationcyclic optimizationretrievergeneratorlarge language modelsbenchmark evaluation
0
0 comments X

The pith

Cyclic alternation between retriever and generator hyperparameters lifts RAG output quality up to 1.54 times over prior optimizers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CDS4RAG as a way to tune the many interacting hyperparameters that control both the retrieval step and the generation step in retrieval-augmented systems. It does so by cycling the optimization focus between the two components in turn rather than treating the whole pipeline as one black box or optimizing only part of it. Within each cycle the method allocates evaluation budget at a fine grain and passes promising settings forward to seed the next cycle, especially for the generator. Experiments on four standard benchmarks with two different backbone models show that wrapping existing algorithms with this cycle produces better final answers in 21 of 24 settings and beats current state-of-the-art tuners in every case while also converging faster. If the pattern holds, RAG systems could be made reliable across new queries and domains without exhaustive manual search or prohibitive compute.

Core claim

CDS4RAG optimizes the full RAG hyperparameters using given queries via a new cyclic dual-sequential formulation. It distinguishes the hyperparameters of the retriever and generator, cyclically optimizing them in turn. Such a paradigm allows fine-grained within-cycle budget provision and expedites the optimization via cross-cycle seeding when optimizing the generator. CDS4RAG is algorithm-agnostic and can be paired with diverse general algorithms. On four common benchmarks and two backbone LLMs it boosts vanilla algorithms in 21/24 cases while significantly outperforming state-of-the-art algorithms in all cases with up to 1.54x improvements of generation quality and better speedup.

What carries the argument

The cyclic dual-sequential formulation that alternates optimization focus between retriever hyperparameters and generator hyperparameters while using cross-cycle seeding to accelerate later cycles.

If this is right

  • Wrapping common hyperparameter search algorithms with the cyclic schedule improves final answer quality in the great majority of tested settings.
  • State-of-the-art RAG tuners are outperformed on every benchmark examined both in quality and in wall-clock speed.
  • Treating the retriever and generator as separate but linked objects allows budget to be spent more precisely than black-box approaches permit.
  • Cross-cycle seeding shortens the search especially for generator parameters once retriever behavior has been explored.
  • The same framework can be dropped onto any base optimizer without changing its internal logic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same alternating schedule could be tested on other composite pipelines such as tool-augmented agents where one stage feeds another.
  • If cycle length or seeding rules were made adaptive to observed interaction strength, the method might converge even faster on new domains.
  • Real-world deployments that tune once on a fixed query set may still need periodic re-tuning when the underlying corpus changes.
  • The reported speedups suggest that full RAG tuning could become routine rather than reserved for high-stakes applications.

Load-bearing premise

The interactions between retriever and generator hyperparameters can be effectively captured and accelerated by cyclic alternation and cross-cycle seeding without missing globally superior joint configurations.

What would settle it

Running an exhaustive joint search over a modest grid of retriever-plus-generator settings on one of the four benchmarks and checking whether any higher-scoring configuration exists outside the ones discovered by CDS4RAG.

Figures

Figures reproduced from arXiv: 2605.08333 by Pengzhou Chen, Tao Chen.

Figure 1
Figure 1. Figure 1: General workflow and hyperparameters of RAG. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Exampled correlations between the quality of retrieval and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: The function of ∆ changes with cycle progress t (β = 2). As from [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: A case of CDS4RAG for RAG that completes T = 6 cycles. Notably, interactions exist between the two mechanisms: the budget provision determines the number of possible cycles, which then influences the cross-cycle seeding that impacts the convergence speed of RAG hyperparameter optimization. 6.4 A Case Study For the case in [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Full correlation analysis between retriever and generator [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The unified prompt template for RAG. Method Hyperparameter Values Description HEBO Model = GP; Acq = MACE; Opt = NSGA-II; Initial size = 5 Heteroscedastic evolutionary BO using MACE acquisition function. BO Model = GP; Acq = EI; Initial = Max-min; Initial size = 5 Standard Bayesian Optimization with Gaussian Process surrogate. TPE Initial size = 5; γ = 0.25; Use kernel density estimation to distinguish goo… view at source ↗
Figure 9
Figure 9. Figure 9: Comparing CDS4RAG paired with HEBO, BO, and TPE against their vanilla original versions for optimizing RAG over 10 runs. 0.25 0.30 0.35 0.40 F1-score Llama-3.1-8B Agriculture 0.24 0.28 0.32 0.36 Biography 0.42 0.48 0.54 0.60 0.66 HotpotQA 0.23 0.25 0.28 0.30 BioASQ 0 15 30 45 60 Time (min) 0.23 0.25 0.28 0.30 F1-score Qwen-3-8B 0 15 30 45 60 Time (min) 0.21 0.24 0.27 0.30 0.33 0 15 30 45 60 Time (min) 0.50… view at source ↗
Figure 10
Figure 10. Figure 10: Ablation results on CDS4RAG over 10 runs. 0.25 0.30 0.35 0.40 F1-score Llama-3.1-8B Agriculture 0.24 0.28 0.32 0.36 Biography 0.42 0.48 0.54 0.60 0.66 HotpotQA 0.23 0.25 0.28 0.30 BioASQ 0 15 30 45 60 Time (min) 0.21 0.24 0.27 0.30 F1-score Qwen-3-8B 0 15 30 45 60 Time (min) 0.21 0.24 0.27 0.30 0.33 0 15 30 45 60 Time (min) 0.50 0.60 0.70 0.80 0 15 30 45 60 Time (min) 0.21 0.22 0.23 0.24 CDS4RAG (N=5) CDS… view at source ↗
Figure 11
Figure 11. Figure 11: Sensitivity of CDS4RAG to N over 10 runs. D Discussion D.1 Precision vs. Richness in Retrieval Stage Indeed, when optimizing in the retrieval stage, there can be a trade-off between precision and richness, but their relationship is not strictly conflicting: there exist “sweet point” hyper￾parameter configurations that have both better precision and richness than the others, beyond which point biases to ei… view at source ↗
read the original abstract

Retrieval-Augmented Generation (RAG) is sensitive to the vast hyperparameters of the retriever and generator, yet optimizing them using given queries is a challenging task due to the complex interactions and expensive evaluation costs. Existing algorithms are ineffective and slow in convergence, since they often treat RAG as a monolithic black box or only optimize partial hyperparameters. In this paper, we propose CDS4RAG, a framework that optimizes the full RAG hyperparameters using given queries via a new cyclic dual-sequential formulation. CDS4RAG is special in the sense that it distinguishes the hyperparameters of the retriever and generator, cyclically optimizing them in turn. Such a paradigm allows us to design fine-grained within-cycle budget provision and expedite the optimization via cross-cycle seeding when optimizing the generator. CDS4RAG is also an algorithm-agnostic framework that can be paired with diverse general algorithms. Through experiments on four common benchmarks and two backbone LLMs, we reveal that CDS4RAG considerably boosts the vanilla algorithms in 21/24 cases while significantly outperforming state-of-the-art algorithms in all cases with up to 1.54x improvements of generation quality and better speedup.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces CDS4RAG, a cyclic dual-sequential hyperparameter optimization framework for RAG systems. It decomposes the joint hyperparameter space into alternating retriever and generator sub-problems, with within-cycle budget allocation and cross-cycle seeding of the generator optimizer from prior retriever results. The method is algorithm-agnostic and is claimed to boost vanilla optimizers in 21/24 cases while outperforming SOTA methods with up to 1.54x gains in generation quality across four benchmarks and two LLMs.

Significance. If the empirical results hold under rigorous controls, CDS4RAG offers a practical, algorithm-agnostic way to handle expensive joint hyperparameter tuning in RAG by exploiting component separation and seeding. This could improve generation quality and convergence speed in retrieval-augmented systems, with potential for integration into existing black-box optimizers.

major comments (3)
  1. [§3] §3: The cyclic alternation with cross-cycle seeding assumes that sequential sub-problem optimization plus seeding reaches high-quality joint configurations, but the manuscript provides no comparison against a joint-search baseline (even on a reduced grid) and no analysis showing that strong retriever-generator interactions do not cause the method to miss globally superior pairs.
  2. [§4] §4 (Experiments): The central claims of 'considerably boosts the vanilla algorithms in 21/24 cases' and 'significantly outperforming state-of-the-art algorithms in all cases' rest on results that report no statistical significance tests, no details on whether baselines received identical evaluation budgets, and no controls for post-hoc selection, undermining the reliability of the performance assertions.
  3. [Ablation studies] Ablation studies: No ablation that disables cross-cycle seeding is reported, leaving open whether the reported gains stem from the cyclic structure itself or simply from additional optimization steps, and whether seeding systematically excludes better joint optima.
minor comments (2)
  1. [Abstract] The abstract and results sections could more explicitly state the evaluation metric (e.g., exact generation quality measure) behind the '1.54x' figure and the precise definition of the 24 cases.
  2. [§2] Notation distinguishing retriever versus generator hyperparameter spaces in §2 and §3 would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important aspects for strengthening the rigor of our work. We agree with the recommendation for major revision and will incorporate the suggested additions and clarifications in the revised manuscript. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: [§3] §3: The cyclic alternation with cross-cycle seeding assumes that sequential sub-problem optimization plus seeding reaches high-quality joint configurations, but the manuscript provides no comparison against a joint-search baseline (even on a reduced grid) and no analysis showing that strong retriever-generator interactions do not cause the method to miss globally superior pairs.

    Authors: We acknowledge that a direct comparison to joint optimization would provide stronger validation of the cyclic dual-sequential approach. Joint search over the full hyperparameter space is indeed computationally prohibitive for RAG systems, which motivates the decomposition in CDS4RAG. To address this, we will add a comparison against a joint-search baseline on a reduced grid in the revised Section 3 and experiments. We will also include an analysis of retriever-generator interactions (e.g., via correlation metrics or sensitivity plots) to demonstrate that the method does not systematically miss superior joint configurations. These additions will clarify the assumptions and empirical support. revision: yes

  2. Referee: [§4] §4 (Experiments): The central claims of 'considerably boosts the vanilla algorithms in 21/24 cases' and 'significantly outperforming state-of-the-art algorithms in all cases' rest on results that report no statistical significance tests, no details on whether baselines received identical evaluation budgets, and no controls for post-hoc selection, undermining the reliability of the performance assertions.

    Authors: We agree that statistical tests and explicit controls are essential for reliable claims. In the revised manuscript, we will add statistical significance tests (e.g., paired t-tests with p-values) for the reported improvements in Section 4. We will explicitly confirm that all methods, including baselines and SOTA, received identical evaluation budgets (same number of function evaluations per run). We will also detail the experimental protocol to show no post-hoc selection was applied, including fixed random seeds and full reporting of all runs. These changes will be incorporated to support the claims of boosting vanilla algorithms in 21/24 cases and outperforming SOTA. revision: yes

  3. Referee: [Ablation studies] Ablation studies: No ablation that disables cross-cycle seeding is reported, leaving open whether the reported gains stem from the cyclic structure itself or simply from additional optimization steps, and whether seeding systematically excludes better joint optima.

    Authors: We concur that an ablation disabling cross-cycle seeding is needed to isolate its contribution. We will add this ablation study in the revised version, comparing full CDS4RAG against a no-seeding variant (while keeping the cyclic structure and within-cycle budget allocation). This will quantify whether gains arise from the cyclic formulation and seeding versus extra steps, and will include checks (e.g., best-found configurations) to address whether seeding excludes superior joint optima. Results will be reported in the ablation studies subsection with the same benchmarks and LLMs. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic framework with direct empirical validation

full rationale

The paper introduces CDS4RAG as a new cyclic dual-sequential optimization framework that decomposes RAG hyperparameter search into alternating retriever and generator subproblems, with within-cycle budget allocation and cross-cycle seeding. This is presented as an algorithm-agnostic design choice that can be paired with existing black-box optimizers. The central claims are empirical improvements (21/24 boosts over vanilla, up to 1.54x over SOTA) measured on four benchmarks and two LLMs. No equations, predictions, or first-principles derivations appear that reduce any result to a fitted quantity or self-referential definition by construction. No load-bearing self-citations or uniqueness theorems are invoked. The derivation chain is the algorithm description itself, which is independent of the reported performance numbers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that retriever and generator hyperparameters interact in a way that benefits from cyclic rather than joint optimization; no free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption RAG performance can be improved by separately cycling optimization of retriever versus generator hyperparameters with cross-cycle seeding
    This is the core premise that enables the claimed speedup and quality gains.

pith-pipeline@v0.9.0 · 5512 in / 1212 out tokens · 41160 ms · 2026-05-12T00:55:39.826806+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 3 internal anchors

  1. [1]

    GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

    [Agrawalet al., 2025 ] Lakshya A. Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl- Ong, Arnav Singhvi, Herumb Shandilya, Michael J. Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Daniel Klein, Matei Zaharia, and Omar Khattab. GEPA: reflective prompt evolution can outperform reinforcement le...

  2. [2]

    Qwen3 Technical Report

    [Alibaba, 2025] Qwen Team Alibaba. Qwen3 technical re- port.CoRR, abs/2505.09388,

  3. [3]

    Awad, Neeratyoy Mallik, and Frank Hutter

    [Awadet al., 2021 ] Noor H. Awad, Neeratyoy Mallik, and Frank Hutter. DEHB: evolutionary hyberband for scalable, robust and efficient hyperparameter optimization. InPro- ceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI, pages 2147–2153,

  4. [4]

    Faster, cheaper, better: Multi-objective hyperparameter optimization for LLM and RAG systems

    [Barkeret al., 2025 ] Matthew Barker, Andrew Bell, Evan Thomas, James Carr, Thomas Andrews, and Umang Bhatt. Faster, cheaper, better: Multi-objective hyperparameter optimization for LLM and RAG systems. InICLR 2025 Workshop on Foundation Models in the Wild,

  5. [5]

    Algorithms for hyper- parameter optimization

    [Bergstraet al., 2011 ] James Bergstra, R ´emi Bardenet, Yoshua Bengio, and Bal´azs K´egl. Algorithms for hyper- parameter optimization. InAnnual Conference on Neural Information Processing Systems, pages 2546–2554,

  6. [6]

    Promisetune: Unveiling causally promising and explain- able configuration tuning

    [Chen and Chen, 2026] Pengzhou Chen and Tao Chen. Promisetune: Unveiling causally promising and explain- able configuration tuning. In48th IEEE/ACM International Conference on Software Engineering (ICSE). ACM,

  7. [7]

    Multi- objectivizing software configuration tuning

    [Chen and Li, 2021] Tao Chen and Miqing Li. Multi- objectivizing software configuration tuning. InESEC/FSE ’21: 29th ACM Joint European Software Engineering Con- ference and Symposium on the Foundations of Software Engineering, pages 453–465. ACM,

  8. [8]

    Adapting multi-objectivized software configuration tuning.Proc

    [Chen and Li, 2024] Tao Chen and Miqing Li. Adapting multi-objectivized software configuration tuning.Proc. ACM Softw. Eng., 1(FSE):539–561,

  9. [9]

    FEMOSAA: feature-guided and knee-driven multi- objective optimization for self-adaptive software.ACM Trans

    [Chenet al., 2018 ] Tao Chen, Ke Li, Rami Bahsoon, and Xin Yao. FEMOSAA: feature-guided and knee-driven multi- objective optimization for self-adaptive software.ACM Trans. Softw. Eng. Methodol., 27(2):5:1–5:50,

  10. [10]

    MMO: meta multi-objectivization for software configura- tion tuning.IEEE Trans

    [Chenet al., 2024 ] Pengzhou Chen, Tao Chen, and Miqing Li. MMO: meta multi-objectivization for software configura- tion tuning.IEEE Trans. Software Eng., 50(6):1478–1504,

  11. [11]

    Accuracy can lie: On the impact of surrogate model in configuration tuning.IEEE Trans

    [Chenet al., 2025 ] Pengzhou Chen, Jingzhi Gong, and Tao Chen. Accuracy can lie: On the impact of surrogate model in configuration tuning.IEEE Trans. Software Eng., 51(2):548–580,

  12. [12]

    Lifelong dynamic optimization for self-adaptive systems: Fact or fiction? InIEEE Inter- national Conference on Software Analysis, Evolution and Reengineering, SANER, pages 78–89

    [Chen, 2022] Tao Chen. Lifelong dynamic optimization for self-adaptive systems: Fact or fiction? InIEEE Inter- national Conference on Software Analysis, Evolution and Reengineering, SANER, pages 78–89. IEEE,

  13. [13]

    HEBO: Pushing the limits of sample-efficient hyperparameter op- timisation

    [Cowen-Riverset al., 2022 ] Alexander Imani Cowen-Rivers, Wenlong Lyu, Rasul Tutunov, Zhi Wang, Antoine Grosnit, Ryan-Rhys Griffiths, Alexandre Max Maraval, Jianye HAO, Jun Wang, Jan Peters, and Haitham Bou Ammar. HEBO: Pushing the limits of sample-efficient hyperparameter op- timisation. InFirst Conference on Automated Machine Learning (Journal Track),

  14. [14]

    BOHB: robust and efficient hyperparameter opti- mization at scale

    [Falkneret al., 2018 ] Stefan Falkner, Aaron Klein, and Frank Hutter. BOHB: robust and efficient hyperparameter opti- mization at scale. InProceedings of the 35th International Conference on Machine Learning, ICML, volume 80 ofPro- ceedings of Machine Learning Research, pages 1436–1445. PMLR,

  15. [15]

    Autorag- hp: Automatic online hyper-parameter tuning for retrieval- augmented generation

    [Fuet al., 2024 ] Jia Fu, Xiaoting Qin, Fangkai Yang, Lu Wang, Jue Zhang, Qingwei Lin, Yubo Chen, Dong- mei Zhang, Saravan Rajmohan, and Qi Zhang. Autorag- hp: Automatic online hyper-parameter tuning for retrieval- augmented generation. InFindings of the Association for Computational Linguistics: EMNLP, pages 3875–3891,

  16. [16]

    Pre- dicting software performance with divide-and-learn

    [Gong and Chen, 2023] Jingzhi Gong and Tao Chen. Pre- dicting software performance with divide-and-learn. In Proceedings of the 31st ACM Joint European Software En- gineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE, pages 858–870,

  17. [17]

    Predict- ing configuration performance in multiple environments with sequential meta-learning.Proc

    [Gong and Chen, 2024] Jingzhi Gong and Tao Chen. Predict- ing configuration performance in multiple environments with sequential meta-learning.Proc. ACM Softw. Eng., 1(FSE):359–382,

  18. [18]

    Dividable configuration performance learning.IEEE Trans

    [Gonget al., 2025 ] Jingzhi Gong, Tao Chen, and Rami Bah- soon. Dividable configuration performance learning.IEEE Trans. Software Eng., 51(1):106–134,

  19. [19]

    LightRAG: Simple and fast retrieval-augmented generation

    [Guoet al., 2025 ] Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. LightRAG: Simple and fast retrieval-augmented generation. InFindings of the Associa- tion for Computational Linguistics: EMNLP,

  20. [20]

    Adaptive-rag: Learn- ing to adapt retrieval-augmented large language models through question complexity

    [Jeonget al., 2024 ] Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong Park. Adaptive-rag: Learn- ing to adapt retrieval-augmented large language models through question complexity. Inthe Conference of the North American Chapter of the Association for Computa- tional Linguistics: NAACL, pages 7036–7050,

  21. [21]

    Openbox: A python toolkit for generalized black- box optimization.J

    [Jianget al., 2024 ] Huaijun Jiang, Yu Shen, Yang Li, Be- icheng Xu, Sixian Du, Wentao Zhang, Ce Zhang, and Bin Cui. Openbox: A python toolkit for generalized black- box optimization.J. Mach. Learn. Res., 25:120:1–120:11,

  22. [22]

    Bioasq-qa: A manually curated corpus for biomedical ques- tion answering.Scientific Data, 10:170,

    [Kritharaet al., 2023 ] Anastasia Krithara, Anastasios Nen- tidis, Konstantinos Bougiatiotis, and Georgios Paliouras. Bioasq-qa: A manually curated corpus for biomedical ques- tion answering.Scientific Data, 10:170,

  23. [23]

    Retrieval- augmented generation for knowledge-intensive NLP tasks

    [Lewiset al., 2020 ] Patrick Lewis, Ethan Perez, Aleksan- dra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K ¨uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨aschel, Sebastian Riedel, and Douwe Kiela. Retrieval- augmented generation for knowledge-intensive NLP tasks. InAnnual Conference on Neural Information Processing Systems,

  24. [24]

    Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: an empirical study

    [Liet al., 2020 ] Ke Li, Zilin Xiang, Tao Chen, Shuo Wang, and Kay Chen Tan. Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: an empirical study. InInternational Conference on Software Engineering (ICSE), pages 566–577,

  25. [25]

    Tune: A Research Platform for Distributed Model Selection and Training

    [Liawet al., 2018 ] Richard Liaw, Eric Liang, Robert Nishi- hara, Philipp Moritz, Joseph E. Gonzalez, and Ion Stoica. Tune: A research platform for distributed model selection and training.CoRR, abs/1807.05118,

  26. [26]

    The Llama 3 Herd of Models

    [Meta, 2024] Llama Team Meta. The llama 3 herd of models. CoRR, abs/2407.21783,

  27. [27]

    Ai4contracts: LLM & rag-powered encoding of financial derivative contracts

    [Mridulet al., 2025 ] Maruf Ahmed Mridul, Ian Sloyan, Aparna Gupta, and Oshani Seneviratne. Ai4contracts: LLM & rag-powered encoding of financial derivative contracts. Inthe Thirty-Fourth International Joint Conference on Ar- tificial Intelligence, IJCAI, pages 9305–9312,

  28. [28]

    Ragtruth: A hallucination corpus for developing trustworthy retrieval-augmented language models

    [Niuet al., 2024 ] Cheng Niu, Yuanhao Wu, Juno Zhu, Siliang Xu, Kashun Shum, Randy Zhong, Juntong Song, and Tong Zhang. Ragtruth: A hallucination corpus for developing trustworthy retrieval-augmented language models. Inthe 62nd Annual Meeting of the Association for Computational Linguistics: ACL, pages 10862–10878,

  29. [29]

    Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, and Omar Khattab

    [Opsahl-Onget al., 2024 ] Krista Opsahl-Ong, Michael J. Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, and Omar Khattab. Optimizing instructions and demonstrations for multi-stage language model programs. InProceedings of the Conference on Empirical Methods in Natural Language Processing, pages 9340–9366,

  30. [30]

    An analysis of hyper-parameter optimiza- tion methods for retrieval augmented generation.CoRR, abs/2505.03452,

    [Orbachet al., 2025 ] Matan Orbach, Ohad Eytan, Benjamin Sznajder, Ariel Gera, Odellia Boni, Yoav Kantor, Gal Bloch, Omri Levy, Hadas Abraham, Nitzan Barzilay, Eyal Shnarch, Michael Factor, Shila Ofek-Koifman, Paula Ta-Shma, and Assaf Toledo. An analysis of hyper-parameter optimiza- tion methods for retrieval augmented generation.CoRR, abs/2505.03452,

  31. [31]

    Msrag: Knowledge augumented image captioning with object-level multi-source RAG

    [Qiaoet al., 2025 ] Yuming Qiao, Yuechen Wang, Dan Meng, Haonan Lu, Zhenyu Yang, and Xudong Zhang. Msrag: Knowledge augumented image captioning with object-level multi-source RAG. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI, pages 6093–6101,

  32. [32]

    Squad: 100, 000+ ques- tions for machine comprehension of text

    [Rajpurkaret al., 2016 ] Pranav Rajpurkar, Jian Zhang, Kon- stantin Lopyrev, and Percy Liang. Squad: 100, 000+ ques- tions for machine comprehension of text. InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP, pages 2383–2392,

  33. [33]

    METIS: fast quality-aware RAG systems with configuration adaptation

    [Rayet al., 2025 ] Siddhant Ray, Rui Pan, Zhuohan Gu, Kun- tai Du, Shaoting Feng, Ganesh Ananthanarayanan, Ravi Netravali, and Junchen Jiang. METIS: fast quality-aware RAG systems with configuration adaptation. InProceed- ings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, SOSP, pages 606–622. ACM,

  34. [34]

    Evaluating retrieval quality in retrieval-augmented generation

    [Salemi and Zamani, 2024] Alireza Salemi and Hamed Za- mani. Evaluating retrieval quality in retrieval-augmented generation. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Infor- mation Retrieval, SIGIR, pages 2395–2400,

  35. [35]

    Ma- hadi Hassan, Micah J

    [Santuet al., 2022 ] Shubhra Kanti Karmaker Santu, Md. Ma- hadi Hassan, Micah J. Smith, Lei Xu, Chengxiang Zhai, and Kalyan Veeramachaneni. Automl to date and beyond: Chal- lenges and opportunities.ACM Comput. Surv., 54(8):175:1– 175:36,

  36. [36]

    A review on bilevel optimization: From classical to evolutionary approaches and applications.IEEE Trans

    [Sinhaet al., 2018 ] Ankur Sinha, Pekka Malo, and Kalyan- moy Deb. A review on bilevel optimization: From classical to evolutionary approaches and applications.IEEE Trans. Evol. Comput., 22(2):276–295,

  37. [37]

    [Snoeket al., 2012 ] Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical bayesian optimization of ma- chine learning algorithms. InAnnual Conference on Neural Information Processing Systems, pages 2960–2968,

  38. [38]

    On the distribution of points in a cube and the approximate evaluation of integrals.USSR Computational Mathematics and Mathematical Physics, 7(4):86–112,

    [Sobol’, 1967] I.M Sobol’. On the distribution of points in a cube and the approximate evaluation of integrals.USSR Computational Mathematics and Mathematical Physics, 7(4):86–112,

  39. [39]

    Rag and roll: An end-to-end evaluation of indirect prompt manipulations in llm-based application frameworks.CoRR, abs/2408.05025,

    [Stefanoet al., 2024 ] Gianluca De Stefano, Lea Sch ¨onherr, and Giancarlo Pellegrino. Rag and roll: An end-to-end evaluation of indirect prompt manipulations in llm-based application frameworks.CoRR, abs/2408.05025,

  40. [40]

    Dually hierarchical drift adaptation for online con- figuration performance learning

    [Xianget al., 2026 ] Zezhen Xiang, Jingzhi Gong, and Tao Chen. Dually hierarchical drift adaptation for online con- figuration performance learning. In48th International Con- ference on Software Engineering (ICSE),

  41. [41]

    Co- tune: Co-evolutionary configuration tuning

    [Xiong and Chen, 2025] Gangda Xiong and Tao Chen. Co- tune: Co-evolutionary configuration tuning. In40th IEEE/ACM International Conference on Automated Soft- ware Engineering, ASE, pages 1490–1502. IEEE,

  42. [42]

    Cohen, Ruslan Salakhutdi- nov, and Christopher D

    [Yanget al., 2018 ] Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdi- nov, and Christopher D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 2369–2380,

  43. [43]

    Dis- tilled lifelong self-adaptation for configurable systems

    [Yeet al., 2025 ] Yulong Ye, Tao Chen, and Miqing Li. Dis- tilled lifelong self-adaptation for configurable systems. In 47th IEEE/ACM International Conference on Software En- gineering, ICSE, pages 1333–1345. IEEE,

  44. [44]

    Revealing domain-spatiality patterns for configuration tuning: Domain knowledge meets fitness landscapes.ACM Trans

    [Yeet al., 2026 ] Yulong Ye, Hongyuan Liang, Chao Jiang, Miqing Li, and Tao Chen. Revealing domain-spatiality patterns for configuration tuning: Domain knowledge meets fitness landscapes.ACM Trans. Softw. Eng. Methodol.,

  45. [45]

    Evaluation of retrieval- augmented generation: A survey.CoRR, abs/2405.07437,

    [Yuet al., 2024 ] Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, and Zhaofeng Liu. Evaluation of retrieval- augmented generation: A survey.CoRR, abs/2405.07437,