pith. machine review for the scientific record. sign in

arxiv: 2604.26981 · v1 · submitted 2026-04-28 · 💻 cs.IR · cs.LG

Recognition: unknown

Budget-Constrained Online Retrieval-Augmented Generation: The Chunk-as-a-Service Model

Ala Al-Fuqaha, Ammar Gharaibeh, Junaid Qadir, Mohamed Abdallah, Mohamed Rahouti, Mohammad Ruhul Amin, Shawqi Al-Maliki

Authors on Pith no claims yet

Pith reviewed 2026-05-07 15:08 UTC · model grok-4.3

classification 💻 cs.IR cs.LG
keywords Retrieval-Augmented GenerationChunk-as-a-ServiceOnline SelectionBudget ConstraintsUtility-Cost TradeoffRAG-as-a-ServicePrompt EnrichmentVector Database
0
0 comments X

The pith

Chunk-as-a-Service charges only for the relevant chunks actually retrieved to enrich specific prompts, selected online under budget limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Chunk-as-a-Service as a replacement for RAG-as-a-Service, which currently bills per prompt without regard to whether a relevant chunk is found or used. Two variants, Open-Budget CaaS and Limited-Budget CaaS, are enabled by the Utility-Cost Online Selection Algorithm that makes real-time decisions on which prompts to enrich. The algorithm balances estimated utility against retrieval cost while respecting a spending limit and without seeing future prompts. Experiments show the online method improves the combined metric of enriched prompts times average relevance by 52 percent over random selection and reaches 75 percent of fully offline performance. Readers would care because existing pricing hides whether any value is delivered and forces payment for useless queries.

Core claim

By treating retrieval-augmented generation as a Chunk-as-a-Service model and applying the Utility-Cost Online Selection Algorithm for budget-constrained online decisions, a subset of prompts can be enriched to maximize the product of the number of enriched prompts and their average relevance while achieving higher performance-to-budget ratios than per-prompt billing.

What carries the argument

The Utility-Cost Online Selection Algorithm (UCOSA), which estimates utility and cost for each prompt in real time and selects a subset to enrich under budget constraints.

If this is right

  • Limited-Budget CaaS and Open-Budget CaaS achieve performance-to-budget ratios of 140 percent and 86 percent compared with RaaS.
  • UCOSA improves the NEP times AR metric by about 52 percent relative to random selection.
  • UCOSA reaches around 75 percent of the NEP times AR performance of offline selection methods.
  • CaaS ties charges directly to the relevance and usage of individual chunks rather than every submitted prompt.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same online utility-cost framing could apply to other streaming services that allocate limited resources to user requests whose value is revealed only after partial processing.
  • Providers adopting CaaS would have a direct incentive to improve chunk quality because payment depends on measured relevance.
  • Real-world deployment could test whether the assumed real-time estimates remain stable when user prompts arrive from diverse domains.

Load-bearing premise

Prompt utility and chunk relevance can be estimated accurately enough in real time to guide good selection decisions without any knowledge of future prompts or complete context.

What would settle it

A logged sequence of prompts with known ground-truth relevance scores and utilities where running the online UCOSA selection produces a lower total NEP times AR value than the offline optimum computed after seeing the entire sequence.

Figures

Figures reproduced from arXiv: 2604.26981 by Ala Al-Fuqaha, Ammar Gharaibeh, Junaid Qadir, Mohamed Abdallah, Mohamed Rahouti, Mohammad Ruhul Amin, Shawqi Al-Maliki.

Figure 1
Figure 1. Figure 1: Standard pipeline of RAG System. It demonstrates an abstract view of the RAG pipeline, highlighting how a plain view at source ↗
Figure 2
Figure 2. Figure 2: A high-level illustration highlighting the position of view at source ↗
Figure 3
Figure 3. Figure 3: A toy example illustrating how various RAG variants, including Standard RaaS and our proposed OB-CaaS and LB view at source ↗
Figure 4
Figure 4. Figure 4: The pipeline of the OB-CaaS. Unlike standard RaaS where the indexed chunks are not incorporated with a price reflecting the value of their data sources, the indexed chunks in the proposed OB-CaaS have a negotiation-based price set during the price agreement stage between the data source owner and the service provider (detailed in Section II-B). This incorporation of the prices into the chunks enables quant… view at source ↗
Figure 5
Figure 5. Figure 5: The pipeline of the proposed LB-CaaS. A cost-aware variant that is more cost-effective than the OB-CaaS, allowing accessibility for end users with limited budgets. Utilizing UCOSA, a proposed online selection algorithm, a subset of plain prompts is selected optimally on the fly for the enrichment process, enabling cost-efficient generation of grounded output. compose an enriched prompt and pass it to the L… view at source ↗
Figure 6
Figure 6. Figure 6: Performance of UCOSA compared to the offline (best case) and relevance-greedy selection algorithms considering view at source ↗
Figure 7
Figure 7. Figure 7: UCOSA’s performance improves with increased budget view at source ↗
Figure 8
Figure 8. Figure 8: The performance, used budget, and the performance-to-budget ratio of LB-CaaS compared to the OB-CaaS and RaaS view at source ↗
read the original abstract

Large Language Models (LLMs) have revolutionized the field of natural language processing. However, they exhibit some limitations, including a lack of reliability and transparency: they may hallucinate and fail to provide sources that support the generated output. Retrieval-Augmented Generation (RAG) was introduced to address such limitations in LLMs. One popular implementation, RAG-as-a-Service (RaaS), has shortcomings that hinder its adoption and accessibility. For instance, RaaS pricing is based on the number of submitted prompts, without considering whether the prompts are enriched by relevant chunks, i.e., text segments retrieved from a vector database, or the quality of the utilized chunks (i.e., their degree of relevance). This results in an opaque and less cost-effective payment model. We propose Chunk-as-a-Service (CaaS) as a transparent and cost-effective alternative. CaaS includes two variants: Open-Budget CaaS (OB-CaaS) and Limited-Budget CaaS (LB-CaaS), which is enabled by our ``Utility-Cost Online Selection Algorithm (UCOSA)''. UCOSA further extends the cost-effectiveness and the accessibility of the OB-CaaS variant by enriching, in an online manner, a subset of the submitted prompts based on budget constraints and utility-cost tradeoff. Our experiments demonstrate the efficacy of the proposed UCOSA compared to both offline and relevance-greedy selection baselines. In terms of the performance metric-the number of enriched prompts (NEP) multiplied by the Average Relevance (AR)-UCOSA outperforms random selection by approximately 52% and achieves around 75% of the performance of offline selection methods. Additionally, in terms of budget utilization, LB-CaaS and OB-CaaS achieve higher performance-to-budget ratios of 140% and 86%, respectively, compared to RaaS, indicating their superior efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Chunk-as-a-Service (CaaS) as a transparent, utility-based alternative to RAG-as-a-Service (RaaS), introducing Open-Budget CaaS (OB-CaaS) and Limited-Budget CaaS (LB-CaaS) variants. The latter is enabled by the Utility-Cost Online Selection Algorithm (UCOSA), which performs online prompt enrichment under budget constraints by trading off estimated prompt utility against chunk retrieval cost. Experiments claim UCOSA improves the NEP × AR metric by approximately 52% over random selection and reaches ~75% of offline selection performance, while CaaS variants show superior performance-to-budget ratios (140% and 86%) relative to RaaS.

Significance. If the empirical claims are substantiated with reproducible details, the work could meaningfully advance practical RAG deployment by shifting from per-prompt pricing to per-enrichment utility pricing. The online UCOSA formulation directly addresses streaming query scenarios with hard budget limits, and the performance-to-budget ratio metric highlights efficiency gains over volume-based models.

major comments (2)
  1. [Experimental Evaluation] Experimental Evaluation section: The central performance claims (UCOSA outperforming random by ~52% and reaching ~75% of offline on NEP×AR; 140%/86% budget ratios) are reported without any description of the datasets, query volume, vector database characteristics, how prompt utility and chunk relevance scores are computed in real time, the precise implementation of the random/relevance-greedy/offline baselines, or statistical tests for the reported deltas. These omissions make the claims unverifiable and prevent assessment of whether the online policy's advantage holds under realistic estimation noise.
  2. [UCOSA algorithm] UCOSA algorithm description (likely §3): The online selection policy rests on the assumption that prompt utility and chunk relevance can be estimated accurately from only the current prompt and available chunks, without future context. No sensitivity analysis, noise model, or ablation on estimation error is provided; if these estimates are biased or low-accuracy (common in evolving RAG streams), the reported gains over relevance-greedy and random baselines could collapse.
minor comments (2)
  1. [Abstract] Abstract: The performance-to-budget ratios are stated as '140% and 86%' without clarifying the exact baseline value or normalization; this should be defined explicitly (e.g., relative to RaaS NEP×AR per unit budget).
  2. [Evaluation Metrics] Notation: The combined metric NEP × AR is introduced without discussion of whether it correlates with downstream generation quality or user utility; a brief justification or correlation study would strengthen the evaluation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We appreciate the opportunity to clarify and strengthen the presentation of our work on Chunk-as-a-Service and the UCOSA algorithm. Below, we provide point-by-point responses to the major comments and indicate the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Experimental Evaluation] Experimental Evaluation section: The central performance claims (UCOSA outperforming random by ~52% and reaching ~75% of offline on NEP×AR; 140%/86% budget ratios) are reported without any description of the datasets, query volume, vector database characteristics, how prompt utility and chunk relevance scores are computed in real time, the precise implementation of the random/relevance-greedy/offline baselines, or statistical tests for the reported deltas. These omissions make the claims unverifiable and prevent assessment of whether the online policy's advantage holds under realistic estimation noise.

    Authors: We agree with the referee that the Experimental Evaluation section requires more detailed information to ensure reproducibility and verifiability of the results. In the revised manuscript, we will expand this section to include: (1) descriptions of the datasets used, including their sources and characteristics; (2) details on query volumes and vector database setups; (3) explanations of how prompt utility and chunk relevance scores are computed in real time, including any models or heuristics employed; (4) precise descriptions of the baseline implementations (random, relevance-greedy, and offline); and (5) statistical tests (e.g., t-tests or confidence intervals) to support the reported performance deltas. These additions will allow readers to assess the robustness of UCOSA under realistic conditions. revision: yes

  2. Referee: [UCOSA algorithm] UCOSA algorithm description (likely §3): The online selection policy rests on the assumption that prompt utility and chunk relevance can be estimated accurately from only the current prompt and available chunks, without future context. No sensitivity analysis, noise model, or ablation on estimation error is provided; if these estimates are biased or low-accuracy (common in evolving RAG streams), the reported gains over relevance-greedy and random baselines could collapse.

    Authors: The referee correctly identifies that UCOSA relies on online estimates of utility and relevance based on the current prompt and chunks. We acknowledge that the original submission did not include a sensitivity analysis or ablation on estimation errors. To address this, we will add a new subsection in the revised manuscript that includes: (i) a noise model for estimation errors, (ii) sensitivity analysis varying the accuracy of utility and relevance estimates, and (iii) ablations showing how performance degrades under increasing noise levels. This will demonstrate the conditions under which the reported gains hold and provide insights into robustness in realistic RAG scenarios with potential estimation biases. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation against external baselines with no self-referential reductions.

full rationale

The paper defines the UCOSA algorithm and CaaS variants explicitly, then reports experimental comparisons of NEP×AR and performance-to-budget ratios against random, relevance-greedy, and offline baselines. No equations are presented that reduce the reported gains to fitted parameters or prior self-citations by construction. The central claims rest on direct measurement of the proposed online selection policy versus independent baselines, making the derivation self-contained rather than tautological. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; UCOSA decisions likely rely on implicit heuristics for utility estimation and relevance scoring that function as unstated parameters.

pith-pipeline@v0.9.0 · 5676 in / 1270 out tokens · 81739 ms · 2026-05-07T15:08:46.276846+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 9 canonical work pages · 4 internal anchors

  1. [1]

    Language mod- els are few-shot learners,

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language mod- els are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

  2. [2]

    GPT-4 Technical Report

    J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023

  3. [3]

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    M. Reid, N. Savinov, D. Teplyashin, D. Lepikhin, T. Lillicrap, J.-b. Alayrac, R. Soricut, A. Lazaridou, O. Firat, J. Schrittwieseret al., “Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context,”arXiv preprint arXiv:2403.05530, 2024

  4. [4]

    Palm: Scal- ing language modeling with pathways,

    A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmannet al., “Palm: Scal- ing language modeling with pathways,”Journal of Machine Learning Research, vol. 24, no. 240, pp. 1–113, 2023

  5. [5]

    Harnessing the power of llms in practice: A survey on chatgpt and beyond,

    J. Yang, H. Jin, R. Tang, X. Han, Q. Feng, H. Jiang, S. Zhong, B. Yin, and X. Hu, “Harnessing the power of llms in practice: A survey on chatgpt and beyond,”ACM Transactions on Knowledge Discovery from Data, vol. 18, no. 6, pp. 1–32, 2024. 16

  6. [6]

    Survey of hallucination in natural language generation,

    Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y . Xu, E. Ishii, Y . J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,”ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, 2023

  7. [7]

    A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

    L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qinet al., “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,” arXiv preprint arXiv:2311.05232, 2023

  8. [8]

    How language model hallucinations can snowball,

    M. Zhang, O. Press, W. Merrill, A. Liu, and N. A. Smith, “How language model hallucinations can snowball,” inForty-first International Conference on Machine Learning, 2024. [Online]. Available: https://openreview.net/forum?id=FPlaQyAGHu

  9. [9]

    Retrieval augmented language model pre-training,

    K. Guu, K. Lee, Z. Tung, P. Pasupat, and M. Chang, “Retrieval augmented language model pre-training,” inInternational conference on machine learning. PMLR, 2020, pp. 3929–3938

  10. [10]

    Retrieval- augmented generation for knowledge-intensive nlp tasks,

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020

  11. [11]

    as-a Service

    V . as-a Service. Vectara RAG-as-a-Service . Accessed: 2024-08-07. [Online]. Available: https://vectara.com/retrieval-augmented-generation/

  12. [12]

    L. Cloud. LlamaIndex Cloud . Accessed: 2024-08-07. [Online]. Available: https://docs.llamaindex.ai/en/stable/llama cloud/

  13. [13]

    RAG-as-a-Service,

    Nuclia, “RAG-as-a-Service,” 2024, [Online; accessed 01-June-2024]. [Online]. Available: https://nuclia.com/rag-as-a-service/

  14. [14]

    RAG-as-a-Service,

    Geniusee, “RAG-as-a-Service,” 2024, [Online; accessed 02-June-2024]. [Online]. Available: https://geniusee.com/ retrieval-augmented-generation

  15. [15]

    RAG-as-a-Service,

    Amazon Bedrock, “RAG-as-a-Service,” 2024, [Online; accessed 02- June-2024]. [Online]. Available: https://aws.amazon.com/bedrock/

  16. [16]

    Retrieval meets long context large language models,

    P. Xu, W. Ping, X. Wu, L. McAfee, C. Zhu, Z. Liu, S. Subramanian, E. Bakhturina, M. Shoeybi, and B. Catanzaro, “Retrieval meets long context large language models,” inThe Twelfth International Conference on Learning Representations

  17. [17]

    How much knowledge can you pack into the parameters of a language model?arXiv preprint arXiv:2002.08910,

    A. Roberts, C. Raffel, and N. Shazeer, “How much knowledge can you pack into the parameters of a language model?”arXiv preprint arXiv:2002.08910, 2020

  18. [18]

    Teaching Language Models to Support Answers with Verified Quotes.CoRR, abs/2203.11147,

    J. Menick, M. Trebacz, V . Mikulik, J. Aslanides, F. Song, M. Chad- wick, M. Glaese, S. Young, L. Campbell-Gillingham, G. Irvinget al., “Teaching language models to support answers with verified quotes,” arXiv preprint arXiv:2203.11147, 2022

  19. [19]

    Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation,

    L. Kuhn, Y . Gal, and S. Farquhar, “Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation,” inThe Eleventh International Confer- ence on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=VD-AYtP0dve

  20. [20]

    Detecting hallucinations in large language models using semantic entropy,

    S. Farquhar, J. Kossen, L. Kuhn, and Y . Gal, “Detecting hallucinations in large language models using semantic entropy,”Nature, vol. 630, no. 8017, pp. 625–630, 2024

  21. [21]

    Training language models on the knowledge graph: Insights on hallucinations and their detectability,

    J. Hron, L. Culp, G. Elsayed, R. Liu, B. Adlam, M. Bileschi, B. Bohnet, J. Co-Reyes, N. Fiedel, C. D. Freemanet al., “Training language models on the knowledge graph: Insights on hallucinations and their detectability,”arXiv preprint arXiv:2408.07852, 2024

  22. [22]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Y . Gao, Y . Xiong, X. Gao, K. Jia, J. Pan, Y . Bi, Y . Dai, J. Sun, and H. Wang, “Retrieval-augmented generation for large language models: A survey,”arXiv preprint arXiv:2312.10997, 2023

  23. [23]

    Seven failure points when engineering a retrieval augmented generation system,

    S. Barnett, S. Kurniawan, S. Thudumu, Z. Brannelly, and M. Abdelrazek, “Seven failure points when engineering a retrieval augmented generation system,” inProceedings of the IEEE/ACM 3rd International Conference on AI Engineering-Software Engineering for AI, 2024, pp. 194–199

  24. [24]

    Benchmarking large language models in retrieval-augmented generation,

    J. Chen, H. Lin, X. Han, and L. Sun, “Benchmarking large language models in retrieval-augmented generation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 754– 17 762

  25. [25]

    Vectara Price Plan

    Vectara. Vectara Price Plan. Accessed: 2024-09-01. [Online]. Available: https://vectara.com/pricing/

  26. [26]

    LlamaIndex

    C. LlamaIndex. Cloud LlamaIndex Price Plan. Accessed: 2024-09-01. [Online]. Available: https://docs.cloud.llamaindex.ai/llamaparse/usage data

  27. [27]

    Nuclia Price Plan

    Nuclia. Nuclia Price Plan. Accessed: 2024-09-01. [Online]. Available: https://nuclia.com/pricing/

  28. [28]

    Mba-rag: a bandit approach for adaptive retrieval-augmented generation through question complexity,

    X. Tang, Q. Gao, J. Li, N. Du, Q. Li, and S. Xie, “Mba-rag: a bandit approach for adaptive retrieval-augmented generation through question complexity,” inProceedings of the 31st International Conference on Computational Linguistics, 2025, pp. 3248–3254

  29. [29]

    Dynamic resource allocation in cloud computing: anal- ysis and taxonomies,

    A. Belgacem, “Dynamic resource allocation in cloud computing: anal- ysis and taxonomies,”Computing, vol. 104, no. 3, pp. 681–710, 2022

  30. [30]

    A cost-aware mechanism for optimized resource provisioning in cloud computing,

    S. Ghasemi, M. R. Meybodi, M. D. T. Fooladi, and A. M. Rahmani, “A cost-aware mechanism for optimized resource provisioning in cloud computing,”Cluster Computing, vol. 21, no. 2, pp. 1381–1394, 2018

  31. [31]

    A comprehensive analysis of cloud service models: Iaas, paas, and saas in the context of emerging technologies and trend,

    R. Younis, M. Iqbal, K. Munir, M. A. Javed, M. Haris, and S. Alahmari, “A comprehensive analysis of cloud service models: Iaas, paas, and saas in the context of emerging technologies and trend,” in2024 international conference on electrical, communication and computer engineering (ICECCE). IEEE, 2024, pp. 1–6

  32. [32]

    Auto- nomic cloud resource provisioning and scheduling using meta-heuristic algorithm,

    M. Kumar, S. C. Sharma, S. Goel, S. K. Mishra, and A. Husain, “Auto- nomic cloud resource provisioning and scheduling using meta-heuristic algorithm,”Neural Computing and Applications, vol. 32, no. 24, pp. 18 285–18 303, 2020

  33. [33]

    Scorerag: A retrieval-augmented generation framework with consistency-relevance scoring and structured summa- rization for news generation,

    P.-Y . Lin and Y .-l. Tsai, “Scorerag: A retrieval-augmented generation framework with consistency-relevance scoring and structured summa- rization for news generation,”arXiv preprint arXiv:2506.03704, 2025

  34. [34]

    Rear: A relevance-aware retrieval-augmented framework for open-domain ques- tion answering,

    Y . Wang, R. Ren, J. Li, W. X. Zhao, J. Liu, and J.-R. Wen, “Rear: A relevance-aware retrieval-augmented framework for open-domain ques- tion answering,”arXiv preprint arXiv:2402.17497, 2024

  35. [35]

    (2025) Aws pricing

    Amazon. (2025) Aws pricing. [Online]. Available: https://aws.amazon. com/pricing/

  36. [36]

    Haystack for Buidling RAG Piplelines

    Haystack. Haystack for Buidling RAG Piplelines. Accessed: 2025-08-

  37. [37]

    Available: https://haystack.deepset.ai/overview/intro

    [Online]. Available: https://haystack.deepset.ai/overview/intro

  38. [38]

    LlamaIndex Vector Store

    LlamaIndex. LlamaIndex Vector Store . Accessed: 2025-08-08. [Online]. Available: https://tinyurl.com/LlamaIndex-Vector-Store

  39. [39]

    M. FAISS. META FAISS Vector Database. Accessed: 2025-08-08. [Online]. Available: https://faiss.ai/index.html

  40. [40]

    Pinecone Vector Database

    Pinecone. Pinecone Vector Database. Accessed: 2025-08-08. [Online]. Available: https://www.pinecone.io/

  41. [41]

    Chroma Vector Database

    Chroma. Chroma Vector Database. Accessed: 2025-08-08. [Online]. Available: https://docs.trychroma.com/

  42. [42]

    E. Stack. Elastic Search Relevance Engine. Accessed: 2025- 08-08. [Online]. Available: https://www.elastic.co/elasticsearch/ elasticsearch-relevance-engine

  43. [43]

    OpenZeppelin Smart Contract

    OpenZeppelin. OpenZeppelin Smart Contract. Accessed: 2025- 08-09. [Online]. Available: https://github.com/OpenZeppelin/ openzeppelin-contracts

  44. [44]

    Consensys Smart Contract

    Consensys. Consensys Smart Contract. Accessed: 2025-08-09. [Online]. Available: https://github.com/Consensys

  45. [45]

    Zuora Revenue Sharing

    Zuora. Zuora Revenue Sharing. Accessed: 2025-08-09. [Online]. Available: https://www.zuora.com/

  46. [46]

    Chargebee Revenue Sharing

    Chargebee. Chargebee Revenue Sharing. Accessed: 2025-08-09. [Online]. Available: https://www.chargebee.com/

  47. [47]

    Lamaindex RAG Framework

    Llamaindex. Lamaindex RAG Framework. Accessed: 2024-05-06. [Online]. Available: https://docs.llamaindex.ai/en/stable/

  48. [48]

    ChatGPT 3.5

    OpenAI. ChatGPT 3.5. Accessed: 2024-05-06. [Online]. Available: https://www.chatgpt.com/

  49. [49]

    LlamaIndex Embedding

    LlamaIndex. LlamaIndex Embedding . Accessed: 2024-08-07. [Online]. Available: https://tinyurl.com/LlamaIndex-Embedding

  50. [50]

    Raptor: Recursive abstractive processing for tree-organized retrieval,

    P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “Raptor: Recursive abstractive processing for tree-organized retrieval,” inThe Twelfth International Conference on Learning Repre- sentations, 2024

  51. [51]

    Retrieval augmented generation or long-context llms? a comprehensive study and hybrid approach,

    Z. Li, C. Li, M. Zhang, Q. Mei, and M. Bendersky, “Retrieval augmented generation or long-context llms? a comprehensive study and hybrid approach,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, 2024, pp. 881–893

  52. [52]

    (2025) Llamaindex similarity metrics

    LlamaIndex. (2025) Llamaindex similarity metrics. [Online]. Available: https://tinyurl.com/m4p9ucus

  53. [53]

    LlamaIndex Cosine Similarity

    ——. LlamaIndex Cosine Similarity . Accessed: 2025-09-09. [Online]. Available: https://tinyurl.com/LlamaIndex-Cosine-Similarity

  54. [54]

    (2025) Llamaindex similarity metrics

    ScienceDirect. (2025) Llamaindex similarity metrics. [Online]. Available: https://www.sciencedirect.com/topics/computer-science/ cosine-similarity

  55. [55]

    (2025) LlamaIndex Chunk Size Evaluation

    LlamaIndex. (2025) LlamaIndex Chunk Size Evaluation. [Online]. Available: https://www.llamaindex.ai/blog/ evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5

  56. [56]

    Chen and C.-J

    P.-Y . Chen and C.-J. Hsieh,Adversarial robustness for machine learning. Academic Press, 2022

  57. [57]

    Warr,Strengthening deep neural networks: Making AI less susceptible to adversarial trickery

    K. Warr,Strengthening deep neural networks: Making AI less susceptible to adversarial trickery. O’Reilly Media, 2019

  58. [58]

    Christian and T

    B. Christian and T. Griffiths,Algorithms to live by: The computer science of human decisions. Macmillan, 2016. 17 Shawqi Al-Malikiis currently a postdoctoral re- searcher at Hamad Bin Khalifa University (HBKU), Doha, Qatar. He received his M.Sc. degree from King Fahd University of Petroleum and Minerals (KFUPM) in Dhahran, Saudi Arabia, in 2018, and his P...

  59. [59]

    Prior to his academic career, Dr

    His research interests span adversarial robust- ness, LLMs security, privacy, reliability, and human- LLM alignment. Prior to his academic career, Dr. Al- Maliki held positions as a Full-Stack Web Developer & Project Manager at Alyaum Media House (2009-

  60. [60]

    and a computer instructor at New Horizon Institute (2006-2008), both in Saudi Arabia. Dr. Ammar Gharaibehis an Associate Professor with the Computer Engineering department of Ger- man Jordanian University, and currently serving as the Acting Dean of the School of Computing at GJU. He received his Ph.D. degree in Computer Engineering from New Jersey Instit...