arxiv: 2605.13221 · v1 · submitted 2026-05-13 · 💻 cs.AI · cs.LG

Recognition: no theorem link

An Agentic AI Framework with Large Language Models and Chain-of-Thought for UAV-Assisted Logistics Scheduling with Mobile Edge Computing

Dusit Niyato, Hanwen Zhang, Malcolm Yoke Hean Low, Wei Zhang, Xin Lou

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:06 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords UAV schedulingmobile edge computingagentic AIlarge language modelschain-of-thought reasoninghierarchical reinforcement learninghybrid logistics optimization

0 comments

The pith

An agentic AI framework with large language models and chain-of-thought reasoning produces consistent mathematical formulations for hybrid UAV logistics and mobile edge computing scheduling, solved via hierarchical proximal policy优化.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an agentic AI system can reliably convert natural language descriptions of a coupled physical-computational scheduling problem into correct optimization models. UAVs must simultaneously collect finished products from manufacturing stations and handle computational tasks from sensors, either locally, onboard, or offloaded to the cloud, with routing decisions directly shaping service windows, energy use, and task deadlines. A hierarchical reinforcement learning approach then decomposes the problem so the upper layer selects routes while the lower layer allocates resources slot by slot. Simulations indicate the resulting formulations are more consistent than manual ones and that the hierarchical PPO policy collects every product in 99.6 percent of late training episodes while meeting all deadlines. This integration matters because it automates model construction for problems where physical movement and computation are tightly interdependent.

Core claim

The agentic AI component, built from large language models, retrieval-augmented generation, and chain-of-thought reasoning, translates user input into an interpretable mathematical formulation of the hybrid scheduling problem; the hierarchical PPO algorithm then learns UAV routing decisions in its upper layer and per-slot task execution plus resource allocation in its lower layer, delivering consistent formulations together with 99.6 percent full product collection and 100 percent deadline satisfaction over the final 500 episodes while exhibiting greater stability than advantage actor-critic.

What carries the argument

Agentic AI pipeline that uses large language models with retrieval-augmented generation and chain-of-thought to generate the mathematical formulation, paired with a two-layer hierarchical PPO in which the upper layer optimizes routing and the lower layer optimizes task execution and resource allocation under deadline and energy constraints.

If this is right

Routing decisions simultaneously determine both physical product collection success and the availability of UAV-assisted computational offloading windows.
The hierarchical decomposition allows the upper layer to focus on long-term route planning while the lower layer enforces short-term deadline and energy constraints.
The framework maintains 100 percent deadline satisfaction for computational tasks even while achieving near-perfect product collection.
Performance remains more stable across training episodes than the advantage actor-critic baseline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same agentic formulation step could be applied to other domains where physical routing and computational offloading must be jointly scheduled, such as autonomous vehicle fleets with onboard AI inference.
If the generated formulations prove robust across varied natural-language inputs, the approach could reduce reliance on optimization experts when setting up new hybrid logistics problems.
Real-world flight tests with actual UAV energy models and sensor task traces would reveal whether the simulated 99.6 percent collection rate holds under wind, battery degradation, and communication latency.

Load-bearing premise

The agentic AI component reliably produces correct and complete mathematical formulations from user input without introducing errors or omissions that would invalidate the subsequent optimization.

What would settle it

Run the agentic AI on a new set of user inputs that describe the same problem in different wording and check whether the generated formulations contain incorrect constraints or missing variables; separately, retrain the hierarchical PPO on identical simulation parameters and observe whether product collection falls below 95 percent or any deadline is violated in the final training episodes.

Figures

Figures reproduced from arXiv: 2605.13221 by Dusit Niyato, Hanwen Zhang, Malcolm Yoke Hean Low, Wei Zhang, Xin Lou.

**Figure 2.** Figure 2: An overview of the proposed agentic AI framework. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Results from the proposed agentic AI framework. [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Upper-layer DRL training results with PPO: Total rewards and [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Lower-layer DRL training results with PPO and A2C: Total rewards [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

read the original abstract

In cloud manufacturing, unmanned aerial vehicles (UAVs) can support both product collection and mobile edge computing (MEC). This joint operation forms a hybrid scheduling problem, where physical logistics decisions are coupled with computational task scheduling. In this paper, UAVs collect finished products from manufacturing stations and transport them back to a central depot. Meanwhile, computational tasks generated by industrial sensor devices at these stations are processed locally, at UAVs, or offloaded via UAVs to the cloud. This coupling makes the problem challenging. A UAV can provide MEC services only during its service window at a station, so routing decisions directly determine when UAV-assisted offloading is available. Routing decisions also affect the UAV energy budget and the availability of onboard computing and communication resources for computational task execution under task deadline constraints. To address this, we propose an agentic-AI-assisted optimization framework with two components. First, we develop an agentic AI that combines large language models, retrieval-augmented generation, and chain-of-thought reasoning to translate user input into an interpretable mathematical formulation for the hybrid scheduling problem. Second, we design a hierarchical deep reinforcement learning approach based on proximal policy optimization (PPO), where the upper layer learns UAV routing and the lower layer optimizes per-slot task execution and resource allocation. Simulation results show that the proposed framework yields more consistent formulations, while the hierarchical PPO achieves full product collection in 99.6% of the last 500 episodes and maintains a 100% deadline satisfaction rate, with more stable performance than the advantage actor-critic approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pairs an LLM agent to write the math program for coupled UAV routing and MEC scheduling, then solves it with hierarchical PPO, but provides no check that the generated formulations are correct.

read the letter

The main contribution is the specific combination of an agentic LLM pipeline (with RAG and chain-of-thought) that turns user input into a mathematical formulation, followed by a two-layer PPO where the upper layer handles UAV routes and the lower layer manages per-slot task execution and resource allocation under the resulting constraints. The coupling they target is real: routing decisions set the service windows for offloading, which directly limits energy, onboard compute, and deadline feasibility. The simulation claims show the hierarchical agent reaching 99.6% full collection and 100% deadline satisfaction in the final episodes while outperforming advantage actor-critic on stability. That part is straightforward to understand and addresses a practical industrial niche in cloud manufacturing logistics plus edge computing. The soft spot is exactly the one the stress test flags. The abstract reports no quantitative validation of the LLM-generated formulations—no error rates on omitted constraints, incorrect variable couplings, or incomplete objective functions across varied inputs. If the agentic step sometimes produces an incomplete or inconsistent program, the PPO is optimizing the wrong problem and the performance numbers lose their meaning. The abstract also omits baseline details, run counts, and statistical tests, so the stability advantage is hard to weigh. This paper is for people already working on UAV-assisted hybrid scheduling or on LLM-assisted modeling pipelines. It is incremental rather than foundational, but the concrete problem setting is clear enough that a referee could usefully press for validation experiments on the formulation step. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes an agentic-AI framework that combines LLMs, RAG, and chain-of-thought reasoning to automatically translate user inputs into mathematical formulations of a hybrid UAV logistics and MEC scheduling problem (product collection coupled with computational task offloading under routing, energy, and deadline constraints). A hierarchical PPO solver is then applied, with an upper layer for UAV routing and a lower layer for per-slot task execution and resource allocation. Simulation results are reported to show more consistent formulations than baselines, with the hierarchical PPO achieving 99.6% full product collection and 100% deadline satisfaction over the final 500 episodes while exhibiting greater stability than advantage actor-critic.

Significance. If the agentic-AI component can be shown to produce correct and complete optimization models at scale, the work would offer a practical route to automating the modeling of tightly coupled physical-computational scheduling problems. The hierarchical PPO results demonstrate stable training behavior in simulation, which is a concrete strength. However, the absence of any quantitative validation of formulation accuracy makes the performance claims conditional on an untested assumption, limiting the immediate impact.

major comments (2)

[Abstract] The headline performance figures (99.6% full collection, 100% deadline satisfaction) are obtained by training hierarchical PPO on the mathematical program emitted by the agentic AI. No quantitative check—such as error rates on constraint sets, objective functions, variable definitions, or coupling terms between routing windows and MEC availability—is reported across varied user inputs. Without this validation, it is impossible to determine whether the DRL results optimize the intended problem or an incorrect one.
[Abstract] The claim that the framework 'yields more consistent formulations' is stated without accompanying metrics (e.g., syntactic correctness rate, constraint completeness score, or comparison against expert-generated models). This metric is central to the first component of the contribution and must be supplied with explicit evaluation protocol and test cases.

minor comments (2)

[Abstract] The abstract supplies no information on simulation parameters (number of stations, UAV fleet size, task arrival rates, episode length, or statistical significance testing), which are required to interpret the reported percentages.
Clarify whether the hierarchical PPO is trained on a single fixed formulation or on a distribution of LLM-generated formulations; the current wording leaves this ambiguous.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We agree that quantitative validation of the agentic AI formulations is required to support the performance claims and the consistency assertion, and we will add the necessary evaluations in the revision.

read point-by-point responses

Referee: [Abstract] The headline performance figures (99.6% full collection, 100% deadline satisfaction) are obtained by training hierarchical PPO on the mathematical program emitted by the agentic AI. No quantitative check—such as error rates on constraint sets, objective functions, variable definitions, or coupling terms between routing windows and MEC availability—is reported across varied user inputs. Without this validation, it is impossible to determine whether the DRL results optimize the intended problem or an incorrect one.

Authors: We agree that the absence of quantitative validation of formulation accuracy is a limitation. The reported DRL metrics assume the agentic AI produces correct models, but this was not explicitly measured. In the revised manuscript we will add a dedicated evaluation subsection that reports error rates on constraint sets, objective functions, variable definitions, and coupling terms across a benchmark of varied user inputs, using direct comparison to expert-generated reference models. The test-case generation protocol will also be described. revision: yes
Referee: [Abstract] The claim that the framework 'yields more consistent formulations' is stated without accompanying metrics (e.g., syntactic correctness rate, constraint completeness score, or comparison against expert-generated models). This metric is central to the first component of the contribution and must be supplied with explicit evaluation protocol and test cases.

Authors: We acknowledge that the consistency claim currently lacks supporting metrics. The manuscript presents the improvement qualitatively. We will revise the abstract to qualify the claim and add a new subsection that supplies the requested metrics (syntactic correctness rate, constraint completeness score) together with an explicit evaluation protocol and the set of test cases used for comparison against expert models. revision: yes

Circularity Check

0 steps flagged

No circularity: performance metrics arise from independent simulation, not tautological re-expression

full rationale

The paper's core claims rest on two sequential but non-circular steps: (1) an agentic LLM+RAG+CoT pipeline that emits a mathematical program from user text, and (2) hierarchical PPO trained on that program, whose reported success rates (99.6 % full collection, 100 % deadline satisfaction) are measured by forward rollout in simulation. Neither step reduces to its own inputs by construction; the formulation generator is not fitted to the downstream metrics, and the PPO policy is optimized against an externally supplied objective rather than being redefined to match observed outcomes. No self-citation is invoked as a uniqueness theorem that forces the architecture, and no ansatz or renaming of known results is presented as a derivation. The absence of quantitative validation for formulation correctness is a separate empirical gap, not a circularity in the reported chain.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Review limited to abstract; ledger entries are inferred at high level from described components. The hierarchical decomposition and LLM reliability are treated as given.

free parameters (1)

PPO hyperparameters and layer sizes
Standard DRL tuning parameters required to achieve the reported collection and deadline rates.

axioms (2)

domain assumption The hybrid scheduling problem admits a clean hierarchical decomposition into routing and per-slot resource allocation without loss of optimality.
Invoked by the two-layer PPO design.
ad hoc to paper LLM-generated formulations are sufficiently accurate to serve as optimization models.
Central to the first component of the framework.

pith-pipeline@v0.9.0 · 5605 in / 1286 out tokens · 58430 ms · 2026-05-14T20:06:37.725968+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 10 canonical work pages · 1 internal anchor

[1]

Masc: Large language model-based multi-agent scheduling chain for flexible job shop scheduling problem,

Z. Wang, C. Wan, J. Liuet al., “Masc: Large language model-based multi-agent scheduling chain for flexible job shop scheduling problem,” Advanced Engineering Informatics, vol. 67, p. 103527, 2025

2025
[2]

Large language model-empowered dynamic scheduling for intelligent hybrid flow shop using multi-agent deep reinforcement learning,

W. Gu, Y . Cao, Y . Liet al., “Large language model-empowered dynamic scheduling for intelligent hybrid flow shop using multi-agent deep reinforcement learning,”Advanced Engineering Informatics, vol. 71, p. 104294, 2026

2026
[3]

Graphthought: Graph combinatorial optimization with thought generation,

Z. Huang, L. Guo, J. Shenget al., “Graphthought: Graph combinatorial optimization with thought generation,”arXiv:2502.11607, 2025

work page arXiv 2025
[4]

Llms can schedule,

H. Abgaryan, A. Harutyunyan, and T. Cazenave, “Llms can schedule,” arXiv:2408.06993, 2024

work page arXiv 2024
[5]

A large language model- based multi-agent manufacturing system for intelligent shopfloor,

Z. Zhao, D. Tang, H. Zhuet al., “A large language model- based multi-agent manufacturing system for intelligent shopfloor,” arXiv:2405.16887, 2024

work page arXiv 2024
[6]

Cloud manufacturing:a new service-oriented networked manufacturing model,

L. Bo-hu, Z. Lin, W. Shi-longet al., “Cloud manufacturing:a new service-oriented networked manufacturing model,”Computer Integrated Manufacturing System, vol. 16, no. 01, pp. 0–0, 2010

2010
[7]

Hybrid task scheduling in cloud manufacturing with sparse-reward deep reinforcement learning,

X. Wang, Y . Laili, L. Zhanget al., “Hybrid task scheduling in cloud manufacturing with sparse-reward deep reinforcement learning,”IEEE Transactions on Automation Science and Engineering, vol. 22, pp. 1878–1892, 2025

2025
[8]

Dl-drl: A double-level deep reinforce- ment learning approach for large-scale task scheduling of multi-uav,

X. Mao, G. Wu, M. Fanet al., “Dl-drl: A double-level deep reinforce- ment learning approach for large-scale task scheduling of multi-uav,” IEEE Transactions on Automation Science and Engineering, vol. 22, pp. 1028–1044, 2025

2025
[9]

Real-time scheduling for flexible job shop with agvs using multiagent reinforcement learning and efficient action decoding,

Y . Li, Q. Wang, X. Liet al., “Real-time scheduling for flexible job shop with agvs using multiagent reinforcement learning and efficient action decoding,”IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 3, pp. 2120–2132, 2025

2025
[10]

J C5a: Service delay minimization for aerial mec-assisted industrial cyber-physical systems,

G. Sun, J. Wu, Z. Sunet al., “J C5a: Service delay minimization for aerial mec-assisted industrial cyber-physical systems,”IEEE Transactions on Services Computing, vol. 18, no. 5, pp. 2976–2993, 2025

2025
[11]

Optimizing 3d trajectory and task offloading in collaborative uav-enabled mobile edge computing networks,

L. Jiao, L. Gao, J. Zhenget al., “Optimizing 3d trajectory and task offloading in collaborative uav-enabled mobile edge computing networks,”Computer Networks, vol. 282, p. 112283, 2026

2026
[12]

Routing a fleet of unmanned aerial vehicles: A trajectory optimisation-based framework,

W. P. Coutinho, J. Fliege, M. Battarraet al., “Routing a fleet of unmanned aerial vehicles: A trajectory optimisation-based framework,” Transportation Research Part B: Methodological, vol. 200, p. 103312, 2025

2025
[13]

A dynamic drone routing problem with uncertain demand and energy consumption,

G. O. Chagas, L. C. Coelho, D. Lagan `aet al., “A dynamic drone routing problem with uncertain demand and energy consumption,” Transportation Research Part B: Methodological, vol. 202, p. 103335, 2025

2025
[14]

C-sppo: A deep reinforcement learning framework for large-scale dynamic logistics uav routing problem,

F. W ANG, H. ZHANG, S. DUet al., “C-sppo: A deep reinforcement learning framework for large-scale dynamic logistics uav routing problem,”Chinese Journal of Aeronautics, vol. 38, no. 5, p. 103229, 2025

2025
[15]

Collaborative vessel–unmanned aerial vehicle routing for time-window-constrained offshore parcel delivery,

Y . Li, S. Wang, H. Sunet al., “Collaborative vessel–unmanned aerial vehicle routing for time-window-constrained offshore parcel delivery,” Transportation Research Part C: Emerging Technologies, vol. 178, p. 105189, 2025

2025
[16]

Drone routing problem for shore- to-ship delivery services considering non-linear energy consumption,

M. Wang, S. Chen, and Q. Meng, “Drone routing problem for shore- to-ship delivery services considering non-linear energy consumption,” Transportation Research Part B: Methodological, vol. 206, p. 103410, 2026

2026
[17]

Mobility-aware dependent task offloading in edge computing: A digital twin-assisted reinforcement learning approach,

X. Chen, J. Cao, Y . Sahniet al., “Mobility-aware dependent task offloading in edge computing: A digital twin-assisted reinforcement learning approach,”IEEE Transactions on Mobile Computing, vol. 24, no. 4, pp. 2979–2994, 2025

2025
[18]

Delay-sensitive task offloading with edge caching through martingale-based deep reinforcement learning,

C. Dong, W. Li, Z. Zhouet al., “Delay-sensitive task offloading with edge caching through martingale-based deep reinforcement learning,” IEEE Transactions on Mobile Computing, vol. 24, no. 7, pp. 6225– 6242, 2025. 15

2025
[19]

Decentralized task offloading in collaborative edge computing: A digital twin assisted multi-agent reinforcement learning approach,

X. Chen, J. Cao, R. Caoet al., “Decentralized task offloading in collaborative edge computing: A digital twin assisted multi-agent reinforcement learning approach,”IEEE Transactions on Mobile Computing, vol. 25, no. 4, pp. 4776–4790, 2026

2026
[20]

Joint offloading decision, user association, and resource allocation in hierarchical aerial computing: Collaboration of uavs and hap,

A. Nabi and S. Moh, “Joint offloading decision, user association, and resource allocation in hierarchical aerial computing: Collaboration of uavs and hap,”IEEE Transactions on Mobile Computing, vol. 24, no. 8, pp. 7267–7282, 2025

2025
[21]

Joint task offloading and resource allocation in ultra-dense multi-access edge computing: A mean field learning approach,

H. Gu, L. Zhao, Z. Hanet al., “Joint task offloading and resource allocation in ultra-dense multi-access edge computing: A mean field learning approach,”IEEE Transactions on Mobile Computing, vol. 25, no. 3, pp. 3598–3615, 2026

2026
[22]

Advancing generative artificial intelligence and large language models for demand side management with internet of electric vehicles,

H. Zhang, R. Zhang, W. Zhanget al., “Advancing generative artificial intelligence and large language models for demand side management with internet of electric vehicles,”IEEE Internet of Things Journal, pp. 1–1, 2026

2026
[23]

Ai agents and agentic ai–navigating a plethora of concepts for future manufacturing,

Y . Ren, Y . Liu, T. Jiet al., “Ai agents and agentic ai–navigating a plethora of concepts for future manufacturing,”Journal of Manufac- turing Systems, vol. 83, pp. 126–133, 2025

2025
[24]

Agentic ai for smart manufacturing,

J. Lee and H. Su, “Agentic ai for smart manufacturing,”Manufacturing Letters, vol. 46, pp. 92–96, 2025

2025
[25]

Application of retrieval-augmented generation for interactive industrial knowledge management via a large language model,

L.-C. Chen, M. S. Pardeshi, Y .-X. Liaoet al., “Application of retrieval-augmented generation for interactive industrial knowledge management via a large language model,”Computer Standards & Interfaces, vol. 94, p. 103995, 2025

2025
[26]

A4ps: Agentic ai-assisted advanced planning and scheduling with large language models for smart manufacturing,

M. Li, Q. Zhou, W. Liet al., “A4ps: Agentic ai-assisted advanced planning and scheduling with large language models for smart manufacturing,”Journal of Manufacturing Systems, vol. 85, pp. 207– 226, 2026

2026
[27]

Agentic data analysis for intelligent manufacturing: Benchmark-driven evaluation of agentic vs. direct llm approaches,

N. M. Farid, A. Taghizadeh, and S. Shafiee, “Agentic data analysis for intelligent manufacturing: Benchmark-driven evaluation of agentic vs. direct llm approaches,”Procedia CIRP, vol. 139, pp. 280–285, 2026, 13th CIRP Global Web Conference

2026
[28]

Drone on-demand delivery routing problem considering order splitting and battery swapping,

S. Li, T. Liao, G. Wuet al., “Drone on-demand delivery routing problem considering order splitting and battery swapping,”Computers & Industrial Engineering, vol. 208, p. 111388, 2025

2025
[29]

Design of a just-in-time periodic material supply system for the assembly lines and an application in electronics industry,

S. I. Satoglu and I. E. Sahin, “Design of a just-in-time periodic material supply system for the assembly lines and an application in electronics industry,”The International Journal of Advanced Manufacturing Technology, vol. 65, no. 1, pp. 319–332, 2013

2013
[30]

Embracing drones and the internet of drones systems in manufacturing – an exploration of obstacles,

D. Askerbekov, J. A. Garza-Reyes, R. Roy Ghataket al., “Embracing drones and the internet of drones systems in manufacturing – an exploration of obstacles,”Technology in Society, vol. 78, p. 102648, 2024

2024
[31]

Amr vs agv: A clear choice for flexible material handling,

J. Walker, “Amr vs agv: A clear choice for flexible material handling,” https://locusrobotics.com/blog/amr-vs-agv, 14 Jul 2022, accessed: 03 Jul 2025

2022
[32]

Unmanned aerial vehicles (uavs): practical aspects, applications, open challenges, security issues, and future trends,

S. A. H. Mohsan, N. Q. H. Othman, Y . Liet al., “Unmanned aerial vehicles (uavs): practical aspects, applications, open challenges, security issues, and future trends,”Intelligent Service Robotics, vol. 16, no. 1, pp. 109–137, 2023

2023
[33]

Cp-rag: Mitigating distracting content in retrieval-augmented generation for industrial knowledge question answering,

C. Wang, S. Chai, T. Xuet al., “Cp-rag: Mitigating distracting content in retrieval-augmented generation for industrial knowledge question answering,”IEEE Internet of Things Journal, vol. 13, no. 7, pp. 15 056–15 066, 2026

2026
[34]

Hybrid rag-empowered multimodal llm for secure data management in internet of medical things: A diffusion-based contract approach,

C. Su, J. Wen, J. Kanget al., “Hybrid rag-empowered multimodal llm for secure data management in internet of medical things: A diffusion-based contract approach,”IEEE Internet of Things Journal, vol. 12, no. 10, pp. 13 428–13 440, 2025

2025
[35]

Interactive ai with retrieval-augmented generation for next generation networking,

R. Zhang, H. Du, Y . Liuet al., “Interactive ai with retrieval-augmented generation for next generation networking,”IEEE Network, vol. 38, no. 6, pp. 414–424, 2024

2024
[36]

Enhancing retrieval-augmented generation with topic- enriched embeddings: A hybrid approach integrating traditional nlp techniques,

R. Kataishi, “Enhancing retrieval-augmented generation with topic- enriched embeddings: A hybrid approach integrating traditional nlp techniques,”Natural Language Processing Journal, vol. 14, p. 100200, 2026

2026
[37]

Cosine similarity – understanding the math and how it works (with python codes),

S. Prabhakaran, “Cosine similarity – understanding the math and how it works (with python codes),” https://machinelearningplus .com/nlp/ cosine-similarity/, 2026, accessed: 19 Mar 2026

2026
[38]

Retrieval-augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktuset al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsellet al., Eds., vol. 33. Curran Associates, Inc., 2020, pp. 9459–9474

2020
[39]

An advanced retrieval- augmented generation system for manufacturing quality control,

J. A. Heredia ´Alvaro and J. G. Barreda, “An advanced retrieval- augmented generation system for manufacturing quality control,” Advanced Engineering Informatics, vol. 64, p. 103007, 2025

2025
[40]

A retrieval augmented generation based optimization approach for medical knowledge understanding and reasoning in large language models,

Y . Wang, Y . Wan, X. Leiet al., “A retrieval augmented generation based optimization approach for medical knowledge understanding and reasoning in large language models,”Array, vol. 28, p. 100504, 2025

2025
[41]

Teleoracle: Fine-tuned retrieval-augmented generation with long-context support for net- works,

N. Alabbasi, O. Erak, O. Alhusseinet al., “Teleoracle: Fine-tuned retrieval-augmented generation with long-context support for net- works,”IEEE Internet of Things Journal, vol. 12, no. 10, pp. 13 170– 13 182, 2025

2025
[42]

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

Q. Chen, L. Qin, J. Liuet al., “Towards reasoning era: A survey of long chain-of-thought for reasoning large language models,” arXiv:2503.09567, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Interactive reasoning: Visualizing and controlling chain-of-thought reasoning in large language models,

R. Y . Pang, K. J. K. Feng, S. Fenget al., “Interactive reasoning: Visualizing and controlling chain-of-thought reasoning in large language models,”arXiv:2506.23678, 2025

work page arXiv 2025
[44]

Zero-shot verification-guided chain of thoughts,

J. R. Chowdhury and C. Caragea, “Zero-shot verification-guided chain of thoughts,”arXiv:2501.13122, 2025

work page arXiv 2025
[45]

Evaluating chain-of-thought reasoning through reusability and verifiability,

S. Aggarwal, R. V . Mishra, and A. Awekar, “Evaluating chain-of-thought reasoning through reusability and verifiability,” arXiv:2602.17544, 2026

work page arXiv 2026
[46]

Collab-rag: Boosting retrieval- augmented generation for complex question answering via white-box and black-box llm collaboration,

R. Xu, W. Shi, Y . Zhuanget al., “Collab-rag: Boosting retrieval- augmented generation for complex question answering via white-box and black-box llm collaboration,”arXiv:2504.04915, 2025

work page arXiv 2025
[47]

Ma-rag: Multi-agent retrieval- augmented generation via collaborative chain-of-thought reasoning,

T. Nguyen, P. Chin, and Y .-W. Tai, “Ma-rag: Multi-agent retrieval- augmented generation via collaborative chain-of-thought reasoning,” arXiv:2505.20096, 2025

work page arXiv 2025
[48]

Mars: toward more efficient multi-agent collaboration for llm reasoning,

X. Wang, J. Wang, Y . Wanget al., “Mars: toward more efficient multi-agent collaboration for llm reasoning,”arXiv:2509.20502, 2026

work page arXiv 2026
[49]

Towards evidence-aware retrieval- augmented generation via self-corrective chain-of-thought,

Y . Li, W. Ke, J. Liuet al., “Towards evidence-aware retrieval- augmented generation via self-corrective chain-of-thought,”Infor- mation Processing & Management, vol. 63, no. 2, Part A, p. 104369, 2026

2026
[50]

Reinforcement learning with priority decentralized ppo for multi-vessel cooperative rescue scheduling in flood disaster,

Y . Zhou, W. Yang, and Y . Gong, “Reinforcement learning with priority decentralized ppo for multi-vessel cooperative rescue scheduling in flood disaster,”Alexandria Engineering Journal, vol. 138, pp. 96–113, 2026

2026
[51]

Panda: Reinforcement learning-based priority assignment for multi-processor real-time scheduling,

H. Lee, J. Lee, I. Yeomet al., “Panda: Reinforcement learning-based priority assignment for multi-processor real-time scheduling,”IEEE Access, vol. 8, pp. 185 570–185 583, 2020

2020
[52]

Deep reinforcement learning for uav routing in the presence of multiple charging stations,

M. Fan, Y . Wu, T. Liaoet al., “Deep reinforcement learning for uav routing in the presence of multiple charging stations,”IEEE Transactions on Vehicular Technology, vol. 72, no. 5, pp. 5732–5746, 2023

2023
[53]

text-embedding-ada-002,

OpenAI, “text-embedding-ada-002,” https://developers.openai.com/ api/docs/models/text-embedding-ada-002, 2026, accessed: 03 Apr 2026

2026
[54]

What chroma offers,

Chroma, “What chroma offers,” https://docs .trychroma.com/docs/ overview/introduction, 2026, accessed: 03 Apr 2026

2026
[55]

Gpt-5.4,

OpenAI, “Gpt-5.4,” https://developers.openai.com/api/docs/models/ gpt-5.4, 2026, accessed: 03 Apr 2026

2026
[56]

Langgraph overview,

LangChain, “Langgraph overview,” https://docs .langchain.com/oss/ python/langgraph/overview, 2026, accessed: 03 Apr 2026

2026
[57]

Langchain overview,

——, “Langchain overview,” https://docs .langchain.com/oss/javascript/ langchain/overview#langchain-overview, 2026, accessed: 03 Apr 2026

2026