pith. machine review for the scientific record. sign in

arxiv: 2605.02162 · v1 · submitted 2026-05-04 · 💻 cs.DC · cs.MA

Recognition: unknown

AAFLOW: Scalable Patterns for Agentic AI Workflows

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:21 UTC · model grok-4.3

classification 💻 cs.DC cs.MA
keywords agentic workflowsdistributed runtimezero-copy data planeoperator abstractionscalabilitydata flow optimizationLLM systems
0
0 comments X

The pith

AAFLOW models agentic workflows as operators in a distributed runtime to deliver up to 4.64 times pipeline speedup through zero-copy data flows.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to fix scalability and reproducibility problems in agentic AI systems that combine retrieval, reasoning, and memory by replacing fragmented data orchestration with a formal execution model drawn from high-performance computing. It demonstrates that representing workflows as operators, paired with a zero-copy data plane, removes serialization overhead and improves batching and communication without any change to LLM inference speed. This approach would let large-scale agent deployments run more reliably and efficiently on distributed hardware by focusing gains on data movement rather than model computation.

Core claim

AAFLOW is a unified distributed runtime that creates communication-efficient execution plans by modeling agentic workflows as an operator abstraction. Using Apache Arrow and Cylon, it builds a zero-copy data plane for direct interoperability between preprocessing, embedding, and vector retrieval. Resource-deterministic scheduling and asynchronous batching reduce coordination costs, yielding up to 4.64 times pipeline speedup and 2.8 times gains in embedding and upsert phases while LLM generation throughput stays comparable; the gains come from better data flow, batching, and communication efficiency rather than faster inference.

What carries the argument

Operator abstraction of agentic workflows, realized through a zero-copy data plane built on Apache Arrow and Cylon plus resource-deterministic scheduling and asynchronous batching.

If this is right

  • Agentic workflows acquire a scalable and reproducible execution model that follows high-performance computing principles.
  • Data-intensive phases such as embedding and vector upsert improve by factors of 2.8 without any change to LLM generation speed.
  • Preprocessing, embedding, and retrieval stages interoperate directly without serialization costs.
  • Coordination overhead drops through deterministic resource scheduling and asynchronous batching in distributed settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same operator-plus-zero-copy pattern could be applied to other data-heavy AI pipelines such as fine-tuning or evaluation loops.
  • Standardizing on a common data plane might reduce fragmentation across different agent frameworks over time.
  • Testing the abstraction on workflows with highly variable or conditional branching would reveal how far the model generalizes.

Load-bearing premise

That agentic workflows can be effectively modeled as an operator abstraction without losing flexibility or that the experimental benchmarks represent typical real-world usage patterns.

What would settle it

A benchmark set of complex agentic workflows that cannot be expressed as operators without major loss of behavior, or real-world runs that show no measurable speedup once data patterns differ from the tested cases.

Figures

Figures reproduced from arXiv: 2605.02162 by Arup Kumar Sarker, Aymen Alsaadi, Geoffrey Fox, Gregor von Laszewski, Mills Staylor, Shantenu Jha.

Figure 1
Figure 1. Figure 1: End-to-End code flow of AAFLOW with memory operation. The RAG view at source ↗
Figure 2
Figure 2. Figure 2: Cylon Layered Architecture. From the bottom-up view, the Hardware view at source ↗
Figure 3
Figure 3. Figure 3: AAFLOW design incorporating a multilayered architecture with agentic view at source ↗
Figure 4
Figure 4. Figure 4: Advanced Async with Parallel Pipeline of AAFLOW. With batching view at source ↗
Figure 5
Figure 5. Figure 5: Framework benchmarking in each vital stage in the RAG pipeline view at source ↗
Figure 6
Figure 6. Figure 6: Strong scaling behavior across configurations for Load, Transform, view at source ↗
Figure 7
Figure 7. Figure 7: Weak scaling behavior across configurations for Load, Transform, view at source ↗
Figure 8
Figure 8. Figure 8: Strong and Weak scaling behavior across configurations for Total view at source ↗
read the original abstract

Agentic workflows in large language model systems integrate retrieval, reasoning, and memory, but existing frameworks suffer from scalability and reproducibility limitations due to fragmented data orchestration, serialization overhead, and non-deterministic execution. Although these frameworks increase flexibility, they don't have a formal execution model that adheres to the principles of high-performance computing. We introduce AAFLOW, a unified distributed runtime that creates communication-efficient execution plans by modeling agentic workflows as an operator abstraction. Using Apache Arrow and Cylon, AAFLOW creates a zero-copy data plane that allows direct interoperability between preprocessing, embedding, and vector retrieval without the need for serialization overhead. To lower coordination costs, it uses resource-deterministic scheduling and asynchronous batching. While retaining comparable LLM generation throughput, experimental results demonstrate up to 4.64 times pipeline speedup and 2.8 times gains in embedding and upsert phases. Rather than LLM inference acceleration, these advantages result from enhanced data flow, batching, and communication efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces AAFLOW, a unified distributed runtime for agentic AI workflows in LLM systems. It models these workflows as an operator abstraction to generate communication-efficient execution plans, leveraging Apache Arrow and Cylon for a zero-copy data plane that enables direct interoperability between preprocessing, embedding, and vector retrieval. Resource-deterministic scheduling and asynchronous batching are used to reduce coordination costs. The abstract claims that this yields up to 4.64 times pipeline speedup and 2.8 times gains in embedding and upsert phases while preserving LLM generation throughput, with advantages attributed solely to data flow, batching, and communication efficiency rather than inference acceleration.

Significance. If the performance claims and the operator abstraction hold under scrutiny, AAFLOW could represent a meaningful step toward scalable, reproducible agentic workflows by importing HPC-style execution models into LLM orchestration frameworks. The emphasis on zero-copy data planes and deterministic scheduling addresses real pain points in fragmented data handling. However, the absence of any experimental methodology, baselines, or analysis in the manuscript prevents a positive assessment of significance at present.

major comments (2)
  1. [Abstract] Abstract: The abstract asserts specific quantitative speedups (up to 4.64x pipeline and 2.8x in embedding/upsert phases) but supplies no experimental methodology, baselines, datasets, hardware configuration, error bars, or statistical details, rendering it impossible to evaluate whether the data support the central performance claims.
  2. [Abstract] Abstract: The operator abstraction is asserted to model agentic workflows without loss of flexibility, yet no description is provided of how dynamic, non-deterministic control flow (conditional branching, loops, and tool selection driven by LLM outputs) is expressed as operators or kept inside the zero-copy plane; if fallback mechanisms or additional coordination layers are required, the reported speedups would not generalize beyond static pipelines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript introducing AAFLOW. The feedback highlights important areas for improvement in presenting our experimental claims and clarifying the operator abstraction's handling of dynamic workflows. We will revise the manuscript to incorporate detailed methodology and expanded descriptions, as outlined in our point-by-point responses below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract asserts specific quantitative speedups (up to 4.64x pipeline and 2.8x in embedding/upsert phases) but supplies no experimental methodology, baselines, datasets, hardware configuration, error bars, or statistical details, rendering it impossible to evaluate whether the data support the central performance claims.

    Authors: We agree that the manuscript as submitted does not include sufficient detail on the experimental methodology to support the quantitative claims in the abstract. The reported speedups derive from our evaluations of data flow, batching, and communication efficiency using Apache Arrow and Cylon, but to enable proper scrutiny we will add a dedicated Experiments section in the revision. This section will specify the baselines (e.g., comparisons against LangChain, LlamaIndex, and AutoGen), datasets and workloads used, hardware configurations, number of runs with error bars, and statistical analysis. The performance gains are isolated to the data plane and scheduling components rather than inference acceleration. revision: yes

  2. Referee: [Abstract] Abstract: The operator abstraction is asserted to model agentic workflows without loss of flexibility, yet no description is provided of how dynamic, non-deterministic control flow (conditional branching, loops, and tool selection driven by LLM outputs) is expressed as operators or kept inside the zero-copy plane; if fallback mechanisms or additional coordination layers are required, the reported speedups would not generalize beyond static pipelines.

    Authors: The operator model in AAFLOW expresses dynamic control flow through composable control operators (e.g., conditional and loop operators) that receive runtime decisions from LLM outputs and schedule subsequent data operators asynchronously. These control signals are passed within the zero-copy Arrow data plane alongside batched data to avoid serialization. We acknowledge that extremely frequent or complex branching may introduce limited coordination overhead outside the pure data plane. In the revised manuscript we will add a dedicated subsection with pseudocode examples illustrating how non-deterministic flows are modeled and will explicitly discuss the scope of the reported speedups (primarily for data-intensive segments) along with any limitations for highly dynamic cases. revision: partial

Circularity Check

0 steps flagged

No circularity: purely experimental claims with no derivations or self-referential reductions

full rationale

The manuscript introduces AAFLOW as an operator-based runtime for agentic workflows and reports empirical speedups (4.64x pipeline, 2.8x embedding/upsert) from zero-copy data planes, batching, and scheduling. No equations, fitted parameters, uniqueness theorems, or ansatzes appear; the central claims rest on benchmark measurements rather than any derivation that reduces to its own inputs by construction. Self-citations are absent from the load-bearing sections, and the operator abstraction is presented as a design choice whose validity is tested externally via experiments, not presupposed. The work is therefore self-contained against external benchmarks with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based on abstract; the central claim relies on the assumption that workflows fit the operator model and that experimental gains are due to data flow improvements.

axioms (1)
  • domain assumption Agentic workflows integrate retrieval, reasoning, and memory and can be modeled as operator abstractions for execution planning.
    The paper states this as the foundation for creating communication-efficient execution plans.
invented entities (1)
  • AAFLOW no independent evidence
    purpose: A unified distributed runtime for agentic AI workflows with zero-copy data plane.
    Introduced as a new system in the paper; no external validation mentioned.

pith-pipeline@v0.9.0 · 5485 in / 1304 out tokens · 56177 ms · 2026-05-08T17:21:46.743789+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 31 canonical work pages

  1. [1]

    Rethinking testing for llm applications: Characteristics, challenges, and a lightweight interaction protocol,

    W. Ma, Y . Yang, Q. Hu, S. Ying, Z. Jin, B. Du, Z. Xing, T. Li, J. Shi, Y . Liuet al., “Rethinking testing for llm applications: Characteristics, challenges, and a lightweight interaction protocol,” 2025. [Online]. Available: https://arxiv.org/abs/2508.20737

  2. [2]

    What is llamaindex ?

    E. R. Vanna Winland, “What is llamaindex ?” IBM, Tech. Rep., April

  3. [3]

    Available: https://www .ibm.com/think/topics/llamaindex

    [Online]. Available: https://www .ibm.com/think/topics/llamaindex

  4. [4]

    Optimize vector databases, enhance rag-driven generative ai,

    M. B. Cathy Zhang, “Optimize vector databases, enhance rag-driven generative ai,” Intel, Tech. Rep., March 2024. [Online]. Available: https://medium .com/intel-tech/optimize-vector- databases-enhance-rag-driven-generative-ai-90c10416cb9c

  5. [5]

    Performance comparison of dask and apache spark on hpc systems for neuroimaging,

    M. Dugré, V . Hayot-Sasson, and T. Glatard, “Performance comparison of dask and apache spark on hpc systems for neuroimaging,” p. e7635,

  6. [6]

    Available: https://doi.org/10.1002/cpe.7635

    [Online]. Available: https://doi.org/10.1002/cpe.7635

  7. [7]

    Dataflow: An llm-driven framework for unified data preparation and workflow automation in the era of data-centric ai.arXiv preprint arXiv:2512.16676, 2025

    H. Liang, X. Ma, Z. Liu, Z. H. Wong, Z. Zhao, Z. Meng, R. He, C. Shen, Q. Cai, Z. Hanet al., “Dataflow: An llm-driven framework for unified data preparation and workflow automation in the era of data-centric ai,”arXiv preprint arXiv:2512.16676, 2025. [Online]. Available: https://arxiv.org/abs/2512.16676

  8. [8]

    High performance dataframes from parallel processing patterns,

    V . Abeykoon, P. Wickramasinghe, S. Kamburugamuve, H. Maithree, C. Widanage, N. Perera, T. A. Kanewala, A. Uyar, G. Gunduz, and G. Fox, “High performance dataframes from parallel processing patterns,” inParallel Processing and Applied Mathematics: 14th International Conference, PPAM 2022, Gdansk, Poland, September 11–14, 2022, Revised Selected Papers, Par...

  9. [9]

    On the reproducibility limitations of rag systems,

    B. Wang, D. Zhao, N. R. Tallent, and L. Guo, “On the reproducibility limitations of rag systems,” 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2509.18869

  10. [10]

    Simplify your rag application architecture with llamain- dex + postgresml,

    LlamaIndex, “Simplify your rag application architecture with llamain- dex + postgresml,” https://www .llamaindex.ai/blog/simplify-your-rag- application-architecture-with-llamaindex-postgresml, accessed: October 7, 2025

  11. [11]

    How to achieve 10x perfor- mance with vector database for llm using lancedb and pyarrow,

    R. K. Choudhary, “How to achieve 10x perfor- mance with vector database for llm using lancedb and pyarrow,” https://www .rishabhxchoudhary.com/blog/ How_to_Achieve_10x_Performance_with_Vector_Database_for_LLM_using_LanceDB_and_PyArrow, accessed: October 7, 2025

  12. [12]

    Radical-pilot and parsl: Executing heterogeneous workflows on hpc platforms,

    A. Alsaadi, L. Ward, A. Merzky, K. Chard, I. Foster, S. Jha, and M. Turilli, “Radical-pilot and parsl: Executing heterogeneous workflows on hpc platforms,” in2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS). IEEE, 2022, pp. 27–34. [Online]. Available: https://doi.org/10.1109/WORKS56498.2022.00009

  13. [13]

    Retrieval-augmented generation for knowledge-intensive nlp tasks,

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive nlp tasks,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates...

  14. [14]

    React: Synergizing reasoning and acting in language models,

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” inThe eleventh international conference on learning representations,

  15. [15]

    Available: https://openreview .net/pdf?id=WE_vluYUL-X

    [Online]. Available: https://openreview .net/pdf?id=WE_vluYUL-X

  16. [16]

    Reflexion: language agents with verbal reinforcement learning,

    N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: language agents with verbal reinforcement learning,” inProceedings of the 37th International Conference on Neural Information Processing Systems, ser. NIPS ’23. Red Hook, NY , USA: Curran Associates Inc., 2023. [Online]. Available: https://dl.acm.org/doi/10.5555/3666122.3666499

  17. [17]

    Retrieval-augmented generation for knowledge-intensive nlp tasks,

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive nlp tasks,” inProceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS ’20. Red Hook, NY , USA: Curran Associates I...

  18. [18]

    Accelerating embarrassingly parallel algorithm on intel mic,

    Q. Wang, J. Liu, X. Tang, F. Wang, G. Fu, and Z. Xing, “Accelerating embarrassingly parallel algorithm on intel mic,” in 2014 IEEE International Conference on Progress in Informatics and Computing, 2014, pp. 213–218. [Online]. Available: https: //doi.org/10.1109/PIC.2014.6972327

  19. [19]

    Ucx: An open source framework for hpc network apis and beyond,

    P. Shamis, M. G. Venkata, M. G. Lopez, M. B. Baker, O. Hernandez, Y . Itigin, M. Dubman, G. Shainer, R. L. Graham, L. Liss, Y . Shahar, S. Potluri, D. Rossetti, D. Becker, D. Poole, C. Lamb, S. Kumar, C. Stunkel, G. Bosilca, and A. Bouteiller, “Ucx: An open source framework for hpc network apis and beyond,” in2015 IEEE 23rd Annual Symposium on High-Perfor...

  20. [20]

    Mpi for python,

    L. Dalcin, R. Paz, and M. Storti, “Mpi for python,”Journal of Parallel and Distributed Computing, vol. 65, no. 9, pp. 1108–1115, Sep. 2005. [Online]. Available: https://doi.org/10.1016/j.jpdc.2005.03.010

  21. [21]

    High performance data engineering everywhere,

    C. Widanage, N. Perera, V . Abeykoon, S. Kamburugamuve, T. A. Kanewala, H. Maithree, P. Wickramasinghe, A. Uyar, G. Gunduz, and G. Fox, “High performance data engineering everywhere,” in2020 IEEE International Conference on Smart Data Services (SMDS). IEEE, 2020, pp. 122–132. [Online]. Available: https: //doi.org/10.1109/SMDS49396.2020.00022

  22. [22]

    In-depth analysis on parallel processing patterns for high-performance dataframes,

    N. Perera, A. K. Sarker, M. Staylor, G. von Laszewski, K. Shan, S. Kamburugamuve, C. Widanage, V . Abeykoon, T. A. Kanewela, and G. Fox, “In-depth analysis on parallel processing patterns for high-performance dataframes,”Future Generation Computer Systems, vol. 149, pp. 250–264, 2023. [Online]. Available: https: //doi.org/10.1016/j.future.2023.07.007

  23. [23]

    Deep rc: A scalable data engineering and deep learning pipeline,

    A. K. Sarker, A. Alsaadi, A. J. Halpern, P. Tangella, M. Titov, N. Perera, M. Staylor, G. von Laszewski, S. Jha, and G. Fox, “Deep rc: A scalable data engineering and deep learning pipeline,” inJob Scheduling Strategies for Parallel Processing: 28th International Workshop, JSSPP 2025, Milan, Italy, June 3–4, 2025, Revised Selected Papers. Berlin, Heidelbe...

  24. [25]

    Available: https://arxiv.org/abs/2512.20795

    [Online]. Available: https://arxiv.org/abs/2512.20795

  25. [26]

    Architecture,

    S. K. Niranda Perera, “Architecture,” Cylon, Tech. Rep., May 2022. [Online]. Available: https://cylondata.org/docs/arch/

  26. [27]

    Radical-cylon: A heterogeneous data pipeline for scientific computing,

    A. K. Sarker, A. Alsaadi, N. Perera, M. Staylor, G. von Laszewski, M. Turilli, O. O. Kilic, M. Titov, A. Merzky, S. Jha et al., “Radical-cylon: A heterogeneous data pipeline for scientific computing,” inJob Scheduling Strategies for Parallel Processing. Springer Nature Switzerland, 2024, pp. 84–102. [Online]. Available: https://doi.org/10.1007/978-3-031-74430-3_5

  27. [28]

    Design and performance characterization of radical-pilot on leadership-class platforms,

    A. Merzky, M. Turilli, M. Titov, A. Al-Saadi, and S. Jha, “Design and performance characterization of radical-pilot on leadership-class platforms,”IEEE Transactions on Parallel and amp; Distributed Systems, vol. 33, no. 04, pp. 818–829, apr 2022. [Online]. Available: https://doi.org/10.1109/TPDS.2021.3105994

  28. [29]

    Gloo: Collective communications library with various primitives for multi-machine training,

    Facebookincubator, “Gloo: Collective communications library with various primitives for multi-machine training,” Facebook, Tech. Rep., March 2023. [Online]. Available: https://github .com/facebookincubator/ gloo"

  29. [30]

    Combining serverless and high-performance computing paradigms to support ml data-intensive applications,

    M. Staylor, A. K. Sarker, G. von Laszewski, G. Fox, Y . Cheng, and J. Fox, “Combining serverless and high-performance computing paradigms to support ml data-intensive applications,”Frontiers in High Performance Computing, 2026

  30. [31]

    Towards XAI in the SOC – A User- Centric Study of Explainable Alerts with SHAP and LIME, in: 2022 IEEE International Conference on Big Data (Big Data), IEEE, Osaka, Japan

    K. Shan, N. Perera, D. Lenadora, T. Zhong, A. Kumar Sarker, S. Kamburugamuve, T. Amila Kanewela, C. Widanage, and G. Fox, “Hybrid cloud and hpc approach to high-performance dataframes,” in2022 IEEE International Conference on Big Data (Big Data), 2022, pp. 2728–2736. [Online]. Available: https://doi .org/10.1109/ BigData55660.2022.10020958

  31. [32]

    Supercharging distributed computing environments for high-performance data engineering,

    N. Perera, A. K. Sarker, K. Shan, A. Fetea, S. Kamburugamuve, T. A. Kanewala, C. Widanage, M. Staylor, T. Zhong, V . Abeykoon, G. von Laszewski, and G. Fox, “Supercharging distributed computing environments for high-performance data engineering,”Frontiers in High Performance Computing, vol. V olume 2 - 2024, 2024. [Online]. Available: https://doi.org/10.3...

  32. [33]

    Context rot: How increasing input tokens impacts llm performance,

    K. Hong, A. Troynikov, and J. Huber, “Context rot: How increasing input tokens impacts llm performance,” Chroma, Tech. Rep., July 2025. [Online]. Available: https://trychroma.com/research/context-rot

  33. [34]

    Generative benchmarking,

    K. Hong, A. Troynikov, J. Huber, and M. McGuire, “Generative benchmarking,” Chroma, Tech. Rep., April 2025. [Online]. Available: https://trychroma.com/research/generative-benchmarking

  34. [35]

    Billion-scale similarity search with GPUs

    J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with gpus,”IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2021. [Online]. Available: https://doi.org/10.1109/TBDATA.2019.2921572

  35. [36]

    Llamaindex: Data framework for connecting large language models to data,

    J. Liu, “Llamaindex: Data framework for connecting large language models to data,” LlamaIndex, Tech. Rep., 2023. [Online]. Available: https://github.com/jerryjliu/llama_index

  36. [37]

    Evaluating chunking strategies for retrieval,

    B. Smith and A. Troynikov, “Evaluating chunking strategies for retrieval,” Chroma, Tech. Rep., July 2024. [Online]. Available: https://trychroma.com/research/evaluating-chunking

  37. [38]

    Pinecone: Scalable vector database for machine learning applications,

    P. Developers, “Pinecone: Scalable vector database for machine learning applications,” Pinecone Systems Inc., Tech. Rep., May 2023. [Online]. Available: https://www.pinecone.io/

  38. [39]

    From rag to agents: Building intelligent systems with memory and tools,

    A. Karpathy, “From rag to agents: Building intelligent systems with memory and tools,” Eureka Labs, Tech. Rep., January 2022. [Online]. Available: https://karpathy.ai/agents-rag-memory

  39. [40]

    Building custom ai workflows using langchain tools,

    E. Davis, “Building custom ai workflows using langchain tools,” ThinkTide Global Research Journal, vol. 5, no. 4, pp. 54–62, 2024. [Online]. Available: https://thinktidejournal .com/index.php/TGRJ/article/ view/53/63

  40. [41]

    Langgraph: Stateful multi-agent workflows,

    L. Developer, “Langgraph: Stateful multi-agent workflows,” LangChain Inc, Tech. Rep., Jan 2024. [Online]. Available: https://blog.langchain.com/ langgraph-multi-agent-workflows

  41. [42]

    Exploration of llm multi- agent application implementation based on langgraph+ crewai,

    Z. Duan and J. Wang, “Exploration of llm multi- agent application implementation based on langgraph+ crewai,” arXiv preprint arXiv:2411.18241, 2024. [Online]. Available: https://doi.org/10.32388/R27SW4

  42. [43]

    [Online]

    CrewAI Team,Kickoff Crew Asynchronously, 2026, accessed: 2026-02-18. [Online]. Available: https://docs.crewai.com/en/learn/kickoff-async

  43. [44]

    Autogen: Enabling next-gen llm applications via multi-agent conversation,

    Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. E. Zhu, L. Jiang, X. Zhang, S. Zhang, A. Awadallah, R. W. White, D. Burger, and C. Wang, “Autogen: Enabling next-gen llm applications via multi-agent conversation,” inCOLM 2024, August 2024. [Online]. Available: https: //www.microsoft.com/en-us/research/publication/autogen-enabling- next-gen-llm-applications-v...

  44. [45]

    Ray: A distributed framework for emerging {AI} applications,

    P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordanet al., “Ray: A distributed framework for emerging {AI} applications,” in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 2018, pp. 561–577. [Online]. Available: https://www.usenix.org/system/files/osdi18-moritz.pdf

  45. [46]

    [Online]

    LlamaIndex Developers,Parallel Execution Ingestion Pipeline, LlamaIndex, 2024, accessed: 2026-02-13. [Online]. Available: https://developers .llamaindex.ai/python/examples/ingestion/ parallel_execution_ingestion_pipeline/

  46. [47]

    Dask: Parallel computation with blocked algorithms and task scheduling,

    M. Rocklin, “Dask: Parallel computation with blocked algorithms and task scheduling,” inProceedings of the 14th python in science conference, vol. 130. Citeseer, 2015, p. 136. [Online]. Available: https://proceedings.scipy.org/articles/Majora-7b98e3ed-013.pdf

  47. [48]

    Higress-rag: A holistic optimization framework for enterprise retrieval-augmented generation via dual hybrid retrieval, adaptive routing, and crag,

    W. Lin, “Higress-rag: A holistic optimization framework for enterprise retrieval-augmented generation via dual hybrid retrieval, adaptive routing, and crag,”arXiv preprint arXiv:2602.23374, 2025. [Online]. Available: https://arxiv.org/abs/2602.23374

  48. [49]

    Higress,

    Alibaba Cloud and the Higress Authors, “Higress,” https://github .com/ alibaba/higress, 2026, gitHub repository. Accessed: 2026-01-25

  49. [50]

    Apache spark: a unified engine for big data processing,

    M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica, “Apache spark: a unified engine for big data processing,”Commun. ACM, vol. 59, no. 11, p. 56–65, Oct

  50. [51]

    Apache Spark: A unified engine for big data processing,

    [Online]. Available: https://doi.org/10.1145/2934664

  51. [52]

    Apache flink: Stream and batch processing in a single engine,

    P. Carbone, A. Katsifodimos, S. Ewen, V . Markl, S. Haridi, and K. Tzoumas, “Apache flink: Stream and batch processing in a single engine,”The Bulletin of the Technical Committee on Data Engineering, vol. 38, no. 4, 2015. [Online]. Available: https://asterios.katsifodimos.com/assets/publications/flink-deb.pdf

  52. [53]

    Towards scalable dataframe systems,

    M. Petersohn, S. Macke, D. Xin, W. Ma, J. K. Wittenauer, S. Hoyer, R. Marcus, M. Zaharia, and B. Recht, “Towards scalable dataframe systems,”Proceedings of the VLDB Endowment (PVLDB), vol. 13, no. 12, pp. 2033–2046, 2020. [Online]. Available: https://doi.org/10.14778/3407790.3407807

  53. [54]

    Incremental perception on real time 3d data,

    A. K. Sarker and F. X. Lin, “Incremental perception on real time 3d data,” inProceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications, ser. HotMobile ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 68–73. [Online]. Available: https://doi.org/10.1145/3508396.3512875

  54. [55]

    Parsl: Pervasive parallel programming in python,

    Y . Babuji, A. Woodard, Z. Li, D. S. Katz, B. Clifford, R. Kumar, L. Lacinski, R. Chard, J. M. Wozniak, I. Foster, M. Wilde, and K. Chard, “Parsl: Pervasive parallel programming in python,” in Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, ser. HPDC ’19. New York, NY , USA: Association for Computing...

  55. [56]

    High-level messaging patterns,

    ZMQ, “High-level messaging patterns,” Zeromq, Tech. Rep., October

  56. [57]

    Available: https://zguide .zeromq.org/docs/chapter2/ #High-Level-Messaging-Patterns"

    [Online]. Available: https://zguide .zeromq.org/docs/chapter2/ #High-Level-Messaging-Patterns"

  57. [58]

    Pathways: Asynchronous distributed dataflow for ml,

    P. Barham, A. Chowdhery, J. Dean, S. Ghemawat, S. Hand, D. Hurt, M. Isard, H. Lim, R. Pang, S. Royet al., “Pathways: Asynchronous distributed dataflow for ml,”Proceedings of Machine Learning and Systems, vol. 4, pp. 430–449, 2022. [Online]. Available: https://proceedings .mlsys.org/paper_files/paper/ 2022/file/37385144cac01dff38247ab11c119e3c-Paper.pdf

  58. [59]

    45 Geng Zhang, Xuanlei Zhao, Kai Wang, and Yang You

    J. Yuan, X. Li, C. Cheng, J. Liu, R. Guo, S. Cai, C. Yao, F. Yang, X. Yi, C. Wuet al., “Oneflow: Redesign the distributed deep learning framework from scratch,”arXiv preprint arXiv:2110.15032, 2021. [Online]. Available: https://arxiv.org/abs/2110.15032

  59. [60]

    Dspy: Compiling declarative language model calls into self-improving pipelines,

    O. Khattab, A. Singhvi, P. Maheshwari, Z. Zhang, K. Santhanam, S. Vardhamanan, S. Haq, A. Sharma, T. T. Joshi, H. Moazam, H. Miller, M. Zaharia, and C. Potts, “Dspy: Compiling declarative language model calls into self-improving pipelines,”The Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/p...

  60. [61]

    and Zhang, Hao and Stoica, Ion , booktitle =

    W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. H. Yu, J. Gonzalez, H. Zhang, and I. Stoica, “Efficient memory management for large language model serving with pagedattention,” inProceedings of the 29th Symposium on Operating Systems Principles, ser. SOSP ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 611–626. [Online]. Availabl...

  61. [62]

    Sglang: efficient execution of structured language model programs,

    L. Zheng, L. Yin, Z. Xie, C. Sun, J. Huang, C. H. Yu, S. Cao, C. Kozyrakis, I. Stoica, J. E. Gonzalez, C. Barrett, and Y . Sheng, “Sglang: efficient execution of structured language model programs,” inProceedings of the 38th International Conference on Neural Information Processing Systems, ser. NIPS ’24. Red Hook, NY , USA: Curran Associates Inc., 2024. ...

  62. [63]

    Memorag: Boosting long context processing with global memory-enhanced retrieval augmentation,

    H. Qian, Z. Liu, P. Zhang, K. Mao, D. Lian, Z. Dou, and T. Huang, “Memorag: Boosting long context processing with global memory-enhanced retrieval augmentation,” inProceedings of the ACM on Web Conference 2025, ser. WWW ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 2366–2377. [Online]. Available: https://doi.org/10.1145/3696410.3714805

  63. [64]

    Tuning llms by rag principles: Towards llm-native memory,

    J. Wei, S. Wu, R. Liu, X. Ying, J. Shang, and F. Tao, “Tuning llms by rag principles: Towards llm-native memory,” arXiv preprint arXiv:2503.16071, 2025. [Online]. Available: https: //arxiv.org/abs/2503.16071

  64. [65]

    From RAG to memory: Non-parametric continual learning for large language models,

    B. J. Gutiérrez, Y . Shu, W. Qi, S. Zhou, and Y . Su, “From RAG to memory: Non-parametric continual learning for large language models,” inForty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=LWH8yn4HS2

  65. [66]

    Cue rag: Dynamic multi-output cue memory under h framework for retrieval-augmented generation,

    Y . Fu, D. Liu, B. Zhang, Z. Jiang, H. Mei, and J. Guan, “Cue rag: Dynamic multi-output cue memory under h framework for retrieval-augmented generation,”Neurocomputing, vol. 639, p. 130235,

  66. [67]

    Available: https://www .sciencedirect.com/science/article/ pii/S0925231225009075

    [Online]. Available: https://www .sciencedirect.com/science/article/ pii/S0925231225009075