pith. machine review for the scientific record. sign in

arxiv: 2605.07062 · v1 · submitted 2026-05-08 · 💻 cs.SE · cs.AI

Recognition: 2 theorem links

· Lean Theorem

From Assistance to Agency: Rethinking Autonomy and Control in CI/CD Pipelines

Marcus Emmanuel Barnes, Safwat Hassan, Taher A. Ghaleb

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:02 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords CI/CD pipelinesAI agentsauthority transferdata planecontrol planeautonomysoftware deploymentgovernance
0
0 comments X

The pith

The central challenge in agentic CI/CD is designing authority transfer from humans to agents rather than improving task performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper argues that AI agents entering CI/CD workflows require a new focus on how decision-making power moves from human-controlled pipelines to agent systems. It defines authority transfer as the delegation of operational choices under explicit constraints and with built-in recourse options. The authors separate data-plane authority for local actions such as generating patches or rerunning tests from control-plane authority for altering pipeline configurations, policies, and approval gates. Current systems remain largely at the data plane, achieving safety through external governance layers instead of intrinsic agent properties. This framing matters because it identifies why evaluation methods lag behind deployment speed and points to control-plane governance as the next research priority.

Core claim

The paper presents a vision of agentic CI/CD in which the central challenge is not improving task performance but designing authority transfer, defined as the delegation of operational decisions from human-controlled pipelines to agent systems under specified constraints and recourse mechanisms. Drawing on research prototypes and industrial platforms, it shows that current systems operate mainly at the data plane under bounded autonomy, with safety achieved through surrounding governance infrastructure rather than intrinsic agent guarantees.

What carries the argument

The distinction between data-plane authority for localized interventions such as patch generation and test reruns and control-plane authority for modifications to pipeline configuration, deployment policies, and approval gates, which structures the analysis of how much decision power is delegated.

If this is right

  • Current systems achieve safety through external governance rather than built-in agent guarantees.
  • Three recurring patterns appear across platforms: constrained autonomy as the dominant design, external governance as the primary safety mechanism, and a widening gap between deployment momentum and evaluation methodology.
  • Control-plane safety and governance mechanisms represent the most urgent open problem.
  • Subsequent priorities include formalization of autonomy boundaries, new evaluation frameworks, and protocols for human-agent coordination.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the data-plane and control-plane split holds, it could be used to design staged systems where agents first handle data-plane actions before any control-plane proposals reach human review.
  • The same authority-transfer lens might apply to other software automation domains such as infrastructure provisioning or monitoring alert handling.
  • A broader survey across additional industrial tools could strengthen or refine the observation that most agents stay at the data plane.

Load-bearing premise

That the distinction between data-plane and control-plane authority is a useful and natural way to analyze autonomy in CI/CD, and that observations from research prototypes and industrial platforms suffice to establish that current systems are limited to data-plane operations with external safety mechanisms.

What would settle it

Discovery of a production CI/CD system in which an agent can independently modify approval gates or deployment policies without any external governance layer or human recourse would directly test the claim that systems remain confined to data-plane authority.

read the original abstract

AI agents are assuming active roles in Continuous Integration and Continuous Deployment (CI/CD) workflows, yet the research community lacks a shared vocabulary for describing what it means for CI/CD to be agentic, how much decision authority is delegated, and where control should reside. This paper presents a vision of agentic CI/CD in which the central challenge is not improving task performance but designing authority transfer, defined as the delegation of operational decisions from human-controlled pipelines to agent systems under specified constraints and recourse mechanisms. To structure this argument, we introduce a distinction between data-plane authority (localized interventions such as patch generation and test reruns) and control-plane authority (modifications to pipeline configuration, deployment policies, and approval gates). Drawing on research prototypes and industrial platforms, we show that current systems operate mainly at the data plane under bounded autonomy, with safety achieved through surrounding governance infrastructure rather than intrinsic agent guarantees. We identify three recurring patterns: constrained autonomy as the dominant design, external governance as the primary safety mechanism, and a widening gap between deployment momentum and evaluation methodology. We propose a research agenda in which control-plane safety and governance mechanisms represent the most urgent open problem, followed by formalization of autonomy boundaries, evaluation frameworks, and human--agent coordination.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that the central challenge in agentic CI/CD is not task performance but designing authority transfer—the delegation of operational decisions from human-controlled pipelines to agent systems under specified constraints and recourse mechanisms. It introduces a distinction between data-plane authority (localized interventions such as patch generation and test reruns) and control-plane authority (modifications to pipeline configuration, deployment policies, and approval gates). Drawing on research prototypes and industrial platforms, it asserts that current systems operate mainly at the data plane under bounded autonomy with safety achieved through external governance rather than intrinsic guarantees, identifies three recurring patterns, and proposes a research agenda prioritizing control-plane safety, formalization of autonomy boundaries, evaluation frameworks, and human-agent coordination.

Significance. If the proposed framework holds, it could provide a useful conceptual lens for analyzing autonomy in AI-augmented DevOps workflows and help structure discussions around governance gaps. The emphasis on authority transfer rather than performance metrics may stimulate targeted research on safety mechanisms and human-AI coordination in software engineering. Its influence will depend on whether the data/control-plane distinction proves operationalizable and is validated through subsequent empirical work.

major comments (2)
  1. [Abstract] The assertion that 'current systems operate mainly at the data plane' (Abstract) is load-bearing for the central argument yet rests on unspecified observations from 'research prototypes and industrial platforms' without providing explicit criteria, a decision procedure, or an enumerated list of examined systems for classifying capabilities as data-plane versus control-plane. This leaves the generalization that existing tools are confined to localized interventions unverifiable and sensitive to how the planes are drawn.
  2. [Discussion of current systems and patterns] The identification of the three recurring patterns (constrained autonomy as dominant, external governance as primary safety mechanism, widening gap between deployment and evaluation) is presented without detailed mappings, case studies, or references to specific prototypes, which undermines the foundation for the subsequent research agenda that treats these patterns as established.
minor comments (1)
  1. [Terminology introduction] The terms 'data-plane authority' and 'control-plane authority' are introduced without referencing analogous distinctions from networking or distributed systems literature, which could help readers understand the intended analogy and scope.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which identify opportunities to improve the verifiability of our claims in this vision paper. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript without changing its conceptual focus.

read point-by-point responses
  1. Referee: [Abstract] The assertion that 'current systems operate mainly at the data plane' (Abstract) is load-bearing for the central argument yet rests on unspecified observations from 'research prototypes and industrial platforms' without providing explicit criteria, a decision procedure, or an enumerated list of examined systems for classifying capabilities as data-plane versus control-plane. This leaves the generalization that existing tools are confined to localized interventions unverifiable and sensitive to how the planes are drawn.

    Authors: We agree that the abstract claim would benefit from greater transparency. The distinction between data-plane and control-plane authority is defined in Section 2 of the manuscript, and the generalization draws from the specific prototypes and platforms analyzed in Sections 3 and 4. To make the classification process explicit and verifiable, we will add a new subsection (or appendix) that enumerates the examined systems, states the decision criteria (whether a system can alter pipeline configuration, policies, or gates versus performing only localized actions), and provides a brief mapping for each. This addition will not expand the paper's scope but will allow readers to assess the generalization directly. revision: yes

  2. Referee: [Discussion of current systems and patterns] The identification of the three recurring patterns (constrained autonomy as dominant, external governance as primary safety mechanism, widening gap between deployment and evaluation) is presented without detailed mappings, case studies, or references to specific prototypes, which undermines the foundation for the subsequent research agenda that treats these patterns as established.

    Authors: The three patterns are synthesized from the concrete examples already referenced in the manuscript (e.g., the research prototypes and industrial platforms discussed in Sections 3–4). We acknowledge that the presentation would be stronger with explicit linkages. In revision we will insert a concise table or bulleted mapping that associates each pattern with one or more specific systems or citations, thereby grounding the patterns without converting the paper into an empirical survey. This will directly support the research agenda that follows. revision: yes

Circularity Check

0 steps flagged

No significant circularity in conceptual analysis of CI/CD autonomy

full rationale

The paper introduces definitions for authority transfer and the data-plane versus control-plane distinction explicitly to frame its vision, then presents the observation that current systems are limited to data-plane operations as drawn from external prototypes and platforms. No equations, fitted parameters, self-citations, or derivations are used that reduce any central claim to its own inputs by construction. The argument relies on new conceptual distinctions and high-level observations rather than self-referential logic, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper introduces two new conceptual entities to structure the discussion and assumes that existing systems can be characterized by their limited autonomy without providing independent verification of that characterization.

axioms (1)
  • domain assumption Current AI-augmented CI/CD systems operate mainly at the data plane under bounded autonomy with safety provided by external governance.
    Presented as an observation drawn from prototypes and platforms in the abstract.
invented entities (2)
  • data-plane authority no independent evidence
    purpose: Localized interventions such as patch generation and test reruns
    New term introduced to categorize limited autonomy in CI/CD agents.
  • control-plane authority no independent evidence
    purpose: Modifications to pipeline configuration, deployment policies, and approval gates
    New term introduced to identify the higher-stakes decisions that remain human-controlled.

pith-pipeline@v0.9.0 · 5523 in / 1419 out tokens · 60067 ms · 2026-05-11T02:02:53.042858+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

  1. [1]

    Bram Adams and Shane McIntosh. 2016. Modern Release Engineering in a Nutshell: Why Researchers Should Care. In2016 IEEE 23rd International 4 Conference on Software Analysis, Evolution, and Reengineering (SANER). 78–90. doi:10.1109/SANER.2016.18

  2. [2]

    Buthayna AlMulla, Maram Assi, and Safwat Hassan. 2025. Understanding the Challenges and Opportunities of Generative AI Apps: An Empirical Study.arXiv preprint arXiv:2506.16453(2025)

  3. [3]

    Amazon Web Services. 2026. Third-party integration with Amazon Q Developer (GitHub). AWS Documentation. https://docs.aws.amazon.com/amazonq/latest/ qdeveloper-ug/third-party-integration.html Accessed 2026-02-07

  4. [4]

    Praveen Anugula, Avdhesh Kumar Bhardwaj, Navin Chhibber, Rohit Tewari, Sunil Khemka, and Piyush Ranjan. 2025. AutoGuard: A Self-Healing Proactive Security Layer for DevSecOps Pipelines Using Reinforcement Learning. https: //arxiv.org/abs/2512.04368

  5. [5]

    Benoit Baudry, Zimin Chen, Khashayar Etemadi, Han Fu, Davide Ginelli, Steve Kommrusch, Matias Martinez, Martin Monperrus, Javier Ron, He Ye, and Zhongx- ing Yu. 2021. A Software-Repair Robot Based on Continual Learning.IEEE Software38, 4 (2021), 28–35. doi:10.1109/MS.2021.3070743

  6. [6]

    Islem Bouzenia, Premkumar Devanbu, and Michael Pradel. 2025. RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, Los Alamitos, CA, USA, 2188–2200. doi:10.1109/ICSE55347.2025.00157

  7. [7]

    Islem Bouzenia and Michael Pradel. 2025. You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects.Proc. ACM Softw. Eng.2, ISSTA, Article ISSTA047 (June 2025), 23 pages. doi:10.1145/3728922

  8. [8]

    Yinfang Chen, Manish Shetty, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Jonathan Mace, Chetan Bansal, Rujia Wang, and Saravan Rajmohan. 2025. AIOp- sLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds. InProceedings of Machine Learning and Systems, Vol. 7

  9. [9]

    Betty H. C. Cheng, Rogério de Lemos, Holger Giese, Paola Inverardi, Jeff Magee, Jesper Andersson, Basil Becker, Nelly Bencomo, Yuriy Brun, Bojan Cukic, Gio- vanna Di Marzo Serugendo, Schahram Dustdar, Anthony Finkelstein, Cristina Gacek, Kurt Geihs, Vincenzo Grassi, Gabor Karsai, Holger M. Kienle, Jeff Kramer, Marin Litoiu, Sam Malek, Raffaela Mirandola, ...

  10. [10]

    InSoftware Engineering for Self-Adaptive Systems, Betty H

    Software Engineering for Self-Adaptive Systems: A Research Roadmap. InSoftware Engineering for Self-Adaptive Systems, Betty H. C. Cheng, Rogério de Lemos, Holger Giese, Paola Inverardi, and Jeff Magee (Eds.). Lecture Notes in Computer Science, Vol. 5525. Springer, 1–26. doi:10.1007/978-3-642-02161-9_1

  11. [11]

    2025.Automate Your CI Fixes: Self-Healing Pipelines with AI Agents

    Dagger. 2025.Automate Your CI Fixes: Self-Healing Pipelines with AI Agents. https: //dagger.io/blog/automate-your-ci-fixes-self-healing-pipelines-with-ai-agents

  12. [12]

    Datadog. 2026. Bits AI Dev Agent. https://docs.datadoghq.com/bits_ai/bits_ai_ dev_agent/. Accessed 2026-05-04

  13. [13]

    Liming Dong, Qinghua Lu, and Liming Zhu. 2024. AgentOps: Enabling Observ- ability of LLM Agents.arXiv preprint arXiv:2411.05285(2024)

  14. [14]

    Brian Fitzgerald and Klaas-Jan Stol. 2017. Continuous Software Engineering: A Roadmap and Agenda.Journal of Systems and Software123 (2017), 176–189. doi:10.1016/j.jss.2015.06.063

  15. [15]

    Taher A Ghaleb. 2026. When AI Agents Touch CI/CD Configurations: Frequency and Success.arXiv preprint arXiv:2601.17413(2026)

  16. [16]

    Gitar. 2026. Automated build failure fix solutions (autonomous CI/CD healing engine). Vendor documentation / blog. https://cms.gitar.ai/automated-build- failure-fix-solutions/ Accessed 2026-02-07

  17. [17]

    2026.About GitHub Copilot coding agent

    GitHub. 2026.About GitHub Copilot coding agent. https://docs.github.com/en/ copilot/concepts/agents/coding-agent/about-coding-agent Concept documenta- tion describing Copilot coding agent autonomy and PR-based delegation

  18. [18]

    GitHub. 2026. Continuous AI in Practice: What Developers Can Automate Today with Agentic CI. https://github.blog/ai-and-ml/generative-ai/continuous-ai-in- practice-what-developers-can-automate-today-with-agentic-ci/. GitHub Blog. Accessed 2026-02-11

  19. [19]

    GitLab. 2026. Fix CI/CD pipeline flow. GitLab Docs (Duo agent plat- form). https://docs.gitlab.com/user/duo_agent_platform/flows/foundational_ flows/fix_pipeline/ Accessed 2026-02-07

  20. [20]

    Google GitHub Actions. 2026. run-gemini-cli: A GitHub Action invoking the Gemini CLI. GitHub repository. https://github.com/google-github-actions/run- gemini-cli Accessed 2026-02-07

  21. [21]

    Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Systematic Literature Review.ACM Transactions on Software Engineering and Methodology33, 8 (2024), 220:1–220:79. doi:10.1145/ 3695988

  22. [22]

    Huebscher and Julie A

    Markus C. Huebscher and Julie A. McCann. 2008. A Survey of Autonomic Computing—Degrees, Models, and Applications.Comput. Surveys40, 3 (2008), 7:1–7:28. doi:10.1145/1380584.1380585

  23. [23]

    2010.Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation

    Jez Humble and David Farley. 2010.Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley Professional

  24. [24]

    Saurabh Jha, Rohan Arora, Yuji Watanabe, Takumi Yanagawa, Yinfang Chen, Jackson Clark, Bhavya Bhavya, Mudit Verma, Harshit Kumar, Hirokuni Ki- tahara, Noah Zheutlin, Saki Takano, Divya Pathak, Felix George, Xinbo Wu, Bekir O. Turkkan, Gerard Vanloo, Michael Nidd, Ting Dai, Oishik Chatter- jee, Pranjal Gupta, Suranjana Samanta, Pooja Aggarwal, Rong Lee, Pa...

  25. [25]

    Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. 2024. SWE-bench: Can Language Models Resolve Real-world Github Issues?. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=VTF8yNQM66

  26. [26]

    Kephart and David M

    Jeffrey O. Kephart and David M. Chess. 2003. The Vision of Autonomic Computing. Computer36, 1 (2003), 41–50. doi:10.1109/MC.2003.1160055

  27. [27]

    Hao Li, Haoxiang Zhang, and Ahmed E. Hassan. 2026. AIDev: Studying AI Coding Agents on GitHub. arXiv:2602.09185 https://arxiv.org/abs/2602.09185

  28. [28]

    Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, and Yiling Lou. 2024. Large Language Model-Based Agents for Software Engineering: A Survey. arXiv:2409.02977 https://arxiv.org/abs/2409.02977

  29. [29]

    Alok Mishra and Ziadoon Otaiwi. 2020. DevOps and Software Quality: A System- atic Mapping.Computer Science Review38 (2020), 100308. doi:10.1016/j.cosrev. 2020.100308

  30. [30]

    Akshay Mittal and Vivek Venkatesan. 2025. Leveraging Generative AI for Proac- tive Security and Automated Remediation in Cloud-Native CI/CD Pipelines. In International Conference on Software Engineering and Data Engineering. Springer, 18–39

  31. [31]

    Martin Monperrus, Simon Urli, Thomas Durieux, Matias Martinez, Benoit Baudry, and Lionel Seinturier. 2019. Repairnator Patches Programs Automatically.Ubiq- uity2019, July (2019), 1–12. doi:10.1145/3349589

  32. [32]

    Paolo Notaro, Jorge Cardoso, and Michael Gerndt. 2021. A Survey of AIOps Methods for Failure Management.ACM Transactions on Intelligent Systems and Technology12, 6, Article 81 (Nov. 2021), 45 pages. doi:10.1145/3483424

  33. [33]

    2025.AI-Powered Self-Healing CI

    Nx. 2025.AI-Powered Self-Healing CI. https://nx.dev/docs/features/ci-features/ self-healing-ci Nx Documentation. Accessed 2026-02-12

  34. [34]

    Mojtaba Shahin, Muhammad Ali Babar, and Liming Zhu. 2017. Continuous Integration, Delivery and Deployment: A Systematic Review on Approaches, Tools, Challenges and Practices.IEEE Access5 (2017), 3909–3943. doi:10.1109/ ACCESS.2017.2685629

  35. [35]

    Simon Urli, Zhongxing Yu, Lionel Seinturier, and Martin Monperrus. 2018. How to design a program repair bot? insights from the repairnator project. InProceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice. ACM, 95–104. doi:10.1145/3183519.3183540

  36. [36]

    Yanlin Wang, Wanjun Zhong, Yanxian Huang, Ensheng Shi, Min Yang, Jiachi Chen, Hui Li, Yuchi Ma, Qianxiang Wang, and Zibin Zheng. 2025. Agents in soft- ware engineering: Survey, landscape, and vision.Automated Software Engineering 32, 2 (2025), 70

  37. [37]

    White, James E

    Steve R. White, James E. Hanson, Ian Whalley, David M. Chess, and Jeffrey O. Kephart. 2004. An Architectural Approach to Autonomic Computing. InProceed- ings of the 1st International Conference on Autonomic Computing (ICAC 2004). 2–9. doi:10.1109/ICAC.2004.8

  38. [38]

    Chunqiu Steven Xia and Lingming Zhang. 2024. Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each using ChatGPT. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis(Vienna, Austria)(ISSTA 2024). Association for Computing Machinery, New York, NY, USA, 819–831. doi:10.1145/3650212.3680323

  39. [39]

    Weiyuan Xu, Juntao Luo, Tao Huang, Kaixin Sui, Jie Geng, Qijun Ma, Isami Akasaka, Xiaoxue Shi, Jing Tang, and Peng Cai. 2025. LogSage: An LLM-Based Framework for CI/CD Failure Detection and Remediation with Industrial Vali- dation. In2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). 3742–3753. doi:10.1109/ASE63991.2025.00310

  40. [40]

    Chen Zhang, Bihuan Chen, Xin Peng, and Wenyun Zhao. 2022. BuildSheriff: change-aware test failure triage for continuous integration builds. InProceedings of the 44th International Conference on Software Engineering(Pittsburgh, Pennsyl- vania)(ICSE ’22). Association for Computing Machinery, New York, NY, USA, 312–324. doi:10.1145/3510003.3510132 5