Recognition: unknown
The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE
Pith reviewed 2026-05-10 10:29 UTC · model grok-4.3
The pith
Software engineering is expanding from executable code to semi-executable artifacts that combine language, tools, workflows, and routines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The important shift is not that software engineering loses relevance. It is that the thing being engineered expands beyond executable code to semi-executable artifacts; combinations of natural language, tools, workflows, control mechanisms, and organizational routines whose enactment depends on human or probabilistic interpretation rather than deterministic execution. The Semi-Executable Stack is introduced as a six-ring diagnostic reference model spanning executable artifacts, instructional artifacts, orchestrated execution, controls, operating logic, and societal and institutional fit. The model helps locate where a contribution, bottleneck, or organizational transition primarily sits, and
What carries the argument
The Semi-Executable Stack, a six-ring diagnostic reference model whose rings are executable artifacts, instructional artifacts, orchestrated execution, controls, operating logic, and societal and institutional fit.
If this is right
- Any contribution or bottleneck can be assigned to one primary ring while noting its dependencies on adjacent rings.
- Familiar objections to agentic AI in software engineering become concrete engineering targets rather than reasons to reject the transition.
- Legacy processes, controls, and coordination routines can be evaluated with the preserve-versus-purify heuristic to decide what to keep versus simplify.
- Organizational transitions can be diagnosed by tracking which rings are changing and which remain stable.
Where Pith is reading between the lines
- The model could be tested by mapping existing agent frameworks onto the six rings to reveal gaps not yet addressed by current tools.
- It offers a way to compare how different organizations adapt their routines when introducing agentic systems.
- The preserve-versus-purify heuristic could generate measurable criteria for retaining or retiring specific SE practices.
Load-bearing premise
The six-ring Semi-Executable Stack supplies a sufficiently complete and actionable diagnostic lens for locating contributions, bottlenecks, and transitions across the expanded scope of software engineering.
What would settle it
A detailed case study of an AI-augmented software project in which the six rings cannot be used to identify the primary location of a major bottleneck or contribution would falsify the model's claimed utility.
Figures
read the original abstract
AI-based systems, currently driven largely by LLMs and tool-using agentic harnesses, are increasingly discussed as a possible threat to software engineering. Foundation models get stronger, agents can plan and act across multiple steps, and tasks such as scaffolding, routine test generation, straightforward bug fixing, and small integration work look more exposed than they did only a few years ago. The result is visible unease not only among students and junior developers, but also among experienced practitioners who worry that hard-won expertise may lose value. This paper argues for a different reading. The important shift is not that software engineering loses relevance. It is that the thing being engineered expands beyond executable code to semi-executable artifacts; combinations of natural language, tools, workflows, control mechanisms, and organizational routines whose enactment depends on human or probabilistic interpretation rather than deterministic execution. The Semi-Executable Stack is introduced as a six-ring diagnostic reference model for reasoning about that expansion, spanning executable artifacts, instructional artifacts, orchestrated execution, controls, operating logic, and societal and institutional fit. The model helps locate where a contribution, bottleneck, or organizational transition primarily sits, and which adjacent rings it depends on. The paper develops the argument through three worked cases, reframes familiar objections as engineering targets rather than reasons to dismiss the transition, and closes with a preserve-versus-purify heuristic for deciding which legacy software engineering processes, controls, and coordination routines should be kept and which should be simplified or redesigned. This paper is a conceptual keynote companion: diagnostic and agenda-setting rather than empirical.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that AI-driven agentic systems do not threaten software engineering but expand its scope beyond executable code to semi-executable artifacts (natural language, tools, workflows, controls, and organizational routines reliant on human or probabilistic interpretation). It introduces the Semi-Executable Stack as a six-ring diagnostic reference model (executable artifacts, instructional artifacts, orchestrated execution, controls, operating logic, and societal/institutional fit) to locate contributions and bottlenecks, develops the reframing via three worked cases, treats familiar objections as engineering targets, and proposes a preserve-versus-purify heuristic for legacy processes.
Significance. If the reframing holds, the paper supplies a useful conceptual tool for the SE community to map the shifting boundaries of the field amid agentic AI. Its diagnostic model and heuristic could help researchers and practitioners identify where new work fits in the expanded stack and decide what legacy elements to retain or redesign. The explicitly agenda-setting, non-empirical stance is a strength, as it avoids overclaiming while offering a structured lens for future contributions.
minor comments (2)
- The abstract lists the six rings but does not provide even one-sentence characterizations of each; adding brief definitions would improve immediate accessibility without altering the conceptual focus.
- The three worked cases are referenced as illustrations of the model, but the manuscript would benefit from an explicit mapping table or paragraph showing which rings each case primarily engages; this is a clarity issue rather than a challenge to the central claim.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the manuscript and for the recommendation of minor revision. The summary accurately reflects the paper's positioning as a conceptual, agenda-setting contribution that introduces the Semi-Executable Stack as a diagnostic model for the expanding scope of software engineering amid agentic AI systems. We appreciate the recognition of its potential value to researchers and practitioners in locating contributions and bottlenecks within the six-ring structure, as well as the preserve-versus-purify heuristic for legacy processes. We will make any minor editorial adjustments in the revised version.
Circularity Check
No significant circularity
full rationale
The paper is explicitly conceptual and agenda-setting. It introduces the six-ring Semi-Executable Stack as an independent diagnostic reference model, develops the reframing through three worked cases, and offers a preserve-versus-purify heuristic. No equations, fitted parameters, predictions, or derivations are present that could reduce to inputs by construction. The central claim concerns an expansion of scope rather than a mechanism derived from its own assumptions or self-citations, rendering the argument self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AI-based systems driven by LLMs and agentic harnesses are increasingly capable of tasks such as scaffolding, routine test generation, and bug fixing that were previously the domain of software engineers.
invented entities (1)
-
Semi-Executable Stack
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Transitioning manual system test suites to automated testing: An industrial case study
Emil Alégroth, Robert Feldt, and Helena Holmström Olsson. Transitioning manual system test suites to automated testing: An industrial case study. In2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, pages 56–65, 2013. doi: 10.1109/ICST.2013.14
-
[2]
Anthropic economic index, january 2026 report, 2026
Anthropic. Anthropic economic index, january 2026 report, 2026. URLhttps://www.an thropic.com/research/anthropic-economic-index-january-2026-report . Accessed April 22, 2026
2026
-
[3]
Baxter and Ian Sommerville
Gordon D. Baxter and Ian Sommerville. Socio-technical systems: From design methods to systems engineering.Interacting with Computers, 23(1):4–17, 2011
2011
-
[4]
Measuring the impact of early-2025 AI on experienced open-source developer productivity, 2025
Joel Becker, Nate Rush, Elizabeth Barnes, and David Rein. Measuring the impact of early-2025 AI on experienced open-source developer productivity, 2025. URL https: //metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ . Model Evaluation and Threat Research (METR). Published July 10, 2025; Accessed April 22, 2026
2025
-
[5]
Frederick P. Brooks, Jr. No silver bullet: Essence and accidents of software engineering. Computer, 20(4):10–19, 1987. doi: 10.1109/MC.1987.1663532
-
[6]
Canaries in the coal mine? six facts about the recent employment effects of artificial intelligence, 2025
Erik Brynjolfsson, Bharat Chandar, and Ruyu Chen. Canaries in the coal mine? six facts about the recent employment effects of artificial intelligence, 2025. URLhttps: //digitaleconomy.stanford.edu/publication/canaries-in-the-coal-mine-six -facts-about-the-recent-employment-effects-of-artificial-intelligence/ . Published November 13, 2025; Accessed April 22, 2026
2025
-
[7]
AI in software engineering at google: Progress and the path ahead, 2024
Satish Chandra and Maxim Tabachnyk. AI in software engineering at google: Progress and the path ahead, 2024. URLhttps://research.google/blog/ai-in-software-enginee ring-at-google-progress-and-the-path-ahead/ . Published June 6, 2024; Accessed April 22, 2026
2024
-
[8]
Addison-Wesley Professional, 3rd edition, 2011
Mary Beth Chrissis, Mike Konrad, and Sandy Shrum.CMMI for Development: Guidelines for Process Integration and Product Improvement. Addison-Wesley Professional, 3rd edition, 2011. 16
2011
-
[9]
2025 state of AI-assisted software development report, 2025
Derek DeBellis, Kevin Storer, Nathen Harvey, Matt Beane, Rob Edwards, Edward Fraser, Ben Good, Eirini Kalliamvakou, Gene Kim, Eric Maxwell, Sarah D’Angelo, Sarah Inman, Ambar Murillo, and Daniella Villalba. 2025 state of AI-assisted software development report, 2025. URL https://research.google/pubs/dora-2025-state-of-ai-assiste d-software-development-rep...
2025
-
[10]
The cybernetic teammate: A field experiment on generative AI reshaping teamwork and expertise, 2025
Fabrizio Dell’Acqua, Charles Ayoubi, Hila Lifshitz, Raffaella Sadun, Ethan Mollick, Lilach Mollick, Yi Han, Jeff Goldman, Hari Nair, Stewart Taub, and Karim Lakhani. The cybernetic teammate: A field experiment on generative AI reshaping teamwork and expertise, 2025. URLhttps://www.nber.org/papers/w33641. NBER Working Paper 33641
2025
-
[11]
Felix Dobslaw, Robert Feldt, Juyeon Yoon, and Shin Yoo. Challenges in testing large language model based software: A faceted taxonomy.ACM Transactions on Software Engineering and Methodology, 35(4):1–38, 2026. doi: 10.1145/3806396
-
[12]
Regulation (eu) 2024/1689 laying down harmonised rules on artificial intelligence (AI act), 2024
European Union. Regulation (eu) 2024/1689 laying down harmonised rules on artificial intelligence (AI act), 2024. URLhttps://eur-lex.europa.eu/legal-content/en/LSU/ ?qid=1744648637579&uri=CELEX%3A32024R1689 . Official summary and legal reference; Accessed April 22, 2026
2024
-
[13]
Fabian Fagerholm, Michael Felderer, Davide Fucci, Michael Unterkalmsteiner, Bogdan Marculescu, Markus Martini, Lucas Gren, Lars Göran Wallgren Tengberg, Robert Feldt, Antti Lehtelä, Bettina Nagyváradi, and Jehan Khattak. Cognition in software engineering: A taxonomy and survey of a half-century of research.ACM Computing Surveys, 54(11s): 1–36, 2022. doi: ...
-
[14]
Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, and Jie M. Zhang. Large language models for software engineering: Survey and open problems. InProceedings of the 45th IEEE/ACM International Conference on Software Engineering: Future of Software Engineering, pages 31–53, 2023. doi: 10.1109/ICSE-FoSE5 9343.2023.00008
-
[15]
Ways of applying artificial intelligence in software engineering
Robert Feldt, Francisco Gomes de Oliveira Neto, and Richard Torkar. Ways of applying artificial intelligence in software engineering. InProceedings of the 6th International Work- shop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE@ICSE), pages 35–41, 2019
2019
-
[16]
Towards autonomous testing agents via conversational large language models
Robert Feldt, Sungmin Kang, Juyeon Yoon, and Shin Yoo. Towards autonomous testing agents via conversational large language models. InProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 1688–1693,
-
[17]
doi: 10.1109/ASE56229.2023.00148
-
[18]
Görkem Giray. A software engineering perspective on engineering machine learning systems: State of the art and challenges.Journal of Systems and Software, 180:111031, 2021. doi: 10.1016/j.jss.2021.111031
-
[19]
Wobbrock, and Katharina Reinecke
Daniel Graziotin, Per Lenberg, Robert Feldt, and Stefan Wagner. Psychometrics in behavioral software engineering: A methodological introduction with guidelines.ACM Transactions on Software Engineering and Methodology, 31(1):1–36, 2022. doi: 10.1145/34 69888
work page doi:10.1145/34 2022
-
[20]
Cross-functional AI task forces (X-FAITs) for AI transfor- mation of software organizations
Lucas Gren and Robert Feldt. Cross-functional AI task forces (X-FAITs) for AI transfor- mation of software organizations. InProceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering (EASE), pages 793–796, 2025. doi: 10.1145/3756681.3757015. 17
-
[21]
Large Language Models for Software Engineering: A Sys- tematic Literature Review,
Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Daniel Luo, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. Large language models for software engineering: A systematic literature review.ACM Transactions on Software Engineering and Methodology, 33(8):1–79, 2024. doi: 10.1145/3695988
-
[22]
Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R
Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R. Narasimhan. SWE-bench: Can language models resolve real-world github issues? InThe Twelfth International Conference on Learning Representations (ICLR), 2024. URL https://openreview.net/forum?id=VTF8yNQM66. ICLR 2024; Accessed April 22, 2026
2024
-
[23]
Gatelens: A reasoning-enhanced LLM agent for automotive software release analytics, 2025
Arsham Gholamzadeh Khoee, Shuai Wang, Robert Feldt, Dhasarathy Parthasarathy, and Yinan Yu. Gatelens: A reasoning-enhanced LLM agent for automotive software release analytics, 2025. URL https://arxiv.org/abs/2503.21735. arXiv:2503.21735; Accessed April 22, 2026
-
[24]
Arsham Gholamzadeh Khoee, Yinan Yu, Robert Feldt, Andris Freimanis, Patrick Andersson Rhodin, andDhasarathyParthasarathy. Gonogo: AnefficientLLM-basedmulti-agentsystem for streamlining automotive software release decision-making. InProceedings of the 36th IFIP WG 6.1 International Conference on Testing Software and Systems (ICTSS), volume 15383 of Lecture...
-
[25]
Amy J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Chris Scaffidi, Joseph Lawrance, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck. The state of the art in end-user software engineering.ACM Computing Surveys, 43(3):1–44, 2011. doi: 10.1145/1922649.19 22658
-
[26]
Machine learning operations (MLOps): Overview, definition, and architecture, 2023
Dominik Kreuzberger, Niklas Kühl, and Sebastian Hirschl. Machine learning operations (MLOps): Overview, definition, and architecture, 2023. URLhttps://arxiv.org/abs/22 05.02302. arXiv:2205.02302; Accessed April 22, 2026
-
[27]
arXiv preprint arXiv:2503.14499 , year=
Thomas Kwa, Ben West, Joel Becker, Amy Deng, Katharyn Garcia, Max Hasin, Sami Jawhar, Megan Kinniment, Nate Rush, Sydney von Arx, Ryan Bloom, Thomas Broadley, Haoxing Du, Brian Goodrich, Nikola Jurkovic, Luke Harold Miles, Seraphina Nix, Tao Lin, Neev Parikh, David Rein, Lucas Jun Koba Sato, Hjalmar Wijk, Daniel M. Ziegler, Elizabeth Barnes, and Lawrence ...
-
[28]
Behavioral software engineering: A definition and systematic literature review.Journal of Systems and Software, 107:15–37,
Per Lenberg, Robert Feldt, and Lars Göran Wallgren. Behavioral software engineering: A definition and systematic literature review.Journal of Systems and Software, 107:15–37,
-
[29]
doi: 10.1016/j.jss.2015.04.084
-
[30]
Human factors related challenges in software engineering: An industrial perspective
Per Lenberg, Robert Feldt, and Lars Göran Wallgren. Human factors related challenges in software engineering: An industrial perspective. In2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE), pages 43–49, 2015. doi: 10.1109/CHASE.2015.13
-
[31]
Manning, Christo- pher Ré, et al
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Ya- sunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christo- pher Ré, et al. Holistic evaluation of language models.Transactions on Machine Learning Research, 2023....
2023
-
[32]
Springer, 2006
Henry Lieberman, Fabio Paternò, and Volker Wulf, editors.End User Development, volume 9 ofHuman-Computer Interaction Series. Springer, 2006
2006
-
[33]
Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, and Yiling Lou. Large language model-based agents for software engineering: A survey, 2024. URLhttps://arxiv.org/abs/2409.02977. arXiv:2409.02977; Accessed April 22, 2026
-
[34]
Silverio Martínez-Fernández, Justus Bogner, Xavier Franch, Marc Oriol, Julien Siebert, Adam Trendowicz, Anna Maria Vollmer, and Stefan Wagner. Software engineering for AI-based systems: A survey.ACM Transactions on Software Engineering and Methodology, 31(2), 2022. doi: 10.1145/3487043
-
[35]
2025: The year the frontier firm is born, 2025
Microsoft WorkLab. 2025: The year the frontier firm is born, 2025. URLhttps://www.mi crosoft.com/en-us/worklab/work-trend-index/2025-the-year-the-frontier-fir m-is-born. Published April 29, 2025; Accessed April 22, 2026
2025
-
[36]
ACM TOSEM32, 3, Article 75 (April 2023), 30 pages
Michel Nass, Emil Alégroth, Robert Feldt, Maurizio Leotta, and Filippo Ricca. Similarity- based web element localization for robust test automation.ACM Transactions on Software Engineering and Methodology, 32(3):1–30, 2023. doi: 10.1145/3571855
-
[37]
AI risk management framework, 2023
National Institute of Standards and Technology. AI risk management framework, 2023. URL https://www.nist.gov/itl/ai-risk-management-framework. Accessed April 22, 2026
2023
-
[38]
Artificial intelligence risk management framework: Generative artificial intelligence profile, 2024
National Institute of Standards and Technology. Artificial intelligence risk management framework: Generative artificial intelligence profile, 2024. URLhttps://www.nist.gov/p ublications/artificial-intelligence-risk-management-framework-generative-a rtificial-intelligence. Published July 26, 2024; Accessed April 22, 2026
2024
-
[39]
An empirical study on decision-making aspects in responsible software engineering for AI
Lekshmi Murali Rani, Faezeh Mohammadi, Robert Feldt, and Richard Berntsson Svensson. An empirical study on decision-making aspects in responsible software engineering for AI. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 575–585. IEEE, 2025. doi: 10.1109/ICSE-SEI P66354.2025.00056
-
[40]
Bridging the socio- emotional gap: The functional dimension of human-AI collaboration for software engineering,
Lekshmi Murali Rani, Richard Berntsson Svensson, and Robert Feldt. Bridging the socio- emotional gap: The functional dimension of human-AI collaboration for software engineering,
-
[41]
arXiv:2601.19387; Accessed April 22, 2026
URL https://arxiv.org/abs/2601.19387. arXiv:2601.19387; Accessed April 22, 2026
-
[42]
Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, and Dan Dennison
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, and Dan Dennison. Hidden technical debt in machine learning systems. InAdvances in Neural Information Processing Systems (NeurIPS), pages 2503–2511, 2015
2015
-
[43]
The unfulfilled potential of data-driven decision making in agile software development
Richard Berntsson Svensson, Robert Feldt, and Richard Torkar. The unfulfilled potential of data-driven decision making in agile software development. InAgile Processes in Software Engineering and Extreme Programming, volume355ofLecture Notes in Business Information Processing, pages 69–85. Springer, 2019. doi: 10.1007/978-3-030-19034-7_5
-
[44]
Mahan Tafreshipour, Aaron Imani, Eric Huang, Eduardo Santana de Almeida, Thomas Zimmermann, and Iftekhar Ahmed. Prompting in the wild: An empirical study of prompt evolution in software repositories. InProceedings of the 22nd IEEE/ACM International Conference on Mining Software Repositories (MSR), 2025. doi: 10.1109/MSR66628.2025.00
-
[45]
URL https://2025.msrconf.org/details/msr-2025-technical-papers/10/Pro 19 mpting-in-the-Wild-An-Empirical-Study-of-Prompt-Evolution-in-Software-Rep ositorie
2025
-
[46]
Unpacking organizational change in AI transfor- mations of software engineering
Theocharis Tavantzis and Robert Feldt. Unpacking organizational change in AI transfor- mations of software engineering. InProceedings of the 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE@ICSE), pages 149–160,
-
[47]
doi: 10.1109/CHASE66643.2025.00026
-
[48]
Shuai Wang, Yinan Yu, Robert Feldt, and Dhasarathy Parthasarathy. Automating a complete software test process using LLMs: An automotive case study. InProceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE), pages 373–384, 2025. doi: 10.1109/ICSE55347.2025.00211
-
[49]
The github recent bugs dataset for evaluating llm-based debugging applications,
Juyeon Yoon, Robert Feldt, and Shin Yoo. Intent-driven mobile GUI testing with au- tonomous large language model agents. In2024 IEEE Conference on Software Testing, Ver- ification and Validation (ICST), pages 129–139, 2024. doi: 10.1109/ICST60714.2024.00020. A Keynote Title and Abstract Title.Agentic Software Engineering Will Eat the World: AI-Based Syste...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.