arxiv: 2605.02454 · v1 · submitted 2026-05-04 · 💻 cs.SE · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Causal Software Engineering: A Vision and Roadmap

Roberto Pietrantuono , Luca Giamattei , Stefano Russo , Julien Siebert , Neil Walkinshaw

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:47 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords causal software engineeringcausal inferencesoftware lifecyclecounterfactual diagnosisinterventional reasoningsoftware engineeringAIOpsuncertainty-aware estimates

0 comments

The pith

Causal models should systematically inform decisions across the full software engineering lifecycle instead of relying on correlations alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Software engineers face high-stakes choices under uncertainty using signals from code, field data, and processes, yet current AI tools mainly detect co-occurring patterns that cannot answer questions about the effects of deliberate changes. The paper proposes Causal Software Engineering as a new paradigm that embeds causal models and reasoning into every phase of development and operations to supply explicit assumptions, uncertainty-aware effect estimates, and counterfactual diagnosis. This augments existing practices rather than replacing them. The authors sketch a causal-first workflow, a staged roadmap for tools and adoption, and an agenda for benchmarks to track progress.

Core claim

We propose Causal Software Engineering (CSE) as a future paradigm in which causal models and causal reasoning systematically inform activities across the software lifecycle, augmenting existing practices with explicit assumptions, uncertainty-aware effect estimates, and counterfactual diagnosis. We outline a causal-first workflow view spanning development and operations, a staged roadmap for tools and organizational adoption, and an evaluation and benchmark agenda for measuring progress.

What carries the argument

Causal models that support interventional and counterfactual queries applied to software engineering data throughout the development and operations lifecycle.

Load-bearing premise

Causal models can be practically constructed and validated from the noisy, incomplete, and socio-technical data typical in software engineering contexts.

What would settle it

A real-world software project where causal models are built from available data and used to guide decisions yet produce no measurable improvement in outcomes, reliability, or understanding of change effects compared with standard correlational methods.

Figures

Figures reproduced from arXiv: 2605.02454 by Julien Siebert, Luca Giamattei, Neil Walkinshaw, Roberto Pietrantuono, Stefano Russo.

**Figure 1.** Figure 1: A roadmap for CSE along four co-evolving routes. view at source ↗

read the original abstract

Software engineering increasingly involves making high-stakes decisions under uncertainty, using signals from code, field data, and socio-technical processes. Recent AI-driven support (e.g., anomaly detection, predictive analytics, AIOps, as well as LLM-based agents) has amplified engineers' ability to detect patterns and synthesize content and recommendations, but many critical questions are interventional or counterfactual: What is the expected impact of changing a load-balancing strategy? Would an outage have been avoided under a different release plan? Correlational models answer "what tends to co-occur"; they struggle to answer "what would happen if we act." We propose Causal Software Engineering (CSE) as a future paradigm in which causal models and causal reasoning systematically inform activities across the software lifecycle, augmenting existing practices with explicit assumptions, uncertainty-aware effect estimates, and counterfactual diagnosis. We outline (i) a causal-first workflow view spanning development and operations, (ii) a staged roadmap for tools and organizational adoption, and (iii) an evaluation and benchmark agenda for measuring progress.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a clean vision paper that names a real gap between correlational AI tools and the interventional questions engineers actually face, but it offers no worked examples or data checks.

read the letter

The main point is that current AI support in software engineering mostly finds patterns and makes predictions, yet many decisions involve changing something and asking what would happen next. The paper proposes Causal Software Engineering as a way to bring explicit causal models into the lifecycle so teams can reason about interventions and counterfactuals instead of just correlations. It sketches a workflow that starts with causal assumptions, moves to effect estimation, and includes diagnosis, plus a staged adoption path and a list of benchmarks to track progress. That framing is useful because it connects an established idea from causal ML to the specific messiness of code, releases, and operations data. The roadmap gives readers a concrete set of milestones to argue about rather than leaving the idea floating. The limitation is that none of this is tested. There are no small examples showing how to extract a causal graph from commit logs or telemetry, no discussion of identifiability conditions that would actually hold in typical SE datasets, and no acknowledgment of how much domain expertise would be needed to avoid misspecification. The feasibility question is left entirely to the evaluation agenda. Readers who already work on causal methods or on AI for SE will find the proposal coherent and worth discussing. It is not a result paper, so it will not change anyone's current methods, but it could shape what a follow-up empirical paper should measure. I would send it to referees who know both causal inference and software engineering practice; they can judge whether the proposed benchmarks are the right ones and whether the roadmap underestimates the data and tooling gaps.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Causal Software Engineering (CSE) as a future paradigm in which causal models and causal reasoning systematically inform activities across the software lifecycle. It augments existing correlational AI practices (anomaly detection, predictive analytics, AIOps, LLM agents) by emphasizing explicit assumptions, uncertainty-aware effect estimates, and counterfactual diagnosis for interventional questions such as the impact of changing load-balancing strategies or release plans. The paper outlines (i) a causal-first workflow spanning development and operations, (ii) a staged roadmap for tools and organizational adoption, and (iii) an evaluation and benchmark agenda.

Significance. If the vision is realized, CSE could shift software engineering from pattern detection to reliable interventional and counterfactual reasoning, improving high-stakes decisions under uncertainty in socio-technical systems. As a vision and roadmap paper without empirical results or formal derivations, its primary contribution is framing an open research direction and evaluation agenda rather than delivering validated methods.

major comments (1)

The central feasibility claim—that causal models can be practically constructed and validated from the noisy, incomplete, and socio-technical data typical in software engineering—is presented as an open challenge in the roadmap but is load-bearing for the entire proposal. No concrete identifiability conditions, data requirements, or integration mechanisms with existing SE artifacts (e.g., logs, issue trackers, code repositories) are specified, leaving the transition from correlational to causal inference underspecified.

minor comments (2)

The abstract and workflow description reference LLM-based agents but do not clarify how causal reasoning would be integrated with or augment them; a brief illustrative example in the workflow section would improve clarity.
The evaluation agenda section would benefit from explicit metrics or benchmark tasks that distinguish causal from correlational performance (e.g., counterfactual accuracy on synthetic SE scenarios).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the potential of Causal Software Engineering as a framing for future research. We address the single major comment below and propose targeted revisions to strengthen the manuscript.

read point-by-point responses

Referee: The central feasibility claim—that causal models can be practically constructed and validated from the noisy, incomplete, and socio-technical data typical in software engineering—is presented as an open challenge in the roadmap but is load-bearing for the entire proposal. No concrete identifiability conditions, data requirements, or integration mechanisms with existing SE artifacts (e.g., logs, issue trackers, code repositories) are specified, leaving the transition from correlational to causal inference underspecified.

Authors: We agree that the practical construction and validation of causal models from typical SE data constitutes a load-bearing challenge for the proposal. As a vision and roadmap paper, the manuscript deliberately positions this as an open research direction rather than a resolved capability, consistent with its stated contribution of outlining a paradigm and evaluation agenda. To reduce underspecification while remaining within the vision-paper scope, we will add a concise subsection in the roadmap that (i) sketches identifiability conditions adapted from causal inference literature (e.g., partial identification under bounded unobserved confounding in controlled deployment settings and use of instrumental variables from release policies), (ii) outlines minimal data requirements (e.g., timestamped logs with intervention markers and version-control provenance), and (iii) provides illustrative integration mechanisms with existing artifacts such as mapping issue-tracker metadata to causal graph nodes or using A/B-test infrastructure for effect estimation. These additions will clarify the transition without asserting solved feasibility. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The manuscript is a forward-looking vision and roadmap paper that proposes Causal Software Engineering as a future paradigm without presenting any derivations, equations, fitted parameters, predictions, or formal results. Its content consists of a high-level workflow outline, adoption stages, and an evaluation agenda; no load-bearing claim reduces to an input by construction, self-definition, or self-citation chain. The central proposal is explicitly aspirational and defers feasibility questions, so no circularity patterns apply.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central proposal rests on the assumption that causal models are feasible in software engineering without introducing new free parameters, axioms, or invented entities beyond the conceptual paradigm itself.

axioms (1)

domain assumption Correlational models are insufficient for answering interventional and counterfactual questions in software engineering.
Stated in the abstract as the motivation for moving beyond current AI-driven correlational approaches.

invented entities (1)

Causal Software Engineering (CSE) no independent evidence
purpose: A new paradigm for integrating causal reasoning into the software lifecycle.
Introduced as the core proposal without prior existence or independent evidence.

pith-pipeline@v0.9.0 · 5485 in / 1238 out tokens · 33100 ms · 2026-05-08T17:47:07.356927+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 16 canonical work pages

[1]

Baah, Andy Podgurski, and Mary Jean Harrold

George K. Baah, Andy Podgurski, and Mary Jean Harrold. 2010. Causal Inference for Statistical Fault Localization. In19th International Symposium on Software Testing and Analysis(Trento, Italy). ACM, 73–84. doi:10.1145/1831708.1831717

work page doi:10.1145/1831708.1831717 2010
[2]

Antonia Bertolino. 2007. Software Testing Research: Achievements, Challenges, Dreams. InFuture of Software Engineering. 85–103. doi:10.1109/FOSE.2007.25

work page doi:10.1109/fose.2007.25 2007
[3]

Pengfei Chen, Yong Qi, and Di Hou. 2019. CauseInfer: Automated End-to-End Performance Diagnosis with Hierarchical Causality Graph in Cloud Environment. IEEE Transactions on Services Computing12, 2 (2019), 214–230. doi:10.1109/TSC. 2016.2607739

work page doi:10.1109/tsc 2019
[4]

Clark, Michael Foster, Benedikt Prifling, N

Andrew G. Clark, Michael Foster, Benedikt Prifling, N. Walkinshaw, R. M. Hierons, V. Schmidt, and R. D. Turner. 2023. Testing Causality in Scientific Modelling Software.ACM Transactions on Software Engineering and Methodology33, 1 (2023). doi:10.1145/3607184

work page doi:10.1145/3607184 2023
[5]

Clemens Dubslaff, Kallistos Weis, Christel Baier, and Sven Apel. 2022. Causality in Configurable Software Systems. In44th International Conference on Software Engineering(Pittsburgh, Pennsylvania). ACM, 325–337. doi:10.1145/3510003. 3510200

work page doi:10.1145/3510003 2022
[6]

Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sen- gupta, Shin Yoo, and Jie M. Zhang. 2023. Large Language Models for Software Engineering: Survey and Open Problems. InInternational Conference on Soft- ware Engineering: Future of Software Engineering (ICSE-FoSE). IEEE/ACM, 31–53. doi:10.1109/ICSE-FoSE59343.2023.00008

work page doi:10.1109/icse-fose59343.2023.00008 2023
[8]

doi:10.1145/3635709

Causality-driven Testing of Autonomous Driving Systems.ACM Transac- tions on Software Engineering and Methodology33, 3, Article 74 (2024), 35 pages. doi:10.1145/3635709

work page doi:10.1145/3635709 2024
[9]

Luca Giamattei, Antonio Guerriero, Roberto Pietrantuono, and Stefano Russo
[10]

Information and Software Technology178 (2025)

Causal reasoning in Software Quality Assurance: A systematic review. Information and Software Technology178 (2025)

2025
[11]

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Systematic Literature Review.ACM Trans. Softw. Eng. Methodol.33, 8, Article 220 (Dec. 2024), 79 pages. doi:10.1145/3695988

work page doi:10.1145/3695988 2024
[12]

Eisty, and Tim Menzies

Jeremy Hulse, Nasir U. Eisty, and Tim Menzies. 2025. Shaky structures: The wob- bly world of causal graphs in software analytics.Empirical Software Engineering 30, 5 (21 Jul 2025), 142. doi:10.1007/s10664-025-10690-6

work page doi:10.1007/s10664-025-10690-6 2025
[13]

Azam Ikram, Sarthak Chakraborty, Subrata Mitra, Shiv Saini, Saurabh Bagchi, and Murat Kocaoglu. 2022. Root Cause Analysis of Failures in Microservices through Causal Discovery. InAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc, 31158–31170

2022
[14]

Md Shahriar Iqbal, Rahul Krishna, Mohammad Ali Javidian, Baishakhi Ray, and Pooyan Jamshidi. 2022. Unicorn: Reasoning about Configurable System Perfor- mance through the Lens of Causality. In17th European Conference on Computer Systems(Rennes, France)(EuroSys ’22). ACM, 199–217. doi:10.1145/3492321. 3519575

work page doi:10.1145/3492321 2022
[15]

Brittany Johnson, Yuriy Brun, and Alexandra Meliou. 2020. Causal testing: understanding defects’ root causes. In42nd International Conference on Software Engineering(Seoul, South Korea). ACM, 87–99. doi:10.1145/3377811.3380377

work page doi:10.1145/3377811.3380377 2020
[16]

Yiğit Küçük, Tim A. D. Henderson, and Andy Podgurski. 2021. Improving Fault Localization by Integrating Value and Predicate Based Causal Inference Tech- niques. In43rd International Conference on Software Engineering. IEEE/ACM, 649–660. doi:10.1109/ICSE43902.2021.00066

work page doi:10.1109/icse43902.2021.00066 2021
[17]

Chenxi Liu, Yongqiang Chen, Tongliang Liu, Mingming Gong, James Cheng, Bo Han, and Kun Zhang. 2024. Discovery of the Hidden World with Large Language Models. InAdvances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (Eds.), Vol. 37. Curran Associates, Inc., 102307–102365. doi:10....

work page doi:10.52202/079017-3249 2024
[18]

2009.Causality: Models, Reasoning and Inference(2 ed.)

Judea Pearl. 2009.Causality: Models, Reasoning and Inference(2 ed.). Cambridge University Press

2009
[19]

PyWhy Community. 2025. PyWhy: Causal Machine Learning. https://www. pywhy.org/

2025
[20]

J. Siebert. 2023. Applications of Statistical Causal Inference in Software Engineering.Information and Software Technology159, C (2023), 16 pages. doi:10.1016/j.infsof.2023.107198

work page doi:10.1016/j.infsof.2023.107198 2023
[21]

Lei Wang, Shanshan Huang, Shu Wang, Jun Liao, Tingpeng Li, and Li Liu. 2024. A survey of causal discovery based on functional causal model.Engineering Applications of Artificial Intelligence133 (2024), 108258. doi:10.1016/j.engappai. 2024.108258

work page doi:10.1016/j.engappai 2024
[22]

Simin Wang, Liguo Huang, Amiao Gao, Jidong Ge, Tengfei Zhang, Haitao Feng, Ishna Satyarth, Ming Li, He Zhang, and Vincent Ng. 2023. Machine/Deep Learning for Software Engineering: A Systematic Literature Review.IEEE Transactions on Software Engineering49, 3 (2023), 1188–1231. doi:10.1109/TSE.2022.3173346

work page doi:10.1109/tse.2022.3173346 2023