pith. machine review for the scientific record. sign in

arxiv: 2605.02454 · v1 · submitted 2026-05-04 · 💻 cs.SE · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Causal Software Engineering: A Vision and Roadmap

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:47 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords causal software engineeringcausal inferencesoftware lifecyclecounterfactual diagnosisinterventional reasoningsoftware engineeringAIOpsuncertainty-aware estimates
0
0 comments X

The pith

Causal models should systematically inform decisions across the full software engineering lifecycle instead of relying on correlations alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Software engineers face high-stakes choices under uncertainty using signals from code, field data, and processes, yet current AI tools mainly detect co-occurring patterns that cannot answer questions about the effects of deliberate changes. The paper proposes Causal Software Engineering as a new paradigm that embeds causal models and reasoning into every phase of development and operations to supply explicit assumptions, uncertainty-aware effect estimates, and counterfactual diagnosis. This augments existing practices rather than replacing them. The authors sketch a causal-first workflow, a staged roadmap for tools and adoption, and an agenda for benchmarks to track progress.

Core claim

We propose Causal Software Engineering (CSE) as a future paradigm in which causal models and causal reasoning systematically inform activities across the software lifecycle, augmenting existing practices with explicit assumptions, uncertainty-aware effect estimates, and counterfactual diagnosis. We outline a causal-first workflow view spanning development and operations, a staged roadmap for tools and organizational adoption, and an evaluation and benchmark agenda for measuring progress.

What carries the argument

Causal models that support interventional and counterfactual queries applied to software engineering data throughout the development and operations lifecycle.

Load-bearing premise

Causal models can be practically constructed and validated from the noisy, incomplete, and socio-technical data typical in software engineering contexts.

What would settle it

A real-world software project where causal models are built from available data and used to guide decisions yet produce no measurable improvement in outcomes, reliability, or understanding of change effects compared with standard correlational methods.

Figures

Figures reproduced from arXiv: 2605.02454 by Julien Siebert, Luca Giamattei, Neil Walkinshaw, Roberto Pietrantuono, Stefano Russo.

Figure 1
Figure 1. Figure 1: A roadmap for CSE along four co-evolving routes. view at source ↗
read the original abstract

Software engineering increasingly involves making high-stakes decisions under uncertainty, using signals from code, field data, and socio-technical processes. Recent AI-driven support (e.g., anomaly detection, predictive analytics, AIOps, as well as LLM-based agents) has amplified engineers' ability to detect patterns and synthesize content and recommendations, but many critical questions are interventional or counterfactual: What is the expected impact of changing a load-balancing strategy? Would an outage have been avoided under a different release plan? Correlational models answer "what tends to co-occur"; they struggle to answer "what would happen if we act." We propose Causal Software Engineering (CSE) as a future paradigm in which causal models and causal reasoning systematically inform activities across the software lifecycle, augmenting existing practices with explicit assumptions, uncertainty-aware effect estimates, and counterfactual diagnosis. We outline (i) a causal-first workflow view spanning development and operations, (ii) a staged roadmap for tools and organizational adoption, and (iii) an evaluation and benchmark agenda for measuring progress.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Causal Software Engineering (CSE) as a future paradigm in which causal models and causal reasoning systematically inform activities across the software lifecycle. It augments existing correlational AI practices (anomaly detection, predictive analytics, AIOps, LLM agents) by emphasizing explicit assumptions, uncertainty-aware effect estimates, and counterfactual diagnosis for interventional questions such as the impact of changing load-balancing strategies or release plans. The paper outlines (i) a causal-first workflow spanning development and operations, (ii) a staged roadmap for tools and organizational adoption, and (iii) an evaluation and benchmark agenda.

Significance. If the vision is realized, CSE could shift software engineering from pattern detection to reliable interventional and counterfactual reasoning, improving high-stakes decisions under uncertainty in socio-technical systems. As a vision and roadmap paper without empirical results or formal derivations, its primary contribution is framing an open research direction and evaluation agenda rather than delivering validated methods.

major comments (1)
  1. The central feasibility claim—that causal models can be practically constructed and validated from the noisy, incomplete, and socio-technical data typical in software engineering—is presented as an open challenge in the roadmap but is load-bearing for the entire proposal. No concrete identifiability conditions, data requirements, or integration mechanisms with existing SE artifacts (e.g., logs, issue trackers, code repositories) are specified, leaving the transition from correlational to causal inference underspecified.
minor comments (2)
  1. The abstract and workflow description reference LLM-based agents but do not clarify how causal reasoning would be integrated with or augment them; a brief illustrative example in the workflow section would improve clarity.
  2. The evaluation agenda section would benefit from explicit metrics or benchmark tasks that distinguish causal from correlational performance (e.g., counterfactual accuracy on synthetic SE scenarios).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the potential of Causal Software Engineering as a framing for future research. We address the single major comment below and propose targeted revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: The central feasibility claim—that causal models can be practically constructed and validated from the noisy, incomplete, and socio-technical data typical in software engineering—is presented as an open challenge in the roadmap but is load-bearing for the entire proposal. No concrete identifiability conditions, data requirements, or integration mechanisms with existing SE artifacts (e.g., logs, issue trackers, code repositories) are specified, leaving the transition from correlational to causal inference underspecified.

    Authors: We agree that the practical construction and validation of causal models from typical SE data constitutes a load-bearing challenge for the proposal. As a vision and roadmap paper, the manuscript deliberately positions this as an open research direction rather than a resolved capability, consistent with its stated contribution of outlining a paradigm and evaluation agenda. To reduce underspecification while remaining within the vision-paper scope, we will add a concise subsection in the roadmap that (i) sketches identifiability conditions adapted from causal inference literature (e.g., partial identification under bounded unobserved confounding in controlled deployment settings and use of instrumental variables from release policies), (ii) outlines minimal data requirements (e.g., timestamped logs with intervention markers and version-control provenance), and (iii) provides illustrative integration mechanisms with existing artifacts such as mapping issue-tracker metadata to causal graph nodes or using A/B-test infrastructure for effect estimation. These additions will clarify the transition without asserting solved feasibility. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The manuscript is a forward-looking vision and roadmap paper that proposes Causal Software Engineering as a future paradigm without presenting any derivations, equations, fitted parameters, predictions, or formal results. Its content consists of a high-level workflow outline, adoption stages, and an evaluation agenda; no load-bearing claim reduces to an input by construction, self-definition, or self-citation chain. The central proposal is explicitly aspirational and defers feasibility questions, so no circularity patterns apply.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central proposal rests on the assumption that causal models are feasible in software engineering without introducing new free parameters, axioms, or invented entities beyond the conceptual paradigm itself.

axioms (1)
  • domain assumption Correlational models are insufficient for answering interventional and counterfactual questions in software engineering.
    Stated in the abstract as the motivation for moving beyond current AI-driven correlational approaches.
invented entities (1)
  • Causal Software Engineering (CSE) no independent evidence
    purpose: A new paradigm for integrating causal reasoning into the software lifecycle.
    Introduced as the core proposal without prior existence or independent evidence.

pith-pipeline@v0.9.0 · 5485 in / 1238 out tokens · 33100 ms · 2026-05-08T17:47:07.356927+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 16 canonical work pages

  1. [1]

    Baah, Andy Podgurski, and Mary Jean Harrold

    George K. Baah, Andy Podgurski, and Mary Jean Harrold. 2010. Causal Inference for Statistical Fault Localization. In19th International Symposium on Software Testing and Analysis(Trento, Italy). ACM, 73–84. doi:10.1145/1831708.1831717

  2. [2]

    Antonia Bertolino. 2007. Software Testing Research: Achievements, Challenges, Dreams. InFuture of Software Engineering. 85–103. doi:10.1109/FOSE.2007.25

  3. [3]

    Pengfei Chen, Yong Qi, and Di Hou. 2019. CauseInfer: Automated End-to-End Performance Diagnosis with Hierarchical Causality Graph in Cloud Environment. IEEE Transactions on Services Computing12, 2 (2019), 214–230. doi:10.1109/TSC. 2016.2607739

  4. [4]

    Clark, Michael Foster, Benedikt Prifling, N

    Andrew G. Clark, Michael Foster, Benedikt Prifling, N. Walkinshaw, R. M. Hierons, V. Schmidt, and R. D. Turner. 2023. Testing Causality in Scientific Modelling Software.ACM Transactions on Software Engineering and Methodology33, 1 (2023). doi:10.1145/3607184

  5. [5]

    Clemens Dubslaff, Kallistos Weis, Christel Baier, and Sven Apel. 2022. Causality in Configurable Software Systems. In44th International Conference on Software Engineering(Pittsburgh, Pennsylvania). ACM, 325–337. doi:10.1145/3510003. 3510200

  6. [6]

    Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sen- gupta, Shin Yoo, and Jie M. Zhang. 2023. Large Language Models for Software Engineering: Survey and Open Problems. InInternational Conference on Soft- ware Engineering: Future of Software Engineering (ICSE-FoSE). IEEE/ACM, 31–53. doi:10.1109/ICSE-FoSE59343.2023.00008

  7. [8]

    doi:10.1145/3635709

    Causality-driven Testing of Autonomous Driving Systems.ACM Transac- tions on Software Engineering and Methodology33, 3, Article 74 (2024), 35 pages. doi:10.1145/3635709

  8. [9]

    Luca Giamattei, Antonio Guerriero, Roberto Pietrantuono, and Stefano Russo

  9. [10]

    Information and Software Technology178 (2025)

    Causal reasoning in Software Quality Assurance: A systematic review. Information and Software Technology178 (2025)

  10. [11]

    Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Systematic Literature Review.ACM Trans. Softw. Eng. Methodol.33, 8, Article 220 (Dec. 2024), 79 pages. doi:10.1145/3695988

  11. [12]

    Eisty, and Tim Menzies

    Jeremy Hulse, Nasir U. Eisty, and Tim Menzies. 2025. Shaky structures: The wob- bly world of causal graphs in software analytics.Empirical Software Engineering 30, 5 (21 Jul 2025), 142. doi:10.1007/s10664-025-10690-6

  12. [13]

    Azam Ikram, Sarthak Chakraborty, Subrata Mitra, Shiv Saini, Saurabh Bagchi, and Murat Kocaoglu. 2022. Root Cause Analysis of Failures in Microservices through Causal Discovery. InAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc, 31158–31170

  13. [14]

    Md Shahriar Iqbal, Rahul Krishna, Mohammad Ali Javidian, Baishakhi Ray, and Pooyan Jamshidi. 2022. Unicorn: Reasoning about Configurable System Perfor- mance through the Lens of Causality. In17th European Conference on Computer Systems(Rennes, France)(EuroSys ’22). ACM, 199–217. doi:10.1145/3492321. 3519575

  14. [15]

    Brittany Johnson, Yuriy Brun, and Alexandra Meliou. 2020. Causal testing: understanding defects’ root causes. In42nd International Conference on Software Engineering(Seoul, South Korea). ACM, 87–99. doi:10.1145/3377811.3380377

  15. [16]

    Yiğit Küçük, Tim A. D. Henderson, and Andy Podgurski. 2021. Improving Fault Localization by Integrating Value and Predicate Based Causal Inference Tech- niques. In43rd International Conference on Software Engineering. IEEE/ACM, 649–660. doi:10.1109/ICSE43902.2021.00066

  16. [17]

    Chenxi Liu, Yongqiang Chen, Tongliang Liu, Mingming Gong, James Cheng, Bo Han, and Kun Zhang. 2024. Discovery of the Hidden World with Large Language Models. InAdvances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (Eds.), Vol. 37. Curran Associates, Inc., 102307–102365. doi:10....

  17. [18]

    2009.Causality: Models, Reasoning and Inference(2 ed.)

    Judea Pearl. 2009.Causality: Models, Reasoning and Inference(2 ed.). Cambridge University Press

  18. [19]

    PyWhy Community. 2025. PyWhy: Causal Machine Learning. https://www. pywhy.org/

  19. [20]

    J. Siebert. 2023. Applications of Statistical Causal Inference in Software Engineering.Information and Software Technology159, C (2023), 16 pages. doi:10.1016/j.infsof.2023.107198

  20. [21]

    Lei Wang, Shanshan Huang, Shu Wang, Jun Liao, Tingpeng Li, and Li Liu. 2024. A survey of causal discovery based on functional causal model.Engineering Applications of Artificial Intelligence133 (2024), 108258. doi:10.1016/j.engappai. 2024.108258

  21. [22]

    Simin Wang, Liguo Huang, Amiao Gao, Jidong Ge, Tengfei Zhang, Haitao Feng, Ishna Satyarth, Ming Li, He Zhang, and Vincent Ng. 2023. Machine/Deep Learning for Software Engineering: A Systematic Literature Review.IEEE Transactions on Software Engineering49, 3 (2023), 1188–1231. doi:10.1109/TSE.2022.3173346