From Failure to Alignment: A Requirements Engineering Framework for Machine Learning Systems
Pith reviewed 2026-07-01 04:33 UTC · model grok-4.3
The pith
The REAL framework weaves requirements for data, models, and systems together, uses failure to explore alternatives, and applies iterative traceable refinement to align machine learning systems with stakeholder needs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a requirements engineering framework for machines that learn and fail, built on weaving data-model-system requirements, failure-driven exploration of alternatives, and iterative traceable refinement, enables machine learning systems to better satisfy stakeholder needs, as shown in the autonomous driving demonstration.
What carries the argument
The REAL framework, a model-based requirements engineering process defined by three principles for integrating and refining MLS requirements.
If this is right
- Weaving data, model, and system requirements together produces more complete specifications for MLS.
- Failure-driven exploration surfaces alternative requirements that reduce misalignment risks.
- Iterative traceable refinement supports ongoing verification that stakeholder needs are met.
- The autonomous driving case shows the framework can be applied to safety-critical MLS.
- A replication package allows others to test the same principles on additional systems.
Where Pith is reading between the lines
- The framework could be adapted to domains such as medical diagnostics where data-model-system alignment is equally critical.
- Organizations might embed REAL steps into existing agile or DevOps pipelines for MLS projects.
- Quantitative metrics for alignment, such as stakeholder satisfaction scores before and after applying REAL, would allow direct testing of the framework's impact.
- The approach leaves open how to scale the manual weaving and failure analysis steps to very large training datasets or complex multi-model systems.
Load-bearing premise
The three principles of weaving requirements, using failure to explore alternatives, and iterative refinement are sufficient to produce machine learning systems that better align with stakeholder needs.
What would settle it
Applying the REAL framework to an MLS project and finding no measurable improvement in stakeholder alignment or requirement satisfaction compared with a conventional development process would undermine the claim.
Figures
read the original abstract
Organisations designing, developing, and deploying machine learning systems (MLS) need to be able to check that these systems are trustworthy, and communicate this clearly to their stakeholders, be they different categories of users, engineers, or wider society. By focusing on stakeholders, Requirements Engineering is well positioned to drive the design and engineering of MLS that align with the needs of their stakeholders. Yet, we still need a systematic process for modelling and reasoning about requirements for MLS that is driven both by stakeholders' needs and constraints for MLS development. This paper proposes a framework entitled REAL (Requirements Engineering for mAchines that Learn - and Fail) to help develop MLS that align with stakeholders' needs by adopting a requirements engineering approach. This model-based framework is based on three principles. First, weaving together requirements for data, models, and the system as a whole. Second, using failure to drive the exploration of alternative requirements. Third, iterative and traceable refinement of MLS requirements. We demonstrate the proposed framework using an example from autonomous driving and show that REAL supports the development of MLS that better align with stakeholders' requirements. A replication package is available online.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes the REAL (Requirements Engineering for mAchines that Learn - and Fail) framework, a model-based requirements engineering approach for machine learning systems (MLS). It rests on three principles: weaving together data, model, and system requirements; using failure to drive exploration of alternative requirements; and iterative traceable refinement. The framework is demonstrated via a single autonomous driving example, with the claim that it supports development of MLS better aligned with stakeholders' needs. A replication package is provided.
Significance. If substantiated, the framework could offer a structured way to address alignment and trustworthiness challenges in MLS development within software engineering. The failure-driven and traceability aspects align with practical needs in iterative ML processes, and the replication package supports reproducibility.
major comments (2)
- [Demonstration section (autonomous driving example)] Demonstration section (autonomous driving example): The claim that REAL supports MLS that 'better align with stakeholders' requirements' is based solely on one illustrative scenario. No quantitative alignment metrics, pre/post comparisons, stakeholder measures, or evaluation against baselines are supplied, rendering the sufficiency of the three principles untestable and the central claim unsupported.
- [Framework principles (abstract and § on REAL)] Framework principles (abstract and § on REAL): The assertion that the three principles are sufficient to produce better-aligned MLS lacks defined validation criteria, falsification tests, or counterexample domains. This is load-bearing, as the demonstration remains compatible with the principles being descriptive rather than causally effective.
Simulated Author's Rebuttal
Thank you for the referee's constructive comments. We address each major comment below and indicate revisions where the feedback identifies areas for improvement in how claims are supported.
read point-by-point responses
-
Referee: [Demonstration section (autonomous driving example)] Demonstration section (autonomous driving example): The claim that REAL supports MLS that 'better align with stakeholders' requirements' is based solely on one illustrative scenario. No quantitative alignment metrics, pre/post comparisons, stakeholder measures, or evaluation against baselines are supplied, rendering the sufficiency of the three principles untestable and the central claim unsupported.
Authors: We agree that the demonstration consists of a single illustrative example without quantitative metrics, pre/post comparisons, or baseline evaluations. The manuscript is a framework proposal paper whose contribution lies in defining the three principles and showing their integrated application; it does not constitute an empirical validation study. We will revise the abstract, the demonstration section, and the conclusion to moderate the wording from 'show that REAL supports ... better align' to 'illustrate how REAL can be applied to support alignment', and we will add an explicit limitations and future-work subsection outlining the need for subsequent empirical studies that include metrics and multiple cases. revision: yes
-
Referee: [Framework principles (abstract and § on REAL)] Framework principles (abstract and § on REAL): The assertion that the three principles are sufficient to produce better-aligned MLS lacks defined validation criteria, falsification tests, or counterexample domains. This is load-bearing, as the demonstration remains compatible with the principles being descriptive rather than causally effective.
Authors: The three principles are presented as the foundational elements of the REAL framework, motivated by existing RE literature and documented MLS challenges. The paper does not supply formal validation criteria, falsification tests, or counterexample domains because its scope is the definition and illustration of the integrated approach rather than a causal-effectiveness study. We will revise the framework section to include a short rationale subsection for each principle and an explicit statement that empirical validation (including tests in additional domains) remains future work. revision: yes
Circularity Check
No circularity: framework proposal rests on stated principles and single illustrative example
full rationale
The paper introduces the REAL framework by enumerating three explicit principles (weaving data/model/system requirements, failure-driven exploration, iterative traceable refinement) and then applies them to one autonomous-driving scenario. No equations, fitted parameters, predictions, or derivation steps appear in the provided text. The demonstration is presented as an application of the principles rather than a reduction of any output to the inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim therefore remains an independent proposal whose sufficiency is open to external evaluation rather than being tautological.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Requirements engineering is well positioned to drive the design of MLS that align with stakeholder needs
invented entities (1)
-
REAL framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Teaching Software Engineering for Al- Enabled Systems,
C. K ¨astner and E. Kang, “Teaching Software Engineering for Al- Enabled Systems,” in2020 IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET), Oct. 2020, pp. 45–48
2020
-
[2]
Software Engineering for Ma- chine Learning: A Case Study,
S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Na- gappan, B. Nushi, and T. Zimmermann, “Software Engineering for Ma- chine Learning: A Case Study,” in2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), May 2019, pp. 291–300
2019
-
[3]
Underspecification presents challenges for credibility in modern machine learning,
A. D’Amour, K. Heller, D. Moldovan, B. Adlam, B. Alipanahi, A. Beutel, C. Chen, J. Deaton, J. Eisenstein, M. D. Hoffman, F. Hormozdiari, N. Houlsby, S. Hou, G. Jerfel, A. Karthikesalingam, M. Lucic, Y . Ma, C. McLean, D. Mincu, A. Mitani, A. Montanari, Z. Nado, V . Natarajan, C. Nielson, T. F. Osborne, R. Raman, K. Ramasamy, R. Sayres, J. Schrouff, M. Sen...
-
[4]
Towards best practices in agi safety and governance: A survey of expert opinion,
J. Schuett, N. Dreksler, M. Anderljung, D. McCaffary, L. Heim, E. Bluemke, and B. Garfinkel, “Towards best practices in agi safety and governance: A survey of expert opinion,”arXiv preprint arXiv:2305.07153, 2023
-
[5]
Require- ments practices and gaps when engineering human-centered artificial intelligence systems,
K. Ahmad, M. Abdelrazek, C. Arora, M. Bano, and J. Grundy, “Require- ments practices and gaps when engineering human-centered artificial intelligence systems,”Appl. Soft Comput., vol. 143, p. 110421, 2023
2023
-
[6]
A. Bennaceur, T. T. Tun, Y . Yu, and B. Nuseibeh, “Requirements engineering,” inHandbook of Software Engineering, 2019, pp. 51–92. [Online]. Available: https://doi.org/10.1007/978-3-030-00262-6 2
-
[7]
Four dark corners of requirements engineer- ing,
P. Zave and M. Jackson, “Four dark corners of requirements engineer- ing,”ACM Trans. Softw. Eng. Methodol., vol. 6, no. 1, pp. 1–30, 1997
1997
-
[8]
Failure modes in machine learning systems,
R. S. S. Kumar, D. R. O’Brien, K. Albert, S. Vilj ¨oen, and J. Snover, “Failure modes in machine learning systems,”CoRR, vol. abs/1911.11034, 2019
-
[9]
Machine learning testing: Survey, landscapes and horizons,
J. M. Zhang, M. Harman, L. Ma, and Y . Liu, “Machine learning testing: Survey, landscapes and horizons,”IEEE Trans. Software Eng., vol. 48, no. 2, pp. 1–36, 2022
2022
-
[10]
Test generation strategies for building failure models and explaining spurious failures,
B. A. Jodat, A. Chandar, S. Nejati, and M. Sabetzadeh, “Test generation strategies for building failure models and explaining spurious failures,” ACM Trans. Softw. Eng. Methodol., vol. 33, no. 4, pp. 93:1–93:32, 2024
2024
-
[11]
H. M. Fahmy, F. Pastore, L. C. Briand, and T. Stifter, “Simulator- based explanation and debugging of hazard-triggering events in dnn-based safety-critical systems,”ACM Trans. Softw. Eng. Methodol., vol. 32, no. 4, pp. 104:1–104:47, 2023. [Online]. Available: https://doi.org/10.1145/3569935
-
[12]
Ai engineering to deploy reliable ai in industry,
J. Mattioli, X. Le Roux, B. Braunschweig, L. Cantat, F. Tschirhart, B. Robert, R. Gelin, and Y . Nicolas, “Ai engineering to deploy reliable ai in industry,” in2023 Fifth International Conference on Transdisciplinary AI (TransAI). IEEE, 2023, pp. 228–231
2023
-
[13]
Landscape of requirements engineering for machine learning-based AI systems,
N. Yoshioka, J. H. Husen, H. T. Tun, Z. Chen, H. Washizaki, and Y . Fukazawa, “Landscape of requirements engineering for machine learning-based AI systems,” inAPSEC Workshops. IEEE, 2021, pp. 5–8
2021
-
[14]
Requirements engineering for machine learning: A review and reflection,
Z. Pei, L. Liu, C. Wang, and J. Wang, “Requirements engineering for machine learning: A review and reflection,” inRE Workshops. IEEE, 2022, pp. 166–175
2022
-
[15]
Requirements engi- neering for machine learning: A systematic mapping study,
H. Villamizar, T. Escovedo, and M. Kalinowski, “Requirements engi- neering for machine learning: A systematic mapping study,” inSEAA. IEEE, 2021, pp. 29–36
2021
-
[16]
S. Nalchigar, E. Yu, and K. Keshavjee, “Modeling machine learning requirements from three perspectives: a case report from the healthcare domain,”Requirements Engineering, pp. 1–18, Jan. 2021, company: Springer Distributor: Springer Institution: Springer Label: Springer Publisher: Springer London. [Online]. Available: http://link.springer.com/article/10.1...
-
[17]
Towards Artefact-based Requirements Engineering for Data-Centric Systems,
T. Chuprina, D. Mendez, and K. Wnuk, “Towards Artefact-based Requirements Engineering for Data-Centric Systems,” Mar. 2021. [Online]. Available: http://arxiv.org/abs/2103.05233v1
-
[18]
Requirements engineering framework for human-centered artificial intelligence software systems,
K. Ahmad, M. Abdelrazek, C. Arora, A. A. Baniya, M. Bano, and J. Grundy, “Requirements engineering framework for human-centered artificial intelligence software systems,”Appl. Soft Comput., vol. 143, p. 110455, 2023
2023
-
[19]
Anunnaki: A modular framework for developing trusted artificial intelligence,
M. A. Langford, S. Zilberman, and B. H. C. Cheng, “Anunnaki: A modular framework for developing trusted artificial intelligence,”ACM Trans. Auton. Adapt. Syst., vol. 19, no. 3, pp. 17:1–17:34, 2024. [Online]. Available: https://doi.org/10.1145/3649453
-
[20]
[Online]
Online - accessed June 2026. [Online]. Available: https://github.com/ ApolloAuto/apollo
2026
-
[21]
[Online]
Online - accessed June 2026. [Online]. Available: https://autoware.org/
2026
-
[22]
Requirements engineering: from craft to discipline,
A. van Lamsweerde, “Requirements engineering: from craft to discipline,” inProceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2008, Atlanta, Georgia, USA, November 9-14, 2008, 2008, pp. 238–249. [Online]. Available: http://doi.acm.org/10.1145/1453101.1453133
-
[23]
M. O’Neill and C. Ryan, “Grammatical evolution,”IEEE Trans. Evol. Comput., vol. 5, no. 4, pp. 349–358, 2001. [Online]. Available: https://doi.org/10.1109/4235.942529
-
[24]
[Online]
Online - accessed June 2026. [Online]. Available: https://github.com/ microsoft/responsible-ai-toolbox
2026
-
[25]
[Online]
Online - accessed June 2026. [Online]. Available: https://www-cdn. anthropic.com/e670587677525f28df69b59e5fb4c22cc5461a17.pdf
2026
-
[26]
[Online]
Online - accessed June 2026. [Online]. Available: https://pair. withgoogle.com/guidebook/chapters
2026
-
[27]
Handling obstacles in goal- oriented requirements engineering,
A. van Lamsweerde and E. Letier, “Handling obstacles in goal- oriented requirements engineering,”IEEE Trans. Software Eng., vol. 26, no. 10, pp. 978–1005, 2000. [Online]. Available: https: //doi.org/10.1109/32.879820
-
[28]
Obstacle analysis in requirements engineering: Retrospective and emerging challenges,
E. Letier and A. van Lamsweerde, “Obstacle analysis in requirements engineering: Retrospective and emerging challenges,”IEEE Trans. Software Eng., vol. 51, no. 3, pp. 795–801, 2025. [Online]. Available: https://doi.org/10.1109/TSE.2025.3534318
-
[29]
Generating obstacle conditions for requirements completeness,
D. Alrajeh, J. Kramer, A. van Lamsweerde, A. Russo, and S. Uchitel, “Generating obstacle conditions for requirements completeness,” in ICSE. IEEE Computer Society, 2012, pp. 705–715
2012
-
[30]
Explaining and Harnessing Adversarial Examples
I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inProc. of ICLR, Y . Bengio and Y . LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/abs/1412.6572
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[31]
Assuring the machine learning lifecycle: Desiderata, methods, and challenges,
R. Ashmore, R. Calinescu, and C. Paterson, “Assuring the machine learning lifecycle: Desiderata, methods, and challenges,”ACM Comput. Surv., vol. 54, no. 5, pp. 111:1–111:39, 2022
2022
-
[32]
Dynasto: Validity-aware dynamic–static parameter optimization for autonomous driving testing,
D. Humeniuk, M. Hamdaqa, H. B. Braiek, A. Bennaceur, and F. Khomh, “Dynasto: Validity-aware dynamic–static parameter optimization for autonomous driving testing,” in19th IEEE International Conference on Software Testing, Verification and Validation, ICST 2026, 2026
2026
-
[33]
A conceptual framework for resilience: fundamental definitions, strategies and metrics,
J. Andersson, V . Grassi, R. Mirandola, and D. Perez-Palacin, “A conceptual framework for resilience: fundamental definitions, strategies and metrics,”Computing, Dec. 2020. [Online]. Available: https://doi.org/10.1007/s00607-020-00874-x
-
[34]
CARLA: an open urban driving simulator,
A. Dosovitskiy, G. Ros, F. Codevilla, A. M. L ´opez, and V . Koltun, “CARLA: an open urban driving simulator,” in1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, ser. Proceedings of Machine Learning Research, vol. 78. PMLR, 2017, pp. 1–16. [Online]. Available: http://proceedings.mlr.pre...
2017
-
[35]
Scenic: a language for scenario specification and data generation,
D. J. Fremont, E. Kim, T. Dreossi, S. Ghosh, X. Yue, A. L. Sangiovanni-Vincentelli, and S. A. Seshia, “Scenic: a language for scenario specification and data generation,”Mach. Learn., vol. 112, no. 10, pp. 3805–3849, 2023. [Online]. Available: https://doi.org/10.1007/s10994-021-06120-5
-
[36]
Grape: grammatical algorithms in python for evolution,
A. de Lima, S. Carvalho, D. M. Dias, E. Naredo, J. P. Sullivan, and C. Ryan, “Grape: grammatical algorithms in python for evolution,” Signals, vol. 3, no. 3, pp. 642–663, 2022
2022
-
[37]
CARLA-BSP: a simulated dataset with pedestrians,
M. Wielgosz, A. M. L ´opez, and M. N. Riaz, “CARLA-BSP: a simulated dataset with pedestrians,” May 2023
2023
-
[38]
Requirements engineering and large language models: In- sights from a panel,
M. Borg, “Requirements engineering and large language models: In- sights from a panel,”IEEE Softw., vol. 41, no. 2, pp. 6–10, 2024
2024
-
[39]
Rethinking software engineering in the era of foundation models: A curated catalogue of challenges in the development of trustworthy fmware,
A. E. Hassan, D. Lin, G. K. Rajbahadur, K. Gallaba, F. R. C ˆogo, B. Chen, H. Zhang, K. Thangarajah, G. A. Oliva, J. J. Lin, W. M. Abdullah, and Z. M. J. Jiang, “Rethinking software engineering in the era of foundation models: A curated catalogue of challenges in the development of trustworthy fmware,” inSIGSOFT FSE Companion. ACM, 2024, pp. 294–305
2024
-
[40]
Requirements engineering for trustworthy human-ai synergy in software engineering 2.0,
D. Lo, “Requirements engineering for trustworthy human-ai synergy in software engineering 2.0,” inRE. IEEE, 2024, pp. 3–4
2024
-
[41]
If a human can see it, so should your system: Reliability requirements for machine vision components,
B. C. Hu, L. Marsso, K. Czarnecki, R. Salay, H. Shen, and M. Chechik, “If a human can see it, so should your system: Reliability requirements for machine vision components,” inICSE. ACM, 2022, pp. 1145–1156
2022
-
[42]
Trustworthy and synergistic artificial intelligence for software engineering: Vision and roadmaps,
D. Lo, “Trustworthy and synergistic artificial intelligence for software engineering: Vision and roadmaps,” inICSE-FoSE. IEEE, 2023, pp. 69–85
2023
-
[43]
The IDEA of us: An identity-aware architecture for autonomous systems,
C. Gavidia-Calderon, A. Kordoni, A. Bennaceur, M. Levine, and B. Nu- seibeh, “The IDEA of us: An identity-aware architecture for autonomous systems,”ACM Trans. Softw. Eng. Methodol., vol. 33, no. 6, p. 164, 2024
2024
-
[44]
Human value requirements in AI systems: Empirical analysis of amazon alexa,
R. A. Shams, M. Bano, D. Zowghi, Q. Lu, and J. Whittle, “Human value requirements in AI systems: Empirical analysis of amazon alexa,” inREW. IEEE, 2023, pp. 138–145
2023
-
[45]
Feel it, code it: Emotional goal modelling for gender-inclusive design,
D. Hassett, A. Bennaceur, and B. Nuseibeh, “Feel it, code it: Emotional goal modelling for gender-inclusive design,” inREFSQ, ser. Lecture Notes in Computer Science, vol. 13975. Springer, 2023, pp. 324–336
2023
-
[46]
Val- ues@runtime: An adaptive framework for operationalising values,
A. Bennaceur, D. Hassett, B. Nuseibeh, and A. Zisman, “Val- ues@runtime: An adaptive framework for operationalising values,” in ICSE (SEIS). IEEE, 2023, pp. 175–179
2023
-
[47]
A. C. Edmondson,Right kind of wrong: The science of failing well. London: Cornerstone Press., 2023
2023
-
[48]
Tailoring requirements engineering for responsible AI,
W. Maalej, Y . D. Pham, and L. Chazette, “Tailoring requirements engineering for responsible AI,”Computer, vol. 56, no. 4, pp. 18–27, 2023
2023
-
[49]
Design patterns for machine learning- based systems with humans in the loop,
J. S. Andersen and W. Maalej, “Design patterns for machine learning- based systems with humans in the loop,”IEEE Softw., vol. 41, no. 4, pp. 151–159, 2024
2024
-
[50]
Quality issues in machine learning software systems,
P. C ˆot´e, A. Nikanjam, R. Bouchoucha, I. Basta, M. Abidi, and F. Khomh, “Quality issues in machine learning software systems,”Empir. Softw. Eng., vol. 29, no. 6, p. 149, 2024
2024
-
[51]
The safety of autonomy: A systematic approach,
J. A. McDermid, R. Calinescu, I. Habli, R. Hawkins, Y . Jia, J. Molloy, M. Osborne, C. Paterson, Z. Porter, and P. R. Conmy, “The safety of autonomy: A systematic approach,”Computer, vol. 57, no. 4, pp. 16–25, 2024
2024
-
[52]
Controller synthesis for autonomous systems with deep-learning perception components,
R. Calinescu, C. Imrie, R. Mangal, G. N. Rodrigues, C. S. Pasareanu, M. A. Santana, and G. V ´azquez, “Controller synthesis for autonomous systems with deep-learning perception components,”IEEE Trans. Soft- ware Eng., vol. 50, no. 6, pp. 1374–1395, 2024
2024
-
[53]
Addressing the IEEE A V test challenge with scenic and verifai,
K. Viswanadha, F. Indaheng, J. Wong, E. Kim, E. Kalvan, Y . Pant, D. J. Fremont, and S. A. Seshia, “Addressing the IEEE A V test challenge with scenic and verifai,” inAITest. IEEE, 2021, pp. 136–142
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.