From Failure to Alignment: A Requirements Engineering Framework for Machine Learning Systems

Amel Bennaceur; Bashar Nuseibeh; Faeq Alrimawi; Gopi Krishnan Rajbahadur; Prince Mercy

arxiv: 2606.31589 · v1 · pith:SN6ZG2G3new · submitted 2026-06-30 · 💻 cs.SE · cs.LG

From Failure to Alignment: A Requirements Engineering Framework for Machine Learning Systems

Amel Bennaceur , Gopi Krishnan Rajbahadur , Prince Mercy , Bashar Nuseibeh , Faeq Alrimawi This is my paper

Pith reviewed 2026-07-01 04:33 UTC · model grok-4.3

classification 💻 cs.SE cs.LG

keywords requirements engineeringmachine learning systemsstakeholder alignmentfailure analysisautonomous drivingmodel-based frameworkiterative refinementtrustworthy AI

0 comments

The pith

The REAL framework weaves requirements for data, models, and systems together, uses failure to explore alternatives, and applies iterative traceable refinement to align machine learning systems with stakeholder needs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a model-based framework called REAL to develop machine learning systems that align with stakeholders' needs through a requirements engineering approach. It rests on three principles: combining requirements across data, models, and the full system; letting failures guide the search for alternative requirements; and carrying out iterative, traceable refinement. The framework is illustrated with an autonomous driving example that shows how these steps can produce better-aligned systems. A replication package accompanies the work to support further application.

Core claim

The central claim is that a requirements engineering framework for machines that learn and fail, built on weaving data-model-system requirements, failure-driven exploration of alternatives, and iterative traceable refinement, enables machine learning systems to better satisfy stakeholder needs, as shown in the autonomous driving demonstration.

What carries the argument

The REAL framework, a model-based requirements engineering process defined by three principles for integrating and refining MLS requirements.

If this is right

Weaving data, model, and system requirements together produces more complete specifications for MLS.
Failure-driven exploration surfaces alternative requirements that reduce misalignment risks.
Iterative traceable refinement supports ongoing verification that stakeholder needs are met.
The autonomous driving case shows the framework can be applied to safety-critical MLS.
A replication package allows others to test the same principles on additional systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be adapted to domains such as medical diagnostics where data-model-system alignment is equally critical.
Organizations might embed REAL steps into existing agile or DevOps pipelines for MLS projects.
Quantitative metrics for alignment, such as stakeholder satisfaction scores before and after applying REAL, would allow direct testing of the framework's impact.
The approach leaves open how to scale the manual weaving and failure analysis steps to very large training datasets or complex multi-model systems.

Load-bearing premise

The three principles of weaving requirements, using failure to explore alternatives, and iterative refinement are sufficient to produce machine learning systems that better align with stakeholder needs.

What would settle it

Applying the REAL framework to an MLS project and finding no measurable improvement in stakeholder alignment or requirement satisfaction compared with a conventional development process would undermine the claim.

Figures

Figures reproduced from arXiv: 2606.31589 by Amel Bennaceur, Bashar Nuseibeh, Faeq Alrimawi, Gopi Krishnan Rajbahadur, Prince Mercy.

**Figure 1.** Figure 1: An automated braking system example The perception component receives sensor inputs (e.g. image data), processes them using a trained object detection model, and outputs pedestrian classifications with associated confidence levels. The braking component activates the braking controller when a pedestrian is detected within a predefined safety threshold. While the braking logic is deterministic, the percep… view at source ↗

**Figure 2.** Figure 2: A sample KAOS model for safe pedestrian crossing [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the REAL Framework tal assumptions explicit, producing revised domain properties D′′ clarifying the boundaries of the Operational Design Domain (ODD) under which satisfaction can be defensibly reasoned about D′′, S |= R. These mitigation actions may occur across data, model, system, and requirement layers, but are always evaluated against an explicitly scoped domain. After adaptation, the sat… view at source ↗

**Figure 4.** Figure 4: Identifying obstacles for the automated braking example [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Applying REAL to the automated braking system [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Organisations designing, developing, and deploying machine learning systems (MLS) need to be able to check that these systems are trustworthy, and communicate this clearly to their stakeholders, be they different categories of users, engineers, or wider society. By focusing on stakeholders, Requirements Engineering is well positioned to drive the design and engineering of MLS that align with the needs of their stakeholders. Yet, we still need a systematic process for modelling and reasoning about requirements for MLS that is driven both by stakeholders' needs and constraints for MLS development. This paper proposes a framework entitled REAL (Requirements Engineering for mAchines that Learn - and Fail) to help develop MLS that align with stakeholders' needs by adopting a requirements engineering approach. This model-based framework is based on three principles. First, weaving together requirements for data, models, and the system as a whole. Second, using failure to drive the exploration of alternative requirements. Third, iterative and traceable refinement of MLS requirements. We demonstrate the proposed framework using an example from autonomous driving and show that REAL supports the development of MLS that better align with stakeholders' requirements. A replication package is available online.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

REAL framework combines three RE principles for ML systems and shows them in one autonomous driving example, but offers no metrics or comparisons to back the alignment claim.

read the letter

The paper introduces REAL, a model-based framework for requirements engineering on machine learning systems. It rests on three principles: weaving data, model, and system requirements together; using failure to explore alternative requirements; and iterative traceable refinement. The authors apply it to an autonomous driving scenario and claim it helps produce systems that better match stakeholder needs.

The integration of requirements across data, models, and the full system is a useful framing that is not always explicit in ML development. Treating failure as a prompt for exploring options fits the reality of ML where models often underperform in new contexts. The replication package is a practical step that lets others examine the details.

The main limitation is the evidence. The demonstration is a single illustrative case with no quantitative measure of alignment, no before-and-after comparison, and no check in another domain. Without those anchors it is difficult to judge whether the three principles are sufficient or simply restate sensible practice. The abstract also does not detail how the framework was derived or how it differs from prior RE work on ML.

This is for software engineers and requirements specialists working on deployed ML systems who want a structured process. A reader interested in ideas for stakeholder alignment could extract some value, but would have to supply their own validation.

It is worth sending to peer review so referees can assess the framework's scope and relation to existing literature.

Referee Report

2 major / 0 minor

Summary. The paper proposes the REAL (Requirements Engineering for mAchines that Learn - and Fail) framework, a model-based requirements engineering approach for machine learning systems (MLS). It rests on three principles: weaving together data, model, and system requirements; using failure to drive exploration of alternative requirements; and iterative traceable refinement. The framework is demonstrated via a single autonomous driving example, with the claim that it supports development of MLS better aligned with stakeholders' needs. A replication package is provided.

Significance. If substantiated, the framework could offer a structured way to address alignment and trustworthiness challenges in MLS development within software engineering. The failure-driven and traceability aspects align with practical needs in iterative ML processes, and the replication package supports reproducibility.

major comments (2)

[Demonstration section (autonomous driving example)] Demonstration section (autonomous driving example): The claim that REAL supports MLS that 'better align with stakeholders' requirements' is based solely on one illustrative scenario. No quantitative alignment metrics, pre/post comparisons, stakeholder measures, or evaluation against baselines are supplied, rendering the sufficiency of the three principles untestable and the central claim unsupported.
[Framework principles (abstract and § on REAL)] Framework principles (abstract and § on REAL): The assertion that the three principles are sufficient to produce better-aligned MLS lacks defined validation criteria, falsification tests, or counterexample domains. This is load-bearing, as the demonstration remains compatible with the principles being descriptive rather than causally effective.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's constructive comments. We address each major comment below and indicate revisions where the feedback identifies areas for improvement in how claims are supported.

read point-by-point responses

Referee: [Demonstration section (autonomous driving example)] Demonstration section (autonomous driving example): The claim that REAL supports MLS that 'better align with stakeholders' requirements' is based solely on one illustrative scenario. No quantitative alignment metrics, pre/post comparisons, stakeholder measures, or evaluation against baselines are supplied, rendering the sufficiency of the three principles untestable and the central claim unsupported.

Authors: We agree that the demonstration consists of a single illustrative example without quantitative metrics, pre/post comparisons, or baseline evaluations. The manuscript is a framework proposal paper whose contribution lies in defining the three principles and showing their integrated application; it does not constitute an empirical validation study. We will revise the abstract, the demonstration section, and the conclusion to moderate the wording from 'show that REAL supports ... better align' to 'illustrate how REAL can be applied to support alignment', and we will add an explicit limitations and future-work subsection outlining the need for subsequent empirical studies that include metrics and multiple cases. revision: yes
Referee: [Framework principles (abstract and § on REAL)] Framework principles (abstract and § on REAL): The assertion that the three principles are sufficient to produce better-aligned MLS lacks defined validation criteria, falsification tests, or counterexample domains. This is load-bearing, as the demonstration remains compatible with the principles being descriptive rather than causally effective.

Authors: The three principles are presented as the foundational elements of the REAL framework, motivated by existing RE literature and documented MLS challenges. The paper does not supply formal validation criteria, falsification tests, or counterexample domains because its scope is the definition and illustration of the integrated approach rather than a causal-effectiveness study. We will revise the framework section to include a short rationale subsection for each principle and an explicit statement that empirical validation (including tests in additional domains) remains future work. revision: yes

Circularity Check

0 steps flagged

No circularity: framework proposal rests on stated principles and single illustrative example

full rationale

The paper introduces the REAL framework by enumerating three explicit principles (weaving data/model/system requirements, failure-driven exploration, iterative traceable refinement) and then applies them to one autonomous-driving scenario. No equations, fitted parameters, predictions, or derivation steps appear in the provided text. The demonstration is presented as an application of the principles rather than a reduction of any output to the inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim therefore remains an independent proposal whose sufficiency is open to external evaluation rather than being tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that requirements engineering methods can be directly adapted to ML systems and that failure analysis provides a reliable driver for requirement improvement; no free parameters or invented entities beyond the framework itself are described in the abstract.

axioms (1)

domain assumption Requirements engineering is well positioned to drive the design of MLS that align with stakeholder needs
Stated in the abstract as the positioning for the framework.

invented entities (1)

REAL framework no independent evidence
purpose: To provide a systematic process for modelling and reasoning about requirements for MLS
The framework is the main proposed contribution introduced in the abstract.

pith-pipeline@v0.9.1-grok · 5750 in / 1289 out tokens · 38958 ms · 2026-07-01T04:33:43.388880+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 15 canonical work pages · 1 internal anchor

[1]

Teaching Software Engineering for Al- Enabled Systems,

C. K ¨astner and E. Kang, “Teaching Software Engineering for Al- Enabled Systems,” in2020 IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET), Oct. 2020, pp. 45–48

2020
[2]

Software Engineering for Ma- chine Learning: A Case Study,

S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Na- gappan, B. Nushi, and T. Zimmermann, “Software Engineering for Ma- chine Learning: A Case Study,” in2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), May 2019, pp. 291–300

2019
[3]

Underspecification presents challenges for credibility in modern machine learning,

A. D’Amour, K. Heller, D. Moldovan, B. Adlam, B. Alipanahi, A. Beutel, C. Chen, J. Deaton, J. Eisenstein, M. D. Hoffman, F. Hormozdiari, N. Houlsby, S. Hou, G. Jerfel, A. Karthikesalingam, M. Lucic, Y . Ma, C. McLean, D. Mincu, A. Mitani, A. Montanari, Z. Nado, V . Natarajan, C. Nielson, T. F. Osborne, R. Raman, K. Ramasamy, R. Sayres, J. Schrouff, M. Sen...

work page arXiv 2020
[4]

Towards best practices in agi safety and governance: A survey of expert opinion,

J. Schuett, N. Dreksler, M. Anderljung, D. McCaffary, L. Heim, E. Bluemke, and B. Garfinkel, “Towards best practices in agi safety and governance: A survey of expert opinion,”arXiv preprint arXiv:2305.07153, 2023

work page arXiv 2023
[5]

Require- ments practices and gaps when engineering human-centered artificial intelligence systems,

K. Ahmad, M. Abdelrazek, C. Arora, M. Bano, and J. Grundy, “Require- ments practices and gaps when engineering human-centered artificial intelligence systems,”Appl. Soft Comput., vol. 143, p. 110421, 2023

2023
[6]

Requirements engineering,

A. Bennaceur, T. T. Tun, Y . Yu, and B. Nuseibeh, “Requirements engineering,” inHandbook of Software Engineering, 2019, pp. 51–92. [Online]. Available: https://doi.org/10.1007/978-3-030-00262-6 2

work page doi:10.1007/978-3-030-00262-6 2019
[7]

Four dark corners of requirements engineer- ing,

P. Zave and M. Jackson, “Four dark corners of requirements engineer- ing,”ACM Trans. Softw. Eng. Methodol., vol. 6, no. 1, pp. 1–30, 1997

1997
[8]

Failure modes in machine learning systems,

R. S. S. Kumar, D. R. O’Brien, K. Albert, S. Vilj ¨oen, and J. Snover, “Failure modes in machine learning systems,”CoRR, vol. abs/1911.11034, 2019

work page arXiv 1911
[9]

Machine learning testing: Survey, landscapes and horizons,

J. M. Zhang, M. Harman, L. Ma, and Y . Liu, “Machine learning testing: Survey, landscapes and horizons,”IEEE Trans. Software Eng., vol. 48, no. 2, pp. 1–36, 2022

2022
[10]

Test generation strategies for building failure models and explaining spurious failures,

B. A. Jodat, A. Chandar, S. Nejati, and M. Sabetzadeh, “Test generation strategies for building failure models and explaining spurious failures,” ACM Trans. Softw. Eng. Methodol., vol. 33, no. 4, pp. 93:1–93:32, 2024

2024
[11]

Simulator- based explanation and debugging of hazard-triggering events in dnn-based safety-critical systems,

H. M. Fahmy, F. Pastore, L. C. Briand, and T. Stifter, “Simulator- based explanation and debugging of hazard-triggering events in dnn-based safety-critical systems,”ACM Trans. Softw. Eng. Methodol., vol. 32, no. 4, pp. 104:1–104:47, 2023. [Online]. Available: https://doi.org/10.1145/3569935

work page doi:10.1145/3569935 2023
[12]

Ai engineering to deploy reliable ai in industry,

J. Mattioli, X. Le Roux, B. Braunschweig, L. Cantat, F. Tschirhart, B. Robert, R. Gelin, and Y . Nicolas, “Ai engineering to deploy reliable ai in industry,” in2023 Fifth International Conference on Transdisciplinary AI (TransAI). IEEE, 2023, pp. 228–231

2023
[13]

Landscape of requirements engineering for machine learning-based AI systems,

N. Yoshioka, J. H. Husen, H. T. Tun, Z. Chen, H. Washizaki, and Y . Fukazawa, “Landscape of requirements engineering for machine learning-based AI systems,” inAPSEC Workshops. IEEE, 2021, pp. 5–8

2021
[14]

Requirements engineering for machine learning: A review and reflection,

Z. Pei, L. Liu, C. Wang, and J. Wang, “Requirements engineering for machine learning: A review and reflection,” inRE Workshops. IEEE, 2022, pp. 166–175

2022
[15]

Requirements engi- neering for machine learning: A systematic mapping study,

H. Villamizar, T. Escovedo, and M. Kalinowski, “Requirements engi- neering for machine learning: A systematic mapping study,” inSEAA. IEEE, 2021, pp. 29–36

2021
[16]

Modeling machine learning requirements from three perspectives: a case report from the healthcare domain,

S. Nalchigar, E. Yu, and K. Keshavjee, “Modeling machine learning requirements from three perspectives: a case report from the healthcare domain,”Requirements Engineering, pp. 1–18, Jan. 2021, company: Springer Distributor: Springer Institution: Springer Label: Springer Publisher: Springer London. [Online]. Available: http://link.springer.com/article/10.1...

work page doi:10.1007/s00766-020-00343-z 2021
[17]

Towards Artefact-based Requirements Engineering for Data-Centric Systems,

T. Chuprina, D. Mendez, and K. Wnuk, “Towards Artefact-based Requirements Engineering for Data-Centric Systems,” Mar. 2021. [Online]. Available: http://arxiv.org/abs/2103.05233v1

work page arXiv 2021
[18]

Requirements engineering framework for human-centered artificial intelligence software systems,

K. Ahmad, M. Abdelrazek, C. Arora, A. A. Baniya, M. Bano, and J. Grundy, “Requirements engineering framework for human-centered artificial intelligence software systems,”Appl. Soft Comput., vol. 143, p. 110455, 2023

2023
[19]

Anunnaki: A modular framework for developing trusted artificial intelligence,

M. A. Langford, S. Zilberman, and B. H. C. Cheng, “Anunnaki: A modular framework for developing trusted artificial intelligence,”ACM Trans. Auton. Adapt. Syst., vol. 19, no. 3, pp. 17:1–17:34, 2024. [Online]. Available: https://doi.org/10.1145/3649453

work page doi:10.1145/3649453 2024
[20]

[Online]

Online - accessed June 2026. [Online]. Available: https://github.com/ ApolloAuto/apollo

2026
[21]

[Online]

Online - accessed June 2026. [Online]. Available: https://autoware.org/

2026
[22]

Requirements engineering: from craft to discipline,

A. van Lamsweerde, “Requirements engineering: from craft to discipline,” inProceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2008, Atlanta, Georgia, USA, November 9-14, 2008, 2008, pp. 238–249. [Online]. Available: http://doi.acm.org/10.1145/1453101.1453133

work page doi:10.1145/1453101.1453133 2008
[23]

Grammatical evolution,

M. O’Neill and C. Ryan, “Grammatical evolution,”IEEE Trans. Evol. Comput., vol. 5, no. 4, pp. 349–358, 2001. [Online]. Available: https://doi.org/10.1109/4235.942529

work page doi:10.1109/4235.942529 2001
[24]

[Online]

Online - accessed June 2026. [Online]. Available: https://github.com/ microsoft/responsible-ai-toolbox

2026
[25]

[Online]

Online - accessed June 2026. [Online]. Available: https://www-cdn. anthropic.com/e670587677525f28df69b59e5fb4c22cc5461a17.pdf

2026
[26]

[Online]

Online - accessed June 2026. [Online]. Available: https://pair. withgoogle.com/guidebook/chapters

2026
[27]

Handling obstacles in goal- oriented requirements engineering,

A. van Lamsweerde and E. Letier, “Handling obstacles in goal- oriented requirements engineering,”IEEE Trans. Software Eng., vol. 26, no. 10, pp. 978–1005, 2000. [Online]. Available: https: //doi.org/10.1109/32.879820

work page doi:10.1109/32.879820 2000
[28]

Obstacle analysis in requirements engineering: Retrospective and emerging challenges,

E. Letier and A. van Lamsweerde, “Obstacle analysis in requirements engineering: Retrospective and emerging challenges,”IEEE Trans. Software Eng., vol. 51, no. 3, pp. 795–801, 2025. [Online]. Available: https://doi.org/10.1109/TSE.2025.3534318

work page doi:10.1109/tse.2025.3534318 2025
[29]

Generating obstacle conditions for requirements completeness,

D. Alrajeh, J. Kramer, A. van Lamsweerde, A. Russo, and S. Uchitel, “Generating obstacle conditions for requirements completeness,” in ICSE. IEEE Computer Society, 2012, pp. 705–715

2012
[30]

Explaining and Harnessing Adversarial Examples

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inProc. of ICLR, Y . Bengio and Y . LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/abs/1412.6572

work page internal anchor Pith review Pith/arXiv arXiv 2015
[31]

Assuring the machine learning lifecycle: Desiderata, methods, and challenges,

R. Ashmore, R. Calinescu, and C. Paterson, “Assuring the machine learning lifecycle: Desiderata, methods, and challenges,”ACM Comput. Surv., vol. 54, no. 5, pp. 111:1–111:39, 2022

2022
[32]

Dynasto: Validity-aware dynamic–static parameter optimization for autonomous driving testing,

D. Humeniuk, M. Hamdaqa, H. B. Braiek, A. Bennaceur, and F. Khomh, “Dynasto: Validity-aware dynamic–static parameter optimization for autonomous driving testing,” in19th IEEE International Conference on Software Testing, Verification and Validation, ICST 2026, 2026

2026
[33]

A conceptual framework for resilience: fundamental definitions, strategies and metrics,

J. Andersson, V . Grassi, R. Mirandola, and D. Perez-Palacin, “A conceptual framework for resilience: fundamental definitions, strategies and metrics,”Computing, Dec. 2020. [Online]. Available: https://doi.org/10.1007/s00607-020-00874-x

work page doi:10.1007/s00607-020-00874-x 2020
[34]

CARLA: an open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. M. L ´opez, and V . Koltun, “CARLA: an open urban driving simulator,” in1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, ser. Proceedings of Machine Learning Research, vol. 78. PMLR, 2017, pp. 1–16. [Online]. Available: http://proceedings.mlr.pre...

2017
[35]

Scenic: a language for scenario specification and data generation,

D. J. Fremont, E. Kim, T. Dreossi, S. Ghosh, X. Yue, A. L. Sangiovanni-Vincentelli, and S. A. Seshia, “Scenic: a language for scenario specification and data generation,”Mach. Learn., vol. 112, no. 10, pp. 3805–3849, 2023. [Online]. Available: https://doi.org/10.1007/s10994-021-06120-5

work page doi:10.1007/s10994-021-06120-5 2023
[36]

Grape: grammatical algorithms in python for evolution,

A. de Lima, S. Carvalho, D. M. Dias, E. Naredo, J. P. Sullivan, and C. Ryan, “Grape: grammatical algorithms in python for evolution,” Signals, vol. 3, no. 3, pp. 642–663, 2022

2022
[37]

CARLA-BSP: a simulated dataset with pedestrians,

M. Wielgosz, A. M. L ´opez, and M. N. Riaz, “CARLA-BSP: a simulated dataset with pedestrians,” May 2023

2023
[38]

Requirements engineering and large language models: In- sights from a panel,

M. Borg, “Requirements engineering and large language models: In- sights from a panel,”IEEE Softw., vol. 41, no. 2, pp. 6–10, 2024

2024
[39]

Rethinking software engineering in the era of foundation models: A curated catalogue of challenges in the development of trustworthy fmware,

A. E. Hassan, D. Lin, G. K. Rajbahadur, K. Gallaba, F. R. C ˆogo, B. Chen, H. Zhang, K. Thangarajah, G. A. Oliva, J. J. Lin, W. M. Abdullah, and Z. M. J. Jiang, “Rethinking software engineering in the era of foundation models: A curated catalogue of challenges in the development of trustworthy fmware,” inSIGSOFT FSE Companion. ACM, 2024, pp. 294–305

2024
[40]

Requirements engineering for trustworthy human-ai synergy in software engineering 2.0,

D. Lo, “Requirements engineering for trustworthy human-ai synergy in software engineering 2.0,” inRE. IEEE, 2024, pp. 3–4

2024
[41]

If a human can see it, so should your system: Reliability requirements for machine vision components,

B. C. Hu, L. Marsso, K. Czarnecki, R. Salay, H. Shen, and M. Chechik, “If a human can see it, so should your system: Reliability requirements for machine vision components,” inICSE. ACM, 2022, pp. 1145–1156

2022
[42]

Trustworthy and synergistic artificial intelligence for software engineering: Vision and roadmaps,

D. Lo, “Trustworthy and synergistic artificial intelligence for software engineering: Vision and roadmaps,” inICSE-FoSE. IEEE, 2023, pp. 69–85

2023
[43]

The IDEA of us: An identity-aware architecture for autonomous systems,

C. Gavidia-Calderon, A. Kordoni, A. Bennaceur, M. Levine, and B. Nu- seibeh, “The IDEA of us: An identity-aware architecture for autonomous systems,”ACM Trans. Softw. Eng. Methodol., vol. 33, no. 6, p. 164, 2024

2024
[44]

Human value requirements in AI systems: Empirical analysis of amazon alexa,

R. A. Shams, M. Bano, D. Zowghi, Q. Lu, and J. Whittle, “Human value requirements in AI systems: Empirical analysis of amazon alexa,” inREW. IEEE, 2023, pp. 138–145

2023
[45]

Feel it, code it: Emotional goal modelling for gender-inclusive design,

D. Hassett, A. Bennaceur, and B. Nuseibeh, “Feel it, code it: Emotional goal modelling for gender-inclusive design,” inREFSQ, ser. Lecture Notes in Computer Science, vol. 13975. Springer, 2023, pp. 324–336

2023
[46]

Val- ues@runtime: An adaptive framework for operationalising values,

A. Bennaceur, D. Hassett, B. Nuseibeh, and A. Zisman, “Val- ues@runtime: An adaptive framework for operationalising values,” in ICSE (SEIS). IEEE, 2023, pp. 175–179

2023
[47]

A. C. Edmondson,Right kind of wrong: The science of failing well. London: Cornerstone Press., 2023

2023
[48]

Tailoring requirements engineering for responsible AI,

W. Maalej, Y . D. Pham, and L. Chazette, “Tailoring requirements engineering for responsible AI,”Computer, vol. 56, no. 4, pp. 18–27, 2023

2023
[49]

Design patterns for machine learning- based systems with humans in the loop,

J. S. Andersen and W. Maalej, “Design patterns for machine learning- based systems with humans in the loop,”IEEE Softw., vol. 41, no. 4, pp. 151–159, 2024

2024
[50]

Quality issues in machine learning software systems,

P. C ˆot´e, A. Nikanjam, R. Bouchoucha, I. Basta, M. Abidi, and F. Khomh, “Quality issues in machine learning software systems,”Empir. Softw. Eng., vol. 29, no. 6, p. 149, 2024

2024
[51]

The safety of autonomy: A systematic approach,

J. A. McDermid, R. Calinescu, I. Habli, R. Hawkins, Y . Jia, J. Molloy, M. Osborne, C. Paterson, Z. Porter, and P. R. Conmy, “The safety of autonomy: A systematic approach,”Computer, vol. 57, no. 4, pp. 16–25, 2024

2024
[52]

Controller synthesis for autonomous systems with deep-learning perception components,

R. Calinescu, C. Imrie, R. Mangal, G. N. Rodrigues, C. S. Pasareanu, M. A. Santana, and G. V ´azquez, “Controller synthesis for autonomous systems with deep-learning perception components,”IEEE Trans. Soft- ware Eng., vol. 50, no. 6, pp. 1374–1395, 2024

2024
[53]

Addressing the IEEE A V test challenge with scenic and verifai,

K. Viswanadha, F. Indaheng, J. Wong, E. Kim, E. Kalvan, Y . Pant, D. J. Fremont, and S. A. Seshia, “Addressing the IEEE A V test challenge with scenic and verifai,” inAITest. IEEE, 2021, pp. 136–142

2021

[1] [1]

Teaching Software Engineering for Al- Enabled Systems,

C. K ¨astner and E. Kang, “Teaching Software Engineering for Al- Enabled Systems,” in2020 IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET), Oct. 2020, pp. 45–48

2020

[2] [2]

Software Engineering for Ma- chine Learning: A Case Study,

S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Na- gappan, B. Nushi, and T. Zimmermann, “Software Engineering for Ma- chine Learning: A Case Study,” in2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), May 2019, pp. 291–300

2019

[3] [3]

Underspecification presents challenges for credibility in modern machine learning,

A. D’Amour, K. Heller, D. Moldovan, B. Adlam, B. Alipanahi, A. Beutel, C. Chen, J. Deaton, J. Eisenstein, M. D. Hoffman, F. Hormozdiari, N. Houlsby, S. Hou, G. Jerfel, A. Karthikesalingam, M. Lucic, Y . Ma, C. McLean, D. Mincu, A. Mitani, A. Montanari, Z. Nado, V . Natarajan, C. Nielson, T. F. Osborne, R. Raman, K. Ramasamy, R. Sayres, J. Schrouff, M. Sen...

work page arXiv 2020

[4] [4]

Towards best practices in agi safety and governance: A survey of expert opinion,

J. Schuett, N. Dreksler, M. Anderljung, D. McCaffary, L. Heim, E. Bluemke, and B. Garfinkel, “Towards best practices in agi safety and governance: A survey of expert opinion,”arXiv preprint arXiv:2305.07153, 2023

work page arXiv 2023

[5] [5]

Require- ments practices and gaps when engineering human-centered artificial intelligence systems,

K. Ahmad, M. Abdelrazek, C. Arora, M. Bano, and J. Grundy, “Require- ments practices and gaps when engineering human-centered artificial intelligence systems,”Appl. Soft Comput., vol. 143, p. 110421, 2023

2023

[6] [6]

Requirements engineering,

A. Bennaceur, T. T. Tun, Y . Yu, and B. Nuseibeh, “Requirements engineering,” inHandbook of Software Engineering, 2019, pp. 51–92. [Online]. Available: https://doi.org/10.1007/978-3-030-00262-6 2

work page doi:10.1007/978-3-030-00262-6 2019

[7] [7]

Four dark corners of requirements engineer- ing,

P. Zave and M. Jackson, “Four dark corners of requirements engineer- ing,”ACM Trans. Softw. Eng. Methodol., vol. 6, no. 1, pp. 1–30, 1997

1997

[8] [8]

Failure modes in machine learning systems,

R. S. S. Kumar, D. R. O’Brien, K. Albert, S. Vilj ¨oen, and J. Snover, “Failure modes in machine learning systems,”CoRR, vol. abs/1911.11034, 2019

work page arXiv 1911

[9] [9]

Machine learning testing: Survey, landscapes and horizons,

J. M. Zhang, M. Harman, L. Ma, and Y . Liu, “Machine learning testing: Survey, landscapes and horizons,”IEEE Trans. Software Eng., vol. 48, no. 2, pp. 1–36, 2022

2022

[10] [10]

Test generation strategies for building failure models and explaining spurious failures,

B. A. Jodat, A. Chandar, S. Nejati, and M. Sabetzadeh, “Test generation strategies for building failure models and explaining spurious failures,” ACM Trans. Softw. Eng. Methodol., vol. 33, no. 4, pp. 93:1–93:32, 2024

2024

[11] [11]

Simulator- based explanation and debugging of hazard-triggering events in dnn-based safety-critical systems,

H. M. Fahmy, F. Pastore, L. C. Briand, and T. Stifter, “Simulator- based explanation and debugging of hazard-triggering events in dnn-based safety-critical systems,”ACM Trans. Softw. Eng. Methodol., vol. 32, no. 4, pp. 104:1–104:47, 2023. [Online]. Available: https://doi.org/10.1145/3569935

work page doi:10.1145/3569935 2023

[12] [12]

Ai engineering to deploy reliable ai in industry,

J. Mattioli, X. Le Roux, B. Braunschweig, L. Cantat, F. Tschirhart, B. Robert, R. Gelin, and Y . Nicolas, “Ai engineering to deploy reliable ai in industry,” in2023 Fifth International Conference on Transdisciplinary AI (TransAI). IEEE, 2023, pp. 228–231

2023

[13] [13]

Landscape of requirements engineering for machine learning-based AI systems,

N. Yoshioka, J. H. Husen, H. T. Tun, Z. Chen, H. Washizaki, and Y . Fukazawa, “Landscape of requirements engineering for machine learning-based AI systems,” inAPSEC Workshops. IEEE, 2021, pp. 5–8

2021

[14] [14]

Requirements engineering for machine learning: A review and reflection,

Z. Pei, L. Liu, C. Wang, and J. Wang, “Requirements engineering for machine learning: A review and reflection,” inRE Workshops. IEEE, 2022, pp. 166–175

2022

[15] [15]

Requirements engi- neering for machine learning: A systematic mapping study,

H. Villamizar, T. Escovedo, and M. Kalinowski, “Requirements engi- neering for machine learning: A systematic mapping study,” inSEAA. IEEE, 2021, pp. 29–36

2021

[16] [16]

Modeling machine learning requirements from three perspectives: a case report from the healthcare domain,

S. Nalchigar, E. Yu, and K. Keshavjee, “Modeling machine learning requirements from three perspectives: a case report from the healthcare domain,”Requirements Engineering, pp. 1–18, Jan. 2021, company: Springer Distributor: Springer Institution: Springer Label: Springer Publisher: Springer London. [Online]. Available: http://link.springer.com/article/10.1...

work page doi:10.1007/s00766-020-00343-z 2021

[17] [17]

Towards Artefact-based Requirements Engineering for Data-Centric Systems,

T. Chuprina, D. Mendez, and K. Wnuk, “Towards Artefact-based Requirements Engineering for Data-Centric Systems,” Mar. 2021. [Online]. Available: http://arxiv.org/abs/2103.05233v1

work page arXiv 2021

[18] [18]

Requirements engineering framework for human-centered artificial intelligence software systems,

K. Ahmad, M. Abdelrazek, C. Arora, A. A. Baniya, M. Bano, and J. Grundy, “Requirements engineering framework for human-centered artificial intelligence software systems,”Appl. Soft Comput., vol. 143, p. 110455, 2023

2023

[19] [19]

Anunnaki: A modular framework for developing trusted artificial intelligence,

M. A. Langford, S. Zilberman, and B. H. C. Cheng, “Anunnaki: A modular framework for developing trusted artificial intelligence,”ACM Trans. Auton. Adapt. Syst., vol. 19, no. 3, pp. 17:1–17:34, 2024. [Online]. Available: https://doi.org/10.1145/3649453

work page doi:10.1145/3649453 2024

[20] [20]

[Online]

Online - accessed June 2026. [Online]. Available: https://github.com/ ApolloAuto/apollo

2026

[21] [21]

[Online]

Online - accessed June 2026. [Online]. Available: https://autoware.org/

2026

[22] [22]

Requirements engineering: from craft to discipline,

A. van Lamsweerde, “Requirements engineering: from craft to discipline,” inProceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2008, Atlanta, Georgia, USA, November 9-14, 2008, 2008, pp. 238–249. [Online]. Available: http://doi.acm.org/10.1145/1453101.1453133

work page doi:10.1145/1453101.1453133 2008

[23] [23]

Grammatical evolution,

M. O’Neill and C. Ryan, “Grammatical evolution,”IEEE Trans. Evol. Comput., vol. 5, no. 4, pp. 349–358, 2001. [Online]. Available: https://doi.org/10.1109/4235.942529

work page doi:10.1109/4235.942529 2001

[24] [24]

[Online]

Online - accessed June 2026. [Online]. Available: https://github.com/ microsoft/responsible-ai-toolbox

2026

[25] [25]

[Online]

Online - accessed June 2026. [Online]. Available: https://www-cdn. anthropic.com/e670587677525f28df69b59e5fb4c22cc5461a17.pdf

2026

[26] [26]

[Online]

Online - accessed June 2026. [Online]. Available: https://pair. withgoogle.com/guidebook/chapters

2026

[27] [27]

Handling obstacles in goal- oriented requirements engineering,

A. van Lamsweerde and E. Letier, “Handling obstacles in goal- oriented requirements engineering,”IEEE Trans. Software Eng., vol. 26, no. 10, pp. 978–1005, 2000. [Online]. Available: https: //doi.org/10.1109/32.879820

work page doi:10.1109/32.879820 2000

[28] [28]

Obstacle analysis in requirements engineering: Retrospective and emerging challenges,

E. Letier and A. van Lamsweerde, “Obstacle analysis in requirements engineering: Retrospective and emerging challenges,”IEEE Trans. Software Eng., vol. 51, no. 3, pp. 795–801, 2025. [Online]. Available: https://doi.org/10.1109/TSE.2025.3534318

work page doi:10.1109/tse.2025.3534318 2025

[29] [29]

Generating obstacle conditions for requirements completeness,

D. Alrajeh, J. Kramer, A. van Lamsweerde, A. Russo, and S. Uchitel, “Generating obstacle conditions for requirements completeness,” in ICSE. IEEE Computer Society, 2012, pp. 705–715

2012

[30] [30]

Explaining and Harnessing Adversarial Examples

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inProc. of ICLR, Y . Bengio and Y . LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/abs/1412.6572

work page internal anchor Pith review Pith/arXiv arXiv 2015

[31] [31]

Assuring the machine learning lifecycle: Desiderata, methods, and challenges,

R. Ashmore, R. Calinescu, and C. Paterson, “Assuring the machine learning lifecycle: Desiderata, methods, and challenges,”ACM Comput. Surv., vol. 54, no. 5, pp. 111:1–111:39, 2022

2022

[32] [32]

Dynasto: Validity-aware dynamic–static parameter optimization for autonomous driving testing,

D. Humeniuk, M. Hamdaqa, H. B. Braiek, A. Bennaceur, and F. Khomh, “Dynasto: Validity-aware dynamic–static parameter optimization for autonomous driving testing,” in19th IEEE International Conference on Software Testing, Verification and Validation, ICST 2026, 2026

2026

[33] [33]

A conceptual framework for resilience: fundamental definitions, strategies and metrics,

J. Andersson, V . Grassi, R. Mirandola, and D. Perez-Palacin, “A conceptual framework for resilience: fundamental definitions, strategies and metrics,”Computing, Dec. 2020. [Online]. Available: https://doi.org/10.1007/s00607-020-00874-x

work page doi:10.1007/s00607-020-00874-x 2020

[34] [34]

CARLA: an open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. M. L ´opez, and V . Koltun, “CARLA: an open urban driving simulator,” in1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, ser. Proceedings of Machine Learning Research, vol. 78. PMLR, 2017, pp. 1–16. [Online]. Available: http://proceedings.mlr.pre...

2017

[35] [35]

Scenic: a language for scenario specification and data generation,

D. J. Fremont, E. Kim, T. Dreossi, S. Ghosh, X. Yue, A. L. Sangiovanni-Vincentelli, and S. A. Seshia, “Scenic: a language for scenario specification and data generation,”Mach. Learn., vol. 112, no. 10, pp. 3805–3849, 2023. [Online]. Available: https://doi.org/10.1007/s10994-021-06120-5

work page doi:10.1007/s10994-021-06120-5 2023

[36] [36]

Grape: grammatical algorithms in python for evolution,

A. de Lima, S. Carvalho, D. M. Dias, E. Naredo, J. P. Sullivan, and C. Ryan, “Grape: grammatical algorithms in python for evolution,” Signals, vol. 3, no. 3, pp. 642–663, 2022

2022

[37] [37]

CARLA-BSP: a simulated dataset with pedestrians,

M. Wielgosz, A. M. L ´opez, and M. N. Riaz, “CARLA-BSP: a simulated dataset with pedestrians,” May 2023

2023

[38] [38]

Requirements engineering and large language models: In- sights from a panel,

M. Borg, “Requirements engineering and large language models: In- sights from a panel,”IEEE Softw., vol. 41, no. 2, pp. 6–10, 2024

2024

[39] [39]

Rethinking software engineering in the era of foundation models: A curated catalogue of challenges in the development of trustworthy fmware,

A. E. Hassan, D. Lin, G. K. Rajbahadur, K. Gallaba, F. R. C ˆogo, B. Chen, H. Zhang, K. Thangarajah, G. A. Oliva, J. J. Lin, W. M. Abdullah, and Z. M. J. Jiang, “Rethinking software engineering in the era of foundation models: A curated catalogue of challenges in the development of trustworthy fmware,” inSIGSOFT FSE Companion. ACM, 2024, pp. 294–305

2024

[40] [40]

Requirements engineering for trustworthy human-ai synergy in software engineering 2.0,

D. Lo, “Requirements engineering for trustworthy human-ai synergy in software engineering 2.0,” inRE. IEEE, 2024, pp. 3–4

2024

[41] [41]

If a human can see it, so should your system: Reliability requirements for machine vision components,

B. C. Hu, L. Marsso, K. Czarnecki, R. Salay, H. Shen, and M. Chechik, “If a human can see it, so should your system: Reliability requirements for machine vision components,” inICSE. ACM, 2022, pp. 1145–1156

2022

[42] [42]

Trustworthy and synergistic artificial intelligence for software engineering: Vision and roadmaps,

D. Lo, “Trustworthy and synergistic artificial intelligence for software engineering: Vision and roadmaps,” inICSE-FoSE. IEEE, 2023, pp. 69–85

2023

[43] [43]

The IDEA of us: An identity-aware architecture for autonomous systems,

C. Gavidia-Calderon, A. Kordoni, A. Bennaceur, M. Levine, and B. Nu- seibeh, “The IDEA of us: An identity-aware architecture for autonomous systems,”ACM Trans. Softw. Eng. Methodol., vol. 33, no. 6, p. 164, 2024

2024

[44] [44]

Human value requirements in AI systems: Empirical analysis of amazon alexa,

R. A. Shams, M. Bano, D. Zowghi, Q. Lu, and J. Whittle, “Human value requirements in AI systems: Empirical analysis of amazon alexa,” inREW. IEEE, 2023, pp. 138–145

2023

[45] [45]

Feel it, code it: Emotional goal modelling for gender-inclusive design,

D. Hassett, A. Bennaceur, and B. Nuseibeh, “Feel it, code it: Emotional goal modelling for gender-inclusive design,” inREFSQ, ser. Lecture Notes in Computer Science, vol. 13975. Springer, 2023, pp. 324–336

2023

[46] [46]

Val- ues@runtime: An adaptive framework for operationalising values,

A. Bennaceur, D. Hassett, B. Nuseibeh, and A. Zisman, “Val- ues@runtime: An adaptive framework for operationalising values,” in ICSE (SEIS). IEEE, 2023, pp. 175–179

2023

[47] [47]

A. C. Edmondson,Right kind of wrong: The science of failing well. London: Cornerstone Press., 2023

2023

[48] [48]

Tailoring requirements engineering for responsible AI,

W. Maalej, Y . D. Pham, and L. Chazette, “Tailoring requirements engineering for responsible AI,”Computer, vol. 56, no. 4, pp. 18–27, 2023

2023

[49] [49]

Design patterns for machine learning- based systems with humans in the loop,

J. S. Andersen and W. Maalej, “Design patterns for machine learning- based systems with humans in the loop,”IEEE Softw., vol. 41, no. 4, pp. 151–159, 2024

2024

[50] [50]

Quality issues in machine learning software systems,

P. C ˆot´e, A. Nikanjam, R. Bouchoucha, I. Basta, M. Abidi, and F. Khomh, “Quality issues in machine learning software systems,”Empir. Softw. Eng., vol. 29, no. 6, p. 149, 2024

2024

[51] [51]

The safety of autonomy: A systematic approach,

J. A. McDermid, R. Calinescu, I. Habli, R. Hawkins, Y . Jia, J. Molloy, M. Osborne, C. Paterson, Z. Porter, and P. R. Conmy, “The safety of autonomy: A systematic approach,”Computer, vol. 57, no. 4, pp. 16–25, 2024

2024

[52] [52]

Controller synthesis for autonomous systems with deep-learning perception components,

R. Calinescu, C. Imrie, R. Mangal, G. N. Rodrigues, C. S. Pasareanu, M. A. Santana, and G. V ´azquez, “Controller synthesis for autonomous systems with deep-learning perception components,”IEEE Trans. Soft- ware Eng., vol. 50, no. 6, pp. 1374–1395, 2024

2024

[53] [53]

Addressing the IEEE A V test challenge with scenic and verifai,

K. Viswanadha, F. Indaheng, J. Wong, E. Kim, E. Kalvan, Y . Pant, D. J. Fremont, and S. A. Seshia, “Addressing the IEEE A V test challenge with scenic and verifai,” inAITest. IEEE, 2021, pp. 136–142

2021