pith. machine review for the scientific record. sign in

arxiv: 2603.02259 · v2 · submitted 2026-02-28 · 💻 cs.MA · cs.LG· cs.RO

Recognition: 2 theorem links

· Lean Theorem

The Alignment Flywheel: A Governance-Centric Hybrid MAS for Architecture-Agnostic Safety

Authors on Pith no claims yet

Pith reviewed 2026-05-15 19:02 UTC · model grok-4.3

classification 💻 cs.MA cs.LGcs.RO
keywords Alignment Flywheelhybrid MASsafety governancepatch localitySafety Oraclemulti-agent systemsAI safetyruntime enforcement
0
0 comments X

The pith

The Alignment Flywheel decouples decision generation from safety governance so that many failures can be fixed by patching the Safety Oracle rather than retraining the core component.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Alignment Flywheel as a hybrid multi-agent system architecture that separates the generation of candidate trajectories from their safety evaluation. A Proposer produces actions while a Safety Oracle supplies raw safety signals through a stable interface. An enforcement layer applies risk policies at runtime, and a governance MAS audits the Oracle, verifies uncertainties, and issues versioned refinements. The key principle is patch locality, allowing safety updates to target only the governed oracle artifact and its release pipeline. Sympathetic readers would care because this approach addresses the opacity and retraining costs of safety in learned autonomous systems by making oversight modular and auditable.

Core claim

The Alignment Flywheel formalizes a governance-centric hybrid MAS that decouples decision generation from safety governance. A Proposer representing any autonomous decision component generates candidate trajectories, a Safety Oracle returns raw safety signals through a stable interface, an enforcement layer applies explicit risk policy at runtime, and a governance MAS supervises the Oracle through auditing, uncertainty-driven verification, and versioned refinement. The central engineering principle is patch locality: many newly observed safety failures can be mitigated by updating the governed oracle artifact and its release pipeline rather than retracting or retraining the underlying决策组件.

What carries the argument

Patch locality through the governed Safety Oracle, which supplies raw safety signals via a stable interface that the governance MAS can audit and refine independently of the Proposer.

Load-bearing premise

A stable implementation-agnostic interface exists for the Safety Oracle to deliver reliable raw safety signals, and the governance MAS can audit and refine it without introducing new failure modes or dependencies on the Proposer.

What would settle it

A concrete safety failure observed in deployment that cannot be mitigated by any update to the Safety Oracle and its release pipeline, or that requires direct changes to the Proposer to resolve.

Figures

Figures reproduced from arXiv: 2603.02259 by Elias Malomgr\'e, Pieter Simoens.

Figure 1
Figure 1. Figure 1: Runtime enforcement during deployment. From context Σ, proposer P gener￾ates a candidate trajectory τcand. Enforcement E queries the deployed oracle stack, which returns the requested signals, e.g. (s, u, uthresh, uaua,thresh, vO), where vendor￾side outputs and Flywheel-side governance outputs are combined under a single query interface. Enforcement then derives the action a and the uncertainty state u, lo… view at source ↗
Figure 2
Figure 2. Figure 2: 3D spatial Flywheel progression. The learned Oracle initially assigns non-trivial reward to many cells outside the expert-path basin. Governance batches add suppres￾sion kernels around verified flaw regions. Across iterations, the active reward surface contracts toward the expert trajectory while the sampled basin is preserved. The adaptive Patch Planner eliminates all discovered flaw cells on the evalua￾t… view at source ↗
Figure 3
Figure 3. Figure 3: Abstract OODA interaction pattern for governance agents. Each role reads shared state from the append-only knowledge base K during Observe, interprets it relative to its local objective during Orient, selects a strategy during Decide, and writes derived artifacts back to K during Act. Role-specific behavior is captured by strategies and artifact types, while the interaction contract with K remains uniform … view at source ↗
read the original abstract

Multi-agent systems provide mature methodologies for role decomposition, coordination, and normative governance, capabilities that remain essential as increasingly powerful autonomous decision components are embedded within agent-based systems. While learned and generative models substantially expand system capability, their safety behavior is often entangled with training, making it opaque, difficult to audit, and costly to update after deployment. This paper formalizes the Alignment Flywheel as a governance-centric hybrid MAS architecture that decouples decision generation from safety governance. A Proposer, representing any autonomous decision component, generates candidate trajectories, while a Safety Oracle returns raw safety signals through a stable interface. An enforcement layer applies explicit risk policy at runtime, and a governance MAS supervises the Oracle through auditing, uncertainty-driven verification, and versioned refinement. The central engineering principle is patch locality: many newly observed safety failures can be mitigated by updating the governed oracle artifact and its release pipeline rather than retracting or retraining the underlying decision component. The architecture is implementation-agnostic with respect to both the Proposer and the Safety Oracle, and specifies the roles, artifacts, protocols, and release semantics needed for runtime gating, audit intake, signed patching, and staged rollout across distributed deployments. The result is a hybrid MAS engineering framework for integrating highly capable but fallible autonomous systems under explicit, version-controlled, and auditable oversight.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes the Alignment Flywheel, a governance-centric hybrid multi-agent system architecture that decouples a Proposer (any autonomous decision component) from safety oversight via a Safety Oracle that supplies raw safety signals through a stable interface, an enforcement layer for runtime risk policy, and a governance MAS for auditing, verification, and versioned refinement of the Oracle. The central claim is patch locality: safety failures can be mitigated by updating only the governed Oracle artifact and release pipeline rather than retracting or retraining the underlying decision component. The architecture is presented as implementation-agnostic with respect to both Proposer and Oracle, specifying roles, artifacts, protocols, and release semantics for runtime gating, audit intake, signed patching, and staged rollout.

Significance. If the architecture can be realized with the claimed isolation and stability properties, it would provide a practical engineering framework for integrating capable but fallible autonomous systems under explicit, version-controlled oversight in multi-agent settings. The emphasis on modular patching and governance MAS supervision addresses a real deployment challenge in safety-critical AI systems. However, the contribution remains conceptual; without empirical validation, formal protocol analysis, or concrete interface specifications, its significance is limited to outlining a design pattern rather than demonstrating a working solution.

major comments (2)
  1. [Abstract and governance MAS description] Abstract and governance MAS section: The central engineering principle of patch locality is asserted to hold because governance actions on the Oracle remain isolated from the Proposer, yet no protocol analysis, interface schema for raw safety signals, or examination of potential new runtime dependencies is supplied. For opaque or learned Proposers this leaves the claim that updates can be performed without re-entanglement unshown.
  2. [Safety Oracle interface definition] Safety Oracle and enforcement layer: The paper states that a stable, implementation-agnostic interface exists allowing any Proposer to emit trajectories and receive usable raw safety signals, but supplies neither a concrete signal schema nor an argument that the interface definition itself does not force coupling when the Proposer is a black-box learned model.
minor comments (2)
  1. [Related work] The manuscript would benefit from an explicit comparison table contrasting the proposed architecture with existing MAS governance frameworks (e.g., normative MAS or runtime verification systems) to clarify novelty.
  2. [Terminology and definitions] Terminology such as 'raw safety signals' and 'versioned refinement' is used without operational definitions or examples of what the signals contain or how refinement is performed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive critique. The comments correctly identify that the manuscript presents the Alignment Flywheel at an architectural level without supplying concrete protocol details or an example interface schema. We address each point below and will incorporate the requested clarifications in the revised version.

read point-by-point responses
  1. Referee: [Abstract and governance MAS description] Abstract and governance MAS section: The central engineering principle of patch locality is asserted to hold because governance actions on the Oracle remain isolated from the Proposer, yet no protocol analysis, interface schema for raw safety signals, or examination of potential new runtime dependencies is supplied. For opaque or learned Proposers this leaves the claim that updates can be performed without re-entanglement unshown.

    Authors: We agree that the current text asserts patch locality without a supporting protocol sketch. In the revision we will add a new subsection (Section 4.2) that defines the minimal interaction protocol: the Proposer emits only observable trajectories (state-action sequences) and receives a vector of raw safety signals; the Oracle update changes only the evaluation function behind those signals. Because the Proposer never receives Oracle internals or training data, and because enforcement is performed by a separate runtime gate that consumes the signals, an Oracle patch cannot create new runtime dependencies on the Proposer. We will also include a short argument that this contract remains stable even when the Proposer is a black-box learned model, since the interface touches only externally observable behavior. revision: yes

  2. Referee: [Safety Oracle interface definition] Safety Oracle and enforcement layer: The paper states that a stable, implementation-agnostic interface exists allowing any Proposer to emit trajectories and receive usable raw safety signals, but supplies neither a concrete signal schema nor an argument that the interface definition itself does not force coupling when the Proposer is a black-box learned model.

    Authors: The manuscript deliberately avoids prescribing a single schema in order to remain agnostic across safety-evaluation techniques. To meet the referee’s request we will append an illustrative minimal schema (Appendix A) that lists the required fields (trajectory identifier, per-step risk vector, binary violation flags, and optional uncertainty estimate). The accompanying text will argue that this schema induces no coupling: the Proposer is required only to emit trajectories it already produces and to accept the returned signals for gating; it does not expose parameters or gradients to the Oracle, nor does the Oracle modify the Proposer’s weights. Hence an update to the Oracle artifact changes only the signal-generation logic, preserving the claimed locality. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive architecture with no derivations or self-referential reductions

full rationale

The paper is a descriptive architectural specification that defines roles (Proposer, Safety Oracle, governance MAS) and states the patch locality principle as an engineering claim. No equations, fitted parameters, predictions, or formal derivations appear in the text. The central decoupling is asserted via interface stability rather than derived from prior quantities within the paper. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The architecture is presented as implementation-agnostic by construction of the role definitions, but this is definitional description rather than a circular reduction of a claimed result to its inputs. The skeptic's concerns address unproven assumptions about interface independence, which fall under correctness or completeness rather than circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The proposal rests on domain assumptions about the feasibility of stable safety interfaces and effective governance supervision, with no free parameters or invented entities supported by independent evidence.

axioms (2)
  • domain assumption A stable interface for raw safety signals can be maintained independently of the decision component.
    Invoked in the decoupling of Proposer from Safety Oracle and the enforcement layer.
  • domain assumption The governance MAS can effectively audit, verify uncertainty, and refine the Oracle through versioned updates.
    Central to the supervision and patch locality claims.
invented entities (2)
  • Safety Oracle no independent evidence
    purpose: To return raw safety signals through a stable interface decoupled from the Proposer.
    New component introduced to enable the architecture; no independent evidence provided.
  • Alignment Flywheel no independent evidence
    purpose: The overall governance-centric hybrid MAS framework.
    Newly formalized architecture name and structure.

pith-pipeline@v0.9.0 · 5545 in / 1528 out tokens · 47106 ms · 2026-05-15T19:02:08.763207+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    A Proposer generates candidate trajectories; a governed Safety Oracle stack returns safety scores, prediction uncertainty, audit coverage uncertainty, and evidence hooks through a stable interface; and an Enforcement layer applies explicit risk policy at runtime.

  • IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The central engineering principle is patch locality: many newly observed safety failures can be mitigated by updating the governed oracle artifact and its release pipeline rather than retracting or retraining the underlying decision component.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 3 internal anchors

  1. [1]

    In: Proceedings of the Twenty-First International Conference on Machine Learning

    Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning. p. 1. ICML ’04, Association for Computing Machin- ery, New York, NY, USA (2004). https://doi.org/10.1145/1015330.1015430, https://doi.org/10.1145/1015330.1015430

  2. [2]

    Artificial Intelligence Review55(6), 4307–4346 (2022)

    Adams, S., Cody, T., Beling, P.A.: A survey of inverse reinforcement learn- ing. Artificial Intelligence Review55(6), 4307–4346 (2022)

  3. [3]

    In: Proceedings of the AAAI conference on artificial intelligence

    Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedings of the AAAI conference on artificial intelligence. vol. 32 (2018)

  4. [4]

    Artificial Intelligence297, 103500 (2021)

    Arora, S., Doshi, P.: A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence297, 103500 (2021)

  5. [5]

    arXiv preprint arXiv:2412.10096 (2024)

    Baert, M., Leroux, S., Simoens, P.: Reward machine inference for robotic manipulation. arXiv preprint arXiv:2412.10096 (2024)

  6. [6]

    In: 2014 IEEE Real-Time Systems Symposium

    Bak, S., Johnson, T.T., Caccamo, M., Sha, L.: Real-time reachability for verified simplex design. In: 2014 IEEE Real-Time Systems Symposium. pp. 138–148. IEEE (2014)

  7. [7]

    AI and Ethics5(3), 3265–3279 (2025)

    Batool, A., Zowghi, D., Bano, M.: Ai governance: a systematic literature review. AI and Ethics5(3), 3265–3279 (2025)

  8. [8]

    Internet Research33(7), 133– 167 (2023)

    Birkstedt, T., Minkkinen, M., Tandon, A., Mäntymäki, M.: Ai governance: themes, knowledge gaps and future agendas. Internet Research33(7), 133– 167 (2023)

  9. [9]

    Computational & Mathematical Organization Theory 12(2), 71–79 (2006)

    Boella, G., Van Der Torre, L., Verhagen, H.: Introduction to normative multiagent systems. Computational & Mathematical Organization Theory 12(2), 71–79 (2006)

  10. [10]

    In: 2017 IEEE international conference on big data (big data)

    Breck, E., Cai, S., Nielsen, E., Salib, M., Sculley, D.: The ml test score: A rubric for ml production readiness and technical debt reduction. In: 2017 IEEE international conference on big data (big data). pp. 1123–1132. IEEE (2017)

  11. [11]

    In: Proceedings of the 10th international command and control research technology symposium

    Brehmer, B.: The dynamic ooda loop: Amalgamating boyd’s ooda loop and the cybernetic approach to command and control. In: Proceedings of the 10th international command and control research technology symposium. pp. 365–368 (2005)

  12. [12]

    arXiv preprint arXiv:2510.14176 (2025)

    Castanyer, R.C., Mohamed, F., Castro, P.S., Neary, C., Berseth, G.: Arm- fm: Automated reward machines via foundation models for compositional reinforcement learning. arXiv preprint arXiv:2510.14176 (2025)

  13. [13]

    ACM Computing Surveys58(2), 1–37 (2025)

    Chaudhari, S., Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., Deshpande, A., Castro da Silva, B.: Rlhf deciphered: A critical analysis of reinforcement learning from human feedback for llms. ACM Computing Surveys58(2), 1–37 (2025)

  14. [14]

    IEEE Control Systems Magazine 43(2), 28–65 (2023)

    Hobbs, K.L., Mote, M.L., Abate, M.C., Coogan, S.D., Feron, E.M.: Run- time assurance for safety-critical systems: An introduction to safety filtering 48 Elias Malomgré and Pieter Simoens approaches for complex control systems. IEEE Control Systems Magazine 43(2), 28–65 (2023)

  15. [15]

    In: Proceedings of the SIGCHI conference on Human Factors in Computing Systems

    Horvitz, E.: Principles of mixed-initiative user interfaces. In: Proceedings of the SIGCHI conference on Human Factors in Computing Systems. pp. 159–166 (1999)

  16. [16]

    Hovakimyan, G., Bravo, J.M.: Evolving strategies in machine learning: a systematicreviewofconceptdriftdetection.Information15(12), 786(2024)

  17. [17]

    Hsu, K.C., Hu, H., Fisac, J.F.: The safety filter: A unified view of safety- criticalcontrolinautonomoussystems.AnnualReviewofControl,Robotics, and Autonomous Systems7(2023)

  18. [18]

    arXiv preprint arXiv:2310.19852 (2025)

    Ji, J., Qiu, T., Chen, B., Zhang, B., Lou, H., Wang, K., Duan, Y., He, Z., Zhou, J., Zhang, Z., et al.: Ai alignment: A comprehensive survey. arXiv preprint arXiv:2310.19852 (2025)

  19. [19]

    Communications of the ACM68(11), 80–90 (2025)

    Könighofer, B., Bloem, R., Jansen, N., Junges, S., Pranger, S.: Shields for safe reinforcement learning. Communications of the ACM68(11), 80–90 (2025)

  20. [20]

    In: 2017 USENIX Annual Technical Conference (USENIX ATC 17)

    Kuppusamy, T.K., Diaz, V., Cappos, J.: Mercury:{Bandwidth-Effective} prevention of rollback attacks against community repositories. In: 2017 USENIX Annual Technical Conference (USENIX ATC 17). pp. 673–688 (2017)

  21. [21]

    Cybersecurity1(1), 6 (2018)

    Li, J., Zhao, B., Zhang, C.: Fuzzing: a survey. Cybersecurity1(1), 6 (2018)

  22. [22]

    In: Findings of the Association for Computational Linguistics: EMNLP 2024

    Li, R., Wang, P., Ma, J., Zhang, D., Sha, L., Sui, Z.: Be a multitude to itself: A prompt evolution framework for red teaming. In: Findings of the Association for Computational Linguistics: EMNLP 2024. pp. 3287–3301 (2024)

  23. [23]

    arXiv preprint arXiv:2409.07569 (2024)

    Liu, G., Xu, S., Liu, S., Gaurav, A., Subramanian, S.G., Poupart, P.: A comprehensive survey on inverse constrained reinforcement learning: Defi- nitions, progress and challenges. arXiv preprint arXiv:2409.07569 (2024)

  24. [24]

    arXiv preprint arXiv:2507.15287 (2025)

    Malomgré, E., Simoens, P.: Mixture of autoencoder experts guidance using unlabeled and incomplete data for exploration in reinforcement learning. arXiv preprint arXiv:2507.15287 (2025)

  25. [25]

    Malomgré, E., Simoens, P.: Interactionless inverse reinforcement learning: A data-centric framework for durable alignment (2026), https://arxiv.org/abs/2602.14844

  26. [26]

    AI and Ethics2(4), 603–609 (2022)

    Mäntymäki, M., Minkkinen, M., Birkstedt, T., Viljanen, M.: Defining orga- nizational ai governance. AI and Ethics2(4), 603–609 (2022)

  27. [27]

    In: Proceedings of the 2022 ACM SIGSAC Conference on Com- puter and Communications Security

    Newman, Z., Meyers, J.S., Torres-Arias, S.: Sigstore: Software signing for everybody. In: Proceedings of the 2022 ACM SIGSAC Conference on Com- puter and Communications Security. pp. 2353–2367 (2022)

  28. [28]

    In: Icml

    Ng, A.Y., Russell, S., et al.: Algorithms for inverse reinforcement learning. In: Icml. vol. 1, p. 2 (2000)

  29. [29]

    Advances in neural information processing systems35, 27730–27744 (2022) The Alignment Flywheel 49

    Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al.: Training language models to follow instructions with human feedback. Advances in neural information processing systems35, 27730–27744 (2022) The Alignment Flywheel 49

  30. [30]

    Advances in neural information processing systems32(2019)

    Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J., Lakshminarayanan, B., Snoek, J.: Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Advances in neural information processing systems32(2019)

  31. [31]

    In: 2017 17th International Conference on Appli- cation of Concurrency to System Design (ACSD)

    Phan, D., Yang, J., Clark, M., Grosu, R., Schierman, J., Smolka, S., Stoller, S.: A component-based simplex architecture for high-assurance cyber-physical systems. In: 2017 17th International Conference on Appli- cation of Concurrency to System Design (ACSD). pp. 49–58. IEEE (2017)

  32. [32]

    Governed Capability Evolution: Lifecycle-Time Compatibility Checking and Rollback for AI-Component-Based Systems, with Embodied Agents as Case Study

    Qin, X., Luan, S., See, J., Yang, C., Li, Z.: Governed capability evolution for embodied agents: Safe upgrade, compatibility checking, and runtime rollback for embodied capability modules. arXiv preprint arXiv:2604.08059 (2026)

  33. [33]

    Mit Press (2008)

    Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset shift in machine learning. Mit Press (2008)

  34. [34]

    Advances in neural information processing systems36, 53728–53741 (2023)

    Rafailov, R., Sharma, A., Mitchell, E., Manning, C.D., Ermon, S., Finn, C.: Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems36, 53728–53741 (2023)

  35. [35]

    In: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society

    Raji, I.D., Xu, P., Honigsberg, C., Ho, D.: Outsider oversight: Designing a third party audit ecosystem for ai governance. In: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. pp. 557–571 (2022)

  36. [36]

    In: Proceedings of the 2023 conference on empirical meth- ods in natural language processing: system demonstrations

    Rebedea, T., Dinu, R., Sreedhar, M.N., Parisien, C., Cohen, J.: Nemo guardrails: A toolkit for controllable and safe llm applications with pro- grammable rails. In: Proceedings of the 2023 conference on empirical meth- ods in natural language processing: system demonstrations. pp. 431–445 (2023)

  37. [37]

    In: Proceedings of the fifth international confer- ence on Autonomous agents

    Scerri, P., Pynadath, D., Tambe, M.: Adjustable autonomy in real-world multi-agent environments. In: Proceedings of the fifth international confer- ence on Autonomous agents. pp. 300–307 (2001)

  38. [38]

    Advances in neural information process- ing systems28(2015)

    Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.F., Dennison, D.: Hidden technical debt in machine learning systems. Advances in neural information process- ing systems28(2015)

  39. [39]

    IEEE access5, 3909–3943 (2017)

    Shahin, M., Babar, M.A., Zhu, L.: Continuous integration, delivery and deployment: a systematic review on approaches, tools, challenges and prac- tices. IEEE access5, 3909–3943 (2017)

  40. [40]

    arXiv preprint arXiv:2108.13557 (2021)

    Shankar, S., Parameswaran, A.: Towards observability for production ma- chine learning pipelines. arXiv preprint arXiv:2108.13557 (2021)

  41. [41]

    Active Reward Machine Inference From Raw State Trajectories

    Shehab, M.L., Aspeel, A., Ozay, N.: Active reward machine inference from raw state trajectories. arXiv preprint arXiv:2604.07480 (2026)

  42. [42]

    ACM Transactions on Intelligent Systems and Technology (TIST)5(1), 1–23 (2014)

    Singh, M.P.: Norms as a basis for governing sociotechnical systems. ACM Transactions on Intelligent Systems and Technology (TIST)5(1), 1–23 (2014)

  43. [43]

    arXiv preprint arXiv:2507.13158 (2025) 50 Elias Malomgré and Pieter Simoens

    Sun, H., van der Schaar, M.: Inverse reinforcement learning meets large language model post-training: Basics, advances, and opportunities. arXiv preprint arXiv:2507.13158 (2025) 50 Elias Malomgré and Pieter Simoens

  44. [44]

    arXiv preprint arXiv:2505.09843 (2025)

    Turcotte, M., Labrèche, F., Paquette, S.O.: Automated alert classification and triage (aact): an intelligent system for the prioritisation of cybersecurity alerts. arXiv preprint arXiv:2505.09843 (2025)

  45. [45]

    Information and Software Technology183, 107733 (2025)

    Zarour, M., Alzabut, H., Al-Sarayreh, K.T.: Mlops best prac- tices, challenges and maturity models: A systematic litera- ture review. Information and Software Technology183, 107733 (2025). https://doi.org/https://doi.org/10.1016/j.infsof.2025.107733, https://www.sciencedirect.com/science/article/pii/S0950584925000722

  46. [46]

    arXiv preprint arXiv:2511.08607 (2025)

    Zhao, Y., Zhang, S., Wu, Y., Sun, Y., Sun, Y., Pei, D., Bansal, C., Ma, M.: Triage in software engineering: A systematic review of research and practice. arXiv preprint arXiv:2511.08607 (2025)

  47. [47]

    In: Aaai

    Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K., et al.: Maximum entropy inverse reinforcement learning. In: Aaai. vol. 8, pp. 1433–1438. Chicago, IL, USA (2008)

  48. [48]

    Fine-Tuning Language Models from Human Preferences

    Ziegler, D.M., Stiennon, N., Wu, J., Brown, T.B., Radford, A., Amodei, D., Christiano, P., Irving, G.: Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593 (2019)