arxiv: 2605.01562 · v2 · submitted 2026-05-02 · 💻 cs.SE · cs.AI

Recognition: unknown

Neuro-Symbolic Agents for Hallucination-Free Requirements Reuse

Ahmed Ibrahim

Authors on Pith no claims yet

Pith reviewed 2026-05-09 14:00 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords neuro-symbolic agentsrequirements reusehallucination preventionOOMRAM latticemodel-driven elicitationsymbolic validationLLM agents

0 comments

The pith

A neuro-symbolic agent system pairs an LLM heuristic with a deterministic validator on a formal requirements lattice to block all invalid combinations by construction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a multi-agent setup that treats requirements reuse as traversal of a fixed OOMRAM requirement lattice. An LLM proposes paths through this lattice while a symbolic validator checks every proposed combination against the lattice's structural rules inside the agent loop. This design is tested on autonomous-system benchmarks covering two application families, where it reaches full requirement coverage and drops constraint violations to 0.2 percent. The central goal is to keep the flexibility of language models for adapting specifications while guaranteeing that every output remains structurally valid and complete.

Core claim

The system re-conceptualizes requirements reuse as model-driven elicitation: an LLM acts as a non-deterministic heuristic for navigating the deterministic OOMRAM requirement lattice, and a symbolic validator enforces all structural constraints within the agent loop, eliminating hallucinated combinations by construction. On autonomous benchmarks across two families the approach attains 100 percent requirement coverage and a 0.2 percent constraint-violation rate, with every generated specification satisfying all mandatory domain requirements even though F1 scores against a single gold standard remain moderate at 0.47-0.51.

What carries the argument

The neuro-symbolic multi-agent loop in which an LLM proposes steps across the OOMRAM requirement lattice while a deterministic symbolic validator rejects any structurally invalid combination before it leaves the loop.

Load-bearing premise

The OOMRAM requirement lattice is assumed to be a complete and accurate representation of all domain constraints, and the LLM is assumed to serve as an effective heuristic that finds valid paths without missing required combinations.

What would settle it

Apply the system to an application family whose true constraints are not fully captured by the provided OOMRAM lattice and check whether any generated specification violates a real domain rule or omits a mandatory requirement that the lattice does not contain.

Figures

Figures reproduced from arXiv: 2605.01562 by Ahmed Ibrahim.

**Figure 2.** Figure 2: LangGraph agent workflow. The Navigator suggests the next discrim view at source ↗

read the original abstract

The Object-Oriented Method for Requirements Authoring and Management (OOMRAM) is a requirements reuse framework that relies on exact identifier matching and rigid templates, limiting its ability to adapt specifications across diverse contexts. While Large Language Models (LLMs) offer the flexibility to overcome this bottleneck, they introduce the risk of generating structurally invalid or inconsistent requirement combinations. To address this tension, we present a neuro-symbolic multi-agent system that re-conceptualizes requirements reuse as a Model-Driven Elicitation process. In this paradigm, an LLM serves as a non-deterministic heuristic for traversing a deterministic domain model represented by a formal OOMRAM requirement lattice. A deterministic, symbolic validator enforces all structural constraints within the agent loop, effectively eliminating hallucinated requirement combinations by construction. Evaluated on an autonomous benchmark across two application families, our system achieves 100% requirement coverage and a constraint-violation rate of only 0.2%. Although the F1-score against a single gold standard is moderate (0.47-0.51), every generated specification is structurally valid and satisfies all mandatory domain requirements. The model-agnostic implementation scales to larger lattices via subgraph navigation and provides transparent audit trails for regulatory compliance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The neuro-symbolic multi-agent setup for OOMRAM requirements reuse pairs an LLM heuristic with a symbolic validator, but the 0.2% violation rate undercuts the by-construction elimination claim.

read the letter

The key point is this paper presents a neuro-symbolic multi-agent system for hallucination-free requirements reuse. It uses an LLM to heuristically traverse an OOMRAM requirement lattice while a deterministic symbolic validator sits in the loop to block invalid outputs. What is new here is the specific setup that treats reuse as model-driven elicitation with the validator enforcing constraints by construction for the OOMRAM domain model. The multi-agent aspect and scaling via subgraph navigation add some engineering detail not commonly seen in similar work. The paper does well in identifying the limitations of rigid OOMRAM templates and proposing a hybrid that keeps structural validity high. The emphasis on transparent audit trails for regulatory use is a solid practical angle. The soft spots center on the evaluation claims. The abstract states the validator eliminates hallucinated combinations by construction yet reports a 0.2% constraint-violation rate. This mismatch suggests either the validator is not applied universally or the lattice does not cover every possible constraint in the generated specs. The F1 scores of 0.47-0.51 are moderate, and the lack of benchmark details or error analysis makes it hard to see if the results back the central promise. The assumption that the lattice fully represents domain constraints also looks like it could be tested more rigorously. This paper targets researchers and practitioners in requirements engineering and AI for software systems. A reader working on safe AI integration in SE would get useful ideas from the architecture. It deserves a serious referee because it brings a concrete implementation and benchmark results to a relevant problem, even with the need for clarifications. I would recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper presents a neuro-symbolic multi-agent system for OOMRAM-based requirements reuse. An LLM acts as a non-deterministic heuristic to traverse a formal requirement lattice while a deterministic symbolic validator enforces all structural constraints inside the agent loop, claimed to eliminate hallucinated requirement combinations by construction. On an autonomous benchmark spanning two application families the system reports 100% requirement coverage, a 0.2% constraint-violation rate, and moderate F1 scores (0.47-0.51) against a single gold standard, with every output asserted to be structurally valid and to satisfy all mandatory domain requirements. The implementation is model-agnostic and supports subgraph navigation for larger lattices together with audit trails.

Significance. If the by-construction guarantee and the reported metrics are substantiated, the work demonstrates a practical hybrid approach that combines LLM flexibility with symbolic correctness guarantees in a safety-critical software-engineering task. The emphasis on transparent audit trails and scalability via subgraph navigation could be valuable for regulated domains. The paper also supplies an autonomous benchmark and a deterministic validator, which are positive contributions for reproducibility.

major comments (2)

[Abstract] Abstract: the central claim that the deterministic symbolic validator 'enforces all structural constraints within the agent loop, effectively eliminating hallucinated requirement combinations by construction' is in direct tension with the reported 0.2% constraint-violation rate. If the validator is exhaustive and applied to every proposal inside the loop, final outputs must contain zero violations; a non-zero rate implies either pre-validation measurement, incomplete coverage of the lattice, or that the validator is not applied to all generated artifacts. This point is load-bearing for the 'hallucination-free' and 'by construction' assertions and requires explicit clarification with reference to the agent-loop pseudocode or architecture diagram.
[Evaluation] Evaluation section (inferred from abstract metrics): the manuscript provides no benchmark details, error analysis, data-exclusion rules, or description of how the 0.2% violation rate was computed (pre- vs. post-validation). Without these, it is impossible to verify whether the numbers support the claim that every generated specification is structurally valid. This absence weakens the empirical support for the central neuro-symbolic guarantee.

minor comments (2)

[Abstract] The F1-score range (0.47-0.51) is labeled 'moderate' without stating the size or construction of the gold standard or the precise definition of true/false positives; adding a short table or paragraph would improve interpretability.
[Method] Notation for the OOMRAM requirement lattice and the validator interface should be introduced with a small example in the main text rather than deferred to an appendix, to make the 'by construction' argument easier to follow.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for identifying the need for greater clarity around our central claims and empirical reporting. We address each major comment below with explanations and commit to revisions that resolve the identified issues without altering the core contributions.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the deterministic symbolic validator 'enforces all structural constraints within the agent loop, effectively eliminating hallucinated requirement combinations by construction' is in direct tension with the reported 0.2% constraint-violation rate. If the validator is exhaustive and applied to every proposal inside the loop, final outputs must contain zero violations; a non-zero rate implies either pre-validation measurement, incomplete coverage of the lattice, or that the validator is not applied to all generated artifacts. This point is load-bearing for the 'hallucination-free' and 'by construction' assertions and requires explicit clarification with reference to the agent-loop pseudocode or architecture diagram.

Authors: The 0.2% constraint-violation rate measures the incidence of invalid proposals generated by the LLM heuristic before the symbolic validator is invoked inside the agent loop. The validator is exhaustive, deterministic, and applied to every candidate artifact; consequently, every final output satisfies all structural constraints by construction. We will revise the abstract to state this distinction explicitly and will add explicit cross-references to the agent-loop pseudocode and architecture diagram so that readers can trace the validator's position and confirm that post-validation outputs contain zero violations. This clarification directly addresses the load-bearing concern for the hallucination-free claim. revision: yes
Referee: [Evaluation] Evaluation section (inferred from abstract metrics): the manuscript provides no benchmark details, error analysis, data-exclusion rules, or description of how the 0.2% violation rate was computed (pre- vs. post-validation). Without these, it is impossible to verify whether the numbers support the claim that every generated specification is structurally valid. This absence weakens the empirical support for the central neuro-symbolic guarantee.

Authors: We agree that the current Evaluation section is too concise and omits the requested details. The manuscript summarizes aggregate metrics but does not describe the autonomous benchmark construction, data-exclusion criteria, error analysis, or the precise pre- versus post-validation computation of the 0.2% rate. We will expand the Evaluation section to supply these elements, explicitly documenting that the reported rate is pre-validation and that post-validation validity is 100% for all outputs. These additions will allow independent verification of the neuro-symbolic guarantee. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation rests on independent symbolic validator and external benchmark

full rationale

The paper's core derivation presents the LLM as a non-deterministic heuristic traversing a fixed OOMRAM lattice, with a separate deterministic symbolic validator enforcing constraints inside the agent loop. This 'by construction' elimination of invalid combinations follows from the validator's described role as exhaustive and independent of LLM outputs, not from any self-referential definition or fitted parameter. Evaluation metrics (100% coverage, 0.2% violation rate, F1 0.47-0.51) are reported against an external autonomous benchmark across two application families, with no equations, parameter fitting, or self-citations shown as load-bearing. The minor tension between zero-violation 'by construction' language and the reported 0.2% rate does not reduce any prediction to its inputs; it is consistent with possible pre-validation measurement or lattice scope details that remain externally falsifiable. The system is self-contained against the stated assumptions without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the completeness of the OOMRAM lattice and the LLM's ability to act as a useful heuristic; these are domain assumptions without independent evidence supplied in the abstract.

axioms (1)

domain assumption The OOMRAM requirement lattice fully captures all structural constraints for the application families under test.
The symbolic validator depends on this lattice to guarantee validity by construction.

invented entities (1)

Neuro-symbolic multi-agent system with LLM heuristic and symbolic validator no independent evidence
purpose: To traverse the requirement lattice while enforcing constraints deterministically.
This architecture is introduced by the paper to solve the hallucination problem.

pith-pipeline@v0.9.0 · 5500 in / 1238 out tokens · 34378 ms · 2026-05-09T14:00:20.702324+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Representing requirements families using the product line approach,

M. Mannion and B. Keepence, “Representing requirements families using the product line approach,”University of the West of Scotland Technical Report, 2000

2000
[2]

Bosch,Software Product Lines: Practices and Patterns

J. Bosch,Software Product Lines: Practices and Patterns. Addison- Wesley, 2013

2013
[3]

A methodology for reusing requirements in application families,

A. F. Ibrahim, “A methodology for reusing requirements in application families,” Master’s thesis, Cairo University, 2005

2005
[4]

Large language models in requirements engineering: A systematic mapping study,

T. Rosaet al., “Large language models in requirements engineering: A systematic mapping study,”Empirical Software Engineering, vol. 30, no. 2, pp. 1–52, 2025

2025
[5]

Generative AI for Requirements Engineering: A roadmap,

M. Langet al., “Generative AI for Requirements Engineering: A roadmap,”IEEE Software, vol. 43, no. 1, pp. 30–38, 2026

2026
[6]

Requirements engineering in the age of large lan- guage models: A systematic mapping study,

T. Nguyenet al., “Requirements engineering in the age of large lan- guage models: A systematic mapping study,”Information and Software Technology, vol. 165, p. 107401, 2025

2025
[7]

On the use of large language models in requirements engineering: A systematic literature review,

M. Khanet al., “On the use of large language models in requirements engineering: A systematic literature review,”IEEE Access, vol. 12, pp. 123 456–123 480, 2024

2024
[8]

Langgraph: Building stateful, multi-actor applications with llms,

J. Chenet al., “Langgraph: Building stateful, multi-actor applications with llms,” inProceedings of the 47th International Conference on Software Engineering (ICSE), 2025

2025
[9]

Langgraph: Build language agents as graphs,

LangChain Inc., “Langgraph: Build language agents as graphs,” https: //github.com/langchain-ai/langgraph, 2024

2024
[10]

Langgraph multi-agent workflows: A practical guide,

A. Kaplunovich, “Langgraph multi-agent workflows: A practical guide,” inProceedings of the AAAI Conference on Artificial Intelligence, 2025, workshop on Agentic AI

2025
[11]

Agentic workflows for software engineering: A survey and roadmap,

P. Mandulapalliet al., “Agentic workflows for software engineering: A survey and roadmap,” inProceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE), 2025

2025
[12]

Challenges in applying large language models to requirements engineering tasks,

J. Norheim, E. Rebentisch, D. Xiao, L. Draeger, A. Kerbrat, and O. L. de Weck, “Challenges in applying large language models to requirements engineering tasks,”Design Science, vol. 10, 2024

2024
[13]

Leveraging graph-rag and prompt engineering to enhance llm-based automated requirement traceability and compliance checks,

A. Masoudifard, M. M. Sorond, M. Madadi, M. Sabokrou, and E. Habibi, “Leveraging graph-rag and prompt engineering to enhance llm-based automated requirement traceability and compliance checks,” arXiv preprint arXiv:2412.08593, 2024

work page arXiv 2024
[14]

Evaluating large language models for the automated generation of software require- ments,

T. Puchleitner, S. Lubos, A. Felfernig, and D. Garber, “Evaluating large language models for the automated generation of software require- ments,” inAdvances and Trends in Artificial Intelligence. Theory and Applications: 38th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2025. Springer-...

2025
[15]

Automated analysis of feature models: Quo vadis?

D. Benavides, S. Segura, and A. Ruiz-Cort ´es, “Automated analysis of feature models: Quo vadis?” inProceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE), vol. 2, 2010, pp. 519–522

2010
[16]

Llm hallucinations in practical code generation: Phenomena, mechanism, and mitigation,

Z. Zhang, C. Wang, Y . Wang, E. Shi, Y . Ma, W. Zhong, J. Chen, M. Mao, and Z. Zheng, “Llm hallucinations in practical code generation: Phenomena, mechanism, and mitigation,”Proceedings of the ACM on Software Engineering, vol. 2, pp. 481–503, 2024

2024
[17]

A survey of safety and trustworthiness of large language models through the lens of verification and validation,

X. Huang, W. Ruan, W. Huang, G. Jin, Y . Dong, C. Wu, S. Bensalem, R. Mu, Y . Qi, X. Zhaoet al., “A survey of safety and trustworthiness of large language models through the lens of verification and validation,” Artificial Intelligence Review, vol. 57, 2023

2023
[18]

Embedding trace- ability in large language model code generation: Towards trustworthy ai-augmented software engineering,

F. Wang, X. Xi, Z. Cui, H. Dai, and X. Wang, “Embedding trace- ability in large language model code generation: Towards trustworthy ai-augmented software engineering,”Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, 2025

2025
[19]

Automated building information modeling compliance check through a large language model combined with deep learning and ontology,

N. Chen, X. Lin, H. Jiang, and Y . An, “Automated building information modeling compliance check through a large language model combined with deep learning and ontology,”Buildings, 2024

2024
[20]

Formal verification of business constraints in workflow-based applications,

F. Stoica and L. F. Stoica, “Formal verification of business constraints in workflow-based applications,”Informatica, vol. 15, p. 778, 2024

2024
[21]

Verifying static constraints on models using general formal verification methods,

N. Somogyi and G. Mezei, “Verifying static constraints on models using general formal verification methods,” inProceedings of the International Conference on Model-Driven Engineering and Software Development (MODELSWARD), 2023, pp. 85–93

2023
[22]

An llm-based approach to recover traceability links between security requirements and goal models,

J. Hassine, “An llm-based approach to recover traceability links between security requirements and goal models,” inProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering (EASE). ACM, 2024, pp. 643–651

2024
[23]

TraceLLM: Leveraging large language models with prompt engineering for enhanced requirements traceability,

N. Alturayeif, I. Ahmad, and J. Hassine, “TraceLLM: Leveraging large language models with prompt engineering for enhanced requirements traceability,” 2026. [Online]. Available: https://arxiv.org/abs/2602.01253

work page arXiv 2026
[24]

Survey of hallucination in natural language generation,

Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y . Xu, E. Ishii, Y . Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,”ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, 2023

2023
[25]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, and T. Liu, “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,” ACM Transactions on Information Systems, vol. 43, pp. 1–55, 2023

2023
[26]

De-hallucinator: Mitigating llm hallucina- tions in code generation tasks via iterative grounding,

A. Eghbali and M. Pradel, “De-hallucinator: Mitigating llm hallucina- tions in code generation tasks via iterative grounding,” 2024, preprint

2024
[27]

Multi-layered framework for llm hallucina- tion mitigation in high-stakes applications: A tutorial,

S. Hiriyanna and W. Zhao, “Multi-layered framework for llm hallucina- tion mitigation in high-stakes applications: A tutorial,”Comput., vol. 14, p. 332, 2025

2025
[28]

Elicitron: A framework for simulating design requirements elicitation using large language model agents,

M. Ataei, H. Cheong, D. Grandi, Y . Wang, N. Morris, and A. Tessier, “Elicitron: A framework for simulating design requirements elicitation using large language model agents,” inProceedings of the ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (IDETC-CIE), 2024

2024
[29]

Multi-agent debate strategies to enhance requirements engineering with large language models,

M. Oriol, Q. Motger, J. Marco, and X. Franch, “Multi-agent debate strategies to enhance requirements engineering with large language models,” 2025

2025
[30]

Bench- marking the effectiveness of multi-agent llms in collaborative privacy threat modeling with linddun go,

A. Bissoli, M. Mollaeefar, D. Van Landuyt, and S. Ranise, “Bench- marking the effectiveness of multi-agent llms in collaborative privacy threat modeling with linddun go,”Journal of Information Security and Applications, vol. 100, p. 104489, 2026

2026
[31]

Agentless: Demystifying LLM-based Software Engineering Agents

C. Xia, Y . Deng, S. Dunn, and L. Zhang, “Agentless: Demystify- ing llm-based software engineering agents,” inarXiv preprint, 2024, arXiv:2407.01489

work page internal anchor Pith review arXiv 2024
[32]

Llm-based multi-agent systems for software engineering: Literature review, vision, and the road ahead,

J. He, C. Treude, and D. Lo, “Llm-based multi-agent systems for software engineering: Literature review, vision, and the road ahead,” ACM Transactions on Software Engineering and Methodology, vol. 34, pp. 1–30, 2024

2024
[33]

Liu et al

J. Guo, S. Huang, M. Li, D. Huang, X. Chen, R. Zhang, Z. Guo, H. Yu, S. Yiu, C. Jensenet al., “A comprehensive survey on benchmarks and solutions in software engineering of llm-empowered agentic system,” arXiv preprint arXiv:2510.09721, 2025

work page arXiv 2025
[34]

Neurosymbolic programming,

S. Chaudhuri, K. Ellis, O. Polozov, R. Singh, A. Solar-Lezamaet al., “Neurosymbolic programming,”Foundations and Trends in Program- ming Languages, vol. 7, no. 3, pp. 158–363, 2021

2021
[35]

Neuro-symbolic agents for hallucination-free requirements reuse,

A. Ibrahim, “Neuro-symbolic agents for hallucination-free requirements reuse,” 2026, full paper, forthcoming

2026