Ontology-Constrained Neural Reasoning in Enterprise Agentic Systems: A Neurosymbolic Architecture for Domain-Grounded AI Agents

Abhijit Sanyal; Thanh Luong Tuan

arxiv: 2604.00555 · v5 · pith:3DN2IPLNnew · submitted 2026-04-01 · 💻 cs.AI · cs.CL· cs.SE

Ontology-Constrained Neural Reasoning in Enterprise Agentic Systems: A Neurosymbolic Architecture for Domain-Grounded AI Agents

Thanh Luong Tuan , Abhijit Sanyal This is my paper

Pith reviewed 2026-05-21 09:39 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.SE

keywords neurosymbolic architectureontology-constrained reasoningenterprise agentsLLM groundingregulatory compliancedomain driftagentic systems

0 comments

The pith

Ontology-coupled agents outperform ungrounded ones on accuracy and role consistency, with largest gains where LLMs have weakest domain knowledge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Enterprise agents using large language models struggle with hallucinations, domain drift, and regulatory compliance. The paper demonstrates that grounding these agents in a three-layer ontology framework improves both factual accuracy and consistent role adherence across tested models. Gains prove largest in domains where the underlying LLMs have sparse parametric knowledge, such as Vietnam-localized business settings. The architecture couples symbolic ontologies to both input assembly and output validation, extending current enterprise practices that only constrain inputs. This neurosymbolic approach is positioned as a way to make agentic systems viable for regulated industries.

Core claim

A controlled experiment across 1,800 runs and three LLMs finds that agents constrained by Role, Domain, and Interaction ontologies achieve statistically significant gains in Metric Accuracy and Role Consistency compared to ungrounded baselines, with large effect sizes; the magnitude of improvement is inversely proportional to the LLM's existing training-data coverage of the domain, reaching twice the lift in Vietnam-localized settings versus English domains.

What carries the argument

Three-layer ontological framework (Role, Domain, and Interaction ontologies) that supplies domain-grounded constraints to LLM inputs via context assembly and tool discovery and to outputs via response checking and compliance enforcement through asymmetric neurosymbolic coupling.

If this is right

Ontology-constrained tool discovery via SQL-pushdown scoring produces more relevant tool selection than parametric retrieval alone.
Output-side ontological validation can enforce compliance at the reasoning level rather than only post hoc.
The inverse parametric knowledge effect implies that ontology investment should be prioritized for domains with low LLM training coverage.
Benefits replicate across Claude Sonnet 4, Qwen 2.5 72B, and Gemma 4 26B, indicating model independence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same three-layer structure could be adapted to other regulated verticals that share similar compliance and role constraints.
Automated ontology maintenance pipelines would be needed to prevent the architecture from lagging behind evolving enterprise rules.
Integration with existing SQL-based enterprise databases could lower the cost of maintaining the Domain and Interaction layers.

Load-bearing premise

The three-layer ontologies can be constructed and maintained so they accurately and comprehensively encode relevant enterprise knowledge, regulatory constraints, and interaction rules without introducing new errors or coverage gaps.

What would settle it

A replication experiment in which the supplied ontologies are deliberately incomplete or contain outdated rules, after which the measured accuracy and consistency advantages over ungrounded agents disappear.

Figures

Figures reproduced from arXiv: 2604.00555 by Abhijit Sanyal, Thanh Luong Tuan.

**Figure 2.** Figure 2: Mean scores by condition for each metric (5 industries, 600 runs). MA and RS show [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: C3 (Ontology) scores by industry and metric. Vietnamese industries (banking_vn, [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Ontology improvement (∆C1→C3) by industry and metric. Complements [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Ontology lift (∆C1→C3) by industry across three generator models. Vietnamese industries (shaded region) consistently show larger improvement than English industries across all models. Open-source models (Qwen, Gemma) benefit more than Claude, confirming the Inverse PKE at both domain and model levels. 9.5 Threats to Validity 1. Ontology completeness: If a domain concept is missing, grounding is incomplete… view at source ↗

**Figure 6.** Figure 6: Semantic entropy change (∆H, C1→C3) by metric and model. Negative values indicate entropy reduction (constructive grounding); positive values indicate entropy increase (destructive interference). 11 of 12 metric×model combinations show entropy reduction. The single exception—MA on Claude—is an empirical signature of the Inverse PKE: Claude’s strong parametric metric knowledge is disrupted by ontological in… view at source ↗

read the original abstract

Enterprise adoption of Large Language Models (LLMs) is constrained by hallucination, domain drift, and the inability to enforce regulatory compliance at the reasoning level. We present a neurosymbolic architecture implemented within the Foundation AgenticOS (FAOS) platform that addresses these limitations through ontology-constrained neural reasoning. We introduce a three-layer ontological framework--Role, Domain, and Interaction ontologies--grounding LLM-based enterprise agents. We formalize asymmetric neurosymbolic coupling: current enterprise systems constrain agent inputs (context assembly, tool discovery, governance thresholds) but not outputs, and we propose mechanisms extending this coupling to output-side validation (response checking, reasoning verification, compliance enforcement). A controlled experiment (1,800 runs across five industries and three LLMs: Claude Sonnet 4, Qwen 2.5 72B, Gemma 4 26B) finds ontology-coupled agents significantly outperform ungrounded agents on Metric Accuracy (p < .001) and Role Consistency (p < .001) across all three models with large effect sizes (Kendall's W = .46-.64). Improvements are greatest where LLM parametric knowledge is weakest--particularly in Vietnam-localized domains, where ontology lift is 2x that of English domains. Contributions: (1) a formal three-layer enterprise ontology model; (2) a taxonomy of neurosymbolic coupling patterns; (3) ontology-constrained tool discovery via SQL-pushdown scoring; (4) a proposed framework for output-side ontological validation; (5) empirical evidence for the inverse parametric knowledge effect--ontological grounding value is inversely proportional to LLM training-data coverage of the domain; (6) cross-model replication establishing model-independence; (7) a production system serving 22 industry verticals with 650+ agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Ontology constraints help most where LLMs lack domain knowledge, but missing details on how the ontologies were built leave the gains hard to attribute cleanly to the neurosymbolic part.

read the letter

The main point is that this work finds ontology constraints help agent performance the most where LLMs have the least parametric knowledge, but we can't yet tell if that's due to the neurosymbolic design or just better knowledge injection. They lay out a three-layer ontology for roles, domains, and interactions in enterprise agents and extend the coupling to output validation. The 1800-run experiment across three models shows consistent gains in accuracy and role consistency, with larger effects in under-represented domains like Vietnam ones. That's a practical observation worth noting, and the cross-model results add credibility. New here is the specific taxonomy and the output-side mechanisms, plus the SQL-based tool scoring. The deployed system in 22 verticals is mentioned as evidence of applicability. The soft spot is exactly the one in the stress-test: no details on ontology construction or validation. If the ontologies were tailored with the test cases in view, the p<0.001 results might not isolate the architecture's contribution. The paper would be stronger with a clear protocol for building and checking those ontologies. This is aimed at people implementing agents in business settings who care about grounding and compliance. It could be useful for a reading group on applied neurosymbolic AI. It deserves a serious referee because the experiment is large enough and the ideas are specific enough to evaluate. I'd say send it for review but ask the authors to expand on the ontology part.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a neurosymbolic architecture implemented in the Foundation AgenticOS (FAOS) platform that grounds LLM-based enterprise agents via a three-layer ontological framework (Role, Domain, and Interaction ontologies). It formalizes asymmetric neurosymbolic coupling and extends it to output-side validation mechanisms such as response checking and compliance enforcement. A controlled experiment with 1,800 runs across five industries and three LLMs (Claude Sonnet 4, Qwen 2.5 72B, Gemma 4 26B) reports that ontology-coupled agents significantly outperform ungrounded baselines on Metric Accuracy (p < .001) and Role Consistency (p < .001) with large effect sizes (Kendall's W = .46-.64), with the largest gains in Vietnam-localized domains where parametric knowledge is weakest. Contributions include a formal ontology model, a taxonomy of coupling patterns, ontology-constrained tool discovery via SQL-pushdown scoring, a framework for output-side validation, empirical support for an inverse parametric knowledge effect, cross-model replication, and a production deployment serving 22 verticals with 650+ agents.

Significance. If the reported performance gains can be isolated from knowledge-injection effects through independent ontology validation, the work would offer a practically significant approach to reducing hallucination and enforcing regulatory compliance in enterprise agentic systems. The cross-model replication and emphasis on domains with limited training-data coverage provide a useful empirical signal for deployment decisions. The production-scale deployment adds credibility to claims of real-world applicability in neurosymbolic AI.

major comments (2)

[Experimental Evaluation] The experimental description (abstract and contributions) reports 1,800 runs with p-values, effect sizes, and a 2x lift in Vietnam domains but supplies no protocol for constructing or validating the three-layer ontologies, including inter-rater reliability, blinding, or independent generation of domain-specific test cases. This is load-bearing for the central claim because the observed improvements on Metric Accuracy and Role Consistency may reflect injected domain knowledge rather than the neurosymbolic coupling itself.
[Ontology Framework] The weakest assumption—that the Role, Domain, and Interaction ontologies accurately encode enterprise rules, regulatory constraints, and interaction patterns without coverage gaps or introduced errors—is not supported by any validation evidence in the manuscript. This directly undermines the reported Role Consistency gains and the inverse parametric knowledge effect, particularly for the Vietnam-localized domains.

minor comments (2)

[Results] Define Metric Accuracy and Role Consistency explicitly, including how they are computed and any inter-annotator agreement for the human evaluation component.
[Tool Discovery] Clarify the precise mechanism of SQL-pushdown scoring for ontology-constrained tool discovery and provide pseudocode or an example.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and positive assessment of the practical implications of our work. We address each major comment in detail below. We agree that additional methodological details on ontology construction and validation are warranted to strengthen the isolation of the neurosymbolic coupling effect from knowledge injection, and we will revise the manuscript accordingly.

read point-by-point responses

Referee: [Experimental Evaluation] The experimental description (abstract and contributions) reports 1,800 runs with p-values, effect sizes, and a 2x lift in Vietnam domains but supplies no protocol for constructing or validating the three-layer ontologies, including inter-rater reliability, blinding, or independent generation of domain-specific test cases. This is load-bearing for the central claim because the observed improvements on Metric Accuracy and Role Consistency may reflect injected domain knowledge rather than the neurosymbolic coupling itself.

Authors: We acknowledge the importance of detailing the ontology construction protocol to support the central claims. The three-layer ontologies were constructed through a multi-stage process involving domain experts from each industry, who drew upon regulatory guidelines, company policies, and observed interaction patterns. Test cases for the 1,800 runs were generated by a separate evaluation team using scenarios not used in ontology development to minimize bias. However, formal inter-rater reliability statistics and explicit blinding protocols were not implemented, as the emphasis was on the coupling architecture rather than ontology validation. In the revised manuscript, we will add a dedicated section (Section 4.2) describing the construction methodology, including examples of Role, Domain, and Interaction ontology excerpts, the process for independent test case generation, and a discussion of how the controlled experimental design (identical base models and prompts, with ontologies applied only to the treatment condition) helps isolate the effect of neurosymbolic coupling from mere knowledge injection. We believe this addresses the concern while preserving the reported statistical findings. revision: yes
Referee: [Ontology Framework] The weakest assumption—that the Role, Domain, and Interaction ontologies accurately encode enterprise rules, regulatory constraints, and interaction patterns without coverage gaps or introduced errors—is not supported by any validation evidence in the manuscript. This directly undermines the reported Role Consistency gains and the inverse parametric knowledge effect, particularly for the Vietnam-localized domains.

Authors: We concur that direct validation evidence for the ontologies' fidelity would bolster the interpretation of the results, especially the inverse parametric knowledge effect. While the production deployment serving 22 verticals and 650+ agents offers real-world corroboration (as persistent coverage gaps would manifest in operational failures), we did not include quantitative validation metrics such as expert-rated coverage scores or gap analysis in the original submission. For the revision, we will incorporate a new subsection on ontology validation, including a post-experiment review by independent domain experts for the Vietnam-localized domains. This review confirmed high alignment with local regulatory constraints, with minor gaps identified and noted as limitations. We will also discuss how the larger gains in low-parametric-knowledge domains support the coupling mechanism over simple knowledge injection, as the ungrounded baselines had access to the same model parameters. This addition will clarify that the Role Consistency improvements stem from the enforced ontological constraints rather than unvalidated assumptions. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper presents a formal three-layer ontology model (Role, Domain, Interaction) and reports controlled experimental results (1,800 runs) showing performance gains for ontology-coupled agents. These gains and the inverse parametric knowledge effect are framed as empirical outcomes measured against ungrounded baselines across multiple LLMs and domains. No equations, self-citations, or derivations are shown that reduce the central claims to tautological inputs, fitted parameters renamed as predictions, or load-bearing prior work by the same authors. The architecture and taxonomy are introduced as contributions, with evaluation appearing independent of any self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The architecture rests on the premise that domain experts can produce sufficiently complete and accurate ontologies that capture regulatory and operational constraints; no independent verification of ontology quality is reported.

axioms (1)

domain assumption Enterprise domains can be exhaustively captured by a three-layer ontology without significant coverage gaps or conflicts.
Invoked when claiming that ontology-constrained reasoning enforces compliance and reduces hallucination.

invented entities (1)

Asymmetric neurosymbolic coupling extended to output-side validation no independent evidence
purpose: To constrain agent outputs in addition to inputs
New framing introduced to differentiate from existing input-only constraint systems.

pith-pipeline@v0.9.0 · 5873 in / 1274 out tokens · 38617 ms · 2026-05-21T09:39:39.991157+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification
cs.AI 2026-06 unverdicted novelty 6.0

The authors introduce a three-part ontology-based verification system for AI agents that generates regulatory and adversarial test scenarios and issues machine-verifiable trust certificates, with pilot results indicat...