Recognition: no theorem link
Open Ontologies: Tool-Augmented Ontology Engineering with Stable Matching Alignment
Pith reviewed 2026-05-12 01:50 UTC · model grok-4.3
The pith
Stable 1-to-1 matching is the dominant factor in ontology alignment quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that stable 1-to-1 matching is the dominant factor in ontology alignment quality. On the Anatomy track benchmark it achieves F1 = 0.832 with precision 0.963 and recall 0.733, competitive with state-of-the-art systems and highest in precision. Ablation across five weight configurations shows signal weights are irrelevant once stable matching is applied, with F1 varying by less than 0.004, while removing stable matching drops F1 to 0.728. The same method yields F1 = 0.438 on the Conference track benchmark. Structured tool access via the Model Context Protocol enables the language model to reach F1 = 0.717, compared with 0.323 when reading raw OWL files and 0.431 with no文件
What carries the argument
Stable 1-to-1 matching procedure that selects an optimal set of conflict-free correspondences from similarity signals between ontology concepts.
Load-bearing premise
The performance advantages of stable matching observed on the tested benchmark tracks will hold for other ontology domains, larger structures, and different language model configurations.
What would settle it
Applying the same alignment method to a new pair of ontologies outside the original benchmark tracks and finding that stable matching no longer produces competitive F1 scores or that signal weights regain influence would disprove the central claim.
read the original abstract
We present Open Ontologies, an open-source ontology engineering system implemented in Rust that integrates LLM-driven construction with formal OWL reasoning and ontology alignment via the Model Context Protocol. Our primary finding is that stable 1-to-1 matching is the dominant factor in ontology alignment quality: on the OAEI Anatomy track, it achieves F1 = 0.832 (P = 0.963, R = 0.733), competitive with state-of-the-art systems and exceeding all in precision. Ablation across five weight configurations shows that signal weights are irrelevant when stable matching is applied (F1 varies by less than 0.004), while removing stable matching drops F1 to 0.728. On the Conference track, the same method achieves F1 = 0.438. On tool-augmented ontology interaction, we find a surprising result: an LLM reading a raw OWL file (F1 = 0.323) performs worse than the same LLM with no file at all (F1 = 0.431), while structured MCP tool access achieves F1 = 0.717. This demonstrates that tool structure provides a qualitatively different mode of access that the LLM cannot replicate by reading raw syntax. The system ships as a single binary under the MIT licence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Open Ontologies, an open-source Rust system integrating LLM-driven ontology construction, formal OWL reasoning, and ontology alignment via the Model Context Protocol. Its central empirical claim is that stable 1-to-1 matching dominates alignment quality: on the OAEI Anatomy track it yields F1 = 0.832 (P = 0.963, R = 0.733), competitive with SOTA and highest in precision; an ablation across five weight configurations shows weights are irrelevant under stable matching (F1 varies < 0.004) while removing it drops F1 to 0.728. The same method obtains F1 = 0.438 on the Conference track. A secondary finding is that structured MCP tool access (F1 = 0.717) outperforms both raw OWL file reading (F1 = 0.323) and no-file baselines (F1 = 0.431) for LLMs.
Significance. If the reported dominance of stable matching holds under broader testing, the work would provide a clear, actionable principle for ontology alignment pipelines and demonstrate that structured tool interfaces offer a qualitatively different capability than raw syntax access for LLMs. The single-binary MIT-licensed implementation together with direct evaluation on public OAEI benchmarks constitutes a concrete reproducibility asset for the community.
major comments (2)
- [Results / ablation experiments] The claim that stable 1-to-1 matching is the dominant factor in alignment quality rests on the Anatomy ablation (F1 drop from 0.832 to 0.728 when removed; weights irrelevant). No corresponding weight-configuration or removal ablation is reported for the Conference track (only the single F1 = 0.438 figure is given), so the generality of the dominance conclusion across ontology alignment tasks remains untested within the manuscript.
- [Tool-augmented ontology interaction experiments] The tool-augmentation results (raw OWL F1 = 0.323 worse than no-file F1 = 0.431, MCP F1 = 0.717) are presented as a key qualitative finding, yet the manuscript provides no details on prompt templates, exact MCP tool signatures, number of LLM calls, or statistical significance of the differences; these omissions make it difficult to assess whether the observed ordering is robust to implementation choices.
minor comments (1)
- [Abstract and §4] The abstract and results sections would benefit from explicit statement of the exact OAEI reference alignments used, data-split protocol, and whether the reported F1 figures are averaged over multiple LLM runs or single-shot.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the empirical claims and reproducibility.
read point-by-point responses
-
Referee: [Results / ablation experiments] The claim that stable 1-to-1 matching is the dominant factor in alignment quality rests on the Anatomy ablation (F1 drop from 0.832 to 0.728 when removed; weights irrelevant). No corresponding weight-configuration or removal ablation is reported for the Conference track (only the single F1 = 0.438 figure is given), so the generality of the dominance conclusion across ontology alignment tasks remains untested within the manuscript.
Authors: We agree that the ablation study was performed only on the Anatomy track and that the Conference track result is reported without the corresponding weight and removal ablations. This does limit the strength of the generality claim in the current manuscript. In the revised version we will add the full set of weight-configuration and removal ablations for the Conference track, using the same five weight settings and removal condition as in Anatomy, so that readers can directly assess whether the dominance of stable matching holds across both OAEI tasks. revision: yes
-
Referee: [Tool-augmented ontology interaction experiments] The tool-augmentation results (raw OWL F1 = 0.323 worse than no-file F1 = 0.431, MCP F1 = 0.717) are presented as a key qualitative finding, yet the manuscript provides no details on prompt templates, exact MCP tool signatures, number of LLM calls, or statistical significance of the differences; these omissions make it difficult to assess whether the observed ordering is robust to implementation choices.
Authors: We acknowledge that the current manuscript lacks the implementation details needed to evaluate robustness. In the revision we will add: (1) the exact prompt templates used for each of the three conditions, (2) the full MCP tool signatures and their parameter descriptions, (3) the number of LLM calls per condition, and (4) statistical support for the F1 differences (standard deviation across at least five independent runs with the same model and temperature). These additions will allow readers to reproduce and assess the ordering. revision: yes
Circularity Check
No circularity: empirical results on public benchmarks
full rationale
The paper's central claims consist of measured F1 scores and ablation outcomes on the public OAEI Anatomy and Conference tracks. Stable-matching dominance is shown by direct experimental comparison (F1 0.832 with matching vs. 0.728 without; weight variations <0.004), not by any derivation, fitted parameter renamed as prediction, or self-citation chain. No equations, ansatzes, or uniqueness theorems are invoked that reduce to the paper's own inputs; all reported quantities are externally verifiable against fixed reference alignments.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math OWL 2 semantics and reasoning are sound and complete for the tested ontologies
- domain assumption OAEI Anatomy and Conference tracks are representative alignment tasks
Reference graph
Works this paper leans on
-
[1]
Chen, B., Lin, Z., Li, Y., Zhang, N., Wen, H., Chen, H.: OntoAxiom: Evaluating LLM Capabilities for Ontology Axiom Identification. arXiv:2512.05594 (2025)
-
[2]
Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: HermiT: An OWL 2 Reasoner. J. Automated Reasoning 53(3), 245–269 (2014)
work page 2014
-
[3]
Algergawy, A., Faria, D., Ferrara, A., Jimenez-Ruiz, E., et al.: Results of the On- tology Alignment Evaluation Initiative 2024. In: OM Workshop at ISWC 2024. CEUR-WS Vol-3897 (2024)
work page 2024
-
[4]
Jim´ enez-Ruiz, E., Grau, B.C.: LogMap: Logic-Based and Scalable Ontology Match- ing. In: ISWC 2011, pp. 273–288. Springer (2011)
work page 2011
-
[5]
Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I.F., Couto, F.M.: The AgreementMakerLight Ontology Matching System. In: OTM 2013, pp. 527–541. Springer (2013)
work page 2013
-
[6]
He, Y., Chen, J., Antonyrajah, D., Horrocks, I.: BERTMap: A BERT-Based On- tology Alignment System. In: AAAI 2022, pp. 5570–5578 (2022)
work page 2022
-
[7]
Hertling, S., Paulheim, H.: OLaLa: Ontology Matching with Large Language Mod- els. In: ISWC 2023, pp. 442–459. Springer (2023)
work page 2023
-
[8]
Anthropic: Model Context Protocol Specification.https:// modelcontextprotocol.io(2024)
work page 2024
-
[9]
Tani` ere, T.: Oxigraph: A SPARQL Graph Database.https://oxigraph.org (2023)
work page 2023
-
[10]
Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: A Practical OWL-DL Reasoner. J. Web Semantics 5(2), 51–53 (2007)
work page 2007
-
[11]
Tsarkov, D., Horrocks, I.: FaCT++ Description Logic Reasoner: System Descrip- tion. In: IJCAR 2006, pp. 292–297. Springer (2006)
work page 2006
-
[12]
Kazakov, Y., Kr¨ otzsch, M., Simanˇ c´ ık, F.: The Incredible ELK. J. Automated Rea- soning 53(1), 1–61 (2014)
work page 2014
-
[13]
Bioinformatics 40(3), btae104 (2024)
Caufield, J.H., et al.: Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES). Bioinformatics 40(3), btae104 (2024)
work page 2024
-
[14]
Graux, D., et al.: OntoChat: A Framework for Conversational Ontology Engineer- ing. In: ESWC 2024 (2024)
work page 2024
-
[15]
Babaei Giglou, H., D’Souza, J., Auer, S.: LLMs4OL: Large Language Models for Ontology Learning. In: ISWC 2023, pp. 408–427. Springer (2023)
work page 2023
-
[16]
In: OWL Experiences and Directions Workshop (2006)
Horridge, M., Drummond, N., Goodwin, J., Rector, A., Stevens, R., Wang, H.: The Manchester OWL Syntax. In: OWL Experiences and Directions Workshop (2006)
work page 2006
-
[17]
Guo, Y., Pan, Z., Heflin, J.: LUBM: A Benchmark for OWL Knowledge Base Systems. J. Web Semantics 3(2–3), 158–182 (2005)
work page 2005
-
[18]
HashiCorp: Terraform: Infrastructure as Code.https://www.terraform.io(2014) 10 F. Rovai
work page 2014
-
[19]
Semantic Web 2(1), 11–21 (2011)
Horridge, M., Bechhofer, S.: The OWL API: A Java API for OWL Ontologies. Semantic Web 2(1), 11–21 (2011)
work page 2011
-
[20]
BMC Bioinformatics 20, 407 (2019)
Jackson, R.C., et al.: ROBOT: A Tool for Automating Ontology Workflows. BMC Bioinformatics 20, 407 (2019)
work page 2019
-
[21]
Artificial In- telligence in Medicine 80, 11–28 (2017)
Lamy, J.B.: Owlready: Ontology-Oriented Programming in Python. Artificial In- telligence in Medicine 80, 11–28 (2017)
work page 2017
-
[22]
Matentzoglu, N., et al.: A Simple Standard for Sharing Ontological Mappings (SS- SOM). Database 2022, baac035 (2022)
work page 2022
-
[23]
Motik, B., Grau, B.C., Horrocks, I., et al.: OWL 2 Web Ontology Language Profiles (Second Edition). W3C Recommendation (2012)
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.