Unlocking Crowdsourcing for Ontology Matching Validation

Zhangcheng Qiang

arxiv: 2605.12226 · v4 · pith:JMKYAYSQnew · submitted 2026-05-12 · 💻 cs.IR

Unlocking Crowdsourcing for Ontology Matching Validation

Zhangcheng Qiang This is my paper

Pith reviewed 2026-06-30 22:07 UTC · model grok-4.3

classification 💻 cs.IR

keywords crowdsourcingontology matchingvalidationhuman-in-the-looplarge language modelsquality mechanismsannotation

0 comments

The pith

A crowdsourcing system using three domain-specific mechanisms enables reliable validation of ontology matches by non-experts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models now produce far more ontology matching candidates than domain experts can review. This paper presents a crowdsourcing platform built to handle that volume through three mechanisms that adapt quality controls to ontology validation tasks. The system integrates directly with existing ontology matching tools to support human-in-the-loop checks. Evaluations across varied user groups and annotation formats show the approach maintains effectiveness. Two use cases demonstrate practical deployment while highlighting remaining constraints.

Core claim

The paper claims that differential trustworthiness, coherence pre-filling, and time-dependent opinion together allow non-expert crowds to validate ontology alignments at quality levels sufficient for integration with automated matching systems, thereby addressing the validation bottleneck created by high-volume LLM outputs.

What carries the argument

The three mechanisms—differential trustworthiness for weighting contributor reliability, coherence pre-filling for consistency-based annotation support, and time-dependent opinion for capturing judgment evolution—that adapt crowdsourcing to ontology matching validation.

If this is right

Ontology matching systems can incorporate scalable human validation instead of relying solely on experts.
High volumes of candidate matches from LLM-based matchers become feasible to review.
Annotation interfaces can be tuned for mixed user groups while preserving output quality.
Human-in-the-loop workflows become viable for real-world ontology engineering projects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The mechanisms could be tested on validation tasks for other structured outputs such as knowledge graph triples.
Similar quality controls might apply to crowdsourced evaluation of other AI-generated structured data.
The time-dependent opinion component suggests a way to model changing consensus in long-running annotation projects.

Load-bearing premise

The three mechanisms are sufficient to ensure crowdsourcing quality for ontology matching validation without domain experts.

What would settle it

A side-by-side comparison of crowdsourced validation decisions against expert gold standards on a fixed ontology matching benchmark, checking whether accuracy remains above a usable threshold when only non-experts participate.

Figures

Figures reproduced from arXiv: 2605.12226 by Zhangcheng Qiang.

**Figure 2.** Figure 2: Discovery rates of TPs and FPs vs expert ratio. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Discovery rates of TPs and FPs vs expert knowledge. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Pre-fills by type vs coverage [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Discovery rates of TPs and FPs vs fraction of wrong [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 5.** Figure 5: Pre-fills by domain vs coverage. • Time-Dependent Opinion. We simulate 100 annotators (50% experts and 50% non-experts) with varying trustworthiness. Similarly, we assume that TPs are mappings that exist only in the reference and FPs are mappings that exist only in the matcher. Here, we define 0.5/0.5 as both non-experts and experts having trustworthiness 0.5, while 0.1/0.9 means non-experts with trustwo… view at source ↗

**Figure 7.** Figure 7: Discovery rates of TPs and FPs vs fraction of wrong [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

read the original abstract

Recent advances in large language models (LLMs) pose new challenges for ontology matching (OM). While OM systems built on LLMs have shown remarkable capabilities in discovering more matching candidates, traditional OM validation that relies on domain experts has become overwhelming. In this study, we explore the use of crowdsourcing for OM validation and introduce a novel crowdsourcing system. We propose three domain-specific mechanisms, namely differential trustworthiness, coherence pre-filling, and time-dependent opinion, to ensure the quality of crowdsourcing for OM validation. We demonstrate that our crowdsourcing system can be integrated with existing OM systems to enable human-in-the-loop validation. The evaluation of the system shows its effectiveness in handling diverse user groups and different annotation settings. We discuss two real-world use cases of the system and current limitations for improvement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proposes three mechanisms for crowdsourcing ontology matching validation but the abstract supplies no metrics, baselines, or test details to support the effectiveness claim.

read the letter

Hi,

The main takeaway is that this paper describes a crowdsourcing system for validating LLM-generated ontology matches, built around three mechanisms labeled differential trustworthiness, coherence pre-filling, and time-dependent opinion. It claims the system integrates with existing OM tools and works across user groups and settings.

What is actually new is the application of these three controls to the OM validation setting. The problem framing is reasonable: LLM matchers produce too many candidates for experts alone. The mention of two real-world use cases adds a practical angle that could interest people trying to deploy this kind of system.

The soft spots are in the evaluation. The abstract states that the evaluation shows effectiveness, yet it contains no numbers, no gold-standard comparisons, no inter-annotator agreement figures, no ablation of the three mechanisms, and no baseline against plain crowdsourcing or expert-only validation. The stress-test note is accurate on this point; without those details the claim that the mechanisms suffice without domain experts remains untested. If the full manuscript has a proper experimental section with those elements, the picture would change, but nothing of that sort is visible here.

This is aimed at researchers in ontology matching and semantic integration who need scalable validation methods. A reader hunting for a ready method or reproducible results will find little to use. It does not yet merit a serious referee because the central assertions rest on assertion rather than evidence. I would recommend against peer review until the authors add concrete results and comparisons.

Referee Report

2 major / 2 minor

Summary. The paper introduces a crowdsourcing system for ontology matching (OM) validation to address the scalability challenges posed by LLM-based OM systems. It proposes three domain-specific mechanisms—differential trustworthiness, coherence pre-filling, and time-dependent opinion—to maintain quality with non-expert crowds. The system integrates with existing OM tools for human-in-the-loop validation. The manuscript claims that an evaluation demonstrates the system's effectiveness across diverse user groups and annotation settings, discusses two real-world use cases, and notes current limitations.

Significance. If the mechanisms are shown to deliver reliable validation quality, the work could enable scalable, expert-free OM validation and reduce reliance on scarce domain experts. The domain-specific design of the mechanisms represents a targeted adaptation that, if validated, would distinguish this approach from generic crowdsourcing platforms.

major comments (2)

[Abstract (and Evaluation section)] The central claim that the three mechanisms suffice to ensure crowdsourcing quality for OM validation without domain experts rests on an unevidenced assertion. The abstract states that 'the evaluation of the system shows its effectiveness' but supplies no quantitative results (precision, recall, F1, inter-annotator agreement), no gold-standard expert comparisons, no baseline OM validation methods, and no ablation of the individual mechanisms. This directly undermines the sufficiency argument.
[Evaluation section] No experimental design details are provided: participant recruitment, number of users per group, ontology datasets or matching tasks used, annotation settings tested, or how 'diverse user groups' were operationalized. Without these, the claim of effectiveness across settings cannot be evaluated.

minor comments (2)

[Abstract] The abstract would be strengthened by including at least one key quantitative finding (e.g., agreement rate or accuracy relative to experts) rather than a qualitative assertion of effectiveness.
[Evaluation section] Consider adding a table that reports per-mechanism contribution or inter-annotator statistics to make the evaluation section more transparent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting gaps in the presentation of our evaluation. We agree that the current manuscript version does not provide sufficient quantitative evidence or experimental details to fully support the claims of effectiveness. We will revise the manuscript to incorporate these elements.

read point-by-point responses

Referee: [Abstract (and Evaluation section)] The central claim that the three mechanisms suffice to ensure crowdsourcing quality for OM validation without domain experts rests on an unevidenced assertion. The abstract states that 'the evaluation of the system shows its effectiveness' but supplies no quantitative results (precision, recall, F1, inter-annotator agreement), no gold-standard expert comparisons, no baseline OM validation methods, and no ablation of the individual mechanisms. This directly undermines the sufficiency argument.

Authors: We acknowledge that the abstract and evaluation section in the submitted manuscript lack the requested quantitative metrics, gold-standard comparisons, baselines, and ablations. This omission weakens the support for our claims. In the revised version, we will expand the evaluation section to report precision, recall, F1, inter-annotator agreement, expert comparisons, baseline methods, and mechanism ablations, with corresponding updates to the abstract. revision: yes
Referee: [Evaluation section] No experimental design details are provided: participant recruitment, number of users per group, ontology datasets or matching tasks used, annotation settings tested, or how 'diverse user groups' were operationalized. Without these, the claim of effectiveness across settings cannot be evaluated.

Authors: We agree that the experimental design details are missing from the evaluation section. The revised manuscript will include full descriptions of participant recruitment, group sizes, ontology datasets and tasks, annotation settings, and how diverse user groups were defined and operationalized. revision: yes

Circularity Check

0 steps flagged

No significant circularity in system proposal or evaluation

full rationale

The paper proposes a crowdsourcing system for OM validation along with three mechanisms (differential trustworthiness, coherence pre-filling, time-dependent opinion) and reports an evaluation of its effectiveness in integration and handling diverse users/settings. No equations, derivations, fitted parameters, predictions of quantities, or self-citations appear in the provided text. The central claims concern system design and empirical performance rather than any quantity defined in terms of itself or reduced by construction to inputs. The derivation chain is therefore self-contained with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claim rests on the untested premise that the three invented mechanisms will produce reliable validation at scale. No free parameters are mentioned. The mechanisms themselves function as invented constructs without independent evidence supplied in the abstract.

axioms (1)

domain assumption Crowdsourcing participants can perform domain-specific ontology matching validation when supported by quality-control mechanisms.
This background assumption underpins the entire proposal that non-experts can replace or supplement domain experts.

invented entities (3)

differential trustworthiness no independent evidence
purpose: Weighting of crowd workers to improve validation quality
New mechanism introduced in the abstract with no external validation or prior citation referenced.
coherence pre-filling no independent evidence
purpose: Using consistency to pre-populate answers for crowd workers
New mechanism introduced in the abstract with no external validation or prior citation referenced.
time-dependent opinion no independent evidence
purpose: Accounting for temporal changes in crowd judgments
New mechanism introduced in the abstract with no external validation or prior citation referenced.

pith-pipeline@v0.9.1-grok · 5649 in / 1339 out tokens · 26400 ms · 2026-06-30T22:07:36.177169+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 21 canonical work pages

[1]

Airtasker. n.d.. Airtasker. Retrieved May 1, 2026 from https://www.airtasker.com/

2026
[2]

Hamed Babaei Giglou, Jennifer D’Souza, Felix Engel, and Sören Auer. 2024. LLMs4OM: Matching Ontologies with Large Language Models. InThe Semantic Web: ESWC 2024 Satellite Events. Springer, Hersonissos, Crete, Greece, 25–35. doi:10.1007/978-3-031-78952-6_3

work page doi:10.1007/978-3-031-78952-6_3 2024
[3]

Bharathan Balaji, Arka Bhattacharya, Gabriel Fierro, Jingkun Gao, Joshua Gluck, Dezhi Hong, Aslak Johansen, Jason Koh, Joern Ploennigs, Yuvraj Agarwal, Mario Berges, David Culler, Rajesh Gupta, Mikkel Baun Kjærgaard, Mani Srivastava, and Kamin Whitehouse. 2016. Brick: Towards a Unified Metadata Schema For Buildings. InProceedings of the 3rd ACM Internatio...

work page arXiv 2016
[5]

Tenenbaum, and Igor Mordatch

Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch
[6]

InProceedings of the 41st International Conference on Machine Learning

Improving factuality and reasoning in language models through multiagent debate. InProceedings of the 41st International Conference on Machine Learning. JMLR.org, Vienna, Austria, 11733–11763
[7]

Yolanda Gil, Daniel Garijo, Varun Ratnakar, Deborah Khider, Julien Emile-Geay, and Nicholas McKay. 2017. A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations. InThe Semantic Web – ISWC

2017
[8]

doi:10.1007/978-3-319-68204-4_24

Springer, Vienna, Austria, 231–246. doi:10.1007/978-3-319-68204-4_24

work page doi:10.1007/978-3-319-68204-4_24
[9]

Karl Hammar, Erik Oskar Wallin, Per Karlberg, and David Hälleberg. 2019. The RealEstateCore Ontology. InThe Semantic Web - ISWC 2019. Springer, Auckland, New Zealand, 130–145. doi:10.1007/978-3-030-30796-7_9

work page doi:10.1007/978-3-030-30796-7_9 2019
[10]

Florian Hanika, Gerhard Wohlgenannt, and Marta Sabou. 2014. The uComp Protégé Plugin: Crowdsourcing Enabled Ontology Engineering. InKnowledge Engineering and Knowledge Management. Springer, Linköping, Sweden, 181–196. doi:10.1007/978-3-319-13704-9_14

work page doi:10.1007/978-3-319-13704-9_14 2014
[11]

Sven Hertling and Heiko Paulheim. 2023. OLaLa: Ontology Matching with Large Language Models. InProceedings of the 12th Knowledge Capture Conference 2023. ACM, Pensacola, FL, USA, 131–139. doi:10.1145/3587259.3627571

work page doi:10.1145/3587259.3627571 2023
[12]

Adam Tauman Kalai, Ofir Nachum, Santosh S Vempala, and Edwin Zhang. 2026. Evaluating large language models for accuracy incentivizes hallucinations.Nature (2026), 29 pages. doi:10.1038/s41586-026-10549-w

work page doi:10.1038/s41586-026-10549-w 2026
[13]

Chepkoech C. Kiptoo. 2020. Ontology enhancement using crowdsourcing: a conceptual architecture.International Journal of Crowd Science4, 3 (2020), 231–

2020
[14]

doi:10.1108/IJCS-10-2019-0028

work page doi:10.1108/ijcs-10-2019-0028 2019
[15]

Huanyu Li, Zlatan Dragisic, Daniel Faria, Valentina Ivanova, Ernesto Jiménez- Ruiz, Patrick Lambrix, and Catia Pesquita. 2019. User validation in ontology alignment: functional assessment and impact.The Knowledge Engineering Review 34 (2019), e15. doi:10.1017/S0269888919000080

work page doi:10.1017/s0269888919000080 2019
[16]

John Jack McGowan. 2020. Project Haystack Data Standards. InEnergy and Analytics. River, Aalborg, Denmark, 237–243. doi:10.1201/9781003151944-16

work page doi:10.1201/9781003151944-16 2020
[18]

Mortensen, Mark A

Jonathan M. Mortensen, Mark A. Musen, and Natalya F. Noy. 2013. Developing crowdsourced ontology engineering tasks: an iterative process. InProceedings of the 1st International Workshop on Crowdsourcing the Semantic Webco-located with 12th International Semantic Web Conference (ISWC 2013). CEUR-WS.org, Sydney, Australia, 79–88

2013
[19]

Lam Nguyen, Erika Barcelos, Roger French, and Yinghui Wu. 2025. KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models. In The Semantic Web – ISWC 2025. Springer, Nara, Japan, 629–649. doi:10.1007/978- 3-032-09527-5_34

work page doi:10.1007/978- 2025
[20]

OAEI Community. n.d.. Ontology Alignment Evaluation Initiative (OAEI). Re- trieved May 1, 2026 from https://oaei.ontologymatching.org

2026
[21]

OpenAI. 2022. Introducing ChatGPT. Retrieved May 1, 2026 from https://openai .com/index/chatgpt/

2022
[22]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schul- man, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human...

2022
[23]

Heiko Paulheim, Sven Hertling, and Dominique Ritze. 2013. Towards Evaluating Interactive Ontology Matching Tools. InThe Semantic Web: Semantics and Big Data. Springer, Montpellier, France, 31–45. doi:10.1007/978-3-642-38288-8_3

work page doi:10.1007/978-3-642-38288-8_3 2013
[24]

Zhangcheng Qiang, Weiqing Wang, and Kerry Taylor. 2024. Agent-OM: Leverag- ing LLM Agents for Ontology Matching.Proceedings of the VLDB Endowment18, 3 (2024), 516–529. doi:10.14778/3712221.3712222

work page doi:10.14778/3712221.3712222 2024
[25]

Zhangcheng Qiang, Weiqing Wang, and Kerry Taylor. 2025. Agent-OM Results for OAEI 2025. InThe 20th International Workshop on Ontology Matching collocated with the 24th International Semantic Web Conference (ISWC 2025), Vol. 4144. CEUR- WS.org, Nara, Japan, 202–210

2025
[26]

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly A Reward Model. InProceedings of the 37th International Con- ference on Neural Information Processing Systems, Vol. 36. Curran Associates Inc., New Orleans, LA, USA, 53728–53741

2023
[27]

Cristina Sarasua, Elena Simperl, and Natalya F. Noy. 2012. CrowdMap: Crowd- sourcing Ontology Alignment with Microtasks. InThe Semantic Web – ISWC

2012
[28]

doi:10.1007/978-3-642-35176-1_33

Springer, Boston, MA, USA, 525–541. doi:10.1007/978-3-642-35176-1_33

work page doi:10.1007/978-3-642-35176-1_33
[29]

George Selgin. 1996. Salvaging Gresham’s Law: The good, the bad, and the illegal. Journal of Money, Credit and Banking28, 4 (1996), 637–649. doi:10.2307/2078075

work page doi:10.2307/2078075 1996
[30]

Yiping Song, Jiaoyan Chen, and Renate A Schmidt. 2026. GenOM: ontology matching with description generation and large language models.World Wide Web29, 3 (2026), 29. doi:10.1007/s11280-026-01413-y

work page doi:10.1007/s11280-026-01413-y 2026
[31]

Guilherme Sousa, Rinaldo Lima, and Cassia Trojahn. 2025. Complex Ontology Matching with Large Language Model Embeddings. arXiv:2502.13619 [cs.CL] https://arxiv.org/abs/2502.13619

work page arXiv 2025
[32]

John Sweller. 1988. Cognitive load during problem solving: Effects on learning. Cognitive Science12, 2 (1988), 257–285. doi:10.1016/0364-0213(88)90023-7

work page doi:10.1016/0364-0213(88)90023-7 1988
[33]

Maria Taboada, Diego Martinez, Mohammed Arideh, and Rosa Mosquera. 2025. Ontology matching with Large Language Models and prioritized depth-first search.Information Fusion123 (2025), 103254. doi:10.1016/j.inffus.2025.103254

work page doi:10.1016/j.inffus.2025.103254 2025
[34]

Amos Tversky and Daniel Kahneman. 1974. Judgment under Uncertainty: Heuris- tics and Biases.Science185, 4157 (1974), 1124–1131. doi:10.1126/science.185.4157 .1124

work page doi:10.1126/science.185.4157 1974
[35]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InProceedings of the 36th Annual Conference on Neural Information Processing Systems, Vol. 35. Curran Associates, Inc., New Orleans, LA, USA, 24824–24837

2022
[36]

Gerhard Wohlgenannt, Marta Sabou, and Florian Hanika. 2016. Crowd-based ontology engineering with the uComp Protégé plugin.Semantic Web7, 4 (2016), 379–398. doi:10.3233/SW-150181

work page doi:10.3233/sw-150181 2016
[37]

Payne, and Jie Zhang

Shiyao Zhang, Yuji Dong, Yichuan Zhang, Terry R. Payne, and Jie Zhang. 2024. Large Language Model Assisted Multi-Agent Dialogue for Ontology Alignment. InProceedings of the 2024 International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS, Auckland, New Zealand, 2594–2596

2024

[1] [1]

Airtasker. n.d.. Airtasker. Retrieved May 1, 2026 from https://www.airtasker.com/

2026

[2] [2]

Hamed Babaei Giglou, Jennifer D’Souza, Felix Engel, and Sören Auer. 2024. LLMs4OM: Matching Ontologies with Large Language Models. InThe Semantic Web: ESWC 2024 Satellite Events. Springer, Hersonissos, Crete, Greece, 25–35. doi:10.1007/978-3-031-78952-6_3

work page doi:10.1007/978-3-031-78952-6_3 2024

[3] [3]

Bharathan Balaji, Arka Bhattacharya, Gabriel Fierro, Jingkun Gao, Joshua Gluck, Dezhi Hong, Aslak Johansen, Jason Koh, Joern Ploennigs, Yuvraj Agarwal, Mario Berges, David Culler, Rajesh Gupta, Mikkel Baun Kjærgaard, Mani Srivastava, and Kamin Whitehouse. 2016. Brick: Towards a Unified Metadata Schema For Buildings. InProceedings of the 3rd ACM Internatio...

work page arXiv 2016

[4] [5]

Tenenbaum, and Igor Mordatch

Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch

[5] [6]

InProceedings of the 41st International Conference on Machine Learning

Improving factuality and reasoning in language models through multiagent debate. InProceedings of the 41st International Conference on Machine Learning. JMLR.org, Vienna, Austria, 11733–11763

[6] [7]

Yolanda Gil, Daniel Garijo, Varun Ratnakar, Deborah Khider, Julien Emile-Geay, and Nicholas McKay. 2017. A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations. InThe Semantic Web – ISWC

2017

[7] [8]

doi:10.1007/978-3-319-68204-4_24

Springer, Vienna, Austria, 231–246. doi:10.1007/978-3-319-68204-4_24

work page doi:10.1007/978-3-319-68204-4_24

[8] [9]

Karl Hammar, Erik Oskar Wallin, Per Karlberg, and David Hälleberg. 2019. The RealEstateCore Ontology. InThe Semantic Web - ISWC 2019. Springer, Auckland, New Zealand, 130–145. doi:10.1007/978-3-030-30796-7_9

work page doi:10.1007/978-3-030-30796-7_9 2019

[9] [10]

Florian Hanika, Gerhard Wohlgenannt, and Marta Sabou. 2014. The uComp Protégé Plugin: Crowdsourcing Enabled Ontology Engineering. InKnowledge Engineering and Knowledge Management. Springer, Linköping, Sweden, 181–196. doi:10.1007/978-3-319-13704-9_14

work page doi:10.1007/978-3-319-13704-9_14 2014

[10] [11]

Sven Hertling and Heiko Paulheim. 2023. OLaLa: Ontology Matching with Large Language Models. InProceedings of the 12th Knowledge Capture Conference 2023. ACM, Pensacola, FL, USA, 131–139. doi:10.1145/3587259.3627571

work page doi:10.1145/3587259.3627571 2023

[11] [12]

Adam Tauman Kalai, Ofir Nachum, Santosh S Vempala, and Edwin Zhang. 2026. Evaluating large language models for accuracy incentivizes hallucinations.Nature (2026), 29 pages. doi:10.1038/s41586-026-10549-w

work page doi:10.1038/s41586-026-10549-w 2026

[12] [13]

Chepkoech C. Kiptoo. 2020. Ontology enhancement using crowdsourcing: a conceptual architecture.International Journal of Crowd Science4, 3 (2020), 231–

2020

[13] [14]

doi:10.1108/IJCS-10-2019-0028

work page doi:10.1108/ijcs-10-2019-0028 2019

[14] [15]

Huanyu Li, Zlatan Dragisic, Daniel Faria, Valentina Ivanova, Ernesto Jiménez- Ruiz, Patrick Lambrix, and Catia Pesquita. 2019. User validation in ontology alignment: functional assessment and impact.The Knowledge Engineering Review 34 (2019), e15. doi:10.1017/S0269888919000080

work page doi:10.1017/s0269888919000080 2019

[15] [16]

John Jack McGowan. 2020. Project Haystack Data Standards. InEnergy and Analytics. River, Aalborg, Denmark, 237–243. doi:10.1201/9781003151944-16

work page doi:10.1201/9781003151944-16 2020

[16] [18]

Mortensen, Mark A

Jonathan M. Mortensen, Mark A. Musen, and Natalya F. Noy. 2013. Developing crowdsourced ontology engineering tasks: an iterative process. InProceedings of the 1st International Workshop on Crowdsourcing the Semantic Webco-located with 12th International Semantic Web Conference (ISWC 2013). CEUR-WS.org, Sydney, Australia, 79–88

2013

[17] [19]

Lam Nguyen, Erika Barcelos, Roger French, and Yinghui Wu. 2025. KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models. In The Semantic Web – ISWC 2025. Springer, Nara, Japan, 629–649. doi:10.1007/978- 3-032-09527-5_34

work page doi:10.1007/978- 2025

[18] [20]

OAEI Community. n.d.. Ontology Alignment Evaluation Initiative (OAEI). Re- trieved May 1, 2026 from https://oaei.ontologymatching.org

2026

[19] [21]

OpenAI. 2022. Introducing ChatGPT. Retrieved May 1, 2026 from https://openai .com/index/chatgpt/

2022

[20] [22]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schul- man, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human...

2022

[21] [23]

Heiko Paulheim, Sven Hertling, and Dominique Ritze. 2013. Towards Evaluating Interactive Ontology Matching Tools. InThe Semantic Web: Semantics and Big Data. Springer, Montpellier, France, 31–45. doi:10.1007/978-3-642-38288-8_3

work page doi:10.1007/978-3-642-38288-8_3 2013

[22] [24]

Zhangcheng Qiang, Weiqing Wang, and Kerry Taylor. 2024. Agent-OM: Leverag- ing LLM Agents for Ontology Matching.Proceedings of the VLDB Endowment18, 3 (2024), 516–529. doi:10.14778/3712221.3712222

work page doi:10.14778/3712221.3712222 2024

[23] [25]

Zhangcheng Qiang, Weiqing Wang, and Kerry Taylor. 2025. Agent-OM Results for OAEI 2025. InThe 20th International Workshop on Ontology Matching collocated with the 24th International Semantic Web Conference (ISWC 2025), Vol. 4144. CEUR- WS.org, Nara, Japan, 202–210

2025

[24] [26]

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly A Reward Model. InProceedings of the 37th International Con- ference on Neural Information Processing Systems, Vol. 36. Curran Associates Inc., New Orleans, LA, USA, 53728–53741

2023

[25] [27]

Cristina Sarasua, Elena Simperl, and Natalya F. Noy. 2012. CrowdMap: Crowd- sourcing Ontology Alignment with Microtasks. InThe Semantic Web – ISWC

2012

[26] [28]

doi:10.1007/978-3-642-35176-1_33

Springer, Boston, MA, USA, 525–541. doi:10.1007/978-3-642-35176-1_33

work page doi:10.1007/978-3-642-35176-1_33

[27] [29]

George Selgin. 1996. Salvaging Gresham’s Law: The good, the bad, and the illegal. Journal of Money, Credit and Banking28, 4 (1996), 637–649. doi:10.2307/2078075

work page doi:10.2307/2078075 1996

[28] [30]

Yiping Song, Jiaoyan Chen, and Renate A Schmidt. 2026. GenOM: ontology matching with description generation and large language models.World Wide Web29, 3 (2026), 29. doi:10.1007/s11280-026-01413-y

work page doi:10.1007/s11280-026-01413-y 2026

[29] [31]

Guilherme Sousa, Rinaldo Lima, and Cassia Trojahn. 2025. Complex Ontology Matching with Large Language Model Embeddings. arXiv:2502.13619 [cs.CL] https://arxiv.org/abs/2502.13619

work page arXiv 2025

[30] [32]

John Sweller. 1988. Cognitive load during problem solving: Effects on learning. Cognitive Science12, 2 (1988), 257–285. doi:10.1016/0364-0213(88)90023-7

work page doi:10.1016/0364-0213(88)90023-7 1988

[31] [33]

Maria Taboada, Diego Martinez, Mohammed Arideh, and Rosa Mosquera. 2025. Ontology matching with Large Language Models and prioritized depth-first search.Information Fusion123 (2025), 103254. doi:10.1016/j.inffus.2025.103254

work page doi:10.1016/j.inffus.2025.103254 2025

[32] [34]

Amos Tversky and Daniel Kahneman. 1974. Judgment under Uncertainty: Heuris- tics and Biases.Science185, 4157 (1974), 1124–1131. doi:10.1126/science.185.4157 .1124

work page doi:10.1126/science.185.4157 1974

[33] [35]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InProceedings of the 36th Annual Conference on Neural Information Processing Systems, Vol. 35. Curran Associates, Inc., New Orleans, LA, USA, 24824–24837

2022

[34] [36]

Gerhard Wohlgenannt, Marta Sabou, and Florian Hanika. 2016. Crowd-based ontology engineering with the uComp Protégé plugin.Semantic Web7, 4 (2016), 379–398. doi:10.3233/SW-150181

work page doi:10.3233/sw-150181 2016

[35] [37]

Payne, and Jie Zhang

Shiyao Zhang, Yuji Dong, Yichuan Zhang, Terry R. Payne, and Jie Zhang. 2024. Large Language Model Assisted Multi-Agent Dialogue for Ontology Alignment. InProceedings of the 2024 International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS, Auckland, New Zealand, 2594–2596

2024