Unlocking Crowdsourcing for Ontology Matching Validation
Pith reviewed 2026-06-30 22:07 UTC · model grok-4.3
The pith
A crowdsourcing system using three domain-specific mechanisms enables reliable validation of ontology matches by non-experts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that differential trustworthiness, coherence pre-filling, and time-dependent opinion together allow non-expert crowds to validate ontology alignments at quality levels sufficient for integration with automated matching systems, thereby addressing the validation bottleneck created by high-volume LLM outputs.
What carries the argument
The three mechanisms—differential trustworthiness for weighting contributor reliability, coherence pre-filling for consistency-based annotation support, and time-dependent opinion for capturing judgment evolution—that adapt crowdsourcing to ontology matching validation.
If this is right
- Ontology matching systems can incorporate scalable human validation instead of relying solely on experts.
- High volumes of candidate matches from LLM-based matchers become feasible to review.
- Annotation interfaces can be tuned for mixed user groups while preserving output quality.
- Human-in-the-loop workflows become viable for real-world ontology engineering projects.
Where Pith is reading between the lines
- The mechanisms could be tested on validation tasks for other structured outputs such as knowledge graph triples.
- Similar quality controls might apply to crowdsourced evaluation of other AI-generated structured data.
- The time-dependent opinion component suggests a way to model changing consensus in long-running annotation projects.
Load-bearing premise
The three mechanisms are sufficient to ensure crowdsourcing quality for ontology matching validation without domain experts.
What would settle it
A side-by-side comparison of crowdsourced validation decisions against expert gold standards on a fixed ontology matching benchmark, checking whether accuracy remains above a usable threshold when only non-experts participate.
Figures
read the original abstract
Recent advances in large language models (LLMs) pose new challenges for ontology matching (OM). While OM systems built on LLMs have shown remarkable capabilities in discovering more matching candidates, traditional OM validation that relies on domain experts has become overwhelming. In this study, we explore the use of crowdsourcing for OM validation and introduce a novel crowdsourcing system. We propose three domain-specific mechanisms, namely differential trustworthiness, coherence pre-filling, and time-dependent opinion, to ensure the quality of crowdsourcing for OM validation. We demonstrate that our crowdsourcing system can be integrated with existing OM systems to enable human-in-the-loop validation. The evaluation of the system shows its effectiveness in handling diverse user groups and different annotation settings. We discuss two real-world use cases of the system and current limitations for improvement.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a crowdsourcing system for ontology matching (OM) validation to address the scalability challenges posed by LLM-based OM systems. It proposes three domain-specific mechanisms—differential trustworthiness, coherence pre-filling, and time-dependent opinion—to maintain quality with non-expert crowds. The system integrates with existing OM tools for human-in-the-loop validation. The manuscript claims that an evaluation demonstrates the system's effectiveness across diverse user groups and annotation settings, discusses two real-world use cases, and notes current limitations.
Significance. If the mechanisms are shown to deliver reliable validation quality, the work could enable scalable, expert-free OM validation and reduce reliance on scarce domain experts. The domain-specific design of the mechanisms represents a targeted adaptation that, if validated, would distinguish this approach from generic crowdsourcing platforms.
major comments (2)
- [Abstract (and Evaluation section)] The central claim that the three mechanisms suffice to ensure crowdsourcing quality for OM validation without domain experts rests on an unevidenced assertion. The abstract states that 'the evaluation of the system shows its effectiveness' but supplies no quantitative results (precision, recall, F1, inter-annotator agreement), no gold-standard expert comparisons, no baseline OM validation methods, and no ablation of the individual mechanisms. This directly undermines the sufficiency argument.
- [Evaluation section] No experimental design details are provided: participant recruitment, number of users per group, ontology datasets or matching tasks used, annotation settings tested, or how 'diverse user groups' were operationalized. Without these, the claim of effectiveness across settings cannot be evaluated.
minor comments (2)
- [Abstract] The abstract would be strengthened by including at least one key quantitative finding (e.g., agreement rate or accuracy relative to experts) rather than a qualitative assertion of effectiveness.
- [Evaluation section] Consider adding a table that reports per-mechanism contribution or inter-annotator statistics to make the evaluation section more transparent.
Simulated Author's Rebuttal
We thank the referee for the constructive comments highlighting gaps in the presentation of our evaluation. We agree that the current manuscript version does not provide sufficient quantitative evidence or experimental details to fully support the claims of effectiveness. We will revise the manuscript to incorporate these elements.
read point-by-point responses
-
Referee: [Abstract (and Evaluation section)] The central claim that the three mechanisms suffice to ensure crowdsourcing quality for OM validation without domain experts rests on an unevidenced assertion. The abstract states that 'the evaluation of the system shows its effectiveness' but supplies no quantitative results (precision, recall, F1, inter-annotator agreement), no gold-standard expert comparisons, no baseline OM validation methods, and no ablation of the individual mechanisms. This directly undermines the sufficiency argument.
Authors: We acknowledge that the abstract and evaluation section in the submitted manuscript lack the requested quantitative metrics, gold-standard comparisons, baselines, and ablations. This omission weakens the support for our claims. In the revised version, we will expand the evaluation section to report precision, recall, F1, inter-annotator agreement, expert comparisons, baseline methods, and mechanism ablations, with corresponding updates to the abstract. revision: yes
-
Referee: [Evaluation section] No experimental design details are provided: participant recruitment, number of users per group, ontology datasets or matching tasks used, annotation settings tested, or how 'diverse user groups' were operationalized. Without these, the claim of effectiveness across settings cannot be evaluated.
Authors: We agree that the experimental design details are missing from the evaluation section. The revised manuscript will include full descriptions of participant recruitment, group sizes, ontology datasets and tasks, annotation settings, and how diverse user groups were defined and operationalized. revision: yes
Circularity Check
No significant circularity in system proposal or evaluation
full rationale
The paper proposes a crowdsourcing system for OM validation along with three mechanisms (differential trustworthiness, coherence pre-filling, time-dependent opinion) and reports an evaluation of its effectiveness in integration and handling diverse users/settings. No equations, derivations, fitted parameters, predictions of quantities, or self-citations appear in the provided text. The central claims concern system design and empirical performance rather than any quantity defined in terms of itself or reduced by construction to inputs. The derivation chain is therefore self-contained with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Crowdsourcing participants can perform domain-specific ontology matching validation when supported by quality-control mechanisms.
invented entities (3)
-
differential trustworthiness
no independent evidence
-
coherence pre-filling
no independent evidence
-
time-dependent opinion
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Airtasker. n.d.. Airtasker. Retrieved May 1, 2026 from https://www.airtasker.com/
2026
-
[2]
Hamed Babaei Giglou, Jennifer D’Souza, Felix Engel, and Sören Auer. 2024. LLMs4OM: Matching Ontologies with Large Language Models. InThe Semantic Web: ESWC 2024 Satellite Events. Springer, Hersonissos, Crete, Greece, 25–35. doi:10.1007/978-3-031-78952-6_3
-
[3]
Bharathan Balaji, Arka Bhattacharya, Gabriel Fierro, Jingkun Gao, Joshua Gluck, Dezhi Hong, Aslak Johansen, Jason Koh, Joern Ploennigs, Yuvraj Agarwal, Mario Berges, David Culler, Rajesh Gupta, Mikkel Baun Kjærgaard, Mani Srivastava, and Kamin Whitehouse. 2016. Brick: Towards a Unified Metadata Schema For Buildings. InProceedings of the 3rd ACM Internatio...
-
[5]
Tenenbaum, and Igor Mordatch
Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch
-
[6]
InProceedings of the 41st International Conference on Machine Learning
Improving factuality and reasoning in language models through multiagent debate. InProceedings of the 41st International Conference on Machine Learning. JMLR.org, Vienna, Austria, 11733–11763
-
[7]
Yolanda Gil, Daniel Garijo, Varun Ratnakar, Deborah Khider, Julien Emile-Geay, and Nicholas McKay. 2017. A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Metadata Annotations. InThe Semantic Web – ISWC
2017
-
[8]
doi:10.1007/978-3-319-68204-4_24
Springer, Vienna, Austria, 231–246. doi:10.1007/978-3-319-68204-4_24
-
[9]
Karl Hammar, Erik Oskar Wallin, Per Karlberg, and David Hälleberg. 2019. The RealEstateCore Ontology. InThe Semantic Web - ISWC 2019. Springer, Auckland, New Zealand, 130–145. doi:10.1007/978-3-030-30796-7_9
-
[10]
Florian Hanika, Gerhard Wohlgenannt, and Marta Sabou. 2014. The uComp Protégé Plugin: Crowdsourcing Enabled Ontology Engineering. InKnowledge Engineering and Knowledge Management. Springer, Linköping, Sweden, 181–196. doi:10.1007/978-3-319-13704-9_14
-
[11]
Sven Hertling and Heiko Paulheim. 2023. OLaLa: Ontology Matching with Large Language Models. InProceedings of the 12th Knowledge Capture Conference 2023. ACM, Pensacola, FL, USA, 131–139. doi:10.1145/3587259.3627571
-
[12]
Adam Tauman Kalai, Ofir Nachum, Santosh S Vempala, and Edwin Zhang. 2026. Evaluating large language models for accuracy incentivizes hallucinations.Nature (2026), 29 pages. doi:10.1038/s41586-026-10549-w
-
[13]
Chepkoech C. Kiptoo. 2020. Ontology enhancement using crowdsourcing: a conceptual architecture.International Journal of Crowd Science4, 3 (2020), 231–
2020
-
[14]
doi:10.1108/IJCS-10-2019-0028
-
[15]
Huanyu Li, Zlatan Dragisic, Daniel Faria, Valentina Ivanova, Ernesto Jiménez- Ruiz, Patrick Lambrix, and Catia Pesquita. 2019. User validation in ontology alignment: functional assessment and impact.The Knowledge Engineering Review 34 (2019), e15. doi:10.1017/S0269888919000080
-
[16]
John Jack McGowan. 2020. Project Haystack Data Standards. InEnergy and Analytics. River, Aalborg, Denmark, 237–243. doi:10.1201/9781003151944-16
-
[18]
Mortensen, Mark A
Jonathan M. Mortensen, Mark A. Musen, and Natalya F. Noy. 2013. Developing crowdsourced ontology engineering tasks: an iterative process. InProceedings of the 1st International Workshop on Crowdsourcing the Semantic Webco-located with 12th International Semantic Web Conference (ISWC 2013). CEUR-WS.org, Sydney, Australia, 79–88
2013
-
[19]
Lam Nguyen, Erika Barcelos, Roger French, and Yinghui Wu. 2025. KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models. In The Semantic Web – ISWC 2025. Springer, Nara, Japan, 629–649. doi:10.1007/978- 3-032-09527-5_34
-
[20]
OAEI Community. n.d.. Ontology Alignment Evaluation Initiative (OAEI). Re- trieved May 1, 2026 from https://oaei.ontologymatching.org
2026
-
[21]
OpenAI. 2022. Introducing ChatGPT. Retrieved May 1, 2026 from https://openai .com/index/chatgpt/
2022
-
[22]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schul- man, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human...
2022
-
[23]
Heiko Paulheim, Sven Hertling, and Dominique Ritze. 2013. Towards Evaluating Interactive Ontology Matching Tools. InThe Semantic Web: Semantics and Big Data. Springer, Montpellier, France, 31–45. doi:10.1007/978-3-642-38288-8_3
-
[24]
Zhangcheng Qiang, Weiqing Wang, and Kerry Taylor. 2024. Agent-OM: Leverag- ing LLM Agents for Ontology Matching.Proceedings of the VLDB Endowment18, 3 (2024), 516–529. doi:10.14778/3712221.3712222
-
[25]
Zhangcheng Qiang, Weiqing Wang, and Kerry Taylor. 2025. Agent-OM Results for OAEI 2025. InThe 20th International Workshop on Ontology Matching collocated with the 24th International Semantic Web Conference (ISWC 2025), Vol. 4144. CEUR- WS.org, Nara, Japan, 202–210
2025
-
[26]
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly A Reward Model. InProceedings of the 37th International Con- ference on Neural Information Processing Systems, Vol. 36. Curran Associates Inc., New Orleans, LA, USA, 53728–53741
2023
-
[27]
Cristina Sarasua, Elena Simperl, and Natalya F. Noy. 2012. CrowdMap: Crowd- sourcing Ontology Alignment with Microtasks. InThe Semantic Web – ISWC
2012
-
[28]
doi:10.1007/978-3-642-35176-1_33
Springer, Boston, MA, USA, 525–541. doi:10.1007/978-3-642-35176-1_33
-
[29]
George Selgin. 1996. Salvaging Gresham’s Law: The good, the bad, and the illegal. Journal of Money, Credit and Banking28, 4 (1996), 637–649. doi:10.2307/2078075
-
[30]
Yiping Song, Jiaoyan Chen, and Renate A Schmidt. 2026. GenOM: ontology matching with description generation and large language models.World Wide Web29, 3 (2026), 29. doi:10.1007/s11280-026-01413-y
- [31]
-
[32]
John Sweller. 1988. Cognitive load during problem solving: Effects on learning. Cognitive Science12, 2 (1988), 257–285. doi:10.1016/0364-0213(88)90023-7
-
[33]
Maria Taboada, Diego Martinez, Mohammed Arideh, and Rosa Mosquera. 2025. Ontology matching with Large Language Models and prioritized depth-first search.Information Fusion123 (2025), 103254. doi:10.1016/j.inffus.2025.103254
-
[34]
Amos Tversky and Daniel Kahneman. 1974. Judgment under Uncertainty: Heuris- tics and Biases.Science185, 4157 (1974), 1124–1131. doi:10.1126/science.185.4157 .1124
-
[35]
Chi, Quoc V
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InProceedings of the 36th Annual Conference on Neural Information Processing Systems, Vol. 35. Curran Associates, Inc., New Orleans, LA, USA, 24824–24837
2022
-
[36]
Gerhard Wohlgenannt, Marta Sabou, and Florian Hanika. 2016. Crowd-based ontology engineering with the uComp Protégé plugin.Semantic Web7, 4 (2016), 379–398. doi:10.3233/SW-150181
-
[37]
Payne, and Jie Zhang
Shiyao Zhang, Yuji Dong, Yichuan Zhang, Terry R. Payne, and Jie Zhang. 2024. Large Language Model Assisted Multi-Agent Dialogue for Ontology Alignment. InProceedings of the 2024 International Conference on Autonomous Agents and Multiagent Systems. IFAAMAS, Auckland, New Zealand, 2594–2596
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.