When CQs Go Wrong: Challenges in CQ Verification with OE-Assist
Pith reviewed 2026-06-25 23:35 UTC · model grok-4.3
The pith
Competency questions with ambiguities and excessive complexity hinder reliable ontology verification and require a dedicated refinement tool before publication.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CQ-verification is time-consuming and error-prone because it requires careful interpretation of linguistic nuances and precise alignment with formal ontology constructs; ambiguities and complexity in CQs lead to inconsistent modelling decisions, and the experiments demonstrate the necessity of a tool to refine CQs before publishing them to avoid these problems in the ontology engineering process.
What carries the argument
OE-Assist, the LLM assistant deployed to support participants in CQ-verification tasks, used to surface specific interpretation challenges across the 20 tasks.
If this is right
- Refined CQs produce more consistent alignment between natural language questions and formal ontology constructs.
- Unrevised CQs increase the likelihood of error-prone and time-consuming verification outcomes.
- A pre-publication refinement step reduces ambiguity that otherwise propagates into later ontology engineering phases.
- LLM assistance alone does not eliminate the need for prior CQ refinement.
Where Pith is reading between the lines
- A similar refinement step could apply to other natural-language specifications used in knowledge engineering.
- Detection rules for common CQ ambiguities might be added to existing ontology tools to automate part of the process.
- Refinement could be tested as a standard checkpoint before any CQ-based evaluation begins.
Load-bearing premise
The challenges observed with 19 participants across 20 tasks using OE-Assist are representative of typical CQ verification difficulties and would persist without a dedicated refinement tool.
What would settle it
A controlled comparison in which the same 20 tasks are rerun after systematic refinement of the CQs, measuring whether the rate of inconsistent modelling decisions drops.
Figures
read the original abstract
Competency Questions (CQs) are the central component of CQ-verification, an established process in which an ontology is evaluated against a set of natural language questions to determine whether the intended purpose of the ontology has been properly modelled. However, CQ-verification is often time-consuming and error-prone, as it requires careful interpretation of linguistic nuances and precise alignment with formal ontology constructs. Ambiguities and complexity in CQs can further complicate this process, leading to inconsistent modelling decisions and verification outcomes. In this paper, we investigate what makes a CQ challenging and possible solutions to enhance the users' performance in the CQ-verification process. We experimented with the data of 19 participants who performed CQ-verification on 20 tasks using an LLM assistant to support ontology evaluation. The results show the necessity of a tool to refine CQs before publishing them to avoid ambiguity or excessive complexity in later phases of the ontology engineering process.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports an empirical user study in which 19 participants performed CQ-verification on 20 tasks using the OE-Assist LLM assistant. It identifies challenges arising from ambiguities and excessive complexity in natural-language CQs and concludes that these results demonstrate the necessity of a dedicated pre-publication CQ refinement tool to improve subsequent ontology engineering phases.
Significance. If the central claim were supported by appropriate controls and metrics, the work would provide a useful empirical illustration of practical difficulties in CQ-based ontology verification and could motivate tool development in the ontology engineering community. The study design incorporates real participants interacting with an LLM assistant, which supplies a concrete, practice-oriented data point.
major comments (2)
- [Abstract] Abstract: the single-condition design (19 participants, 20 tasks, OE-Assist only) supplies no baseline arm (e.g., pre-refined CQs or no assistant) and reports no quantitative correctness metrics, statistical tests, or raw data. Consequently the observed errors cannot be attributed specifically to unrefined CQs, which is load-bearing for the necessity claim.
- [Experimental setup / Results] Experimental setup / Results: without a control condition or within-subject comparison, the data cannot test whether the reported ambiguities and complexity would be reliably reduced by a refinement tool, leaving the causal inference that such a tool is necessary unsupported.
minor comments (1)
- [Abstract] Abstract: participant demographics, task selection criteria, and exact performance measures are omitted, reducing the reader's ability to assess representativeness.
Simulated Author's Rebuttal
Thank you for the referee's insightful comments. We recognize the limitations of our single-condition study design and will revise the manuscript to clarify the exploratory nature of the work, qualify our conclusions regarding the necessity of a CQ refinement tool, and add a limitations section to address the lack of baseline comparisons and quantitative metrics.
read point-by-point responses
-
Referee: [Abstract] Abstract: the single-condition design (19 participants, 20 tasks, OE-Assist only) supplies no baseline arm (e.g., pre-refined CQs or no assistant) and reports no quantitative correctness metrics, statistical tests, or raw data. Consequently the observed errors cannot be attributed specifically to unrefined CQs, which is load-bearing for the necessity claim.
Authors: The referee correctly identifies that our study employs a single-condition design without a baseline. This was intentional as the goal was to investigate challenges in CQ verification as it is currently practiced with LLM assistance, rather than to evaluate the impact of refinement. The analysis was qualitative, focusing on participant feedback and observed issues, which explains the absence of quantitative metrics and statistical tests. We will revise the abstract to better describe the study as exploratory and to moderate the claim about demonstrating necessity, instead highlighting the observed challenges as motivation for tool development. Raw data can be made available upon request in a revision if it aids transparency. revision: partial
-
Referee: [Experimental setup / Results] Experimental setup / Results: without a control condition or within-subject comparison, the data cannot test whether the reported ambiguities and complexity would be reliably reduced by a refinement tool, leaving the causal inference that such a tool is necessary unsupported.
Authors: We concur that the current data does not support a causal claim about the effectiveness of a refinement tool, as no comparison was made. The recommendation for such a tool stems from the identification of specific ambiguities and complexities that hinder verification, suggesting that preemptive refinement could mitigate these. In the revised manuscript, we will reframe the conclusions to present this as a motivated recommendation for future research and tool building, rather than an empirically proven necessity. We will explicitly state in a new limitations paragraph that controlled experiments are needed to confirm the benefits of CQ refinement tools. revision: partial
Circularity Check
No circularity: empirical user study with observational inference
full rationale
The paper reports results from a single-arm user study (19 participants, 20 tasks) using OE-Assist and draws the necessity claim directly from observed ambiguities and errors in that data. No equations, parameters, derivations, or self-citations are present that reduce any result to its own inputs by construction. The inference chain is data collection to qualitative observation, which is self-contained and externally falsifiable via replication with different participants or baselines.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
and others , title =
Rebboud, Y. and others , title =. ESWC , year =
-
[2]
arXiv preprint arXiv:2507.02989 , year=
A Comparative Study of Competency Question Elicitation Methods from Ontology Requirements , author=. arXiv preprint arXiv:2507.02989 , year=
-
[3]
and others , title =
Ciroku, F. and others , title =. arXiv preprint arXiv:240X.XXXXX , year =
-
[4]
and others , title =
Pan, X. and others , title =. arXiv preprint arXiv:240X.XXXXX , year =
-
[5]
ISWC Workshop , year =
Anonymous , title =. ISWC Workshop , year =
-
[6]
and others , title =
McNamara, J. and others , title =. Semantic Web Journal , year =
-
[7]
and others , title =
Di Nuzzo, G. and others , title =. arXiv preprint arXiv:240X.XXXXX , year =
-
[8]
and others , title =
Taghzouti, A. and others , title =. ESWC , year =
-
[9]
International Semantic Web Conference , pages=
Large Language Models Assisting Ontology Evaluation , author=. International Semantic Web Conference , pages=. 2025 , organization=
2025
-
[10]
and others , title =
Alharbi, R. and others , title =. Knowledge Engineering Review , year =
-
[11]
ISWC , year =
Anonymous , title =. ISWC , year =
-
[12]
European Semantic Web Conference , pages=
Ontology generation using large language models , author=. European Semantic Web Conference , pages=. 2025 , organization=
2025
-
[13]
Large Language Models as Assistants for Ontology Engineering , author=
-
[14]
European semantic web conference , pages=
Navigating ontology development with large language models , author=. European semantic web conference , pages=. 2024 , organization=
2024
-
[15]
International Conference on Knowledge Engineering and Knowledge Management , pages=
On the roles of competency questions in ontology engineering , author=. International Conference on Knowledge Engineering and Knowledge Management , pages=. 2024 , organization=
2024
-
[16]
arXiv preprint arXiv:2412.13688 , year=
Discerning and characterising types of competency questions for ontologies , author=. arXiv preprint arXiv:2412.13688 , year=
-
[17]
International Conference on Knowledge Engineering and Knowledge Management , pages=
A review and comparison of competency question engineering approaches , author=. International Conference on Knowledge Engineering and Knowledge Management , pages=. 2024 , organization=
2024
-
[18]
arXiv preprint arXiv:2505.24554 , year=
Bench4KE: Benchmarking Automated Competency Question Generation , author=. arXiv preprint arXiv:2505.24554 , year=
-
[19]
International Conference on Knowledge Engineering and Knowledge Management , pages=
Ontology testing-methodology and tool , author=. International Conference on Knowledge Engineering and Knowledge Management , pages=. 2012 , organization=
2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.