CLaRO: a Data-driven CNL for Specifying Competency Questions
Pith reviewed 2026-05-24 20:37 UTC · model grok-4.3
The pith
A data-driven controlled natural language covers about 90 percent of competency questions for ontologies with 93 templates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CLaRO is a template-based controlled natural language for specifying competency questions. It was designed using a dataset of 234 competency questions that were automatically processed into 106 patterns. The resulting CNL consists of 93 main templates and 41 linguistic variants, which cover about 90 percent of unseen questions. The CNL includes an additional model and XML serialisation, and the approach can assist in identifying invalid competency questions.
What carries the argument
The CLaRO template set of 93 main templates and 41 variants derived from 106 patterns extracted from competency questions.
If this is right
- It streamlines formalising ontology content requirements through consistent templates.
- It assists in writing good questions by flagging invalid ones during authoring.
- It supports automation and tooling for ontology development and evaluation.
- The CNL model and XML serialisation enable further machine processing of requirements.
Where Pith is reading between the lines
- The pattern extraction method could be reused to build similar controlled languages for requirements in other modelling tasks.
- Widespread adoption might encourage more uniform competency question styles across separate ontology projects.
- Integration into existing ontology editors could be tested to measure effects on question quality and development time.
Load-bearing premise
The 234 competency questions collected and processed into 106 patterns are representative of the full range used across ontology projects.
What would settle it
Testing the 93 templates on several hundred new competency questions collected independently from many different ontology projects and obtaining coverage well below 90 percent would challenge the central claim.
Figures
read the original abstract
Competency Questions (CQs) for an ontology and similar artefacts aim to provide insights into the contents of an ontology and to demarcate its scope. The absence of a controlled natural language, tooling and automation to support the authoring of CQs has hampered their effective use in ontology development and evaluation. The few question templates that exists are based on informal analyses of a small number of CQs and have limited coverage of question types and sentence constructions. We aim to fill this gap by proposing a template-based CNL to author CQs, called CLaRO. For its design, we exploited a new dataset of 234 CQs that had been processed automatically into 106 patterns, which we analysed and used to design a template-based CNL, with an additional CNL model and XML serialisation. The CNL was evaluated with a subset of questions from the original dataset and with two sets of newly sourced CQs. The coverage of CLaRO, with its 93 main templates and 41 linguistic variants, is about 90% for unseen questions. CLaRO has the potential to facilitate streamlining formalising ontology content requirements and, given that about one third of the competency questions in the test sets turned out to be invalid questions, assist in writing good questions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CLaRO, a template-based controlled natural language (CNL) for authoring competency questions (CQs) to support ontology development and evaluation. It is constructed from a dataset of 234 CQs that were automatically processed into 106 patterns, yielding 93 main templates plus 41 linguistic variants, along with an additional CNL model and XML serialisation. The CNL is evaluated on a subset of the original dataset and two newly sourced sets of CQs, with the central claim that it achieves approximately 90% coverage on unseen questions; the work also observes that roughly one third of the questions in the test sets were invalid.
Significance. If the templates generalise, CLaRO would address the documented absence of standardised tooling for CQ authoring, potentially improving the demarcation of ontology scope and the formalisation of content requirements. The data-driven derivation from an external CQ collection and the incidental finding on invalid questions are constructive elements that could support broader adoption in ontology engineering workflows.
major comments (2)
- [Abstract and evaluation description] The 90% coverage claim for unseen questions (Abstract) depends on the 234 CQs being representative of CQs across ontology projects, yet no evidence is supplied that the collection spans multiple domains, project types, or question complexities; the two newly sourced test sets are described only as 'new' without sourcing details, independence checks, or domain coverage.
- [Abstract] The abstract reports 90% coverage on test sets and notes invalid questions but supplies no details on the pattern extraction method, exclusion criteria for the 106 patterns, evaluation size, controls, or statistical validation; this leaves the soundness of the coverage result difficult to assess.
minor comments (1)
- [Abstract] The abstract contains a grammatical error: 'The few question templates that exists' should read 'exist'.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for greater transparency on dataset representativeness and evaluation details. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract and evaluation description] The 90% coverage claim for unseen questions (Abstract) depends on the 234 CQs being representative of CQs across ontology projects, yet no evidence is supplied that the collection spans multiple domains, project types, or question complexities; the two newly sourced test sets are described only as 'new' without sourcing details, independence checks, or domain coverage.
Authors: We agree that explicit justification of representativeness would strengthen the generalizability of the 90% coverage result. The 234 CQs originate from a prior collection in the ontology engineering literature, and the two test sets were independently sourced from additional ontology projects. To address the concern, we will revise the abstract and insert a dedicated paragraph in the evaluation section describing the domains, project types, and sourcing process for both the original dataset and test sets, along with confirmation of independence. This revision will be made. revision: yes
-
Referee: [Abstract] The abstract reports 90% coverage on test sets and notes invalid questions but supplies no details on the pattern extraction method, exclusion criteria for the 106 patterns, evaluation size, controls, or statistical validation; this leaves the soundness of the coverage result difficult to assess.
Authors: The full manuscript details the automatic processing of the 234 CQs into 106 patterns (Section 3), the subsequent analysis yielding 93 templates plus 41 variants, and the evaluation sizes and direct matching procedure (Section 4). No statistical validation beyond percentage coverage or formal controls were applied. We will revise the abstract to concisely note the automatic pattern extraction, evaluation set sizes, and matching-based coverage computation, while retaining the word limit. This revision will be made. revision: yes
Circularity Check
No significant circularity; empirical derivation from external dataset
full rationale
The paper's core derivation collects an external dataset of 234 CQs, auto-processes them into 106 patterns, manually analyzes the patterns to produce 93 templates plus variants, and then reports empirical coverage (~90%) on a held-out subset plus two newly sourced test sets. This chain contains no self-definitional equations, no parameters fitted to one subset and then relabeled as predictions on a related quantity, and no load-bearing self-citations that reduce the coverage claim to prior author work by construction. The result is therefore an independent measurement against external data rather than a tautology.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The collected 234 competency questions and the 106 patterns derived from them are representative of competency questions used in ontology development.
Reference graph
Works this paper leans on
-
[1]
(2007), http://www.w3.org/TR/rdf-sparql-query/
SPARQL Query Language, w3c working draft 26 march 2007. (2007), http://www.w3.org/TR/rdf-sparql-query/
work page 2007
-
[2]
Drug Discovery Today 18(17), 843 – 852 (2013)
Azzaoui, K., Jacoby, E., Senger, S., Rodrguez, E.C., Loza, M., Zdrazil, B., Pinto, M., Williams, A.J., de la Torre, V., Mestres, J., Pastor, M., Taboureau, O., Rarey, M., Chichester, C., Pettifer, S., Blomberg, N., Harland, L., Williams-Jones, B., Ecker, G.F.: Scientific competency questions as the basis for semantically enriched open pharmacological space...
work page 2013
-
[3]
Bezerra, C., Freitas, F.: Verifying description logic ontologies based on competency questions and unit testing. In: ONTOBRAS. pp. 159–164 (2017)
work page 2017
-
[4]
Bezerra, C., Freitas, F., Santana, F.: Evaluating ontologies with competency ques- tions. In: Proceedings of the 2013 IEEE/WIC/ACM International Joint Confer- ences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Vol- ume 03. pp. 284–285. WI-IAT ’13, IEEE Computer Society, Washington, DC, USA (2013)
work page 2013
-
[5]
Learning & Nonlinear Models 12(2), 4 (2014)
Bezerra, C., Santana, F., Freitas, F.: CQChecker: A tool to check ontologies in owl- dl using competency questions written in controlled natural language. Learning & Nonlinear Models 12(2), 4 (2014)
work page 2014
-
[6]
Semantic Web Journal 5(6), 493–513 (2014)
Bouayad-Agha, N., Casamayor, G., Wanner, L.: Natural language generation in the context of the semantic web. Semantic Web Journal 5(6), 493–513 (2014)
work page 2014
-
[7]
Dasiopoulou, S., Meditskos, G., Efstathiou, V.: Semantic knowledge structures and representation. Tech. Rep. D5.1, FP7-288199 Dem@Care: Dementia Ambient Care: Multi-Sensing Monitoring for Intelligence Remote Management and Decision Sup- port, http://www.demcare.eu/downloads/D5.1SemanticKnowledgeStructures_ andRepresentation.pdf
-
[8]
Dennis, M., van Deemter K., Dell’Aglio, D., Pan, J.Z.: Computing authoring tests from competency questions: Experimental validation. In: d’Amato, C., et al. (eds.) The Semantic Web – ISWC 2017. LNCS, vol. 10587, pp. 243–259. Springer (2017)
work page 2017
-
[9]
In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y
Fern´ andez-Izquierdo, A., Garc´ ıa-Castro, R.: Requirements behaviour analysis for ontology testing. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds.) Proc. of EKAW’18. pp. 114–130. LNAI, Springer (2018)
work page 2018
-
[10]
Semantic Web Journal 8(3), 405–418 (2017)
Ferr´ e, S.: Sparklis: An expressive query builder for sparql endpoints with guidance in natural language. Semantic Web Journal 8(3), 405–418 (2017)
work page 2017
-
[11]
Franconi, E., Guagliardo, P., Trevisan, M.: An intelligent query interface based on ontology navigation. In: Workshop on Visual Interfaces to the Social and Semantic Web (VISSW’10) (2010), hong Kong, February 2010
work page 2010
- [12]
-
[13]
Hellmann, S., Lehmann, J., Auer, S., Br¨ ummer, M.: Integrating nlp using linked data. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) The Semantic Web – ISWC 2013. pp. 98–113. Springer (2013)
work page 2013
-
[14]
Keet, C.M.: Natural language template selection for temporal constraints. In: CREOL: Contextual Representation of Events and Objects in Language, Joint Ontology Workshops 2017. CEUR-WS, vol. 2050, p. 12 (2017), 21-23 September 2017, Bolzano, Italy
work page 2017
-
[15]
Keet, C.M., Lawrynowicz, A.: Test-driven development of ontologies. In: Sack, H., et al. (eds.) Proc. of ESWC’16. LNCS, vol. 9678, pp. 642–657. Springer, Berlin (2016), 29 May - 2 June, 2016, Crete, Greece
work page 2016
-
[16]
In: Janowicz, K., Schlobach, S
Keet, C.M.: A core ontology of macroscopic stuff. In: Janowicz, K., Schlobach, S. (eds.) 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). LNAI, vol. 8876, pp. 209–224. Springer (2014), 24-28 Nov, 2014, Linkoping, Sweden
work page 2014
-
[17]
Keet, C.M.: An introduction to ontology engineering, Computing, vol. 20. College Publications (2018)
work page 2018
-
[18]
Kollia, I., Glimm, B., Horrocks, I.: SPARQL Query Answering over OWL On- tologies. In: Antoniou, G., et al. (eds.) Proc. of ESWC’11. LNCS, vol. 6643, pp. 382–396. Springer (2011), 29 May-2 June, 2011, Heraklion, Crete, Greece
work page 2011
-
[19]
Computa- tional Linguistics 40(1), 121–170 (March 2014)
Kuhn, T.: A survey and classification of controlled natural languages. Computa- tional Linguistics 40(1), 121–170 (March 2014)
work page 2014
-
[20]
Law and Human Behavior 25(1), 81–92 (Feb 2001)
Lyon, T.D., Saywitz, K.J., Kaplan, D.L., Dorado, J.S.: Reducing maltreated chil- dren’s reluctance to answer hypothetical oath-taking competency questions. Law and Human Behavior 25(1), 81–92 (Feb 2001)
work page 2001
-
[21]
Malheiros, Y., Freitas, F.: A method to develop description logic ontologies iter- atively based on competency questions: an implementation. In: ONTOBRAS. p. 142153 (2013)
work page 2013
-
[22]
Journal of Biomedical Semantics 5(1), 25 (Jun 2014)
Malone, J., Brown, A., Lister, A.L., Ison, J., Hull, D., Parkinson, H., Stevens, R.: The software ontology (swo): a resource for reproducibility in biomedical data analysis, curation and digital preservation. Journal of Biomedical Semantics 5(1), 25 (Jun 2014)
work page 2014
- [23]
-
[24]
The dis- tributed ontology, modelling and specification language - DOL
Mossakowski, T., Codescu, M., Neuhaus, F., Kutz, O.: The Road to Universal Logic–Festschrift for 50th birthday of Jean-Yves Beziau, Volume II, chap. The dis- tributed ontology, modelling and specification language - DOL. Studies in Universal Logic, Birkh¨ auser (2015)
work page 2015
-
[25]
Object Management Group: Semantics of Business Vocabulary and Rules (SBVR) – OMG released versions of SBVR, formal/2008-01-02 (January 2008), http://www.omg.org/spec/SBVR/1.0
work page 2008
-
[26]
Panov, P., Soldatova, L.N., Dˇ zeroski, S.: Generic ontology of datatypes. Inf. Sci. 329(C), 900–920 (Feb 2016)
work page 2016
- [27]
-
[28]
In: Extended Semantic Web Con- ference (ESWC’14)
Ren, Y., Parvizi, A., Mellish, C., Pan, J.Z., van Deemter, K., Stevens, R.: Towards competency question-driven ontology authoring. In: Extended Semantic Web Con- ference (ESWC’14). LNCS, Springer (2014)
work page 2014
-
[29]
Language Resources & Evaluation 51(1), 191–220 (2017)
Safwat, H., Davis, B.: CNLs for the semantic web: a state of the art. Language Resources & Evaluation 51(1), 191–220 (2017)
work page 2017
-
[30]
Salgueiro, A.M., Alves, C.B., Balsa, J.: Querying an ontology using natural lan- guage. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNAI, vol. 11122, p. 164169 (2018)
work page 2018
-
[31]
NeOn Deliverable D5.4.1, NeOn Project (2008)
Suarez-Figueroa, M.C., de Cea, G.A., Buil, C., Dellschaft, K., Fernandez-Lopez, M., Garcia, A., G´ omez-P´ erez, A., Herrero, G., Montiel-Ponsoda, E., Sabou, M., Villazon-Terrazas, B., Yufei, Z.: NeOn methodology for building contextualized ontology networks. NeOn Deliverable D5.4.1, NeOn Project (2008)
work page 2008
-
[32]
In: 13th International Workshop on Ontology Matching (OM@ISWC 2018)
Thi´ eblin, E., Haemmerl´ e, O., Trojahn, C.: Complex matching based on competency questions for alignment: a first sketch. In: 13th International Workshop on Ontology Matching (OM@ISWC 2018). CEUR-WS, vol. 2288, pp. 66–70. CEUR-WS (2018)
work page 2018
-
[33]
Knowledge Engineering Review 11(2), 93–136 (1996)
Uschold, M., Gruninger, M.: Ontologies: principles, methods and applications. Knowledge Engineering Review 11(2), 93–136 (1996)
work page 1996
-
[34]
Journal of Information Technology for Teacher Education 5(3), 271–282 (1996)
Williams, P.: Resourcing for the future? information technology provision and com- petency questions for schoolbased initial teacher education. Journal of Information Technology for Teacher Education 5(3), 271–282 (1996)
work page 1996
-
[35]
Competency Questions and SPARQL-OWL Queries Dataset and Analysis
Wisniewski, D., Potoniec, J., Lawrynowicz, A., Keet, C.M.: Competency questions and SPARQL-OWL queries dataset and analysis. Technical Report 1811.09529 (November 2018), https://arxiv.org/abs/1811.09529
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[36]
In: First International Conference on Building and Exploring Web Based Environments
Zemmouchi-Ghomari, L., Ghomari, A.R.: Translating natural language compe- tency questions into sparql queries: a case study. In: First International Conference on Building and Exploring Web Based Environments. pp. 81–86. IARIA (2013)
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.