pith. sign in

arxiv: 1907.07378 · v1 · pith:SIFNXXITnew · submitted 2019-07-17 · 💻 cs.AI · cs.DB

CLaRO: a Data-driven CNL for Specifying Competency Questions

Pith reviewed 2026-05-24 20:37 UTC · model grok-4.3

classification 💻 cs.AI cs.DB
keywords competency questionscontrolled natural languageCNLontology engineeringtemplatesquestion patternsontology evaluationquestion authoring
0
0 comments X

The pith

A data-driven controlled natural language covers about 90 percent of competency questions for ontologies with 93 templates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops CLaRO as a template-based controlled natural language to support authoring competency questions for ontologies and similar artefacts. The design draws from 234 collected questions that were automatically turned into 106 patterns, yielding 93 main templates plus 41 linguistic variants. This matters because competency questions help define an ontology's scope and content, yet current authoring lacks systematic support and automation. The evaluation on unseen questions shows the templates achieve about 90 percent coverage, and the method also identifies invalid questions that make up roughly one third of test sets.

Core claim

CLaRO is a template-based controlled natural language for specifying competency questions. It was designed using a dataset of 234 competency questions that were automatically processed into 106 patterns. The resulting CNL consists of 93 main templates and 41 linguistic variants, which cover about 90 percent of unseen questions. The CNL includes an additional model and XML serialisation, and the approach can assist in identifying invalid competency questions.

What carries the argument

The CLaRO template set of 93 main templates and 41 variants derived from 106 patterns extracted from competency questions.

If this is right

  • It streamlines formalising ontology content requirements through consistent templates.
  • It assists in writing good questions by flagging invalid ones during authoring.
  • It supports automation and tooling for ontology development and evaluation.
  • The CNL model and XML serialisation enable further machine processing of requirements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The pattern extraction method could be reused to build similar controlled languages for requirements in other modelling tasks.
  • Widespread adoption might encourage more uniform competency question styles across separate ontology projects.
  • Integration into existing ontology editors could be tested to measure effects on question quality and development time.

Load-bearing premise

The 234 competency questions collected and processed into 106 patterns are representative of the full range used across ontology projects.

What would settle it

Testing the 93 templates on several hundred new competency questions collected independently from many different ontology projects and obtaining coverage well below 90 percent would challenge the central claim.

Figures

Figures reproduced from arXiv: 1907.07378 by C. Maria Keet, Mary-Jane Antia, Zola Mahlaza.

Figure 1
Figure 1. Figure 1: Data model for CQ templates. 4 Evaluation We conduct a preliminary evaluation of CLaRO to answer the following two questions: RQ1: Does CLaRO cover the CQs from the training set? RQ2: Is CLaRO sufficiently comprehensive for unseen CQs? [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Main components of the CQ authoring tool. Components and implementation The main components of the tool are the user interface, template function module, and storage module as shown in [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Screenshots of the CQ authoring tool showing possible user input, autocomplete suggestions, file related actions, button for deleting a created question, and position of the filename where the user defined questions are stored. loading/saving the user defined questions to disk. When saving the user-defined CQs to disk, the storage module serialises the set of user defined CQs according to an XML schema tha… view at source ↗
read the original abstract

Competency Questions (CQs) for an ontology and similar artefacts aim to provide insights into the contents of an ontology and to demarcate its scope. The absence of a controlled natural language, tooling and automation to support the authoring of CQs has hampered their effective use in ontology development and evaluation. The few question templates that exists are based on informal analyses of a small number of CQs and have limited coverage of question types and sentence constructions. We aim to fill this gap by proposing a template-based CNL to author CQs, called CLaRO. For its design, we exploited a new dataset of 234 CQs that had been processed automatically into 106 patterns, which we analysed and used to design a template-based CNL, with an additional CNL model and XML serialisation. The CNL was evaluated with a subset of questions from the original dataset and with two sets of newly sourced CQs. The coverage of CLaRO, with its 93 main templates and 41 linguistic variants, is about 90% for unseen questions. CLaRO has the potential to facilitate streamlining formalising ontology content requirements and, given that about one third of the competency questions in the test sets turned out to be invalid questions, assist in writing good questions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces CLaRO, a template-based controlled natural language (CNL) for authoring competency questions (CQs) to support ontology development and evaluation. It is constructed from a dataset of 234 CQs that were automatically processed into 106 patterns, yielding 93 main templates plus 41 linguistic variants, along with an additional CNL model and XML serialisation. The CNL is evaluated on a subset of the original dataset and two newly sourced sets of CQs, with the central claim that it achieves approximately 90% coverage on unseen questions; the work also observes that roughly one third of the questions in the test sets were invalid.

Significance. If the templates generalise, CLaRO would address the documented absence of standardised tooling for CQ authoring, potentially improving the demarcation of ontology scope and the formalisation of content requirements. The data-driven derivation from an external CQ collection and the incidental finding on invalid questions are constructive elements that could support broader adoption in ontology engineering workflows.

major comments (2)
  1. [Abstract and evaluation description] The 90% coverage claim for unseen questions (Abstract) depends on the 234 CQs being representative of CQs across ontology projects, yet no evidence is supplied that the collection spans multiple domains, project types, or question complexities; the two newly sourced test sets are described only as 'new' without sourcing details, independence checks, or domain coverage.
  2. [Abstract] The abstract reports 90% coverage on test sets and notes invalid questions but supplies no details on the pattern extraction method, exclusion criteria for the 106 patterns, evaluation size, controls, or statistical validation; this leaves the soundness of the coverage result difficult to assess.
minor comments (1)
  1. [Abstract] The abstract contains a grammatical error: 'The few question templates that exists' should read 'exist'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for greater transparency on dataset representativeness and evaluation details. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract and evaluation description] The 90% coverage claim for unseen questions (Abstract) depends on the 234 CQs being representative of CQs across ontology projects, yet no evidence is supplied that the collection spans multiple domains, project types, or question complexities; the two newly sourced test sets are described only as 'new' without sourcing details, independence checks, or domain coverage.

    Authors: We agree that explicit justification of representativeness would strengthen the generalizability of the 90% coverage result. The 234 CQs originate from a prior collection in the ontology engineering literature, and the two test sets were independently sourced from additional ontology projects. To address the concern, we will revise the abstract and insert a dedicated paragraph in the evaluation section describing the domains, project types, and sourcing process for both the original dataset and test sets, along with confirmation of independence. This revision will be made. revision: yes

  2. Referee: [Abstract] The abstract reports 90% coverage on test sets and notes invalid questions but supplies no details on the pattern extraction method, exclusion criteria for the 106 patterns, evaluation size, controls, or statistical validation; this leaves the soundness of the coverage result difficult to assess.

    Authors: The full manuscript details the automatic processing of the 234 CQs into 106 patterns (Section 3), the subsequent analysis yielding 93 templates plus 41 variants, and the evaluation sizes and direct matching procedure (Section 4). No statistical validation beyond percentage coverage or formal controls were applied. We will revise the abstract to concisely note the automatic pattern extraction, evaluation set sizes, and matching-based coverage computation, while retaining the word limit. This revision will be made. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical derivation from external dataset

full rationale

The paper's core derivation collects an external dataset of 234 CQs, auto-processes them into 106 patterns, manually analyzes the patterns to produce 93 templates plus variants, and then reports empirical coverage (~90%) on a held-out subset plus two newly sourced test sets. This chain contains no self-definitional equations, no parameters fitted to one subset and then relabeled as predictions on a related quantity, and no load-bearing self-citations that reduce the coverage claim to prior author work by construction. The result is therefore an independent measurement against external data rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the representativeness of the 234 CQ dataset and the assumption that automatically extracted patterns generalize to new questions; no free parameters or invented entities are evident from the abstract.

axioms (1)
  • domain assumption The collected 234 competency questions and the 106 patterns derived from them are representative of competency questions used in ontology development.
    The 90% coverage claim and the design of the 93 templates depend directly on this dataset being sufficient to capture the space of valid CQs.

pith-pipeline@v0.9.0 · 5763 in / 1280 out tokens · 22625 ms · 2026-05-24T20:37:18.884000+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

  1. [1]

    (2007), http://www.w3.org/TR/rdf-sparql-query/

    SPARQL Query Language, w3c working draft 26 march 2007. (2007), http://www.w3.org/TR/rdf-sparql-query/

  2. [2]

    Drug Discovery Today 18(17), 843 – 852 (2013)

    Azzaoui, K., Jacoby, E., Senger, S., Rodrguez, E.C., Loza, M., Zdrazil, B., Pinto, M., Williams, A.J., de la Torre, V., Mestres, J., Pastor, M., Taboureau, O., Rarey, M., Chichester, C., Pettifer, S., Blomberg, N., Harland, L., Williams-Jones, B., Ecker, G.F.: Scientific competency questions as the basis for semantically enriched open pharmacological space...

  3. [3]

    In: ONTOBRAS

    Bezerra, C., Freitas, F.: Verifying description logic ontologies based on competency questions and unit testing. In: ONTOBRAS. pp. 159–164 (2017)

  4. [4]

    In: Proceedings of the 2013 IEEE/WIC/ACM International Joint Confer- ences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Vol- ume 03

    Bezerra, C., Freitas, F., Santana, F.: Evaluating ontologies with competency ques- tions. In: Proceedings of the 2013 IEEE/WIC/ACM International Joint Confer- ences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Vol- ume 03. pp. 284–285. WI-IAT ’13, IEEE Computer Society, Washington, DC, USA (2013)

  5. [5]

    Learning & Nonlinear Models 12(2), 4 (2014)

    Bezerra, C., Santana, F., Freitas, F.: CQChecker: A tool to check ontologies in owl- dl using competency questions written in controlled natural language. Learning & Nonlinear Models 12(2), 4 (2014)

  6. [6]

    Semantic Web Journal 5(6), 493–513 (2014)

    Bouayad-Agha, N., Casamayor, G., Wanner, L.: Natural language generation in the context of the semantic web. Semantic Web Journal 5(6), 493–513 (2014)

  7. [7]

    Dasiopoulou, S., Meditskos, G., Efstathiou, V.: Semantic knowledge structures and representation. Tech. Rep. D5.1, FP7-288199 Dem@Care: Dementia Ambient Care: Multi-Sensing Monitoring for Intelligence Remote Management and Decision Sup- port, http://www.demcare.eu/downloads/D5.1SemanticKnowledgeStructures_ andRepresentation.pdf

  8. [8]

    In: d’Amato, C., et al

    Dennis, M., van Deemter K., Dell’Aglio, D., Pan, J.Z.: Computing authoring tests from competency questions: Experimental validation. In: d’Amato, C., et al. (eds.) The Semantic Web – ISWC 2017. LNCS, vol. 10587, pp. 243–259. Springer (2017)

  9. [9]

    In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y

    Fern´ andez-Izquierdo, A., Garc´ ıa-Castro, R.: Requirements behaviour analysis for ontology testing. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds.) Proc. of EKAW’18. pp. 114–130. LNAI, Springer (2018)

  10. [10]

    Semantic Web Journal 8(3), 405–418 (2017)

    Ferr´ e, S.: Sparklis: An expressive query builder for sparql endpoints with guidance in natural language. Semantic Web Journal 8(3), 405–418 (2017)

  11. [11]

    In: Workshop on Visual Interfaces to the Social and Semantic Web (VISSW’10) (2010), hong Kong, February 2010

    Franconi, E., Guagliardo, P., Trevisan, M.: An intelligent query interface based on ontology navigation. In: Workshop on Visual Interfaces to the Social and Semantic Web (VISSW’10) (2010), hong Kong, February 2010

  12. [12]

    Fuchs, N.E., Kaljurand, K., Kuhn, T.: Discourse Representation Structures for ACE 6.6. Tech. Rep. ifi-2010.0010, Department of Informatics, University of Zurich, Zurich, Switzerland (2010)

  13. [13]

    In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K

    Hellmann, S., Lehmann, J., Auer, S., Br¨ ummer, M.: Integrating nlp using linked data. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) The Semantic Web – ISWC 2013. pp. 98–113. Springer (2013)

  14. [14]

    In: CREOL: Contextual Representation of Events and Objects in Language, Joint Ontology Workshops 2017

    Keet, C.M.: Natural language template selection for temporal constraints. In: CREOL: Contextual Representation of Events and Objects in Language, Joint Ontology Workshops 2017. CEUR-WS, vol. 2050, p. 12 (2017), 21-23 September 2017, Bolzano, Italy

  15. [15]

    In: Sack, H., et al

    Keet, C.M., Lawrynowicz, A.: Test-driven development of ontologies. In: Sack, H., et al. (eds.) Proc. of ESWC’16. LNCS, vol. 9678, pp. 642–657. Springer, Berlin (2016), 29 May - 2 June, 2016, Crete, Greece

  16. [16]

    In: Janowicz, K., Schlobach, S

    Keet, C.M.: A core ontology of macroscopic stuff. In: Janowicz, K., Schlobach, S. (eds.) 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW’14). LNAI, vol. 8876, pp. 209–224. Springer (2014), 24-28 Nov, 2014, Linkoping, Sweden

  17. [17]

    Keet, C.M.: An introduction to ontology engineering, Computing, vol. 20. College Publications (2018)

  18. [18]

    In: Antoniou, G., et al

    Kollia, I., Glimm, B., Horrocks, I.: SPARQL Query Answering over OWL On- tologies. In: Antoniou, G., et al. (eds.) Proc. of ESWC’11. LNCS, vol. 6643, pp. 382–396. Springer (2011), 29 May-2 June, 2011, Heraklion, Crete, Greece

  19. [19]

    Computa- tional Linguistics 40(1), 121–170 (March 2014)

    Kuhn, T.: A survey and classification of controlled natural languages. Computa- tional Linguistics 40(1), 121–170 (March 2014)

  20. [20]

    Law and Human Behavior 25(1), 81–92 (Feb 2001)

    Lyon, T.D., Saywitz, K.J., Kaplan, D.L., Dorado, J.S.: Reducing maltreated chil- dren’s reluctance to answer hypothetical oath-taking competency questions. Law and Human Behavior 25(1), 81–92 (Feb 2001)

  21. [21]

    In: ONTOBRAS

    Malheiros, Y., Freitas, F.: A method to develop description logic ontologies iter- atively based on competency questions: an implementation. In: ONTOBRAS. p. 142153 (2013)

  22. [22]

    Journal of Biomedical Semantics 5(1), 25 (Jun 2014)

    Malone, J., Brown, A., Lister, A.L., Ison, J., Hull, D., Parkinson, H., Stevens, R.: The software ontology (swo): a resource for reproducibility in biomedical data analysis, curation and digital preservation. Journal of Biomedical Semantics 5(1), 25 (Jun 2014)

  23. [23]

    In: Proc

    Moreira, J., Pires, L.F., van Sinderen, M., Daniele, L.: Saref4health: IoT standard- based ontology-driven healthcare systems. In: Proc. of FOIS’18. FAIA, vol. 306, pp. 239–252. IOS Press (2018)

  24. [24]

    The dis- tributed ontology, modelling and specification language - DOL

    Mossakowski, T., Codescu, M., Neuhaus, F., Kutz, O.: The Road to Universal Logic–Festschrift for 50th birthday of Jean-Yves Beziau, Volume II, chap. The dis- tributed ontology, modelling and specification language - DOL. Studies in Universal Logic, Birkh¨ auser (2015)

  25. [25]

    Object Management Group: Semantics of Business Vocabulary and Rules (SBVR) – OMG released versions of SBVR, formal/2008-01-02 (January 2008), http://www.omg.org/spec/SBVR/1.0

  26. [26]

    Panov, P., Soldatova, L.N., Dˇ zeroski, S.: Generic ontology of datatypes. Inf. Sci. 329(C), 900–920 (Feb 2016)

  27. [27]

    charters

    Perlin, M.L.: Are courts competent to decide competency questions: Stripping the facade from united states v. charters. University of Kansas Law Review 38, 957 (1988-1990)

  28. [28]

    In: Extended Semantic Web Con- ference (ESWC’14)

    Ren, Y., Parvizi, A., Mellish, C., Pan, J.Z., van Deemter, K., Stevens, R.: Towards competency question-driven ontology authoring. In: Extended Semantic Web Con- ference (ESWC’14). LNCS, Springer (2014)

  29. [29]

    Language Resources & Evaluation 51(1), 191–220 (2017)

    Safwat, H., Davis, B.: CNLs for the semantic web: a state of the art. Language Resources & Evaluation 51(1), 191–220 (2017)

  30. [30]

    In: Villavicencio, A., et al

    Salgueiro, A.M., Alves, C.B., Balsa, J.: Querying an ontology using natural lan- guage. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNAI, vol. 11122, p. 164169 (2018)

  31. [31]

    NeOn Deliverable D5.4.1, NeOn Project (2008)

    Suarez-Figueroa, M.C., de Cea, G.A., Buil, C., Dellschaft, K., Fernandez-Lopez, M., Garcia, A., G´ omez-P´ erez, A., Herrero, G., Montiel-Ponsoda, E., Sabou, M., Villazon-Terrazas, B., Yufei, Z.: NeOn methodology for building contextualized ontology networks. NeOn Deliverable D5.4.1, NeOn Project (2008)

  32. [32]

    In: 13th International Workshop on Ontology Matching (OM@ISWC 2018)

    Thi´ eblin, E., Haemmerl´ e, O., Trojahn, C.: Complex matching based on competency questions for alignment: a first sketch. In: 13th International Workshop on Ontology Matching (OM@ISWC 2018). CEUR-WS, vol. 2288, pp. 66–70. CEUR-WS (2018)

  33. [33]

    Knowledge Engineering Review 11(2), 93–136 (1996)

    Uschold, M., Gruninger, M.: Ontologies: principles, methods and applications. Knowledge Engineering Review 11(2), 93–136 (1996)

  34. [34]

    Journal of Information Technology for Teacher Education 5(3), 271–282 (1996)

    Williams, P.: Resourcing for the future? information technology provision and com- petency questions for schoolbased initial teacher education. Journal of Information Technology for Teacher Education 5(3), 271–282 (1996)

  35. [35]

    Competency Questions and SPARQL-OWL Queries Dataset and Analysis

    Wisniewski, D., Potoniec, J., Lawrynowicz, A., Keet, C.M.: Competency questions and SPARQL-OWL queries dataset and analysis. Technical Report 1811.09529 (November 2018), https://arxiv.org/abs/1811.09529

  36. [36]

    In: First International Conference on Building and Exploring Web Based Environments

    Zemmouchi-Ghomari, L., Ghomari, A.R.: Translating natural language compe- tency questions into sparql queries: a case study. In: First International Conference on Building and Exploring Web Based Environments. pp. 81–86. IARIA (2013)