Recognition: no theorem link
Concordance Comparison as a Means of Assembling Local Grammars
Pith reviewed 2026-05-13 05:41 UTC · model grok-4.3
The pith
A tool comparing concordances from local grammars helps assemble them into an improved extractor for person names.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Analyzing pairs of local grammars with a concordance comparison tool shows relationships of inclusion, intersection, and disjunction that aid in assembling the grammars yielding the best results for person name extraction from Portuguese texts.
What carries the argument
The concordance comparison tool, which highlights differences between the concordances of two local grammars to identify inclusion, intersection, and disjunction relationships for grammar assembly.
Load-bearing premise
The highlighted differences in concordances point to all relevant improvements without overlooking cases or introducing new errors in the assembled grammar.
What would settle it
Applying the assembled grammar to a different Portuguese text collection and measuring an F-Measure below 70.86 would show the gain does not hold beyond the specific dataset.
Figures
read the original abstract
Named Entity Recognition for person names is an important but non-trivial task in information extraction. This article uses a tool that compares the concordances obtained from two local grammars (LG) and highlights the differences. We used the results as an aid to select the best of a set of LGs. By analyzing the comparisons, we observed relationships of inclusion, intersection and disjunction within each pair of LGs, which helped us to assemble those that yielded the best results. This approach was used in a case study on extraction of person names from texts written in Portuguese. We applied the enhanced grammar to the Gold Collection of the Second HAREM. The F-Measure obtained was 76.86, representing a gain of 6 points in relation to the state-of-the-art for Portuguese.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a concordance comparison tool can be used to identify inclusion, intersection, and disjunction relationships between pairs of local grammars (LGs) for named entity recognition, aiding in their assembly into an enhanced grammar. In a case study on Portuguese person name extraction, this method yields an F-measure of 76.86 on the Second HAREM Gold Collection, a 6-point gain over the state-of-the-art.
Significance. Should the assembly procedure prove reproducible and the performance gain attributable to the tool rather than manual intervention, this work could offer a useful heuristic for refining rule-based systems in information extraction. The reliance on an external public benchmark for evaluation is a methodological strength.
major comments (3)
- [Abstract and case study description] The manuscript reports that analyzing comparisons 'helped us to assemble those that yielded the best results' but provides no explicit, reproducible rules or algorithm for how the observed relationships (inclusion, intersection, disjunction) translate into grammar assembly decisions.
- [Evaluation section] There is no ablation study or comparison showing the assembled grammar's performance relative to the best individual LG or to combinations assembled without the concordance tool on the Second HAREM Gold Collection. This is necessary to substantiate that the tool enables superior assembly.
- [Results] The F-measure of 76.86 and the claimed 6-point improvement require specification of the exact baseline system, whether the evaluation set influenced grammar development, and any statistical significance testing.
minor comments (1)
- [Introduction] Provide more background on local grammars and the specific concordance comparison tool used, including how it highlights differences.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and indicate the revisions planned for the next version.
read point-by-point responses
-
Referee: [Abstract and case study description] The manuscript reports that analyzing comparisons 'helped us to assemble those that yielded the best results' but provides no explicit, reproducible rules or algorithm for how the observed relationships (inclusion, intersection, disjunction) translate into grammar assembly decisions.
Authors: We agree that the current description is high-level. The concordance comparison tool is intended as a heuristic aid rather than a fully automated algorithm; decisions were based on manual inspection of highlighted differences to identify complementary coverage. In the revised manuscript we will add an explicit section describing the decision heuristics: retain the more general grammar in cases of inclusion, merge rules in cases of intersection to maximize coverage without redundancy, and combine in cases of disjunction when both contribute unique matches. Concrete examples from the Portuguese person-name case study will be included to make the process reproducible. revision: yes
-
Referee: [Evaluation section] There is no ablation study or comparison showing the assembled grammar's performance relative to the best individual LG or to combinations assembled without the concordance tool on the Second HAREM Gold Collection. This is necessary to substantiate that the tool enables superior assembly.
Authors: We acknowledge the value of such comparisons. The original work focused on the final assembled result obtained with the tool's assistance. In the revision we will report F-measures for each individual local grammar on the same test collection and for the best-performing single grammar, allowing readers to see the incremental gain from assembly. We will also note that systematic construction of combinations without the tool was not performed, as the tool was integral to identifying which pairs to combine; this limitation will be stated explicitly. revision: partial
-
Referee: [Results] The F-measure of 76.86 and the claimed 6-point improvement require specification of the exact baseline system, whether the evaluation set influenced grammar development, and any statistical significance testing.
Authors: The baseline is the prior state-of-the-art Portuguese person-name system evaluated on the Second HAREM Gold Collection, which we will cite by reference in the revised results section. Grammar development was conducted on a separate development corpus; the evaluation gold collection was used only for final reporting and did not influence rule creation. We will add this clarification. Statistical significance testing was not performed in the original study; we will discuss this as a limitation and, if space permits, include a brief bootstrap analysis in the revision. revision: partial
Circularity Check
No significant circularity detected
full rationale
The paper presents an empirical workflow for comparing concordances from local grammars (LGs) to observe inclusion/intersection/disjunction relations and then manually assemble improved grammars. The central performance claim (F-measure 76.86 on person-name extraction) is obtained by applying the assembled grammar to the external Second HAREM Gold Collection and comparing against a reported state-of-the-art baseline. No equations, fitted parameters, self-citations, or ansatzes are invoked in a load-bearing manner that would make the reported result equivalent to its inputs by construction. The evaluation uses an independent gold-standard corpus, satisfying the criterion for non-circular grounding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Local grammars can effectively model patterns for named entity recognition in natural language texts.
Reference graph
Works this paper leans on
-
[1]
Unitex (2018), http://unitexgramlab.org/, acesso em: 02/03/2018
work page 2018
-
[2]
Amaral, D.O., Fonseca, E.B., Lopes, L., Vieira, R.: Compa rative analysis of por- tuguese named entities recognition tools. In: Chair), N.C. C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odi jk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference o n Language Resources and Evaluation (LREC’14). ...
work page 2014
-
[3]
In: Semin´ a rios de Lingu ´ ıstica
Baptista, J.: A local grammar of proper nouns. In: Semin´ a rios de Lingu ´ ıstica. vol. 2, pp. 21–37. Faro: Universidade do Algarve (1998) Concordance Comparison as a Means of Assembling Local Gramm ars 9
work page 1998
-
[4]
In: In Cristina Mota and Diana Santos (eds.)
Cardoso, N.: Rembrandt-reconhecimento de entidades men cionadas baseado em re- laa¸ c˜ oes e an´ alise detalhada do texto. In: In Cristina Mota and Diana Santos (eds.). Desafios na Avaliaa¸ c˜ ao Conjunta do Reconhecimento de Entidades Mencionadas. vol. 1, pp. 195–211. Linguateca (2008)
work page 2008
-
[5]
Gross, M.: The construction of local grammars. In ROCHE, E .; SCHAB `ES, Y. (eds.). Finite-state language processing, Language, Spee ch, and Communication, Cambridge, Mass. pp. 329–354 (1997)
work page 1997
-
[6]
Gross, M.: A Bootstrap Method for Constructing Local Gram mars. In: Bokan, N. (ed.) Proceedings of the Symposium on Contemporary Mathema tics, pp. 229–250. University of Belgrad (1999)
work page 1999
-
[7]
Linguateca: (2018), http://www.linguateca.pt, acesso em: 02/03/18
work page 2018
-
[8]
Manning, C.D., Sch¨ utze, H.: Foundations of Statistical Natural Language Process- ing. MIT press (1999)
work page 1999
-
[9]
Linguateca (2008) , https://www
Mota, C., Santos, D.: Desafios na Avalia¸ c˜ ao Conjunta do R econhecimento de Entidades Mencionadas: O Segundo HAREM. Linguateca (2008) , https://www. linguateca.pt/LivroSegundoHAREM/
work page 2008
-
[10]
Paumier, S.: Unitex 3.1 user manual (2016), http://unitexgramlab.org/ releases/3.1/man/Unitex-GramLab-3.1-usermanual-en.p df
work page 2016
-
[11]
Pirovani, J.P.C., de Oliveira, E.: Extra¸ c˜ ao de Nomes de Pessoas em Textos em Por- tuguˆ es: uma Abordagem Usando Gram´ aticas Locais. In: Computer on the Beach
-
[12]
pp. 1–10. SBC, Florian´ opolis, SC (March 2015)
work page 2015
-
[13]
In: International Conference on Intelligent Systems Design and Applications (ISDA 2017)
Pirovani, J.P.C., de Oliveira, E.: CRF+LG: A Hybrid Appr oach for the Portuguese Named Entity Recognition. In: International Conference on Intelligent Systems Design and Applications (ISDA 2017). Delhi, India (2017)
work page 2017
-
[14]
SAHARA: (2018), http://www.linguateca.pt/SAHARA/, acesso em: 02/03/2018
work page 2018
-
[15]
Linguateca (2007), http://www.linguateca.pt/aval_conjunta/ LivroHAREM/Livro-SantosCardoso2007.pdf
Santos, D., Cardoso, N.: Reconhecimento de Entidades Me ncionadas em Por- tuguˆ es: Documenta¸ c˜ ao e Actas do HAREM, a Primeira Avalia ¸ c˜ ao Con- junta na ´Area. Linguateca (2007), http://www.linguateca.pt/aval_conjunta/ LivroHAREM/Livro-SantosCardoso2007.pdf
work page 2007
-
[16]
Com- putational Linguistics 40(2), 469–510 (2014), https://doi.org/10.1162/COLI_a_ 00178
Shaalan, K.: A survey of arabic named entity recognition and classification. Com- putational Linguistics 40(2), 469–510 (2014), https://doi.org/10.1162/COLI_a_ 00178
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.