arxiv: 2605.11862 · v1 · submitted 2026-05-12 · 💻 cs.CL

Recognition: no theorem link

Concordance Comparison as a Means of Assembling Local Grammars

Elias de Oliveira, Eric Laporte, Juliana Pirovani

Authors on Pith no claims yet

Pith reviewed 2026-05-13 05:41 UTC · model grok-4.3

classification 💻 cs.CL

keywords local grammarsconcordance comparisonnamed entity recognitionperson namesPortugueseHAREMgrammar assemblyinformation extraction

0 comments

The pith

A tool comparing concordances from local grammars helps assemble them into an improved extractor for person names.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors demonstrate that comparing the concordances produced by different local grammars can reveal useful relationships such as inclusion, intersection, and disjunction between their outputs. These insights guide the selection and combination of grammar elements to create a stronger overall grammar for recognizing person names in text. They tested this on Portuguese documents from the Second HAREM Gold Collection. The assembled grammar reached an F-Measure of 76.86, improving by six points over the previous best result for Portuguese. This method offers a practical way to refine local grammars using their comparative outputs rather than isolated evaluation.

Core claim

Analyzing pairs of local grammars with a concordance comparison tool shows relationships of inclusion, intersection, and disjunction that aid in assembling the grammars yielding the best results for person name extraction from Portuguese texts.

What carries the argument

The concordance comparison tool, which highlights differences between the concordances of two local grammars to identify inclusion, intersection, and disjunction relationships for grammar assembly.

Load-bearing premise

The highlighted differences in concordances point to all relevant improvements without overlooking cases or introducing new errors in the assembled grammar.

What would settle it

Applying the assembled grammar to a different Portuguese text collection and measuring an F-Measure below 70.86 would show the gain does not hold beyond the specific dataset.

Figures

Figures reproduced from arXiv: 2605.11862 by Elias de Oliveira, Eric Laporte, Juliana Pirovani.

**Figure 1.** Figure 1: LGG G1 (ReconheceFormasDeTratamento.grf) Unitex allows for attaching outputs to graph boxes. Outputs are displayed in bold under boxes. In [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Part of a concordance comparison file Lines in blue characters (lines 1 and 2) are the occurrences common to the two concordances. In the example shown in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: LG G2 (ReconheceNomesCompostos.grf) Consider, for example, LGs G1 ( [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Part of the concordance comparison C1 × C2 [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Named Entity Recognition for person names is an important but non-trivial task in information extraction. This article uses a tool that compares the concordances obtained from two local grammars (LG) and highlights the differences. We used the results as an aid to select the best of a set of LGs. By analyzing the comparisons, we observed relationships of inclusion, intersection and disjunction within each pair of LGs, which helped us to assemble those that yielded the best results. This approach was used in a case study on extraction of person names from texts written in Portuguese. We applied the enhanced grammar to the Gold Collection of the Second HAREM. The F-Measure obtained was 76.86, representing a gain of 6 points in relation to the state-of-the-art for Portuguese.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gets a 6-point F-measure gain on Portuguese person-name extraction by using concordance comparisons to help assemble local grammars, but the assembly process is described too loosely to reproduce or isolate its contribution.

read the letter

The main result is straightforward. The authors compare concordances from pairs of local grammars for person names in Portuguese, note patterns of inclusion, intersection, and disjunction, and use those observations to build a combined grammar. On the Second HAREM Gold Collection this reaches 76.86 F-measure, six points above the prior state of the art for the language. The benchmark is public and external, so the number itself is easy to check against earlier work.

Referee Report

3 major / 1 minor

Summary. The paper claims that a concordance comparison tool can be used to identify inclusion, intersection, and disjunction relationships between pairs of local grammars (LGs) for named entity recognition, aiding in their assembly into an enhanced grammar. In a case study on Portuguese person name extraction, this method yields an F-measure of 76.86 on the Second HAREM Gold Collection, a 6-point gain over the state-of-the-art.

Significance. Should the assembly procedure prove reproducible and the performance gain attributable to the tool rather than manual intervention, this work could offer a useful heuristic for refining rule-based systems in information extraction. The reliance on an external public benchmark for evaluation is a methodological strength.

major comments (3)

[Abstract and case study description] The manuscript reports that analyzing comparisons 'helped us to assemble those that yielded the best results' but provides no explicit, reproducible rules or algorithm for how the observed relationships (inclusion, intersection, disjunction) translate into grammar assembly decisions.
[Evaluation section] There is no ablation study or comparison showing the assembled grammar's performance relative to the best individual LG or to combinations assembled without the concordance tool on the Second HAREM Gold Collection. This is necessary to substantiate that the tool enables superior assembly.
[Results] The F-measure of 76.86 and the claimed 6-point improvement require specification of the exact baseline system, whether the evaluation set influenced grammar development, and any statistical significance testing.

minor comments (1)

[Introduction] Provide more background on local grammars and the specific concordance comparison tool used, including how it highlights differences.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and indicate the revisions planned for the next version.

read point-by-point responses

Referee: [Abstract and case study description] The manuscript reports that analyzing comparisons 'helped us to assemble those that yielded the best results' but provides no explicit, reproducible rules or algorithm for how the observed relationships (inclusion, intersection, disjunction) translate into grammar assembly decisions.

Authors: We agree that the current description is high-level. The concordance comparison tool is intended as a heuristic aid rather than a fully automated algorithm; decisions were based on manual inspection of highlighted differences to identify complementary coverage. In the revised manuscript we will add an explicit section describing the decision heuristics: retain the more general grammar in cases of inclusion, merge rules in cases of intersection to maximize coverage without redundancy, and combine in cases of disjunction when both contribute unique matches. Concrete examples from the Portuguese person-name case study will be included to make the process reproducible. revision: yes
Referee: [Evaluation section] There is no ablation study or comparison showing the assembled grammar's performance relative to the best individual LG or to combinations assembled without the concordance tool on the Second HAREM Gold Collection. This is necessary to substantiate that the tool enables superior assembly.

Authors: We acknowledge the value of such comparisons. The original work focused on the final assembled result obtained with the tool's assistance. In the revision we will report F-measures for each individual local grammar on the same test collection and for the best-performing single grammar, allowing readers to see the incremental gain from assembly. We will also note that systematic construction of combinations without the tool was not performed, as the tool was integral to identifying which pairs to combine; this limitation will be stated explicitly. revision: partial
Referee: [Results] The F-measure of 76.86 and the claimed 6-point improvement require specification of the exact baseline system, whether the evaluation set influenced grammar development, and any statistical significance testing.

Authors: The baseline is the prior state-of-the-art Portuguese person-name system evaluated on the Second HAREM Gold Collection, which we will cite by reference in the revised results section. Grammar development was conducted on a separate development corpus; the evaluation gold collection was used only for final reporting and did not influence rule creation. We will add this clarification. Statistical significance testing was not performed in the original study; we will discuss this as a limitation and, if space permits, include a brief bootstrap analysis in the revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical workflow for comparing concordances from local grammars (LGs) to observe inclusion/intersection/disjunction relations and then manually assemble improved grammars. The central performance claim (F-measure 76.86 on person-name extraction) is obtained by applying the assembled grammar to the external Second HAREM Gold Collection and comparing against a reported state-of-the-art baseline. No equations, fitted parameters, self-citations, or ansatzes are invoked in a load-bearing manner that would make the reported result equivalent to its inputs by construction. The evaluation uses an independent gold-standard corpus, satisfying the criterion for non-circular grounding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on the established concept of local grammars and the HAREM dataset as benchmarks, with no new free parameters or invented entities introduced.

axioms (1)

domain assumption Local grammars can effectively model patterns for named entity recognition in natural language texts.
This is a standard assumption in computational linguistics for rule-based NER.

pith-pipeline@v0.9.0 · 5429 in / 1327 out tokens · 134188 ms · 2026-05-13T05:41:15.648273+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Unitex (2018), http://unitexgramlab.org/, acesso em: 02/03/2018

work page 2018
[2]

In: Chair), N.C

Amaral, D.O., Fonseca, E.B., Lopes, L., Vieira, R.: Compa rative analysis of por- tuguese named entities recognition tools. In: Chair), N.C. C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odi jk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference o n Language Resources and Evaluation (LREC’14). ...

work page 2014
[3]

In: Semin´ a rios de Lingu ´ ıstica

Baptista, J.: A local grammar of proper nouns. In: Semin´ a rios de Lingu ´ ıstica. vol. 2, pp. 21–37. Faro: Universidade do Algarve (1998) Concordance Comparison as a Means of Assembling Local Gramm ars 9

work page 1998
[4]

In: In Cristina Mota and Diana Santos (eds.)

Cardoso, N.: Rembrandt-reconhecimento de entidades men cionadas baseado em re- laa¸ c˜ oes e an´ alise detalhada do texto. In: In Cristina Mota and Diana Santos (eds.). Desaﬁos na Avaliaa¸ c˜ ao Conjunta do Reconhecimento de Entidades Mencionadas. vol. 1, pp. 195–211. Linguateca (2008)

work page 2008
[5]

In ROCHE, E .; SCHAB `ES, Y

Gross, M.: The construction of local grammars. In ROCHE, E .; SCHAB `ES, Y. (eds.). Finite-state language processing, Language, Spee ch, and Communication, Cambridge, Mass. pp. 329–354 (1997)

work page 1997
[6]

In: Bokan, N

Gross, M.: A Bootstrap Method for Constructing Local Gram mars. In: Bokan, N. (ed.) Proceedings of the Symposium on Contemporary Mathema tics, pp. 229–250. University of Belgrad (1999)

work page 1999
[7]

Linguateca: (2018), http://www.linguateca.pt, acesso em: 02/03/18

work page 2018
[8]

MIT press (1999)

Manning, C.D., Sch¨ utze, H.: Foundations of Statistical Natural Language Process- ing. MIT press (1999)

work page 1999
[9]

Linguateca (2008) , https://www

Mota, C., Santos, D.: Desaﬁos na Avalia¸ c˜ ao Conjunta do R econhecimento de Entidades Mencionadas: O Segundo HAREM. Linguateca (2008) , https://www. linguateca.pt/LivroSegundoHAREM/

work page 2008
[10]

Paumier, S.: Unitex 3.1 user manual (2016), http://unitexgramlab.org/ releases/3.1/man/Unitex-GramLab-3.1-usermanual-en.p df

work page 2016
[11]

In: Computer on the Beach

Pirovani, J.P.C., de Oliveira, E.: Extra¸ c˜ ao de Nomes de Pessoas em Textos em Por- tuguˆ es: uma Abordagem Usando Gram´ aticas Locais. In: Computer on the Beach

work page
[12]

pp. 1–10. SBC, Florian´ opolis, SC (March 2015)

work page 2015
[13]

In: International Conference on Intelligent Systems Design and Applications (ISDA 2017)

Pirovani, J.P.C., de Oliveira, E.: CRF+LG: A Hybrid Appr oach for the Portuguese Named Entity Recognition. In: International Conference on Intelligent Systems Design and Applications (ISDA 2017). Delhi, India (2017)

work page 2017
[14]

SAHARA: (2018), http://www.linguateca.pt/SAHARA/, acesso em: 02/03/2018

work page 2018
[15]

Linguateca (2007), http://www.linguateca.pt/aval_conjunta/ LivroHAREM/Livro-SantosCardoso2007.pdf

Santos, D., Cardoso, N.: Reconhecimento de Entidades Me ncionadas em Por- tuguˆ es: Documenta¸ c˜ ao e Actas do HAREM, a Primeira Avalia ¸ c˜ ao Con- junta na ´Area. Linguateca (2007), http://www.linguateca.pt/aval_conjunta/ LivroHAREM/Livro-SantosCardoso2007.pdf

work page 2007
[16]

Com- putational Linguistics 40(2), 469–510 (2014), https://doi.org/10.1162/COLI_a_ 00178

Shaalan, K.: A survey of arabic named entity recognition and classiﬁcation. Com- putational Linguistics 40(2), 469–510 (2014), https://doi.org/10.1162/COLI_a_ 00178

work page doi:10.1162/coli_a_ 2014