Automatic Part-of-Speech Tagging of Arabic-English Dictionary Senses through WordNet

Aly A. Fahmy; Diaa M. Fayed; Mohsen A. Rashwan; Wafaa K. Fayed

arxiv: 2606.24359 · v1 · pith:GKS2VHTBnew · submitted 2026-06-23 · 💻 cs.CL

Automatic Part-of-Speech Tagging of Arabic-English Dictionary Senses through WordNet

Diaa M. Fayed , Aly A. Fahmy , Mohsen A. Rashwan , Wafaa K. Fayed This is my paper

Pith reviewed 2026-06-25 23:59 UTC · model grok-4.3

classification 💻 cs.CL

keywords part-of-speech taggingbilingual dictionaryWordNetArabic-Englishsense disambiguationresource-light NLPWordNet-LMF

0 comments

The pith

POS tags from WordNet transfer to Arabic-English dictionary senses after disambiguation, yielding high accuracy at low cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an algorithm that assigns part-of-speech tags to senses in the Al-Mawrid Arabic-English bilingual dictionary by first disambiguating the English translation equivalences and then transferring their tags from the Princeton WordNet. This step is presented as a prerequisite for linking the dictionary to WordNet or converting it to WordNet-LMF format, where synsets rather than single words form the basic unit. The method is framed as resource-light, addressing the high cost of building large annotated corpora or expert lexicons for NLP tools in languages with limited resources. The authors report that the registered accuracy remains high despite the low cost of the process.

Core claim

The algorithm accomplishes POS tagging of bilingual dictionary senses by transferring the POS tags of the English translation equivalences to the dictionary senses after a disambiguation process. The English POS tags are acquired from the Princeton WordNet. This enables the required linking of a bilingual dictionary to WordNet and standardization into WordNet-LMF format.

What carries the argument

Disambiguation of English translation equivalences to map them to WordNet senses, followed by transfer of those senses' POS tags to the corresponding Arabic dictionary entries.

If this is right

Bilingual dictionaries can be linked directly to WordNet for semantic applications.
Dictionaries can be standardized into WordNet-LMF format with the synset as the basic unit.
NLP and HLT tools for low-resource languages become feasible without large annotated corpora or extensive expert lexicons.
Development time and investment for linguistic resources decrease while maintaining usable accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same transfer approach could be tested on other bilingual dictionaries that supply English translations and WordNet coverage.
Once tagged, the dictionary could support downstream tasks such as Arabic text analysis or cross-lingual retrieval that rely on POS information.
Iterative refinement of the disambiguation step might further lower the already low cost without new expert annotation.

Load-bearing premise

The disambiguation process can reliably map English translation equivalences to the correct WordNet senses so that the transferred POS tags are accurate for the Arabic senses.

What would settle it

Manual verification on a sample of several hundred tagged dictionary senses showing accuracy well below the level reported in the paper, or failure to produce a valid WordNet mapping for most entries.

read the original abstract

This paper proposed an algorithm for part-of-speech (POS) tagging senses of a bilingual dictionary. The algorithm is applied on the Al-Mawrid Arabic-English dictionary. The tagging task is accomplished by transferring the POS tags of the English translation equivalences (TEs) to the dictionary senses after dis-ambiguities process. The English POS tags of senses are acquired from the Princeton WordNet. POS tagging of bilingual dictionary senses is prerequisite to link a bilingual dictionary to WordNet and/or standardizing that dictionary into WordNet-LMF format where the synset (set of synonyms), not word, is the basic brick. The registered accuracy is high though the cost is little. Building NLP/HLT tools needs linguistic experts, large investments, and long time. For statistical approach, we need large annotated corpora and for rule-based approach, we need large lexicon that contains rich linguistic and world knowledge. That motivates the appearance of what are called resource-light approaches to develop natural language processing (NLP) tools for poor-resource languages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Straightforward WordNet transfer applied to one dictionary, but the high-accuracy claim rests on an undescribed disambiguation step with zero evaluation details.

read the letter

The core of this paper is transferring POS tags from Princeton WordNet to senses in the Al-Mawrid Arabic-English dictionary by first mapping English translation equivalences and then disambiguating. The motivation for resource-light methods on Arabic is reasonable, and the goal of preparing a dictionary for WordNet-LMF standardization is practical. That is the main thing a colleague should know: it is an application of an existing pattern rather than a new technique or framework.

The work does what it sets out to do in outline. It identifies a low-cost route for a language pair where building large annotated resources from scratch is expensive, and it focuses on a concrete dictionary that matters for Arabic NLP. The abstract correctly notes that synset-level linking requires POS information on senses.

The soft spot is the evaluation. The abstract states that accuracy after the disambiguation process is high, yet supplies no algorithm for that step, no sample size, no gold-standard comparison, and no error breakdown. If the disambiguation is simple string matching or first-sense heuristics, sense granularity differences between WordNet and the dictionary could produce systematic mismatches. Without those numbers the central claim cannot be checked. The citation pattern is thin on prior dictionary-standardization work, which makes it harder to judge how much this instance adds.

This paper is for readers already working on Arabic lexical resources or WordNet integration who need quick annotation methods. A serious referee could usefully ask for the missing method description and results, so it clears the bar for peer review even though the current version is incomplete. I would not cite it until the evaluation appears.

Referee Report

2 major / 2 minor

Summary. The paper proposes an algorithm for part-of-speech (POS) tagging of senses in the Al-Mawrid Arabic-English bilingual dictionary. POS tags from English translation equivalents are transferred from Princeton WordNet synsets to Arabic senses after a disambiguation process. The approach is framed as a resource-light method to support linking bilingual dictionaries to WordNet or converting them to WordNet-LMF format, with the claim that accuracy is high at low cost.

Significance. If the disambiguation step and transfer process can be shown to work reliably, the method would provide a low-cost route to POS-tagging dictionary senses for under-resourced languages, reducing reliance on large annotated corpora or expert-built lexicons. This could aid standardization efforts for lexical resources where synsets rather than words are the basic unit.

major comments (2)

[Abstract] Abstract: The assertion that 'the registered accuracy is high though the cost is little' is unsupported by any quantitative results, sample size, gold-standard comparison, error analysis, or baseline. This is load-bearing for the central claim that the disambiguation process reliably maps TEs to WordNet senses for accurate POS transfer.
[Abstract] Abstract: The algorithm is described only as transferring POS 'after dis-ambiguities process' with no specification of the disambiguation procedure, sense selection criteria from WordNet, or handling of granularity mismatches between dictionary senses and WordNet synsets.

minor comments (2)

Typo and phrasing: 'dis-ambiguities process' should read 'disambiguation process'.
Tense consistency: 'This paper proposed an algorithm' is better rendered as 'This paper proposes an algorithm' in an abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and the specific comments on the abstract. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that 'the registered accuracy is high though the cost is little' is unsupported by any quantitative results, sample size, gold-standard comparison, error analysis, or baseline. This is load-bearing for the central claim that the disambiguation process reliably maps TEs to WordNet senses for accurate POS transfer.

Authors: We agree the abstract claim is unsupported by quantitative evidence in the current manuscript. The provided text contains only the high-level description without an evaluation section, sample sizes, accuracy figures, baselines or error analysis. We will revise the abstract to remove or qualify the unsupported assertion and add a new evaluation section reporting results on a held-out sample of dictionary entries, including accuracy, comparison to a simple baseline, and error analysis. revision: yes
Referee: [Abstract] Abstract: The algorithm is described only as transferring POS 'after dis-ambiguities process' with no specification of the disambiguation procedure, sense selection criteria from WordNet, or handling of granularity mismatches between dictionary senses and WordNet synsets.

Authors: The abstract is intentionally brief. The full manuscript text does not expand on the disambiguation steps, WordNet sense selection criteria, or granularity handling. We will revise the abstract to include a concise description of these elements and expand the methods section of the paper to specify the disambiguation procedure, sense selection rules, and approach to sense granularity differences. revision: yes

Circularity Check

0 steps flagged

No circularity; descriptive algorithm with no self-referential reductions or fitted predictions

full rationale

The paper describes an algorithm that transfers POS tags from Princeton WordNet English senses to Arabic dictionary senses after an unspecified disambiguation step. No equations, parameters, or derivations appear. The accuracy claim is presented as an empirical outcome rather than a quantity defined by its own outputs or by self-citation chains. The method does not reduce to any of the enumerated circular patterns; the disambiguation step is external to the reported result and does not create self-definition or fitted-input issues.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no free parameters, axioms, or invented entities are stated in the provided text.

axioms (1)

domain assumption English translation equivalences have reliable POS tags in Princeton WordNet that can be transferred after disambiguation
The transfer step rests on this premise; the abstract does not discuss coverage gaps or sense mismatches.

pith-pipeline@v0.9.1-grok · 5725 in / 1174 out tokens · 19252 ms · 2026-06-25T23:59:17.658166+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references

[1]

Inducing Multilingual Text Analysis Tools via Robust Projection Across Aligned Corpora,

D. Yarowsky, G. Ngai, and R. Wicentowski, "Inducing Multilingual Text Analysis Tools via Robust Projection Across Aligned Corpora," in Proceedings of the first international conference on Human language technology research, 2001, pp. 1-8

2001
[2]

Literality Based Sample Sorting for Syntax Projection,

B. Cavestro and N. Cancedda, "Literality Based Sample Sorting for Syntax Projection," 2005

2005
[3]

Bootstrapping Parsers via Syntactic Projection Across Parallel Texts,

R. Hwa, P. Resnik, A. Weinberg, C. Cabezas, and O. Kolak, "Bootstrapping Parsers via Syntactic Projection Across Parallel Texts," Natural language engineering, vol. 11, pp. 311-326, 2005

2005
[4]

Cross -Language Transfer of Syntactic Relations using Parallel Corpora,

V. B. Mititelu and R. Ion, "Cross -Language Transfer of Syntactic Relations using Parallel Corpora," in Cross- Language Knowledge Induction Workshop, Romania, 2005

2005
[5]

Automatic Import of Verbal Syntactic Relations using Parallel Corpora,

V. B. Mititelu and R. Ion, " Automatic Import of Verbal Syntactic Relations using Parallel Corpora," in Cross- Language Knowledge Induction Workshop, 2005

2005
[6]

Cross -Linguistic Projection of Role -Semantic Information,

S. Padó and M. Lapata, "Cross -Linguistic Projection of Role -Semantic Information," in Proceedings of the conference on Human L anguage Technology and Empirical Methods in Natural Language Processing , 2005, pp. 859-866

2005
[7]

Portable Language Technology: A Resource -light Approach to Morpho -syntactic Tagging,

A. Feldman, "Portable Language Technology: A Resource -light Approach to Morpho -syntactic Tagging," Citeseer, 2006

2006
[8]

Experi ments in Cross -Language Morphological Annotation Transfer,

A. Feldman, J. Hana, and C. Brew, "Experi ments in Cross -Language Morphological Annotation Transfer," in Computational Linguistics and Intelligent Text Processing, ed: Springer, 2006, pp. 41-50

2006
[9]

A Cross -Language Approach to Rapid Creation of New Morpho - Syntactically Annotated Resources,

A. Feldman, J. Hana, and C. Brew, "A Cross -Language Approach to Rapid Creation of New Morpho - Syntactically Annotated Resources," Gen, vol. 115, pp. 6-6, 2006

2006
[10]

Feldman and J

A. Feldman and J. Hana, A Resource-Light Approach to Morpho-Syntactic Tagging: Rodopi, 2010

2010
[11]

Morphological Inference from Bitext for Resource -Poor Languages,

T. D. Szymanski, "Morphological Inference from Bitext for Resource -Poor Languages," The Unive rsity of Michigan, 2011

2011
[12]

Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora,

D. Yarowsky and G. Ngai, "Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora," 2001

2001
[13]

C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing: MIT press, 1999

1999
[14]

Al -Mawrid: A Modern Arabic -English Dictionary,

R. Baalbaki, "Al -Mawrid: A Modern Arabic -English Dictionary," 18 ed. Beirut, Lebanon: Dar El -Elm Lilmalayin, 2004

2004
[15]

Extracting Knowledge from an Arabic -English Machine-Readable Dictionary Using Information Extraction,

D. M. Fayed, A. A. Fahmy, M. A. Rashwan, and W. K. Fayed, "Extracting Knowledge from an Arabic -English Machine-Readable Dictionary Using Information Extraction," presented at the 5th International Conference on Arabic Language Processing (CITALA 2014), Oujda, Morocco, 2014

2014
[16]

Towards Structuring an Arabic -English Machine-Readable Dictionary Using Parsing Expression Grammars,

D. M. Fayed, A. A. Fahmy, M. A. Rashwan, and W. K. Fayed, "Towards Structuring an Arabic -English Machine-Readable Dictionary Using Parsing Expression Grammars," International Journal of Computational Linguistics Research, vol. 5, pp. 1-13, 2014

2014
[17]

Five papers on WordNet,

G. A. Miller, "Five papers on WordNet," Technical Report CLS -Rep-43, Cognitive Science Laboratory, Princeton University, 1993

1993
[18]

WordNet 3.0 Refere nce Manua : http://wordnet.princeton.edu/wordnet/documentation/, (accessed 23 October 2015)

2015
[19]

WordNet: http://wordnet.princeton.edu/, (accessed 23 October 2015)

2015
[20]

WordNet 3.0 Database: https://wordnet.princeton.edu/wordnet/download/current-version/, (accessed 23 October 2015)

2015
[21]

Natural Language Toolkit (NLTK): http://www.nltk.org/, (accessed 23 October 2015)

2015
[22]

Evaluating the Performance of Automated Part -of-Speech Taggers on an L2 Corpus,

C. Hagerman and ク. ヘガマン, "Evaluating the Performance of Automated Part -of-Speech Taggers on an L2 Corpus," 2012

2012
[23]

Modeling second language learners’ interlanguage and its variability,

S. Thouësny, "Modeling second language learners’ interlanguage and its variability," Dublin City University , 2011

2011
[24]

MultiWordNet: Developing an Aligned Multilingual Database,

E. Pianta, L. Bentivogli, and C. Girardi, "MultiWordNet: Developing an Aligned Multilingual Database," in Proc. 1st Int’l Conference on Global WordNet, 2002

2002
[25]

Bootstrapping a Multilingual Part -of-speech Tagger in One Person -day,

S. Cucerzan and D. Yarowsky, "Bootstrapping a Multilingual Part -of-speech Tagger in One Person -day," in proceedings of the 6th conference on Natural language learning-Volume 20, 2002, pp. 1-7. 9 Diaa El -Din Mohamed Abo -Fayed received the B.E . degree in the Electronics, Faculty of Engineering, Mansoura University, 1995. He received M.Sc. degree in the...

2002

[1] [1]

Inducing Multilingual Text Analysis Tools via Robust Projection Across Aligned Corpora,

D. Yarowsky, G. Ngai, and R. Wicentowski, "Inducing Multilingual Text Analysis Tools via Robust Projection Across Aligned Corpora," in Proceedings of the first international conference on Human language technology research, 2001, pp. 1-8

2001

[2] [2]

Literality Based Sample Sorting for Syntax Projection,

B. Cavestro and N. Cancedda, "Literality Based Sample Sorting for Syntax Projection," 2005

2005

[3] [3]

Bootstrapping Parsers via Syntactic Projection Across Parallel Texts,

R. Hwa, P. Resnik, A. Weinberg, C. Cabezas, and O. Kolak, "Bootstrapping Parsers via Syntactic Projection Across Parallel Texts," Natural language engineering, vol. 11, pp. 311-326, 2005

2005

[4] [4]

Cross -Language Transfer of Syntactic Relations using Parallel Corpora,

V. B. Mititelu and R. Ion, "Cross -Language Transfer of Syntactic Relations using Parallel Corpora," in Cross- Language Knowledge Induction Workshop, Romania, 2005

2005

[5] [5]

Automatic Import of Verbal Syntactic Relations using Parallel Corpora,

V. B. Mititelu and R. Ion, " Automatic Import of Verbal Syntactic Relations using Parallel Corpora," in Cross- Language Knowledge Induction Workshop, 2005

2005

[6] [6]

Cross -Linguistic Projection of Role -Semantic Information,

S. Padó and M. Lapata, "Cross -Linguistic Projection of Role -Semantic Information," in Proceedings of the conference on Human L anguage Technology and Empirical Methods in Natural Language Processing , 2005, pp. 859-866

2005

[7] [7]

Portable Language Technology: A Resource -light Approach to Morpho -syntactic Tagging,

A. Feldman, "Portable Language Technology: A Resource -light Approach to Morpho -syntactic Tagging," Citeseer, 2006

2006

[8] [8]

Experi ments in Cross -Language Morphological Annotation Transfer,

A. Feldman, J. Hana, and C. Brew, "Experi ments in Cross -Language Morphological Annotation Transfer," in Computational Linguistics and Intelligent Text Processing, ed: Springer, 2006, pp. 41-50

2006

[9] [9]

A Cross -Language Approach to Rapid Creation of New Morpho - Syntactically Annotated Resources,

A. Feldman, J. Hana, and C. Brew, "A Cross -Language Approach to Rapid Creation of New Morpho - Syntactically Annotated Resources," Gen, vol. 115, pp. 6-6, 2006

2006

[10] [10]

Feldman and J

A. Feldman and J. Hana, A Resource-Light Approach to Morpho-Syntactic Tagging: Rodopi, 2010

2010

[11] [11]

Morphological Inference from Bitext for Resource -Poor Languages,

T. D. Szymanski, "Morphological Inference from Bitext for Resource -Poor Languages," The Unive rsity of Michigan, 2011

2011

[12] [12]

Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora,

D. Yarowsky and G. Ngai, "Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora," 2001

2001

[13] [13]

C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing: MIT press, 1999

1999

[14] [14]

Al -Mawrid: A Modern Arabic -English Dictionary,

R. Baalbaki, "Al -Mawrid: A Modern Arabic -English Dictionary," 18 ed. Beirut, Lebanon: Dar El -Elm Lilmalayin, 2004

2004

[15] [15]

Extracting Knowledge from an Arabic -English Machine-Readable Dictionary Using Information Extraction,

D. M. Fayed, A. A. Fahmy, M. A. Rashwan, and W. K. Fayed, "Extracting Knowledge from an Arabic -English Machine-Readable Dictionary Using Information Extraction," presented at the 5th International Conference on Arabic Language Processing (CITALA 2014), Oujda, Morocco, 2014

2014

[16] [16]

Towards Structuring an Arabic -English Machine-Readable Dictionary Using Parsing Expression Grammars,

D. M. Fayed, A. A. Fahmy, M. A. Rashwan, and W. K. Fayed, "Towards Structuring an Arabic -English Machine-Readable Dictionary Using Parsing Expression Grammars," International Journal of Computational Linguistics Research, vol. 5, pp. 1-13, 2014

2014

[17] [17]

Five papers on WordNet,

G. A. Miller, "Five papers on WordNet," Technical Report CLS -Rep-43, Cognitive Science Laboratory, Princeton University, 1993

1993

[18] [18]

WordNet 3.0 Refere nce Manua : http://wordnet.princeton.edu/wordnet/documentation/, (accessed 23 October 2015)

2015

[19] [19]

WordNet: http://wordnet.princeton.edu/, (accessed 23 October 2015)

2015

[20] [20]

WordNet 3.0 Database: https://wordnet.princeton.edu/wordnet/download/current-version/, (accessed 23 October 2015)

2015

[21] [21]

Natural Language Toolkit (NLTK): http://www.nltk.org/, (accessed 23 October 2015)

2015

[22] [22]

Evaluating the Performance of Automated Part -of-Speech Taggers on an L2 Corpus,

C. Hagerman and ク. ヘガマン, "Evaluating the Performance of Automated Part -of-Speech Taggers on an L2 Corpus," 2012

2012

[23] [23]

Modeling second language learners’ interlanguage and its variability,

S. Thouësny, "Modeling second language learners’ interlanguage and its variability," Dublin City University , 2011

2011

[24] [24]

MultiWordNet: Developing an Aligned Multilingual Database,

E. Pianta, L. Bentivogli, and C. Girardi, "MultiWordNet: Developing an Aligned Multilingual Database," in Proc. 1st Int’l Conference on Global WordNet, 2002

2002

[25] [25]

Bootstrapping a Multilingual Part -of-speech Tagger in One Person -day,

S. Cucerzan and D. Yarowsky, "Bootstrapping a Multilingual Part -of-speech Tagger in One Person -day," in proceedings of the 6th conference on Natural language learning-Volume 20, 2002, pp. 1-7. 9 Diaa El -Din Mohamed Abo -Fayed received the B.E . degree in the Electronics, Faculty of Engineering, Mansoura University, 1995. He received M.Sc. degree in the...

2002