When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates

Bashar Alhafni; Junhong Liang; Noor Abo Mokh

arxiv: 2606.13218 · v1 · pith:RCO2JFVQnew · submitted 2026-06-11 · 💻 cs.CL

When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates

Junhong Liang , Noor Abo Mokh , Bashar Alhafni This is my paper

Pith reviewed 2026-06-27 06:39 UTC · model grok-4.3

classification 💻 cs.CL

keywords ArabicHebrewcognatesfalse friendsloanwordsLLMscross-lingual semanticsbenchmark

0 comments

The pith

Large language models rely on surface-form similarity for Arabic-Hebrew pairs and lose accuracy on false friends and loanwords.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates SemCog Bench, a collection of 1858 Arabic-Hebrew word pairs annotated at the sentence level for whether pairs are true cognates or misleading false friends and loanwords. Evaluation of multiple LLMs across raw, diacritized, Romanized, and phonetic inputs shows strong results on genuine cognates but sharp drops elsewhere. The pattern points to models defaulting to visual or orthographic overlap instead of resolving actual meaning. Adding full-sentence context produces only small gains, leaving the core form-meaning conflict largely unaddressed.

Core claim

Arabic and Hebrew share a lexicon of true cognates, false friends, and modern loanwords that creates form-meaning conflicts for LLMs. Testing on SemCog Bench shows models reach high accuracy on true cognates yet drop sharply on false friends and loanwords, driven by surface-form similarity; sentence context yields only modest gains and does not overcome the misleading signals.

What carries the argument

SemCog Bench, the curated benchmark of 1858 annotated Arabic-Hebrew pairs evaluated under four input representations to isolate reliance on surface similarity versus semantic disambiguation.

If this is right

Current LLMs exhibit a limitation in separating form from meaning across related languages.
Surface similarity overrides contextual cues when the two conflict.
Sentence-level information alone does not resolve form-meaning mismatches.
SemCog Bench supplies a concrete test set for measuring progress on cross-lingual semantic reasoning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same surface-form bias may appear in other language pairs that share cognates and loanwords.
Explicit training signals that penalize false-friend errors could reduce the observed gap.
Extending the benchmark to spoken or additional written varieties would check whether the pattern holds beyond the current sample.

Load-bearing premise

The manually created labels for cognate status and semantic disambiguation accurately represent the Arabic-Hebrew lexicon and the chosen input formats expose the relevant conflicts.

What would settle it

Re-running the same models on a fresh, independently verified collection of Arabic-Hebrew pairs and finding no accuracy gap between true cognates and false friends would falsify the surface-similarity reliance claim.

Figures

Figures reproduced from arXiv: 2606.13218 by Bashar Alhafni, Junhong Liang, Noor Abo Mokh.

**Figure 2.** Figure 2: Overview of the data construction pipeline and evaluation framework for SemCog Bench. (Left) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Relative change (∆) in accuracy (Acc) and average F1 compared to the undiacritized baseline across input representations. Values are reported in percentage points. 5 Result & Analysis RQ1 : Classification and Disambiguation Table 1 presents the results for 3-class cognate identification and semantic disambiguation using the undiacritized input representation. Although proprietary models generally outper… view at source ↗

**Figure 5.** Figure 5: Overall accuracy under different input repre [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Arabic and Hebrew, as closely related Semitic languages, share a substantial lexicon of true cognates, misleading false friends, and modern loanwords. This overlap poses a challenge for cross-lingual semantic understanding in large language models (LLMs). To evaluate this capability, we introduce SemCog Bench, a curated benchmark of 1,858 Arabic--Hebrew word pairs with sentence-level annotations for cognate identification and semantic disambiguation. We evaluate open-source and commercial LLMs across multiple input representations (raw, diacritized, Romanized, and phonetic) and reveal a critical gap in cross-lingual reasoning. While models achieve high accuracy on true cognates, performance drops sharply on false friends and loanwords, reflecting a strong reliance on surface-form similarity. Furthermore, sentence-level context yields only modest improvements, suggesting that contextual cues alone are insufficient to overcome misleading form-based signals. These findings reveal a fundamental limitation of current LLMs in resolving cross-lingual form--meaning conflicts and establish SemCog Bench as a rigorous benchmark for multilingual semantic reasoning. Our code and data are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SemCog Bench is a new resource for Arabic-Hebrew cognate evaluation, but the main claims rest on unvalidated manual labels with no agreement metrics.

read the letter

The paper introduces SemCog Bench, a set of 1,858 Arabic-Hebrew word pairs labeled for true cognates, false friends, and loanwords, along with sentence annotations. It tests LLMs across raw, diacritized, Romanized, and phonetic inputs and reports high accuracy on true cognates but sharp drops on false friends and loanwords, with only modest help from sentence context.

The work does one clear thing well: it releases the data and code, and it runs a controlled comparison across model types and input formats. That setup makes the form-meaning conflict concrete and gives others a starting point for similar tests in related languages.

The soft spot is the annotation process. The labels come from manual curation, yet the paper reports no inter-annotator agreement, no adjudication steps, and no cross-check against an external lexicon. The stress-test note flags this directly. If even a modest fraction of the false-friend labels are noisy, the reported performance gap becomes partly an artifact of the data rather than evidence of model behavior. That issue sits at the center of the claim, so it is not minor.

The rest of the evaluation looks standard for this kind of benchmark paper, with no obvious circularity or invented quantities. The abstract patterns are clear, but their strength tracks the label quality.

This paper is for people working on multilingual semantic reasoning or building test sets for closely related languages. The dataset itself could be useful to others even if the current conclusions need more support.

I would send it to peer review. The benchmark addresses a practical gap, and referees can push on the validation details without starting from zero.

Referee Report

2 major / 2 minor

Summary. The paper introduces SemCog Bench, a manually curated dataset of 1,858 Arabic-Hebrew word pairs with sentence-level annotations for cognate identification and semantic disambiguation. It evaluates open-source and commercial LLMs across raw, diacritized, Romanized, and phonetic input representations, claiming high accuracy on true cognates but sharp performance drops on false friends and loanwords (indicating surface-form reliance), with only modest gains from sentence context.

Significance. If the benchmark labels prove reliable, the work identifies a concrete limitation in LLMs' cross-lingual semantic reasoning for related languages and provides a public benchmark plus code for future evaluation; the public data release is a clear strength.

major comments (2)

[§3] §3 (SemCog Bench construction): the 1,858 pairs were produced by manual curation, yet no inter-annotator agreement figures, adjudication protocol, or comparison to an independent lexicon are reported; because all headline performance gaps (true cognates vs. false friends/loanwords) are measured against these labels, label noise would directly artifact the central claim of surface-form reliance.
[§4] §4 (experimental setup): exact model versions/checkpoints, prompt templates, and any data-leakage checks are not specified; without these, it is impossible to verify whether the reported accuracy patterns reflect genuine form-meaning conflicts or artifacts of training-data overlap.

minor comments (2)

[§4.2] The description of how sentence-level context is concatenated with word pairs could be clarified with an explicit example prompt.
[§5] Table 2 (or equivalent results table) would benefit from explicit statistical significance markers for the reported accuracy drops.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below and will revise the manuscript to improve transparency on dataset construction and experimental details.

read point-by-point responses

Referee: [§3] §3 (SemCog Bench construction): the 1,858 pairs were produced by manual curation, yet no inter-annotator agreement figures, adjudication protocol, or comparison to an independent lexicon are reported; because all headline performance gaps (true cognates vs. false friends/loanwords) are measured against these labels, label noise would directly artifact the central claim of surface-form reliance.

Authors: We acknowledge that the original manuscript does not report inter-annotator agreement figures or an explicit adjudication protocol. The 1,858 pairs were curated by a small team of linguists with expertise in both languages through iterative review and consensus. In revision we will expand §3 with a detailed description of the curation and adjudication process and will add a comparison of the labels against available bilingual lexicons where such resources exist. These additions will allow readers to evaluate label reliability directly. revision: yes
Referee: [§4] §4 (experimental setup): exact model versions/checkpoints, prompt templates, and any data-leakage checks are not specified; without these, it is impossible to verify whether the reported accuracy patterns reflect genuine form-meaning conflicts or artifacts of training-data overlap.

Authors: We agree that precise experimental specifications are required for reproducibility. The revised version will list the exact model checkpoints (Hugging Face identifiers for open-source models and API versions for commercial models), include the complete prompt templates in an appendix, and report data-leakage checks such as n-gram overlap analysis between the benchmark and publicly known training corpora. These changes will confirm that the performance patterns arise from form-meaning conflicts rather than memorization. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation against external annotations

full rationale

The paper introduces SemCog Bench as a manually curated collection of 1,858 Arabic-Hebrew pairs and reports LLM accuracies on cognate identification and semantic disambiguation by direct comparison of model outputs to those annotations. No equations, parameter fits, or derivations appear; performance figures are not constructed from the paper's own inputs but measured against held-out labels. No self-citation chains, uniqueness theorems, or ansatzes are invoked to support the central claims. The evaluation is therefore self-contained against an external benchmark and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical benchmark paper with no mathematical model, derivations, or postulated theoretical entities; the benchmark itself is a constructed dataset rather than an invented physical or formal object.

pith-pipeline@v0.9.1-grok · 5723 in / 1040 out tokens · 20676 ms · 2026-06-27T06:39:30.762078+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 3 linked inside Pith

[1]

Technical report, IFM

Jais 2: A family of Arabic-centric open large language models. Technical report, IFM. Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Se- bastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom,...

arXiv 2024
[2]

Preprint, arXiv:2406.12793

Chatglm: A family of large language mod- els from glm-130b to glm-4 all tools. Preprint, arXiv:2406.12793. Juan Moreno Gonzalez, Bashar Alhafni, and Nizar Habash. 2026. A tale of two scripts: Transliteration and post-correction for judeo-arabic. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics ...

Pith/arXiv arXiv 2026
[3]

On Arabic Transliteration. In A. van den Bosch and A. Soudi, editors, Arabic Computational Morphology: Knowledge-based and Empirical Meth- ods, pages 15–22. Springer, Netherlands. Nizar Y Habash. 2010. Introduction to Arabic natural language processing, volume 3. Morgan & Claypool Publishers. Bradley Hauer and Grzegorz Kondrak. 2015. Auto- matic cognate i...

Pith/arXiv arXiv 2010
[4]

Preprint, arXiv:2506.23929

Impact: Inflectional morphology probes across complex typologies. Preprint, arXiv:2506.23929. Djamé Seddah, Reut Tsarfaty, Sandra Kübler, Marie Candito, Jinho D. Choi, Richárd Farkas, Jennifer Fos- ter, Iakes Goenaga, Koldo Gojenola Galletebeitia, Yoav Goldberg, Spence Green, Nizar Habash, Marco Kuhlmann, Wolfgang Maier, Joakim Nivre, Adam Przepiórkowski,...

arXiv 2013
[5]

Werner Vach and Oke Gerke

From spmrl to nmrl: What did we learn (and unlearn) in a decade of parsing morphologically-rich languages (mrls)? In Proceedings of the 58th an- nual meeting of the Association for Computational Linguistics, pages 7396–7408. Werner Vach and Oke Gerke. 2023. Gwet’s ac1 is not a substitute for cohen’s kappa–a comparison of basic properties. MethodsX, 10:102...

Pith/arXiv arXiv 2023
[6]

These words were initially classified using LLMs

Task Overview The objective of this annotation task is to classify Arabic-Hebrew word pairs as either True Cognates or False Friends based on semantic equivalence. These words were initially classified using LLMs. Your role as an annotator to verify this classification. After that, you will be required to classify whether Arabic-Hebrew word pairs are loan...
[7]

At least one core meaning must be present in both languages

Annotation Criteria 2.1 Stage 1: Cognate Annotation Label Definition True Cognate Words which share a common Semitic root AND have semantically overlapping meanings (share a meaning). At least one core meaning must be present in both languages. (ex. The words חלב and حليب which mean milk are true cognates) False Friend Words share similar form BUT have co...
[8]

Read the two words and their meanings, is the meaning correct?
[9]

Check if any meaning overlaps between the two words from the two languages
[10]

If they have one shared meaning, then the words are True Cognate
[11]

If you accept the classification in column H (type), be it True cognate or False friends, then in annotator_type (column K), choose Accept, otherwise, choose Reject
[12]

words that are not commonly used in the two languages), only keep words that are used in MSA and Modern Hebrew

Please Reject the borrowed word from English, archaic words (i.e. words that are not commonly used in the two languages), only keep words that are used in MSA and Modern Hebrew. Also reject, and names of places/locations, animals etc. As they are not considered cognates
[13]

If you have any comments, add a note in column L
[14]

reasoning

You will also see a “reasoning” column; this is generated by Gemini-3.1-pro-preview. This is only for researchers’ references; please use your own judgement when providing your annotation! Google Sheet Columns to Fill: Column Required Description annotator_type (col. K) Yes Accept/Reject note Optional Brief explanation or comment (if needed) 2.2 Stage 2: ...
[15]

Read the Arabic sentence (and its translation if not sure about meaning)
[16]

Natural" or

Evaluate naturalness: Does it sound "Natural" or "Unnatural" based on the definition above?
[17]

Add comment if needed
[18]

Read the Hebrew sentence and its translation
[19]

Natural" or

Evaluate naturalness: "Natural" or "Unnatural" based on the definition above of naturalness
[20]

Add optional note if needed
[21]

There may be issues with translations

You don't need to verify the correctness of English translation; it is used for easier understanding. There may be issues with translations
[22]

The English meaning of Arabic and Hebrew words is attached also for reference
[23]

If you marked the words as cognates and you find that the sentences are awkward (unnatural), please suggest a new sentence without changing the word form. 2.3 Stage 3: Loanword word-level annotation Loanwords: the word that has been borrowed from one language (the donor language) and incorporated into the vocabulary of another language (the recipient lang...
[24]

The word should originate from a non-Semitic source language, Both Arabic and Hebrew forms should be phonologically similar

Verify the word is a genuine loanword, The word should be commonly used in Modern Standard Arabic (MSA) and Modern Hebrew. The word should originate from a non-Semitic source language, Both Arabic and Hebrew forms should be phonologically similar
[25]

The provided meaning should be correct for BOTH languages
[26]

If meanings differ significantly between Arabic and Hebrew, note this in the comments
[27]

iv）Words that are completely different in meaning between the two languages

Words to REJECT: i) Words that are NOT loanwords (native Semitic words) ii) Archaic words that are rarely used in modern contexts Proper nouns (names of people, places, brands) iii) Words that are only used in dialectal Arabic, not MSA. iv）Words that are completely different in meaning between the two languages
[28]

please use your own judgement when providing your annotation!

You will also see other columns such as entry, id, arabic_ipa, hebrew_ipa, phonetic_similairty, loan_source, those columns are only used for easier understanding by researchers. please use your own judgement when providing your annotation!
[29]

2.4 Stage 4: Loanword Sentence-level annotation The requirement is same as stage 2

If you have any comments, add a note in column M. 2.4 Stage 4: Loanword Sentence-level annotation The requirement is same as stage 2
[30]

Strict Guidelines & Critical Rules Here are some important rules to follow:
[31]

NO use of LLMs or AI tools (ChatGPT, Claude, Gemini, etc.)
[32]

NO use of machine translation
[33]

NO guessing — consult a dictionary if uncertain
[34]

Verify meanings independently using trusted dictionaries
[35]

Maintain consistency throughout the dataset
[36]

Add notes to explain unclear or borderline cases
[37]

slave/servant

Additional examples Example 1: True Cognate (Stage 1) [Input Variables] arabic_undiac hebrew_undiac arabic_meaning hebrew_meaning عبد עבד slave; servant; worship slave; to work [Expected Annotation] annotator_type note True Cognate Both share core meaning "slave/servant" from root ʕ-b-d Example 2: False Friend (Stage 1) [Input Variables] arabic_undiac heb...
[38]

Common Mistakes to Avoid
[39]

Over-reliance on form — Similar-looking words may be False Friends if meanings differ
[40]

Ignoring polysemy — One shared meaning is sufficient for True Cognate
[41]

Translation bias — Verify meanings in original languages
[42]

Conjugation differences are expected and should not be a factor in the annotation decision. Final Submission Checklist • All entries have annotator_type filled (Stage 1) • All entries have arabic_sentence_natural filled (Stage 2) • All entries have hebrew_sentence_natural filled (Stage 2) • Notes added for unclear or borderline cases • No entries skipped ...

2026

[1] [1]

Technical report, IFM

Jais 2: A family of Arabic-centric open large language models. Technical report, IFM. Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Se- bastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom,...

arXiv 2024

[2] [2]

Preprint, arXiv:2406.12793

Chatglm: A family of large language mod- els from glm-130b to glm-4 all tools. Preprint, arXiv:2406.12793. Juan Moreno Gonzalez, Bashar Alhafni, and Nizar Habash. 2026. A tale of two scripts: Transliteration and post-correction for judeo-arabic. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics ...

Pith/arXiv arXiv 2026

[3] [3]

On Arabic Transliteration. In A. van den Bosch and A. Soudi, editors, Arabic Computational Morphology: Knowledge-based and Empirical Meth- ods, pages 15–22. Springer, Netherlands. Nizar Y Habash. 2010. Introduction to Arabic natural language processing, volume 3. Morgan & Claypool Publishers. Bradley Hauer and Grzegorz Kondrak. 2015. Auto- matic cognate i...

Pith/arXiv arXiv 2010

[4] [4]

Preprint, arXiv:2506.23929

Impact: Inflectional morphology probes across complex typologies. Preprint, arXiv:2506.23929. Djamé Seddah, Reut Tsarfaty, Sandra Kübler, Marie Candito, Jinho D. Choi, Richárd Farkas, Jennifer Fos- ter, Iakes Goenaga, Koldo Gojenola Galletebeitia, Yoav Goldberg, Spence Green, Nizar Habash, Marco Kuhlmann, Wolfgang Maier, Joakim Nivre, Adam Przepiórkowski,...

arXiv 2013

[5] [5]

Werner Vach and Oke Gerke

From spmrl to nmrl: What did we learn (and unlearn) in a decade of parsing morphologically-rich languages (mrls)? In Proceedings of the 58th an- nual meeting of the Association for Computational Linguistics, pages 7396–7408. Werner Vach and Oke Gerke. 2023. Gwet’s ac1 is not a substitute for cohen’s kappa–a comparison of basic properties. MethodsX, 10:102...

Pith/arXiv arXiv 2023

[6] [6]

These words were initially classified using LLMs

Task Overview The objective of this annotation task is to classify Arabic-Hebrew word pairs as either True Cognates or False Friends based on semantic equivalence. These words were initially classified using LLMs. Your role as an annotator to verify this classification. After that, you will be required to classify whether Arabic-Hebrew word pairs are loan...

[7] [7]

At least one core meaning must be present in both languages

Annotation Criteria 2.1 Stage 1: Cognate Annotation Label Definition True Cognate Words which share a common Semitic root AND have semantically overlapping meanings (share a meaning). At least one core meaning must be present in both languages. (ex. The words חלב and حليب which mean milk are true cognates) False Friend Words share similar form BUT have co...

[8] [8]

Read the two words and their meanings, is the meaning correct?

[9] [9]

Check if any meaning overlaps between the two words from the two languages

[10] [10]

If they have one shared meaning, then the words are True Cognate

[11] [11]

If you accept the classification in column H (type), be it True cognate or False friends, then in annotator_type (column K), choose Accept, otherwise, choose Reject

[12] [12]

words that are not commonly used in the two languages), only keep words that are used in MSA and Modern Hebrew

Please Reject the borrowed word from English, archaic words (i.e. words that are not commonly used in the two languages), only keep words that are used in MSA and Modern Hebrew. Also reject, and names of places/locations, animals etc. As they are not considered cognates

[13] [13]

If you have any comments, add a note in column L

[14] [14]

reasoning

You will also see a “reasoning” column; this is generated by Gemini-3.1-pro-preview. This is only for researchers’ references; please use your own judgement when providing your annotation! Google Sheet Columns to Fill: Column Required Description annotator_type (col. K) Yes Accept/Reject note Optional Brief explanation or comment (if needed) 2.2 Stage 2: ...

[15] [15]

Read the Arabic sentence (and its translation if not sure about meaning)

[16] [16]

Natural" or

Evaluate naturalness: Does it sound "Natural" or "Unnatural" based on the definition above?

[17] [17]

Add comment if needed

[18] [18]

Read the Hebrew sentence and its translation

[19] [19]

Natural" or

Evaluate naturalness: "Natural" or "Unnatural" based on the definition above of naturalness

[20] [20]

Add optional note if needed

[21] [21]

There may be issues with translations

You don't need to verify the correctness of English translation; it is used for easier understanding. There may be issues with translations

[22] [22]

The English meaning of Arabic and Hebrew words is attached also for reference

[23] [23]

If you marked the words as cognates and you find that the sentences are awkward (unnatural), please suggest a new sentence without changing the word form. 2.3 Stage 3: Loanword word-level annotation Loanwords: the word that has been borrowed from one language (the donor language) and incorporated into the vocabulary of another language (the recipient lang...

[24] [24]

The word should originate from a non-Semitic source language, Both Arabic and Hebrew forms should be phonologically similar

Verify the word is a genuine loanword, The word should be commonly used in Modern Standard Arabic (MSA) and Modern Hebrew. The word should originate from a non-Semitic source language, Both Arabic and Hebrew forms should be phonologically similar

[25] [25]

The provided meaning should be correct for BOTH languages

[26] [26]

If meanings differ significantly between Arabic and Hebrew, note this in the comments

[27] [27]

iv）Words that are completely different in meaning between the two languages

Words to REJECT: i) Words that are NOT loanwords (native Semitic words) ii) Archaic words that are rarely used in modern contexts Proper nouns (names of people, places, brands) iii) Words that are only used in dialectal Arabic, not MSA. iv）Words that are completely different in meaning between the two languages

[28] [28]

please use your own judgement when providing your annotation!

You will also see other columns such as entry, id, arabic_ipa, hebrew_ipa, phonetic_similairty, loan_source, those columns are only used for easier understanding by researchers. please use your own judgement when providing your annotation!

[29] [29]

2.4 Stage 4: Loanword Sentence-level annotation The requirement is same as stage 2

If you have any comments, add a note in column M. 2.4 Stage 4: Loanword Sentence-level annotation The requirement is same as stage 2

[30] [30]

Strict Guidelines & Critical Rules Here are some important rules to follow:

[31] [31]

NO use of LLMs or AI tools (ChatGPT, Claude, Gemini, etc.)

[32] [32]

NO use of machine translation

[33] [33]

NO guessing — consult a dictionary if uncertain

[34] [34]

Verify meanings independently using trusted dictionaries

[35] [35]

Maintain consistency throughout the dataset

[36] [36]

Add notes to explain unclear or borderline cases

[37] [37]

slave/servant

Additional examples Example 1: True Cognate (Stage 1) [Input Variables] arabic_undiac hebrew_undiac arabic_meaning hebrew_meaning عبد עבד slave; servant; worship slave; to work [Expected Annotation] annotator_type note True Cognate Both share core meaning "slave/servant" from root ʕ-b-d Example 2: False Friend (Stage 1) [Input Variables] arabic_undiac heb...

[38] [38]

Common Mistakes to Avoid

[39] [39]

Over-reliance on form — Similar-looking words may be False Friends if meanings differ

[40] [40]

Ignoring polysemy — One shared meaning is sufficient for True Cognate

[41] [41]

Translation bias — Verify meanings in original languages

[42] [42]

Conjugation differences are expected and should not be a factor in the annotation decision. Final Submission Checklist • All entries have annotator_type filled (Stage 1) • All entries have arabic_sentence_natural filled (Stage 2) • All entries have hebrew_sentence_natural filled (Stage 2) • Notes added for unclear or borderline cases • No entries skipped ...

2026