arxiv: 2604.18490 · v1 · submitted 2026-04-20 · 💻 cs.CL · cs.AI

Recognition: unknown

LQM: Linguistically Motivated Multidimensional Quality Metrics for Machine Translation

Samar M. Magdy , Fakhraddin Alwajih , Abdellah El Mekki , Wesam El-Sayed , Muhammad Abdul-Mageed

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:09 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords machine translation evaluationerror taxonomyArabic dialectslinguistic metricsMT quality assessmentdiglossic languagesLLM translation analysis

0 comments

The pith

LQM introduces a six-level hierarchical taxonomy to diagnose machine translation errors through linguistic categories such as sociolinguistics and pragmatics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LQM as a new evaluation framework for machine translation that moves beyond language-agnostic metrics. Current tools like MQM often overlook errors tied to dialect, cultural context, and pragmatic fit, especially in diglossic languages such as Arabic. LQM organizes errors into six grounded levels to enable detailed span-level analysis. The authors build a parallel corpus across seven Arabic dialects from conversational sources and apply the taxonomy to annotate errors in LLM translations. This produces concrete data on error patterns and quality scores while remaining adaptable to other languages.

Core claim

LQM is a hierarchical error taxonomy organized around six linguistically motivated levels—sociolinguistics, pragmatics, semantics, morphosyntax, orthography, and graphetics—that allows systematic diagnosis of MT failures arising from variety mismatches, content gaps, and appropriateness issues rather than surface form alone, as shown by expert annotation of over 6,000 error spans in a new bidirectional Arabic dialect corpus.

What carries the argument

The LQM hierarchical error taxonomy with its six linguistically grounded levels, which structures span-level error labeling and severity scoring for MT output.

Load-bearing premise

The six levels together capture the main dialect- and culture-specific translation errors and that expert span annotations with this scheme are reliable enough to outperform existing MQM approaches.

What would settle it

An independent annotation study on the same corpus in which LQM labels show no stronger correlation with overall human quality ratings than standard MQM categories or automatic metrics like spBLEU.

Figures

Figures reproduced from arXiv: 2604.18490 by Abdellah El Mekki, Fakhraddin Alwajih, Muhammad Abdul-Mageed, Samar M. Magdy, Wesam El-Sayed.

**Figure 1.** Figure 1: Cross-lingual examples illustrating the proposed LQM framework’s linguistic levels, demonstrating its [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Data and annotation workflow. Casablanca dataset of seven Arabic dialects is translated by humans [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: LQM-annotated examples across the six linguistic levels. Severity and fine-grained error types are [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

Existing MT evaluation frameworks, including automatic metrics and human evaluation schemes such as Multidimensional Quality Metrics (MQM), are largely language-agnostic. However, they often fail to capture dialect- and culture-specific errors in diglossic languages (e.g., Arabic), where translation failures stem from mismatches in language variety, content coverage, and pragmatic appropriateness rather than surface form alone.We introduce LQM: Linguistically Motivated Multidimensional Quality Metrics for MT. LQM is a hierarchical error taxonomy for diagnosing MT errors through six linguistically grounded levels: sociolinguistics, pragmatics, semantics, morphosyntax, orthography, and graphetics (Figure 1). We construct a bidirectional parallel corpus of 3,850 sentences (550 per variety) spanning seven Arabic dialects (Egyptian, Emirati, Jordanian, Mauritanian, Moroccan, Palestinian, and Yemeni), derived from conversational, culturally rich content. We evaluate six LLMs in a zero-shot setting and conduct expert span-level human annotation using LQM, producing 6,113 labeled error spans across 3,495 unique erroneous sentences, along with severity-weighted quality scores. We complement this analysis with an automatic metric (spBLEU). Though validated here on Arabic, LQM is a language-agnostic framework designed to be easily applied to or adapted for other languages. LQM annotated errors data, prompts, and annotation guidelines are publicly available at https://github.com/UBC-NLP/LQM_MT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LQM adds a six-level error taxonomy for Arabic dialect MT and releases a new corpus, but skips inter-annotator agreement and direct comparisons to MQM.

read the letter

The paper's main contribution is a hierarchical taxonomy called LQM that layers sociolinguistics, pragmatics, semantics, morphosyntax, orthography, and graphetics on top of existing MT error schemes. They apply it to a fresh bidirectional corpus of 3,495 sentences drawn from seven Arabic dialects, produce 6,113 labeled spans from expert annotation, and run zero-shot tests on six LLMs while releasing the data and guidelines on GitHub. That resource-building step is the clearest value here; anyone working on dialectal Arabic or similar diglossic settings now has a public starting point for error analysis that goes beyond surface metrics like spBLEU.

Referee Report

3 major / 3 minor

Summary. The paper introduces LQM, a hierarchical error taxonomy with six linguistically grounded levels (sociolinguistics, pragmatics, semantics, morphosyntax, orthography, graphetics) for diagnosing MT errors, with emphasis on dialect- and culture-specific issues in diglossic languages such as Arabic. It describes construction of a bidirectional parallel corpus of 3,850 sentences (550 per variety) across seven Arabic dialects from conversational content, zero-shot evaluation of six LLMs, expert span-level annotation producing 6,113 labeled error spans on 3,495 sentences, severity-weighted quality scores, and an automatic spBLEU baseline. The framework is positioned as language-agnostic, with data, prompts, and guidelines released publicly.

Significance. If the taxonomy is shown to be comprehensive and the annotations reliable, LQM could meaningfully advance MT evaluation by supplying finer-grained, linguistically motivated diagnostics that address gaps in language-agnostic schemes such as MQM for low-resource and dialectal settings. The public release of the annotated corpus, annotation guidelines, and prompts is a clear strength that supports reproducibility and extension to other languages.

major comments (3)

[§3 and §4] §3 (Annotation procedure) and §4 (Results): No inter-annotator agreement statistics (Cohen's kappa, percentage overlap, or similar) are reported for the expert span-level annotations that produced the 6,113 labeled spans. This is load-bearing for the central claim that LQM enables reliable, reproducible diagnoses superior to MQM.
[§4 and §5] §4 (Results) and §5 (Discussion): No head-to-head comparison of error distributions, severity scores, or actionable insights between LQM and MQM annotations on the same 3,495 sentences is provided, leaving the asserted practical superiority of the six-level taxonomy unsupported by direct evidence.
[§2] §2 (LQM Taxonomy, Figure 1): The claim that the six levels comprehensively capture all relevant dialect- and culture-specific errors in diglossic languages is stated axiomatically without coverage analysis, examples of uncovered error types, or validation against an independent error inventory.

minor comments (3)

[Abstract and §3] The abstract states a corpus of 3,850 sentences while results refer to 3,495 unique erroneous sentences; clarify the relationship and any filtering criteria in the corpus construction section.
[§4] Table or figure presenting error counts per level and per dialect would strengthen the diagnostic claims; consider adding one if not already present.
[§6] The language-agnostic framing would benefit from a short subsection outlining concrete adaptation steps for a non-Arabic language.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights important aspects for strengthening the reliability and comparative claims of LQM. We address each major comment below with planned revisions where appropriate, while maintaining the manuscript's focus on the linguistically motivated taxonomy for dialectal MT evaluation.

read point-by-point responses

Referee: [§3 and §4] §3 (Annotation procedure) and §4 (Results): No inter-annotator agreement statistics (Cohen's kappa, percentage overlap, or similar) are reported for the expert span-level annotations that produced the 6,113 labeled spans. This is load-bearing for the central claim that LQM enables reliable, reproducible diagnoses superior to MQM.

Authors: We agree that inter-annotator agreement statistics are essential to substantiate the reliability of the expert annotations. The original manuscript omitted these details, as the annotations were conducted by trained linguists using standardized guidelines. In the revised version, we will add a dedicated subsection to §3 describing the annotation protocol and report agreement metrics (e.g., Cohen's kappa for category assignment and span overlap F1) computed on a double-annotated subset of sentences. This addition will directly support the reproducibility claims. revision: yes
Referee: [§4 and §5] §4 (Results) and §5 (Discussion): No head-to-head comparison of error distributions, severity scores, or actionable insights between LQM and MQM annotations on the same 3,495 sentences is provided, leaving the asserted practical superiority of the six-level taxonomy unsupported by direct evidence.

Authors: A full quantitative head-to-head comparison would require re-annotating the corpus with MQM, which exceeds the scope and resources of the current study. To address the concern, we will revise §5 to include a qualitative comparison using concrete examples from the Arabic data, illustrating error categories (particularly sociolinguistic and pragmatic) that LQM diagnoses more explicitly than MQM. We will also add an explicit limitations paragraph noting the absence of quantitative comparison and outlining it as future work. revision: partial
Referee: [§2] §2 (LQM Taxonomy, Figure 1): The claim that the six levels comprehensively capture all relevant dialect- and culture-specific errors in diglossic languages is stated axiomatically without coverage analysis, examples of uncovered error types, or validation against an independent error inventory.

Authors: The taxonomy was constructed from linguistic theory and refined iteratively against observed errors in the multi-dialect corpus. We will expand §2 in the revision to include: (i) a description of the iterative development process, (ii) concrete examples of dialect- and culture-specific errors mapped to each level, and (iii) a brief mapping to established MT error inventories (e.g., MQM and others) to demonstrate coverage. Any gaps identified during annotation will be explicitly noted as potential areas for extension. revision: yes

Circularity Check

0 steps flagged

No significant circularity; LQM is a novel taxonomy applied to independent data

full rationale

The paper introduces LQM as a new hierarchical error taxonomy defined by six linguistically grounded levels and applies it to a newly constructed bidirectional parallel corpus of 3,850 sentences across seven Arabic dialects. It reports expert span-level annotations producing 6,113 labeled error spans and severity-weighted scores, supplemented by spBLEU. No equations, fitted parameters called predictions, self-citations that are load-bearing, or reductions of claims to inputs by construction appear in the provided text. The framework is presented as independently defined and language-agnostic, then used on external data, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The paper introduces a new framework whose central claim rests on unvalidated assumptions about the taxonomy's completeness and the reliability of expert annotation; study design choices such as dialect selection and sentence counts function as free parameters.

free parameters (2)

Number of Arabic dialects covered
Seven dialects selected for the corpus construction to represent variety in diglossic Arabic.
Sentences per variety
550 sentences per dialect chosen to build the 3,850-sentence parallel corpus.

axioms (2)

domain assumption Expert human annotators can consistently and accurately apply the six-level LQM taxonomy to identify MT errors.
The evaluation relies on expert span-level human annotation producing 6,113 labeled spans without reported agreement metrics.
ad hoc to paper The six levels (sociolinguistics, pragmatics, semantics, morphosyntax, orthography, graphetics) are sufficient to diagnose all relevant dialect- and culture-specific errors.
The taxonomy is presented as the core contribution without prior empirical justification in the abstract.

invented entities (1)

LQM six-level hierarchical error taxonomy no independent evidence
purpose: To diagnose MT errors through linguistically grounded dimensions beyond surface form.
New categories proposed as the central framework; no independent evidence of their superiority or completeness is provided in the abstract.

pith-pipeline@v0.9.0 · 5591 in / 1665 out tokens · 40813 ms · 2026-05-10T04:09:39.785435+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

112 extracted references · 22 canonical work pages · 3 internal anchors

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972
[2]

Publications Manual , year = "1983", publisher =

1983
[3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
[5]

Dan Gusfield , title =. 1997

1997
[6]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015
[7]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
[8]

Conference on Machine Translation , year=

The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation , author=. Conference on Machine Translation , year=
[9]

ArXiv , year=

INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback , author=. ArXiv , year=
[10]

arXiv preprint arXiv:2208.05309 , year=

Looking for a needle in a haystack: A comprehensive study of hallucinations in neural machine translation , author=. arXiv preprint arXiv:2208.05309 , year=

work page arXiv
[11]

North American Chapter of the Association for Computational Linguistics , year=

Statistical Phrase-Based Post-Editing , author=. North American Chapter of the Association for Computational Linguistics , year=
[12]

Conference on Machine Translation , year=

Findings of the WMT 2023 Shared Task on Automatic Post-Editing , author=. Conference on Machine Translation , year=

2023
[13]

ArXiv , year=

Tower: An Open Multilingual Large Language Model for Translation-Related Tasks , author=. ArXiv , year=
[14]

Conference on Empirical Methods in Natural Language Processing , year=

Leveraging GPT-4 for Automatic Translation Post-Editing , author=. Conference on Empirical Methods in Natural Language Processing , year=
[15]

NAACL-HLT , year=

Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations , author=. NAACL-HLT , year=
[16]

NAACL-HLT , year=

LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback , author=. NAACL-HLT , year=
[17]

Conference on Empirical Methods in Natural Language Processing , year=

xTower: A Multilingual LLM for Explaining and Correcting Translation Errors , author=. Conference on Empirical Methods in Natural Language Processing , year=
[18]

Proceedings of ArabicNLP 2023 , pages=

TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties , author=. Proceedings of ArabicNLP 2023 , pages=

2023
[19]

Open source software available from https://github

Label studio: Data labeling software , author=. Open source software available from https://github. com/heartexlabs/label-studio , volume=
[20]

arXiv preprint arXiv:2403.11430 , year=

A novel paradigm boosting translation capabilities of large language models , author=. arXiv preprint arXiv:2403.11430 , year=

work page arXiv
[21]

ArXiv , year=

Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model , author=. ArXiv , year=
[22]

European Association for Machine Translation Conferences/Workshops , year=

Aligning Neural Machine Translation Models: Human Feedback in Training and Inference , author=. European Association for Machine Translation Conferences/Workshops , year=
[23]

ArXiv , year=

Reinforced Self-Training (ReST) for Language Modeling , author=. ArXiv , year=
[24]

ArXiv , year=

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation , author=. ArXiv , year=
[25]

EXPLAINABLE CED: A Dataset for Explainable Critical Error Detection in Machine Translation , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop) , pages=. 2024 , url=

2024
[26]

Proceedings of ACL 2025 (to appear) , year=

TranslationCorrect: A Human-Centered Post-Editing Framework for Error-Aware Machine Translation , author=. Proceedings of ACL 2025 (to appear) , year=

2025
[27]

Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

Tear: Improving llm-based machine translation with systematic self-refinement , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=. 2025 , url=

2025
[28]

arXiv preprint arXiv:2405.12534 , year=

Fine-tuning Arabic LLMs for Lebanese Dialect Translation and Evaluation , author=. arXiv preprint arXiv:2405.12534 , year=

work page arXiv
[29]

and Waheed, Abdul and Khondaker, Md Tawkat Islam and El-Shangiti, Ahmed Oumar and Nagoudi, El Moatez Billah and Abdul-Mageed, Muhammad

Kadaoui, Karima and Magdy, Samar M. and Waheed, Abdul and Khondaker, Md Tawkat Islam and El-Shangiti, Ahmed Oumar and Nagoudi, El Moatez Billah and Abdul-Mageed, Muhammad. TARJAMAT : Evaluation of Bard and C hat GPT on Machine Translation of Ten A rabic Varieties. Proceedings of ArabicNLP 2023. 2023. doi:10.18653/v1/2023.arabicnlp-1.6

work page doi:10.18653/v1/2023.arabicnlp-1.6 2023
[30]

Proceedings of The Second Arabic Natural Language Processing Conference , pages=

Arabic train at NADI 2024 shared task: LLMs’ ability to translate Arabic dialects into Modern Standard Arabic , author=. Proceedings of The Second Arabic Natural Language Processing Conference , pages=. 2024 , url=

2024
[31]

OSACT 2024 task 2: Arabic dialect to MSA translation , author=. Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation@ LREC-COLING 2024 , pages=. 2024 , url=

2024
[32]

arXiv preprint arXiv:2406.19482 , year=

xTOWER: Multilingual Translation Error Explanation and Correction with Large Language Models , author=. arXiv preprint arXiv:2406.19482 , year=

work page arXiv
[33]

The Twelfth International Conference on Learning Representations , year=

Sparse moe with language guided routing for multilingual machine translation , author=. The Twelfth International Conference on Learning Representations , year=
[34]

Applied Sciences , volume=

Domain Adaptation for Arabic Machine Translation: Financial Texts as a Case Study , author=. Applied Sciences , volume=. 2024 , publisher=

2024
[35]

Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018) , year=

The MADAR Arabic dialect corpus and lexicon , author=. Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018) , year=

2018
[36]

, author=

Conventional orthography for dialectal Arabic. , author=. LREC , pages=. 2012 , url=

2012
[37]

Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , year=

Machine Translation of Arabic Dialects , author=. Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , year=

2012
[38]

A ra B ench: Benchmarking Dialectal A rabic- E nglish Machine Translation

Sajjad, Hassan and Abdelali, Ahmed and Durrani, Nadir and Dalvi, Fahim. A ra B ench: Benchmarking Dialectal A rabic- E nglish Machine Translation. Proceedings of the 28th International Conference on Computational Linguistics. 2020. doi:10.18653/v1/2020.coling-main.447

work page doi:10.18653/v1/2020.coling-main.447 2020
[39]

Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) , year=

ArBERT and MARBERT: Deep Bidirectional Transformers for Arabic , author=. Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) , year=
[40]

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) , year=

Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics , author=. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) , year=
[41]

Machine Translation , year=

Quantitative fine-grained human evaluation of machine translation systems: a case study on English to Croatian , author=. Machine Translation , year=
[42]

ArXiv , year=

How Well Do Large Reasoning Models Translate? A Comprehensive Evaluation for Multi-Domain Machine Translation , author=. ArXiv , year=
[43]

ArXiv , year=

New Trends for Modern Machine Translation with Large Reasoning Models , author=. ArXiv , year=
[44]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Drt: Deep reasoning translation via long chain-of-thought , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=. 2025 , url=

2025
[45]

ArXiv , year=

R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning , author=. ArXiv , year=
[46]

arXiv preprint arXiv:2504.10187 , year=

Deep reasoning translation via reinforcement learning , author=. arXiv preprint arXiv:2504.10187 , year=

work page arXiv
[47]

ArXiv , year=

MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning , author=. ArXiv , year=
[48]

ArXiv , year=

MAATS: A Multi-Agent Automated Translation System Based on MQM Evaluation , author=. ArXiv , year=
[49]

ArXiv , year=

Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation , author=. ArXiv , year=
[50]

European Association for Machine Translation Conferences/Workshops , year=

Using a new analytic measure for the annotation and analysis of MT errors on real data , author=. European Association for Machine Translation Conferences/Workshops , year=
[51]

Transactions of the Association for Computational Linguistics , year=

Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation , author=. Transactions of the Association for Computational Linguistics , year=
[52]

European Association for Machine Translation Conferences/Workshops , year=

Correct Me If You Can: Learning from Error Corrections and Markings , author=. European Association for Machine Translation Conferences/Workshops , year=
[53]

Gemma 2: Improving Open Language Models at a Practical Size

Gemma 2: Improving open language models at a practical size , author=. arXiv preprint arXiv:2408.00118 , year=

work page internal anchor Pith review arXiv
[54]

arXiv preprint arXiv:2501.13944

Fanar: An arabic-centric multimodal generative ai platform , author=. arXiv preprint arXiv:2501.13944 , year=

work page arXiv
[55]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[56]

2024 , publisher =

CohereLabs , title =. 2024 , publisher =

2024
[57]

Proceedings of the 31st International Conference on Computational Linguistics , pages=

MQM-APE: toward high-quality error annotation predictors with automatic post-editing in LLM translation evaluators , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=. 2025 , url=

2025
[58]

ArXiv , year=

GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4 , author=. ArXiv , year=
[59]

Transactions of the Association for Computational Linguistics , year=

xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection , author=. Transactions of the Association for Computational Linguistics , year=
[60]

ArXiv , year=

Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost , author=. ArXiv , year=
[61]

ArXiv , year=

Test-Time Scaling of Reasoning Models for Machine Translation , author=. ArXiv , year=
[62]

ArXiv , year=

Please Translate Again: Two Simple Experiments on Whether Human-Like Reasoning Helps Translation , author=. ArXiv , year=
[63]

Talafha, Bashar and Kadaoui, Karima and Magdy, Samar Mohamed and Habiboullah, Mariem and Chafei, Chafei Mohamed and El-Shangiti, Ahmed Oumar and Zayed, Hiba and Tourad, Mohamedou Cheikh and Alhamouri, Rahaf and Assi, Rwaa and Alraeesi, Aisha and Mohamed, Hour and Alwajih, Fakhraddin and Mohamed, Abdelrahman and El Mekki, Abdellah and Nagoudi, El Moatez Bi...

work page doi:10.18653/v1/2024.emnlp-main.1211 2024
[64]

URL https://aclanthology.org/ 2024.arabicnlp-1.79

Abdul-Mageed, Muhammad and Keleg, Amr and Elmadany, AbdelRahim and Zhang, Chiyu and Hamed, Injy and Magdy, Walid and Bouamor, Houda and Habash, Nizar. NADI 2024: The Fifth Nuanced A rabic Dialect Identification Shared Task. Proceedings of the Second Arabic Natural Language Processing Conference. 2024. doi:10.18653/v1/2024.arabicnlp-1.79

work page doi:10.18653/v1/2024.arabicnlp-1.79 2024
[65]

A ra T 5- MSA izer: Translating Dialectal A rabic to MSA

Fares, Murhaf. A ra T 5- MSA izer: Translating Dialectal A rabic to MSA. Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024. 2024

2024
[66]

Journal of Pragmatics , volume=

Im/politeness across Englishes , author=. Journal of Pragmatics , volume=. 2012 , publisher=

2012
[67]

1992 , publisher=

The other tongue: English across cultures , author=. 1992 , publisher=

1992
[68]

Proceedings of the Tenth Conference on Machine Translation , year=

Tagged Span Annotation for Detecting Translation Errors in Reasoning LLMs , author=. Proceedings of the Tenth Conference on Machine Translation , year=
[69]

The Bilingualism Reader , year=

Diglossia , author=. The Bilingualism Reader , year=
[70]

chr F ++: words helping character n-grams

Popovi \'c , Maja. chr F ++: words helping character n-grams. Proceedings of the Second Conference on Machine Translation. 2017. doi:10.18653/v1/W17-4770

work page doi:10.18653/v1/w17-4770 2017
[71]

B leu: a method for automatic evaluation of machine translation

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , title =. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , pages =. 2002 , publisher =. doi:10.3115/1073083.1073135 , abstract =

work page doi:10.3115/1073083.1073135 2002
[72]

The F lores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

Goyal, Naman and Gao, Cynthia and Chaudhary, Vishrav and Chen, Peng-Jen and Wenzek, Guillaume and Ju, Da and Krishnan, Sanjana and Ranzato, Marc ' Aurelio and Guzm \'a n, Francisco and Fan, Angela. The F lores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation. Transactions of the Association for Computational Linguistics. 2022...

work page doi:10.1162/tacl_a_00474 2022
[73]

and Lavie, Alon , booktitle =

Rei, Ricardo and Stewart, Craig and Farinha, Ana C and Lavie, Alon. COMET : A Neural Framework for MT Evaluation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.213

work page doi:10.18653/v1/2020.emnlp-main.213 2020
[74]

Sociolinguistics , author=

On communicative competence. Sociolinguistics , author=. Eds. Pride, JB y J. Holmes , pages=. 1972 , url=

1972
[75]

1978 , publisher=

Language as social semiotic , author=. 1978 , publisher=

1978
[76]

1975 , publisher=

How to do things with words , author=. 1975 , publisher=

1975
[77]

1969 , publisher=

Speech acts: An essay in the philosophy of language , author=. 1969 , publisher=

1969
[78]

arXiv preprint arXiv:2211.12000 , year=

ArzEn-ST: A three-way speech translation corpus for code-switched Egyptian Arabic-English , author=. arXiv preprint arXiv:2211.12000 , year=

work page arXiv
[79]

Language , volume=

Idioms , author=. Language , volume=. 1994 , publisher=

1994
[80]

JAWAHER: A multidialectal dataset of Arabic proverbs for LLM benchmarking , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025

Showing first 80 references.