Recognition: unknown
LQM: Linguistically Motivated Multidimensional Quality Metrics for Machine Translation
Pith reviewed 2026-05-10 04:09 UTC · model grok-4.3
The pith
LQM introduces a six-level hierarchical taxonomy to diagnose machine translation errors through linguistic categories such as sociolinguistics and pragmatics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LQM is a hierarchical error taxonomy organized around six linguistically motivated levels—sociolinguistics, pragmatics, semantics, morphosyntax, orthography, and graphetics—that allows systematic diagnosis of MT failures arising from variety mismatches, content gaps, and appropriateness issues rather than surface form alone, as shown by expert annotation of over 6,000 error spans in a new bidirectional Arabic dialect corpus.
What carries the argument
The LQM hierarchical error taxonomy with its six linguistically grounded levels, which structures span-level error labeling and severity scoring for MT output.
Load-bearing premise
The six levels together capture the main dialect- and culture-specific translation errors and that expert span annotations with this scheme are reliable enough to outperform existing MQM approaches.
What would settle it
An independent annotation study on the same corpus in which LQM labels show no stronger correlation with overall human quality ratings than standard MQM categories or automatic metrics like spBLEU.
Figures
read the original abstract
Existing MT evaluation frameworks, including automatic metrics and human evaluation schemes such as Multidimensional Quality Metrics (MQM), are largely language-agnostic. However, they often fail to capture dialect- and culture-specific errors in diglossic languages (e.g., Arabic), where translation failures stem from mismatches in language variety, content coverage, and pragmatic appropriateness rather than surface form alone.We introduce LQM: Linguistically Motivated Multidimensional Quality Metrics for MT. LQM is a hierarchical error taxonomy for diagnosing MT errors through six linguistically grounded levels: sociolinguistics, pragmatics, semantics, morphosyntax, orthography, and graphetics (Figure 1). We construct a bidirectional parallel corpus of 3,850 sentences (550 per variety) spanning seven Arabic dialects (Egyptian, Emirati, Jordanian, Mauritanian, Moroccan, Palestinian, and Yemeni), derived from conversational, culturally rich content. We evaluate six LLMs in a zero-shot setting and conduct expert span-level human annotation using LQM, producing 6,113 labeled error spans across 3,495 unique erroneous sentences, along with severity-weighted quality scores. We complement this analysis with an automatic metric (spBLEU). Though validated here on Arabic, LQM is a language-agnostic framework designed to be easily applied to or adapted for other languages. LQM annotated errors data, prompts, and annotation guidelines are publicly available at https://github.com/UBC-NLP/LQM_MT.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces LQM, a hierarchical error taxonomy with six linguistically grounded levels (sociolinguistics, pragmatics, semantics, morphosyntax, orthography, graphetics) for diagnosing MT errors, with emphasis on dialect- and culture-specific issues in diglossic languages such as Arabic. It describes construction of a bidirectional parallel corpus of 3,850 sentences (550 per variety) across seven Arabic dialects from conversational content, zero-shot evaluation of six LLMs, expert span-level annotation producing 6,113 labeled error spans on 3,495 sentences, severity-weighted quality scores, and an automatic spBLEU baseline. The framework is positioned as language-agnostic, with data, prompts, and guidelines released publicly.
Significance. If the taxonomy is shown to be comprehensive and the annotations reliable, LQM could meaningfully advance MT evaluation by supplying finer-grained, linguistically motivated diagnostics that address gaps in language-agnostic schemes such as MQM for low-resource and dialectal settings. The public release of the annotated corpus, annotation guidelines, and prompts is a clear strength that supports reproducibility and extension to other languages.
major comments (3)
- [§3 and §4] §3 (Annotation procedure) and §4 (Results): No inter-annotator agreement statistics (Cohen's kappa, percentage overlap, or similar) are reported for the expert span-level annotations that produced the 6,113 labeled spans. This is load-bearing for the central claim that LQM enables reliable, reproducible diagnoses superior to MQM.
- [§4 and §5] §4 (Results) and §5 (Discussion): No head-to-head comparison of error distributions, severity scores, or actionable insights between LQM and MQM annotations on the same 3,495 sentences is provided, leaving the asserted practical superiority of the six-level taxonomy unsupported by direct evidence.
- [§2] §2 (LQM Taxonomy, Figure 1): The claim that the six levels comprehensively capture all relevant dialect- and culture-specific errors in diglossic languages is stated axiomatically without coverage analysis, examples of uncovered error types, or validation against an independent error inventory.
minor comments (3)
- [Abstract and §3] The abstract states a corpus of 3,850 sentences while results refer to 3,495 unique erroneous sentences; clarify the relationship and any filtering criteria in the corpus construction section.
- [§4] Table or figure presenting error counts per level and per dialect would strengthen the diagnostic claims; consider adding one if not already present.
- [§6] The language-agnostic framing would benefit from a short subsection outlining concrete adaptation steps for a non-Arabic language.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which highlights important aspects for strengthening the reliability and comparative claims of LQM. We address each major comment below with planned revisions where appropriate, while maintaining the manuscript's focus on the linguistically motivated taxonomy for dialectal MT evaluation.
read point-by-point responses
-
Referee: [§3 and §4] §3 (Annotation procedure) and §4 (Results): No inter-annotator agreement statistics (Cohen's kappa, percentage overlap, or similar) are reported for the expert span-level annotations that produced the 6,113 labeled spans. This is load-bearing for the central claim that LQM enables reliable, reproducible diagnoses superior to MQM.
Authors: We agree that inter-annotator agreement statistics are essential to substantiate the reliability of the expert annotations. The original manuscript omitted these details, as the annotations were conducted by trained linguists using standardized guidelines. In the revised version, we will add a dedicated subsection to §3 describing the annotation protocol and report agreement metrics (e.g., Cohen's kappa for category assignment and span overlap F1) computed on a double-annotated subset of sentences. This addition will directly support the reproducibility claims. revision: yes
-
Referee: [§4 and §5] §4 (Results) and §5 (Discussion): No head-to-head comparison of error distributions, severity scores, or actionable insights between LQM and MQM annotations on the same 3,495 sentences is provided, leaving the asserted practical superiority of the six-level taxonomy unsupported by direct evidence.
Authors: A full quantitative head-to-head comparison would require re-annotating the corpus with MQM, which exceeds the scope and resources of the current study. To address the concern, we will revise §5 to include a qualitative comparison using concrete examples from the Arabic data, illustrating error categories (particularly sociolinguistic and pragmatic) that LQM diagnoses more explicitly than MQM. We will also add an explicit limitations paragraph noting the absence of quantitative comparison and outlining it as future work. revision: partial
-
Referee: [§2] §2 (LQM Taxonomy, Figure 1): The claim that the six levels comprehensively capture all relevant dialect- and culture-specific errors in diglossic languages is stated axiomatically without coverage analysis, examples of uncovered error types, or validation against an independent error inventory.
Authors: The taxonomy was constructed from linguistic theory and refined iteratively against observed errors in the multi-dialect corpus. We will expand §2 in the revision to include: (i) a description of the iterative development process, (ii) concrete examples of dialect- and culture-specific errors mapped to each level, and (iii) a brief mapping to established MT error inventories (e.g., MQM and others) to demonstrate coverage. Any gaps identified during annotation will be explicitly noted as potential areas for extension. revision: yes
Circularity Check
No significant circularity; LQM is a novel taxonomy applied to independent data
full rationale
The paper introduces LQM as a new hierarchical error taxonomy defined by six linguistically grounded levels and applies it to a newly constructed bidirectional parallel corpus of 3,850 sentences across seven Arabic dialects. It reports expert span-level annotations producing 6,113 labeled error spans and severity-weighted scores, supplemented by spBLEU. No equations, fitted parameters called predictions, self-citations that are load-bearing, or reductions of claims to inputs by construction appear in the provided text. The framework is presented as independently defined and language-agnostic, then used on external data, rendering the derivation self-contained.
Axiom & Free-Parameter Ledger
free parameters (2)
- Number of Arabic dialects covered
- Sentences per variety
axioms (2)
- domain assumption Expert human annotators can consistently and accurately apply the six-level LQM taxonomy to identify MT errors.
- ad hoc to paper The six levels (sociolinguistics, pragmatics, semantics, morphosyntax, orthography, graphetics) are sufficient to diagnose all relevant dialect- and culture-specific errors.
invented entities (1)
-
LQM six-level hierarchical error taxonomy
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Aho and Jeffrey D
Alfred V. Aho and Jeffrey D. Ullman , title =. 1972
1972
-
[2]
Publications Manual , year = "1983", publisher =
1983
-
[3]
Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243
-
[4]
Scalable training of
Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
-
[5]
Dan Gusfield , title =. 1997
1997
-
[6]
Tetreault , title =
Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =
2015
-
[7]
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =
Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
-
[8]
Conference on Machine Translation , year=
The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation , author=. Conference on Machine Translation , year=
-
[9]
ArXiv , year=
INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback , author=. ArXiv , year=
-
[10]
arXiv preprint arXiv:2208.05309 , year=
Looking for a needle in a haystack: A comprehensive study of hallucinations in neural machine translation , author=. arXiv preprint arXiv:2208.05309 , year=
-
[11]
North American Chapter of the Association for Computational Linguistics , year=
Statistical Phrase-Based Post-Editing , author=. North American Chapter of the Association for Computational Linguistics , year=
-
[12]
Conference on Machine Translation , year=
Findings of the WMT 2023 Shared Task on Automatic Post-Editing , author=. Conference on Machine Translation , year=
2023
-
[13]
ArXiv , year=
Tower: An Open Multilingual Large Language Model for Translation-Related Tasks , author=. ArXiv , year=
-
[14]
Conference on Empirical Methods in Natural Language Processing , year=
Leveraging GPT-4 for Automatic Translation Post-Editing , author=. Conference on Empirical Methods in Natural Language Processing , year=
-
[15]
NAACL-HLT , year=
Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations , author=. NAACL-HLT , year=
-
[16]
NAACL-HLT , year=
LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback , author=. NAACL-HLT , year=
-
[17]
Conference on Empirical Methods in Natural Language Processing , year=
xTower: A Multilingual LLM for Explaining and Correcting Translation Errors , author=. Conference on Empirical Methods in Natural Language Processing , year=
-
[18]
Proceedings of ArabicNLP 2023 , pages=
TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties , author=. Proceedings of ArabicNLP 2023 , pages=
2023
-
[19]
Open source software available from https://github
Label studio: Data labeling software , author=. Open source software available from https://github. com/heartexlabs/label-studio , volume=
-
[20]
arXiv preprint arXiv:2403.11430 , year=
A novel paradigm boosting translation capabilities of large language models , author=. arXiv preprint arXiv:2403.11430 , year=
-
[21]
ArXiv , year=
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model , author=. ArXiv , year=
-
[22]
European Association for Machine Translation Conferences/Workshops , year=
Aligning Neural Machine Translation Models: Human Feedback in Training and Inference , author=. European Association for Machine Translation Conferences/Workshops , year=
-
[23]
ArXiv , year=
Reinforced Self-Training (ReST) for Language Modeling , author=. ArXiv , year=
-
[24]
ArXiv , year=
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation , author=. ArXiv , year=
-
[25]
EXPLAINABLE CED: A Dataset for Explainable Critical Error Detection in Machine Translation , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop) , pages=. 2024 , url=
2024
-
[26]
Proceedings of ACL 2025 (to appear) , year=
TranslationCorrect: A Human-Centered Post-Editing Framework for Error-Aware Machine Translation , author=. Proceedings of ACL 2025 (to appear) , year=
2025
-
[27]
Findings of the Association for Computational Linguistics: NAACL 2025 , pages=
Tear: Improving llm-based machine translation with systematic self-refinement , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=. 2025 , url=
2025
-
[28]
arXiv preprint arXiv:2405.12534 , year=
Fine-tuning Arabic LLMs for Lebanese Dialect Translation and Evaluation , author=. arXiv preprint arXiv:2405.12534 , year=
-
[29]
Kadaoui, Karima and Magdy, Samar M. and Waheed, Abdul and Khondaker, Md Tawkat Islam and El-Shangiti, Ahmed Oumar and Nagoudi, El Moatez Billah and Abdul-Mageed, Muhammad. TARJAMAT : Evaluation of Bard and C hat GPT on Machine Translation of Ten A rabic Varieties. Proceedings of ArabicNLP 2023. 2023. doi:10.18653/v1/2023.arabicnlp-1.6
-
[30]
Proceedings of The Second Arabic Natural Language Processing Conference , pages=
Arabic train at NADI 2024 shared task: LLMs’ ability to translate Arabic dialects into Modern Standard Arabic , author=. Proceedings of The Second Arabic Natural Language Processing Conference , pages=. 2024 , url=
2024
-
[31]
OSACT 2024 task 2: Arabic dialect to MSA translation , author=. Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation@ LREC-COLING 2024 , pages=. 2024 , url=
2024
-
[32]
arXiv preprint arXiv:2406.19482 , year=
xTOWER: Multilingual Translation Error Explanation and Correction with Large Language Models , author=. arXiv preprint arXiv:2406.19482 , year=
-
[33]
The Twelfth International Conference on Learning Representations , year=
Sparse moe with language guided routing for multilingual machine translation , author=. The Twelfth International Conference on Learning Representations , year=
-
[34]
Applied Sciences , volume=
Domain Adaptation for Arabic Machine Translation: Financial Texts as a Case Study , author=. Applied Sciences , volume=. 2024 , publisher=
2024
-
[35]
Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018) , year=
The MADAR Arabic dialect corpus and lexicon , author=. Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018) , year=
2018
-
[36]
, author=
Conventional orthography for dialectal Arabic. , author=. LREC , pages=. 2012 , url=
2012
-
[37]
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , year=
Machine Translation of Arabic Dialects , author=. Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , year=
2012
-
[38]
A ra B ench: Benchmarking Dialectal A rabic- E nglish Machine Translation
Sajjad, Hassan and Abdelali, Ahmed and Durrani, Nadir and Dalvi, Fahim. A ra B ench: Benchmarking Dialectal A rabic- E nglish Machine Translation. Proceedings of the 28th International Conference on Computational Linguistics. 2020. doi:10.18653/v1/2020.coling-main.447
-
[39]
Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) , year=
ArBERT and MARBERT: Deep Bidirectional Transformers for Arabic , author=. Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) , year=
-
[40]
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) , year=
Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics , author=. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) , year=
-
[41]
Machine Translation , year=
Quantitative fine-grained human evaluation of machine translation systems: a case study on English to Croatian , author=. Machine Translation , year=
-
[42]
ArXiv , year=
How Well Do Large Reasoning Models Translate? A Comprehensive Evaluation for Multi-Domain Machine Translation , author=. ArXiv , year=
-
[43]
ArXiv , year=
New Trends for Modern Machine Translation with Large Reasoning Models , author=. ArXiv , year=
-
[44]
Findings of the Association for Computational Linguistics: ACL 2025 , pages=
Drt: Deep reasoning translation via long chain-of-thought , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=. 2025 , url=
2025
-
[45]
ArXiv , year=
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning , author=. ArXiv , year=
-
[46]
arXiv preprint arXiv:2504.10187 , year=
Deep reasoning translation via reinforcement learning , author=. arXiv preprint arXiv:2504.10187 , year=
-
[47]
ArXiv , year=
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning , author=. ArXiv , year=
-
[48]
ArXiv , year=
MAATS: A Multi-Agent Automated Translation System Based on MQM Evaluation , author=. ArXiv , year=
-
[49]
ArXiv , year=
Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation , author=. ArXiv , year=
-
[50]
European Association for Machine Translation Conferences/Workshops , year=
Using a new analytic measure for the annotation and analysis of MT errors on real data , author=. European Association for Machine Translation Conferences/Workshops , year=
-
[51]
Transactions of the Association for Computational Linguistics , year=
Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation , author=. Transactions of the Association for Computational Linguistics , year=
-
[52]
European Association for Machine Translation Conferences/Workshops , year=
Correct Me If You Can: Learning from Error Corrections and Markings , author=. European Association for Machine Translation Conferences/Workshops , year=
-
[53]
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2: Improving open language models at a practical size , author=. arXiv preprint arXiv:2408.00118 , year=
work page internal anchor Pith review arXiv
-
[54]
arXiv preprint arXiv:2501.13944
Fanar: An arabic-centric multimodal generative ai platform , author=. arXiv preprint arXiv:2501.13944 , year=
-
[55]
Gemini: A Family of Highly Capable Multimodal Models
Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[56]
2024 , publisher =
CohereLabs , title =. 2024 , publisher =
2024
-
[57]
Proceedings of the 31st International Conference on Computational Linguistics , pages=
MQM-APE: toward high-quality error annotation predictors with automatic post-editing in LLM translation evaluators , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=. 2025 , url=
2025
-
[58]
ArXiv , year=
GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4 , author=. ArXiv , year=
-
[59]
Transactions of the Association for Computational Linguistics , year=
xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection , author=. Transactions of the Association for Computational Linguistics , year=
-
[60]
ArXiv , year=
Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost , author=. ArXiv , year=
-
[61]
ArXiv , year=
Test-Time Scaling of Reasoning Models for Machine Translation , author=. ArXiv , year=
-
[62]
ArXiv , year=
Please Translate Again: Two Simple Experiments on Whether Human-Like Reasoning Helps Translation , author=. ArXiv , year=
-
[63]
Talafha, Bashar and Kadaoui, Karima and Magdy, Samar Mohamed and Habiboullah, Mariem and Chafei, Chafei Mohamed and El-Shangiti, Ahmed Oumar and Zayed, Hiba and Tourad, Mohamedou Cheikh and Alhamouri, Rahaf and Assi, Rwaa and Alraeesi, Aisha and Mohamed, Hour and Alwajih, Fakhraddin and Mohamed, Abdelrahman and El Mekki, Abdellah and Nagoudi, El Moatez Bi...
-
[64]
URL https://aclanthology.org/ 2024.arabicnlp-1.79
Abdul-Mageed, Muhammad and Keleg, Amr and Elmadany, AbdelRahim and Zhang, Chiyu and Hamed, Injy and Magdy, Walid and Bouamor, Houda and Habash, Nizar. NADI 2024: The Fifth Nuanced A rabic Dialect Identification Shared Task. Proceedings of the Second Arabic Natural Language Processing Conference. 2024. doi:10.18653/v1/2024.arabicnlp-1.79
-
[65]
A ra T 5- MSA izer: Translating Dialectal A rabic to MSA
Fares, Murhaf. A ra T 5- MSA izer: Translating Dialectal A rabic to MSA. Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024. 2024
2024
-
[66]
Journal of Pragmatics , volume=
Im/politeness across Englishes , author=. Journal of Pragmatics , volume=. 2012 , publisher=
2012
-
[67]
1992 , publisher=
The other tongue: English across cultures , author=. 1992 , publisher=
1992
-
[68]
Proceedings of the Tenth Conference on Machine Translation , year=
Tagged Span Annotation for Detecting Translation Errors in Reasoning LLMs , author=. Proceedings of the Tenth Conference on Machine Translation , year=
-
[69]
The Bilingualism Reader , year=
Diglossia , author=. The Bilingualism Reader , year=
-
[70]
chr F ++: words helping character n-grams
Popovi \'c , Maja. chr F ++: words helping character n-grams. Proceedings of the Second Conference on Machine Translation. 2017. doi:10.18653/v1/W17-4770
-
[71]
B leu: a method for automatic evaluation of machine translation
Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , title =. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , pages =. 2002 , publisher =. doi:10.3115/1073083.1073135 , abstract =
-
[72]
The F lores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
Goyal, Naman and Gao, Cynthia and Chaudhary, Vishrav and Chen, Peng-Jen and Wenzek, Guillaume and Ju, Da and Krishnan, Sanjana and Ranzato, Marc ' Aurelio and Guzm \'a n, Francisco and Fan, Angela. The F lores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation. Transactions of the Association for Computational Linguistics. 2022...
-
[73]
Rei, Ricardo and Stewart, Craig and Farinha, Ana C and Lavie, Alon. COMET : A Neural Framework for MT Evaluation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.213
-
[74]
Sociolinguistics , author=
On communicative competence. Sociolinguistics , author=. Eds. Pride, JB y J. Holmes , pages=. 1972 , url=
1972
-
[75]
1978 , publisher=
Language as social semiotic , author=. 1978 , publisher=
1978
-
[76]
1975 , publisher=
How to do things with words , author=. 1975 , publisher=
1975
-
[77]
1969 , publisher=
Speech acts: An essay in the philosophy of language , author=. 1969 , publisher=
1969
-
[78]
arXiv preprint arXiv:2211.12000 , year=
ArzEn-ST: A three-way speech translation corpus for code-switched Egyptian Arabic-English , author=. arXiv preprint arXiv:2211.12000 , year=
-
[79]
Language , volume=
Idioms , author=. Language , volume=. 1994 , publisher=
1994
-
[80]
JAWAHER: A multidialectal dataset of Arabic proverbs for LLM benchmarking , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.