Translating the Untranslatable: An Operationalizable Ontology for Untranslatability

Brihi Joshi; Hirona Arai; Jacob Bremerman; Jonathan May; Xiang Ren

arxiv: 2606.17354 · v1 · pith:FUEWJVFXnew · submitted 2026-06-15 · 💻 cs.CL · cs.AI

Translating the Untranslatable: An Operationalizable Ontology for Untranslatability

Jacob Bremerman , Brihi Joshi , Hirona Arai , Xiang Ren , Jonathan May This is my paper

Pith reviewed 2026-06-27 02:49 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords untranslatabilitymachine translationontologycompensation strategiesNLP datasethuman evaluationmultilingual

0 comments

The pith

A structured ontology of untranslatability plus a taxonomy of compensation strategies lets researchers build and test a dataset of cases where direct translation fails.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a framework that breaks down situations in which meaning cannot transfer directly between languages and pairs them with specific techniques translators use to recover that meaning. It turns the framework into a multilingual collection of example sentences and their strategy-labeled translations. Human raters then show consistent differences in perceived quality across the strategies. This setup shifts analysis of machine translation from overall accuracy scores to targeted examination of how systems handle irreducible gaps. If the categories prove workable, future models could be trained or evaluated on explicit strategy selection rather than one-to-one equivalence.

Core claim

We introduce a structured ontology of untranslatability along with a taxonomy of compensation strategies, which are specific techniques to convey meaning under these untranslatable circumstances. We operationalize this framework into a multilingual dataset of untranslatable sentences paired with strategy-based translations, enabling controlled analysis of translation behavior. Initial human preference studies suggest that translation quality depends on the strategy used, with consistent preferences for outputs that include explanatory context, known as the Annotation compensation strategy.

What carries the argument

Ontology of untranslatability paired with taxonomy of compensation strategies, turned into a dataset of paired sentences for controlled comparison.

If this is right

Machine translation evaluation can move from aggregate BLEU or COMET scores to per-strategy performance on untranslatable inputs.
Systems could be fine-tuned to detect untranslatability type and then select an appropriate compensation method.
The dataset supplies training signals for models that output not only a translation but also the strategy it employed.
Cross-lingual consistency of human preferences can be measured by applying the same taxonomy to new language pairs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same categories could be used to audit existing MT outputs for over-reliance on literal renderings that lose key meaning.
Literary or legal translation workflows might adopt the taxonomy to decide when to add annotation versus other adjustments.
If Annotation remains preferred, MT interfaces could surface explanatory notes automatically rather than forcing a single target sentence.

Load-bearing premise

The chosen ontology and taxonomy categories are assumed to be sufficiently complete and non-overlapping to support controlled analysis and generalizable human preference findings across languages.

What would settle it

A follow-up study in which raters show no reliable preference differences across the listed strategies or in which many real sentences fit multiple taxonomy categories equally well.

Figures

Figures reproduced from arXiv: 2606.17354 by Brihi Joshi, Hirona Arai, Jacob Bremerman, Jonathan May, Xiang Ren.

**Figure 2.** Figure 2: A visualization of the iterative process for generating the untranslatable sentences. Human experts produce [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Ontology of uTypes: Refer to Section 3 for more details on definitions and examples. a unifying framework. Our work bridges this gap by introducing a structured ontology of untranslatability and instantiating it in a dataset that enables systematic study within NLP. 3 A Framework for Untranslatability To study untranslatability systematically in the context of NLP, it is necessary to move beyond informa… view at source ↗

**Figure 5.** Figure 5: Change in MRR for each compensation strat [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Mean Reciprocal Rank for each compensation strategy based on source language. Adaptation is much more preferred by Spanish-speakers than Japanese-speakers. relatively well for Spanish but is the least preferred strategy for Japanese. This suggests that adaptation is more difficult when structural and cultural differences between languages are larger, making it harder to preserve meaning through modifica… view at source ↗

**Figure 7.** Figure 7: Visualization of the breakdown of the dataset. We generated 18,200 English translations in total. [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: UI for Translation Preference Ranking task shown to bilingual annotators. [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: UI for context-based preference evaluation of translations. Users are shown a paragraph that describes the [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Results for General Context Spanish Preference Rankings. Annotaion is the most preferred strategy on [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Mean Reciprocal Rank for each Compensation Strategy based on Translation Context. We note similar trends with Annotation still winning out for all contexts and Borrowing consistently rated as the worst. Model Accuracy Zero-shot 0.704 With examples 0.761 [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

read the original abstract

Untranslatability, cases where meaning cannot be directly preserved across languages, is well-studied in linguistics but underexplored in NLP. As machine translation (MT) systems improve on standard benchmarks, their limitations increasingly concentrate in such cases, where translation cannot be reduced to one-to-one equivalence. We introduce a structured ontology of untranslatability along with a taxonomy of compensation strategies, which are specific techniques to convey meaning under these untranslatable circumstances. We operationalize this framework into a multilingual dataset of untranslatable sentences paired with strategy-based translations, enabling controlled analysis of translation behavior. Initial human preference studies suggest that translation quality depends on the strategy used, with consistent preferences for outputs that include explanatory context, known as the Annotation compensation strategy. Our framework and dataset provide a foundation for studying and modeling strategy-informed machine translation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper introduces a new ontology, taxonomy, and dataset for untranslatability in machine translation, which is a useful operationalization even if the supporting study details are sparse.

read the letter

The main thing to know is that the authors have created an ontology of untranslatability and a taxonomy of compensation strategies, then built a multilingual dataset pairing untranslatable sentences with translations using those strategies. This setup supports controlled analysis of how translations handle difficult cases.

It does a solid job of making linguistic ideas about untranslatability practical for NLP work. The dataset is new and the preference studies suggest that including explanatory context is favored by humans.

The soft spots are around validation. The categories are presented without checks for completeness against other linguistic resources or measures of overlap between them. The abstract gives no dataset size, agreement scores, or baseline comparisons, so the claim that quality depends on the strategy is hard to evaluate fully. The preference for annotation could be an artifact of the dataset construction.

This is for MT researchers interested in the cases that current benchmarks miss. A reader in that area would get value from the framework and data.

I would recommend sending it for peer review. The new artifacts make it worth the referees' time, provided the authors can add the missing methodological details.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce a structured ontology of untranslatability along with a taxonomy of compensation strategies. It operationalizes the framework into a multilingual dataset of untranslatable sentences paired with strategy-based translations, enabling controlled analysis of translation behavior. Initial human preference studies are reported to indicate that translation quality depends on the strategy used, with consistent preferences for the Annotation compensation strategy that includes explanatory context. The work positions the ontology, taxonomy, and dataset as a foundation for studying and modeling strategy-informed machine translation.

Significance. If the ontology proves comprehensive and the preference findings generalize, the work could meaningfully advance MT research by shifting focus from standard equivalence benchmarks to structured handling of untranslatable cases. The operationalization into a dataset is a positive step toward falsifiable, controlled experiments in the area.

major comments (2)

The abstract states that the ontology and taxonomy 'enable controlled analysis' and support 'generalizable human preference findings,' yet provides no validation of category completeness or disjointness (e.g., coverage against established linguistic inventories of untranslatability or quantitative overlap metrics). This assumption is load-bearing for the central claim that the framework supports controlled analysis rather than artifactual results from dataset partitioning.
[Abstract] The abstract reports that 'initial human preference studies suggest' a consistent preference for the Annotation strategy but supplies no dataset size, number of annotators, inter-annotator agreement, error bars, statistical tests, or baseline comparisons. These omissions prevent evaluation of whether the preference result is robust enough to support the operationalization claim.

minor comments (1)

[Abstract] The abstract would be clearer if it briefly indicated the number of languages or language families covered by the multilingual dataset.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on the abstract's claims and the need for more rigorous reporting. We address each major comment below and commit to revisions that strengthen the manuscript without overstating current results.

read point-by-point responses

Referee: The abstract states that the ontology and taxonomy 'enable controlled analysis' and support 'generalizable human preference findings,' yet provides no validation of category completeness or disjointness (e.g., coverage against established linguistic inventories of untranslatability or quantitative overlap metrics). This assumption is load-bearing for the central claim that the framework supports controlled analysis rather than artifactual results from dataset partitioning.

Authors: We agree this is a substantive point. The ontology was constructed through a systematic review of linguistic literature on untranslatability phenomena, with categories intended to be mutually exclusive and exhaustive based on that synthesis. However, the manuscript does not include quantitative validation such as overlap metrics or coverage against all established inventories. We will revise the abstract to moderate the language around 'controlled analysis' and 'generalizable findings,' add an explicit limitations subsection discussing potential gaps or overlaps, and include a brief qualitative mapping to key linguistic references to better ground the framework. revision: yes
Referee: [Abstract] The abstract reports that 'initial human preference studies suggest' a consistent preference for the Annotation strategy but supplies no dataset size, number of annotators, inter-annotator agreement, error bars, statistical tests, or baseline comparisons. These omissions prevent evaluation of whether the preference result is robust enough to support the operationalization claim.

Authors: The full manuscript reports the human preference study details (including sentence counts, annotator numbers, agreement metrics, and statistical comparisons) in the experiments section. We acknowledge that the abstract is overly concise and omits these elements, which weakens the summary of the operationalization claim. We will revise the abstract to incorporate key quantitative details such as study scale and agreement levels while remaining within length constraints. revision: yes

Circularity Check

0 steps flagged

New ontology and taxonomy presented as explicit construction with no derivation chain or self-citation reduction

full rationale

The paper introduces its ontology of untranslatability and taxonomy of compensation strategies as a new framework ('We introduce a structured ontology... along with a taxonomy... We operationalize this framework into a multilingual dataset'). No equations, fitted parameters, or predictions are described. No load-bearing self-citations or uniqueness theorems from prior author work are invoked to justify the categories. The construction is self-contained against external benchmarks in the sense that it does not claim to derive its categories from data fits or prior results that would reduce by construction. This matches the expected non-circular case for a definitional framework paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the domain assumption that untranslatability can be taxonomized into discrete, actionable categories and that human preference for compensation strategies is a stable signal. No free parameters or invented physical entities are introduced.

axioms (1)

domain assumption Untranslatability admits a finite, non-overlapping taxonomy of types and compensation strategies that can be operationalized into sentence-level annotations.
Invoked in the construction of the ontology and dataset; no justification or validation against alternative taxonomies is provided in the abstract.

pith-pipeline@v0.9.1-grok · 5679 in / 1253 out tokens · 34536 ms · 2026-06-27T02:49:18.553476+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 6 canonical work pages

[1]

Machine Translation Robustness to Natural Asemantic Variation

Bremerman, Jacob and Ren, Xiang and May, Jonathan. Machine Translation Robustness to Natural Asemantic Variation. 2022. doi:10.18653/v1/2022.emnlp-main.230

work page doi:10.18653/v1/2022.emnlp-main.230 2022
[2]

Untranslatability and the Method of Compensation

Jingjing Cui. Untranslatability and the Method of Compensation. Theory and Practice in Language Studies, Vol. 2, No. 4. 2012

2012
[3]

(Un)translatability of a text: a blessing in disguise? The case of Spanish, English and Polish

Puchała-Ladzińska, Karolina. (Un)translatability of a text: a blessing in disguise? The case of Spanish, English and Polish. Studia Anglica Resoviensia T. 20. 2023

2023
[4]

Translation Journal , volume=

Cultural untranslatability , author=. Translation Journal , volume=
[5]

Humanities science current issues , volume=

The problem of untranslatability: challenges and strategies for solving translation difficulties , author=. Humanities science current issues , volume=
[6]

Detecting the Untranslatable Colloquial Expressions of J apanese Verbs in Cross-Language Instant Messaging

Cheng, Yuchang and Fuji, Masaru and Nagase, Tomoki and Uegaki, Minoru and Okada, Isaac. Detecting the Untranslatable Colloquial Expressions of J apanese Verbs in Cross-Language Instant Messaging. Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing. 2014

2014
[7]

Neural Poetry Translation

Ghazvininejad, Marjan and Choi, Yejin and Knight, Kevin. Neural Poetry Translation. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018. doi:10.18653/v1/N18-2011

work page doi:10.18653/v1/n18-2011 2018
[8]

Examining the Tip of the Iceberg: A Data Set for Idiom Translation

Fadaee, Marzieh and Bisazza, Arianna and Monz, Christof. Examining the Tip of the Iceberg: A Data Set for Idiom Translation. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018

2018
[9]

Controlling Politeness in Neural Machine Translation via Side Constraints

Sennrich, Rico and Haddow, Barry and Birch, Alexandra. Controlling Politeness in Neural Machine Translation via Side Constraints. Proceedings of the 2016 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016. doi:10.18653/v1/N16-1005

work page doi:10.18653/v1/n16-1005 2016
[10]

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Zhu, Wenhao and Liu, Hongyi and Dong, Qingxiu and Xu, Jingjing and Huang, Shujian and Kong, Lingpeng and Chen, Jiajun and Li, Lei. Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis. Findings of the Association for Computational Linguistics: NAACL 2024. 2024. doi:10.18653/v1/2024.findings-naacl.176

work page doi:10.18653/v1/2024.findings-naacl.176 2024
[11]

Synthetic Dialogue Dataset Generation using LLM Agents

Abdullin, Yelaman and Molla, Diego and Ofoghi, Bahadorreza and Yearwood, John and Li, Qingyang. Synthetic Dialogue Dataset Generation using LLM Agents. Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM). 2023

2023
[12]

2024 , eprint=

GPT-4o System Card , author=. 2024 , eprint=

2024
[13]

2024 , eprint=

Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author=. 2024 , eprint=

2024
[14]

Annual Meeting of the Association for Computational Linguistics , year=

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model , author=. Annual Meeting of the Association for Computational Linguistics , year=
[15]

An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance

Khanuja, Simran and Ramamoorthy, Sathyanarayanan and Song, Yueqi and Neubig, Graham. An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.573

work page doi:10.18653/v1/2024.emnlp-main.573 2024
[16]

2024 , eprint=

GPT-4 Technical Report , author=. 2024 , eprint=

2024
[17]

, publisher =

Field, Andy P. , publisher =. Kendall's Coefficient of Concordance , booktitle =. doi:https://doi.org/10.1002/0470013192.bsa327 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/0470013192.bsa327 , year =

work page doi:10.1002/0470013192.bsa327

[1] [1]

Machine Translation Robustness to Natural Asemantic Variation

Bremerman, Jacob and Ren, Xiang and May, Jonathan. Machine Translation Robustness to Natural Asemantic Variation. 2022. doi:10.18653/v1/2022.emnlp-main.230

work page doi:10.18653/v1/2022.emnlp-main.230 2022

[2] [2]

Untranslatability and the Method of Compensation

Jingjing Cui. Untranslatability and the Method of Compensation. Theory and Practice in Language Studies, Vol. 2, No. 4. 2012

2012

[3] [3]

(Un)translatability of a text: a blessing in disguise? The case of Spanish, English and Polish

Puchała-Ladzińska, Karolina. (Un)translatability of a text: a blessing in disguise? The case of Spanish, English and Polish. Studia Anglica Resoviensia T. 20. 2023

2023

[4] [4]

Translation Journal , volume=

Cultural untranslatability , author=. Translation Journal , volume=

[5] [5]

Humanities science current issues , volume=

The problem of untranslatability: challenges and strategies for solving translation difficulties , author=. Humanities science current issues , volume=

[6] [6]

Detecting the Untranslatable Colloquial Expressions of J apanese Verbs in Cross-Language Instant Messaging

Cheng, Yuchang and Fuji, Masaru and Nagase, Tomoki and Uegaki, Minoru and Okada, Isaac. Detecting the Untranslatable Colloquial Expressions of J apanese Verbs in Cross-Language Instant Messaging. Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing. 2014

2014

[7] [7]

Neural Poetry Translation

Ghazvininejad, Marjan and Choi, Yejin and Knight, Kevin. Neural Poetry Translation. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018. doi:10.18653/v1/N18-2011

work page doi:10.18653/v1/n18-2011 2018

[8] [8]

Examining the Tip of the Iceberg: A Data Set for Idiom Translation

Fadaee, Marzieh and Bisazza, Arianna and Monz, Christof. Examining the Tip of the Iceberg: A Data Set for Idiom Translation. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018

2018

[9] [9]

Controlling Politeness in Neural Machine Translation via Side Constraints

Sennrich, Rico and Haddow, Barry and Birch, Alexandra. Controlling Politeness in Neural Machine Translation via Side Constraints. Proceedings of the 2016 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016. doi:10.18653/v1/N16-1005

work page doi:10.18653/v1/n16-1005 2016

[10] [10]

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Zhu, Wenhao and Liu, Hongyi and Dong, Qingxiu and Xu, Jingjing and Huang, Shujian and Kong, Lingpeng and Chen, Jiajun and Li, Lei. Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis. Findings of the Association for Computational Linguistics: NAACL 2024. 2024. doi:10.18653/v1/2024.findings-naacl.176

work page doi:10.18653/v1/2024.findings-naacl.176 2024

[11] [11]

Synthetic Dialogue Dataset Generation using LLM Agents

Abdullin, Yelaman and Molla, Diego and Ofoghi, Bahadorreza and Yearwood, John and Li, Qingyang. Synthetic Dialogue Dataset Generation using LLM Agents. Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM). 2023

2023

[12] [12]

2024 , eprint=

GPT-4o System Card , author=. 2024 , eprint=

2024

[13] [13]

2024 , eprint=

Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author=. 2024 , eprint=

2024

[14] [14]

Annual Meeting of the Association for Computational Linguistics , year=

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model , author=. Annual Meeting of the Association for Computational Linguistics , year=

[15] [15]

An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance

Khanuja, Simran and Ramamoorthy, Sathyanarayanan and Song, Yueqi and Neubig, Graham. An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.573

work page doi:10.18653/v1/2024.emnlp-main.573 2024

[16] [16]

2024 , eprint=

GPT-4 Technical Report , author=. 2024 , eprint=

2024

[17] [17]

, publisher =

Field, Andy P. , publisher =. Kendall's Coefficient of Concordance , booktitle =. doi:https://doi.org/10.1002/0470013192.bsa327 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/0470013192.bsa327 , year =

work page doi:10.1002/0470013192.bsa327