pith. machine review for the scientific record. sign in

arxiv: 2605.12933 · v1 · submitted 2026-05-13 · 💻 cs.CL

Recognition: unknown

ATD-Trans: A Geographically Grounded Japanese-English Travelogue Translation Dataset

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:25 UTC · model grok-4.3

classification 💻 cs.CL
keywords machine translationtravelogue datasetgeographic entitiesJapanese-Englishentity-level evaluationdomestic vs overseasgeo-text
0
0 comments X

The pith

A new travelogue dataset shows machine translation is harder for domestic Japanese locations than overseas ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ATD-Trans, a Japanese-English translation dataset built from travel blogs and annotated with geographic entity spans. It splits evaluation into overall sentence quality and specific entity accuracy while separating places inside Japan from those abroad. Experiments compare several language models and find that versions with Japanese language enhancements produce stronger results. The same tests also show domestic entities are translated less accurately than overseas ones across all models. This matters because geographic text appears in tourism, mapping, and local information systems where place-name errors reduce usefulness for speakers of other languages.

Core claim

The ATD-Trans dataset supplies sentence-aligned Japanese-English travelogue pairs together with geographic-entity annotations that distinguish domestic from overseas locations, enabling separate measurement of translation performance on geo-rich text and demonstrating that Japanese-enhanced models hold an advantage while domestic-region entities remain harder to translate correctly.

What carries the argument

The ATD-Trans dataset of travelogue sentence pairs with geographic entity annotations split by domestic versus overseas region.

Load-bearing premise

The geographic entity annotations are accurate and the selected travelogues represent typical geo-text encountered in real applications.

What would settle it

Independent re-annotation of the same travelogues that changes a substantial fraction of entity labels, or fresh experiments on a new collection of travelogues that find no accuracy gap between domestic and overseas entities.

Figures

Figures reproduced from arXiv: 2605.12933 by Atsushi Fujita, Hiroki Ouchi, Masao Utiyama, Shohei Higashiyama.

Figure 1
Figure 1. Figure 1: A fragment of Japanese–English parallel text [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
read the original abstract

Geographic text, or textual data rich in geographic (geo-) information is a valuable source for various geographic applications, e.g., tourism management. Making such information accessible to speakers of other languages further enhances its utility; thus, accurate machine translation (MT) is essential for equity in multilingual geo-information access. To facilitate in-depth analysis for geographic text, we introduce ATD-Trans, a geographically grounded Japanese--English travelogue translation dataset, which enables evaluation of MT quality at both the overall and geo-entity levels across domestic (within Japan) and overseas regions. Our experiments on existing language models examine two factors: model language focus and geographic regions. The results highlight advantages of Japanese-enhanced models and greater difficulty in translating domestic-region geo-entities mentioned in travel blogs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper introduces ATD-Trans, a new Japanese-English parallel dataset of travelogues with geographic entity annotations that distinguishes domestic (Japan) and overseas locations. It reports experiments evaluating existing MT models at both sentence and geo-entity levels, claiming advantages for Japanese-enhanced models and greater translation difficulty for domestic-region entities.

Significance. If the entity annotations are shown to be reliable, the dataset supplies a much-needed resource for geo-specific MT evaluation in Japanese, supporting applications in tourism and multilingual geo-information access. The entity-level breakdown offers a finer-grained view of translation challenges than standard corpora provide, and the domestic/overseas split enables targeted analysis that could inform model development for location-rich text.

major comments (3)
  1. [§3] §3 (Dataset Construction): the geo-entity annotation process is described at a high level but reports neither inter-annotator agreement scores nor any external validation or held-out check set; this directly undermines the reliability of the per-entity difficulty comparison between domestic and overseas regions presented in §5.
  2. [§5.2] §5.2 and Table 3: the headline result that domestic geo-entities are harder to translate is computed from entity-level metrics that presuppose correct identification and classification of every geo-entity; without quantitative annotation reliability data, systematic boundary or labeling errors could inflate or deflate the reported domestic-overseas gap.
  3. [§4.1] §4.1: the selection criteria and sampling procedure for the travelogues are not detailed enough to confirm that the corpus represents typical geo-text encountered in real applications, which is required to generalize the model-comparison findings beyond the collected sample.
minor comments (3)
  1. [Abstract] The abstract and §2 should explicitly state the primary automatic metric (e.g., BLEU, COMET) used for both sentence-level and entity-level evaluation.
  2. [Figure 1] Figure 1 and the accompanying text would benefit from clearer indication of how compound place names are segmented during annotation.
  3. [§2] A few citations to prior geo-entity recognition or MT work on Japanese are missing or could be expanded for context.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the strengths and limitations of our ATD-Trans dataset and experiments. We address each major comment below and will incorporate revisions to improve the manuscript's transparency on annotation quality and data sampling.

read point-by-point responses
  1. Referee: [§3] §3 (Dataset Construction): the geo-entity annotation process is described at a high level but reports neither inter-annotator agreement scores nor any external validation or held-out check set; this directly undermines the reliability of the per-entity difficulty comparison between domestic and overseas regions presented in §5.

    Authors: We agree that quantitative reliability measures were omitted from the original submission. In the revised manuscript we will add inter-annotator agreement scores computed on a double-annotated subset, together with a description of the annotation guidelines, annotator expertise, and any post-annotation validation steps performed. These additions will directly support the credibility of the domestic/overseas entity comparisons in §5. revision: yes

  2. Referee: [§5.2] §5.2 and Table 3: the headline result that domestic geo-entities are harder to translate is computed from entity-level metrics that presuppose correct identification and classification of every geo-entity; without quantitative annotation reliability data, systematic boundary or labeling errors could inflate or deflate the reported domestic-overseas gap.

    Authors: This observation is valid and stems from the same gap identified in §3. By reporting inter-annotator agreement and validation details in the revised §3, we will demonstrate that annotation errors are unlikely to systematically bias the entity-level metrics or the domestic-overseas gap shown in §5.2 and Table 3. We will also add a brief limitations paragraph discussing residual annotation uncertainty. revision: yes

  3. Referee: [§4.1] §4.1: the selection criteria and sampling procedure for the travelogues are not detailed enough to confirm that the corpus represents typical geo-text encountered in real applications, which is required to generalize the model-comparison findings beyond the collected sample.

    Authors: We accept that the original description of data collection was insufficient for assessing representativeness. In the revision we will expand §4.1 to specify the travel-blog platforms used, the exact selection criteria (minimum length, presence of geo-entities, Japanese/English bilingual content), and the sampling procedure (random vs. stratified). These details will allow readers to evaluate how well the corpus reflects typical tourism-related geographic text. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset creation and empirical evaluation on existing models

full rationale

The paper introduces the ATD-Trans dataset via manual annotation of travelogues for geo-entities and evaluates pre-existing language models on translation quality at sentence and entity levels. No derivation chain, equations, fitted parameters, or first-principles predictions are claimed. All reported results are direct measurements on the newly created data and off-the-shelf models; nothing reduces to its own inputs by construction. Self-citations, if present, are not load-bearing for any central claim. This matches the reader's assessment of score 1.0 with no mathematical reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution rests on the assumption that travelogues contain clearly identifiable geographic entities suitable for consistent annotation and that the collected texts are representative of real-world geo-text usage.

axioms (1)
  • domain assumption Geographic entities in travelogues can be reliably identified and annotated across languages.
    The dataset construction and geo-entity level evaluation depend on this annotation quality.

pith-pipeline@v0.9.0 · 5437 in / 1096 out tokens · 43920 ms · 2026-05-14T20:25:37.011967+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 23 canonical work pages · 4 internal anchors

  1. [1]

    2015 , publisher =

    Geographic Information Science: GIS Standard , author=. 2015 , publisher =

  2. [2]

    ISO 17100:2015 Translation services--- R equirements for translation services

    ISO/TC37. ISO 17100:2015 Translation services--- R equirements for translation services. 2015

  3. [3]

    OpenStreetMap contributors , title =

  4. [4]

    2020 , note="

    Honnibal, Matthew and Montani, Ines and Van Landeghem, Sofie and Boyd, Adriane , title =. 2020 , note="

  5. [5]

    doi:2 , url =

    Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang...

  6. [6]

    2022 , eprint=

    Location reference recognition from texts: A survey and comparison , author=. 2022 , eprint=

  7. [7]

    Ouchi, Hiroki and Shindo, Hiroyuki and Wakamiya, Shoko and Matsuda, Yuki and Inoue, Naoya and Higashiyama, Shohei and Nakamura, Satoshi and Watanabe, Taro , eprint=

  8. [8]

    Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation

    Higashiyama, Shohei and Ouchi, Hiroki and Teranishi, Hiroki and Otomo, Hiroyuki and Ide, Yusuke and Yamamoto, Aitaro and Shindo, Hiroyuki and Matsuda, Yuki and Wakamiya, Shoko and Inoue, Naoya and Yamada, Ikuya and Watanabe, Taro. Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation. Findings of the Association for...

  9. [9]

    A Model-Theoretic Coreference Scoring Scheme

    Vilain, Marc and Burger, John and Aberdeen, John and Connolly, Dennis and Hirschman, Lynette. A Model-Theoretic Coreference Scoring Scheme. MUC -6. 1995

  10. [10]

    Workshop on Linguistics Coreference , volume=

    Algorithms for scoring coreference chains , author=. Workshop on Linguistics Coreference , volume=

  11. [11]

    On Coreference Resolution Performance Metrics

    Luo, Xiaoqiang. On Coreference Resolution Performance Metrics. HLT-EMNLP. 2005

  12. [12]

    C o NLL -2012 Shared Task: Modeling Multilingual Unrestricted Coreference in O nto N otes

    Pradhan, Sameer and Moschitti, Alessandro and Xue, Nianwen and Uryupina, Olga and Zhang, Yuchen. C o NLL -2012 Shared Task: Modeling Multilingual Unrestricted Coreference in O nto N otes. EMNLP - C o NLL Shared Task. 2012

  13. [14]

    B leu: a method for automatic evaluation of machine translation

    Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing. B leu: a Method for Automatic Evaluation of Machine Translation. ACL. 2002. doi:10.3115/1073083.1073135

  14. [16]

    de Souza, Jos \'e G

    Rei, Ricardo and C. de Souza, Jos \'e G. and Alves, Duarte and Zerva, Chrysoula and Farinha, Ana C and Glushkova, Taisiya and Lavie, Alon and Coheur, Luisa and Martins, Andr \'e F. T. COMET -22: Unbabel- IST 2022 Submission for the Metrics Shared Task. Proceedings of the 7th Conference on Machine Translation (WMT). 2022

  15. [19]

    G eospa C y: A tool for extraction and geographical referencing of spatial expressions in textual data

    Mehtab Alam, Syed and Arsevska, Elena and Roche, Mathieu and Teisseire, Maguelonne. G eospa C y: A tool for extraction and geographical referencing of spatial expressions in textual data. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. 2024

  16. [20]

    2019 , booktitle =

    Matsuda, Hiroshi and Omura, Mai and Asahara, Masayuki , title =. 2019 , booktitle =

  17. [21]

    and Kaiser,

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention is all you need , year =

  18. [22]

    Continual Pre-Training for Cross-Lingual

    Kazuki Fujii and Taishi Nakamura and Mengsay Loem and Hiroki Iida and Masanari Ohi and Kakeru Hattori and Hirai Shota and Sakae Mizuki and Rio Yokota and Naoaki Okazaki , booktitle=. Continual Pre-Training for Cross-Lingual. 2024 , url=

  19. [23]

    The 8th International Conference on Learning Representations

    Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov , booktitle="The 8th International Conference on Learning Representations", pages=. 2020 , url=

  20. [25]

    m LUKE : T he Power of Entity Representations in Multilingual Pretrained Language Models

    Ri, Ryokan and Yamada, Ikuya and Tsuruoka, Yoshimasa. m LUKE : T he Power of Entity Representations in Multilingual Pretrained Language Models. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022

  21. [26]

    2023 , eprint=

    Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=

  22. [28]

    The Llama 3 Herd of Models , author=

  23. [29]

    JP ara C rawl v3.0: A Large-scale E nglish- J apanese Parallel Corpus

    Morishita, Makoto and Chousa, Katsuki and Suzuki, Jun and Nagata, Masaaki. JP ara C rawl v3.0: A Large-scale E nglish- J apanese Parallel Corpus. Proceedings of the Thirteenth Language Resources and Evaluation Conference. 2022

  24. [30]

    No Language Left Behind: Scaling Human-Centered Machine Translation

    No Language Left Behind: Scaling Human-Centered Machine Translation , author=. 2207.04672 , archivePrefix=

  25. [31]

    2024 , eprint=

    Gemma 2: Improving Open Language Models at a Practical Size , author=. 2024 , eprint=

  26. [32]

    Cross-lingual Name Tagging and Linking for 282 Languages

    Pan, Xiaoman and Zhang, Boliang and May, Jonathan and Nothman, Joel and Knight, Kevin and Ji, Heng. Cross-lingual Name Tagging and Linking for 282 Languages. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1178

  27. [33]

    and Shan, Zifei and Gillick, Daniel

    Botha, Jan A. and Shan, Zifei and Gillick, Daniel. Entity linking in 100 languages. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020. doi:10.18653/v1/2020.emnlp-main.630

  28. [34]

    Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input

    Masis, Tessa and O ' Connor, Brendan. Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input. Proceedings of the Sixth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS 2024). 2024. doi:10.18653/v1/2024.nlpcss-1.7

  29. [35]

    Automatic Construction of a Large-Scale Corpus for Geoparsing Using W ikipedia Hyperlinks

    Ohno, Keyaki and Kameko, Hirotaka and Shirai, Keisuke and Nishimura, Taichi and Mori, Shinsuke. Automatic Construction of a Large-Scale Corpus for Geoparsing Using W ikipedia Hyperlinks. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024

  30. [37]

    2015 , eprint=

    Multi-lingual Geoparsing based on Machine Translation , author=. 2015 , eprint=

  31. [38]

    Language Resources and Evaluation , volume=

    A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics , author=. Language Resources and Evaluation , volume=. 2020 , publisher=

  32. [39]

    S em E val-2025 Task 2: Entity-Aware Machine Translation

    Conia, Simone and Li, Min and Navigli, Roberto and Potdar, Saloni. S em E val-2025 Task 2: Entity-Aware Machine Translation. Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025). 2025

  33. [42]

    A Text Embedding Model with Contrastive Example Mining for Point-of-Interest Geocoding

    Nakatani, Hibiki and Teranishi, Hiroki and Higashiyama, Shohei and Sawada, Yuya and Ouchi, Hiroki and Watanabe, Taro. A Text Embedding Model with Contrastive Example Mining for Point-of-Interest Geocoding. Proceedings of the 31st International Conference on Computational Linguistics. 2025

  34. [46]

    Yasushi Asami, Keiji Yano, Yukio Sadahiro, and Minori Yuda. 2015. https://www.kokon.co.jp/book/b196056.html Geographic Information Science: GIS Standard . Kokon Shoin. (in Japanese)

  35. [47]

    Xu Chen, Han Zhang, and Judith Gelernter. 2015. https://arxiv.org/abs/1511.01974 Multi-lingual geoparsing based on machine translation . Preprint, arXiv:1511.01974

  36. [48]

    Simone Conia, Min Li, Roberto Navigli, and Saloni Potdar. 2025. https://aclanthology.org/2025.semeval-1.326/ S em E val-2025 task 2: Entity-aware machine translation . In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 2535--2557, Vienna, Austria. Association for Computational Linguistics

  37. [49]

    Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzm \'a n, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. https://doi.org/10.18653/v1/2020.acl-main.747 Unsupervised cross-lingual representation learning at scale . In Proceedings of the 58th Annual Meeting of the Association for Comp...

  38. [50]

    Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, and Naoaki Okazaki. 2024. https://openreview.net/forum?id=TQdd1VhWbe Continual pre-training for cross-lingual LLM adaptation: Enhancing Japanese language capabilities . In Proceedings of the 1st Conference on Language Modeling, pa...

  39. [51]

    Judith Gelernter and Wei Zhang. 2013. https://doi.org/10.1145/2533888.2533943 Cross-lingual geo-parsing for non-structured data . In Proceedings of the 7th Workshop on Geographic Information Retrieval, GIR '13, page 64–71, New York, NY, USA. Association for Computing Machinery

  40. [52]

    Gemma Team , Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, and 179 others. 2024. https://arxiv.org/abs/2408.00118 Gemma 2:...

  41. [53]

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, and 542 others. 2024. https://arxiv.org/abs/2407.21783 The Llama 3...

  42. [54]

    Milan Gritta, Mohammad Taher Pilehvar, and Nigel Collier. 2020. https://doi.org/10.1007/s10579-019-09475-3 A pragmatic guide to geoparsing evaluation: Toponyms, named entity recognition and pragmatics . Language Resources and Evaluation, 54:683--712

  43. [55]

    Shohei Higashiyama, Hiroki Ouchi, Hiroki Teranishi, Hiroyuki Otomo, Yusuke Ide, Aitaro Yamamoto, Hiroyuki Shindo, Yuki Matsuda, Shoko Wakamiya, Naoya Inoue, Ikuya Yamada, and Taro Watanabe. 2024. https://aclanthology.org/2024.findings-eacl.35 Arukikata travelogue dataset with geographic entity mention, coreference, and link annotation . In Findings of the...

  44. [56]

    Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. spaCy : Industrial-strength natural language processing in Python . https://www.doi.org/10.5281/zenodo.1212303

  45. [57]

    Xuke Hu, Zhiyong Zhou, Hao Li, Yingjie Hu, Fuqiang Gu, Jens Kersten, Hongchao Fan, and Friederike Klan. 2022. https://arxiv.org/abs/2207.01683 Location reference recognition from texts: A survey and comparison . Preprint, arXiv:2207.01683

  46. [58]

    ISO/TC37 . 2015. ISO 17100:2015 translation services--- R equirements for translation services. https://www.iso.org/standard/59149.html

  47. [59]

    Linghao Jin, Jacqueline He, Jonathan May, and Xuezhe Ma. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.943 Challenges in context-aware neural machine translation . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15246--15263, Singapore. Association for Computational Linguistics

  48. [60]

    Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, and Luke Zettlemoyer. 2020 a . https://doi.org/10.1162/tacl_a_00343 Multilingual denoising pre-training for neural machine translation . Transactions of the Association for Computational Linguistics, 8:726--742

  49. [61]

    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2020 b . https://openreview.net/forum?id=SyxS0T4tvS RoBERTa : A robustly optimized BERT pretraining approach . In The 8th International Conference on Learning Representations, pages 1--15

  50. [62]

    Hiroshi Matsuda, Mai Omura, and Masayuki Asahara. 2019. http://www.anlp.jp/proceedings/annual_meeting/2019/pdf_dir/F2-3.pdf Tantan'i hinshi-no y\=oh\=o aimais\=e kaiketsu-to ison kank\=e labeling-no d\=oji gakush\=u ( Simultaneous learning of usage disambiguation of parts-of-speech for short unit words and dependency relation labeling.) . In Proceedings o...

  51. [63]

    Nafise Sadat Moosavi and Michael Strube. 2016. https://doi.org/10.18653/v1/P16-1060 Which coreference evaluation metric do you trust? A proposal for a link-based entity aware metric . In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 632--642, Berlin, Germany. Association for Computat...

  52. [64]

    Hibiki Nakatani, Hiroki Teranishi, Shohei Higashiyama, Yuya Sawada, Hiroki Ouchi, and Taro Watanabe. 2025. https://aclanthology.org/2025.coling-main.486/ A text embedding model with contrastive example mining for point-of-interest geocoding . In Proceedings of the 31st International Conference on Computational Linguistics, pages 7279--7291, Abu Dhabi, UAE...

  53. [65]

    Hiroki Ouchi, Hiroyuki Shindo, Shoko Wakamiya, Yuki Matsuda, Naoya Inoue, Shohei Higashiyama, Satoshi Nakamura, and Taro Watanabe. 2023. https://arxiv.org/abs/2305.11444 NAIST academic travelogue dataset . Preprint, arXiv:2305.11444

  54. [66]

    Matt Post. 2018. https://doi.org/10.18653/v1/W18-6319 A call for clarity in reporting BLEU scores . In Proceedings of the 3rd Conference on Machine Translation: Research Papers, pages 186--191, Brussels, Belgium. Association for Computational Linguistics

  55. [67]

    Ricardo Rei, Jos \'e G. C. de Souza, Duarte Alves, Chrysoula Zerva, Ana C Farinha, Taisiya Glushkova, Alon Lavie, Luisa Coheur, and Andr \'e F. T. Martins. 2022. https://aclanthology.org/2022.wmt-1.52 COMET -22: Unbabel- IST 2022 submission for the metrics shared task . In Proceedings of the 7th Conference on Machine Translation (WMT), pages 578--585, Abu...

  56. [68]

    Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. 2022. https://aclanthology.org/2022.acl-long.505 m LUKE : T he power of entity representations in multilingual pretrained language models . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland. Association for Computational Linguistics

  57. [69]

    Keonwoo Roh, Yeong-Joon Ju, and Seong-Whan Lee. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1466 XLQA : A benchmark for locale-aware multilingual open-domain question answering . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 28809--28821, Suzhou, China. Association for Computational Linguistics

  58. [70]

    Kirill Semenov, Xu Huang, Vil \'e m Zouhar, Nathaniel Berger, Dawei Zhu, Arturo Oncevay, and Pinzhen Chen. 2025. https://doi.org/10.18653/v1/2025.wmt-1.30 Findings of the WMT 25 terminology translation task: Terminology is useful especially for good MT s . In Proceedings of the 10th Conference on Machine Translation, pages 554--576, Suzhou, China. Associa...

  59. [71]

    Kirill Semenov, Vil \'e m Zouhar, Tom Kocmi, Dongdong Zhang, Wangchunshu Zhou, and Yuchen Eleanor Jiang. 2023. https://doi.org/10.18653/v1/2023.wmt-1.54 Findings of the WMT 2023 shared task on machine translation with terminologies . In Proceedings of the Eighth Conference on Machine Translation, pages 663--671, Singapore. Association for Computational Li...

  60. [72]

    Shivalika Singh, Angelika Romanou, Cl \'e mentine Fourrier, David Ifeoluwa Adelani, Jian Gang Ngui, Daniel Vila-Suero, Peerat Limkonchotiwat, Kelly Marchisio, Wei Qi Leong, Yosephine Susanto, Raymond Ng, Shayne Longpre, Sebastian Ruder, Wei-Yin Ko, Antoine Bosselut, Alice Oh, Andre Martins, Leshem Choshen, Daphne Ippolito, and 4 others. 2025. https://doi....

  61. [73]

    Longyue Wang, Chenyang Lyu, Tianbo Ji, Zhirui Zhang, Dian Yu, Shuming Shi, and Zhaopeng Tu. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.1036 Document-level machine translation with large language models . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16646--16661, Singapore. Association for Computat...