Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective

Carlos Escolano; Ernesto Garcia-Estrada; Jos\'e A. R. Fonallosa

arxiv: 2605.15976 · v1 · pith:RCQJTJEOnew · submitted 2026-05-15 · 💻 cs.CL · cs.AI

Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective

Ernesto Garcia-Estrada , Carlos Escolano , Jos\'e A. R. Fonallosa This is my paper

Pith reviewed 2026-05-20 17:53 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords machine translationreinforcement learningreference-freeGRPONLLB-200Seq2Seqlow-resource MTpolicy optimization

0 comments

The pith

Reinforcement learning with reference-free rewards improves Seq2Seq machine translation across 13 languages without parallel data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that Group Relative Policy Optimization can be used to fine-tune encoder-decoder models such as NLLB-200 for machine translation. It relies on a hybrid reward combining LaBSE and COMET-Kiwi scores that needs no parallel data or references during the process. This leads to consistent quality gains on every one of the 13 languages examined, with the biggest boosts reaching +5.03 chrF++ on Traditional Chinese. The improvements are especially notable on morphologically complex languages where the method rivals three epochs of supervised fine-tuning even without target-language data. A reader would care because this offers a practical route to better translation systems in settings where parallel data is hard to obtain.

Core claim

We apply Group Relative Policy Optimization to NLLB-200 (600M and 1.3B) using a hybrid reference-free reward (LaBSE and COMET-Kiwi) that requires no parallel data at fine-tuning time, evaluating across 13 typologically diverse languages. GRPO yields consistent improvements on all 13 languages, up to +5.03 chrF++ for Traditional Chinese, and, without any target-language data, competes with 3-epoch supervised fine-tuning on morphologically complex languages. We identify a consistent empirical pattern in which gains are largest where baseline performance is weakest and reward discriminability is highest, making this approach most effective precisely where parallel data is scarcest, and replicat

What carries the argument

Group Relative Policy Optimization (GRPO) driven by a hybrid LaBSE and COMET-Kiwi reward signal. It enables relative comparisons among candidate translations to update the policy of the encoder-decoder model without reference translations.

If this is right

Consistent quality improvements occur on all 13 tested languages.
Gains are largest on languages with the weakest baseline performance.
The method competes with supervised fine-tuning for morphologically complex languages without target data.
The same pattern appears when translating from English and from Spanish.
Both the 600M and 1.3B parameter NLLB models benefit from the approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Reference-free RL could be tried on other encoder-decoder architectures for translation or related generation tasks.
It suggests RL fine-tuning is viable for mid-sized Seq2Seq models rather than only very large decoder-only systems.
Developers working on low-resource languages might adopt this to improve systems where collecting parallel data is costly.
Testing the approach with even smaller models or additional language pairs would clarify its scalability.

Load-bearing premise

The hybrid reference-free reward from LaBSE and COMET-Kiwi provides an accurate and unbiased signal of translation quality to guide effective policy optimization across typologically diverse languages.

What would settle it

Running human evaluations or an independent quality metric on the outputs of the GRPO-tuned models to check whether the reported chrF++ gains correspond to actual improvements in translation quality.

read the original abstract

Production machine translation relies overwhelmingly on encoder-decoder Seq2Seq models, yet reinforcement learning approaches to MT fine-tuning have largely targeted decoder-only LLMs at $\geq$7B parameters, with limited systematic study of encoder-decoder architectures. We apply Group Relative Policy Optimization to NLLB-200 (600M and 1.3B) using a hybrid reference-free reward (LaBSE and COMET-Kiwi) that requires no parallel data at fine-tuning time, evaluating across 13 typologically diverse languages. GRPO yields consistent improvements on all 13 languages, up to $+$5.03 chrF++ for Traditional Chinese, and, without any target-language data, competes with 3-epoch supervised fine-tuning on morphologically complex languages . We identify a consistent empirical pattern in which gains are largest where baseline performance is weakest and reward discriminability is highest, making this approach most effective precisely where parallel data is scarcest, and replicate this pattern across English and Spanish source languages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GRPO on NLLB with a LaBSE+COMET-Kiwi reward produces consistent chrF++ gains on 13 languages without parallel data, but the claims need checks on reward bias and run variance.

read the letter

The main thing to know is that this paper applies Group Relative Policy Optimization to NLLB-200 encoder-decoder models using a hybrid reference-free reward from LaBSE and COMET-Kiwi, reports consistent improvements across 13 typologically diverse languages, and shows the largest gains where the baseline was weakest, sometimes matching three epochs of supervised fine-tuning without any target-language data at fine-tuning time. They test both the 600M and 1.3B versions and replicate the pattern for English and Spanish sources. What is actually new is the systematic extension of this RL method to standard Seq2Seq MT architectures rather than the decoder-only LLMs that have gotten most of the recent attention. The work does a solid job laying out the empirical pattern that gains track baseline weakness and reward discriminability, which lines up with the practical need in low-resource settings. Credit for running the experiments on a useful range of languages and for keeping the focus on production-relevant models. The soft spots are around the strength of the supporting evidence. The headline numbers are specific, yet the abstract-level description leaves out run-to-run variance, statistical tests, and exact data splits, so it is difficult to judge stability. The hybrid reward is the load-bearing piece, and known language-pair and morphological biases in LaBSE and COMET-Kiwi could be contributing to the observed pattern; without per-language reward-human correlations or component ablations the improvements might partly reflect reward artifacts rather than genuine quality lifts. This paper is for researchers working on low-resource machine translation and on RL fine-tuning outside the LLM setting. A reader who needs practical ways to boost performance when parallel data is scarce will find the setup and the reported pattern useful. It has enough of a distinct angle and empirical scope that it deserves a serious referee to examine the methods, the reward validation, and the result details. I would recommend sending it out for peer review rather than a desk reject.

Referee Report

3 major / 1 minor

Summary. The paper applies Group Relative Policy Optimization (GRPO) to fine-tune NLLB-200 encoder-decoder Seq2Seq models (600M and 1.3B) for machine translation using a hybrid reference-free reward (LaBSE + COMET-Kiwi) that requires no parallel data during fine-tuning. It evaluates across 13 typologically diverse languages from English and Spanish sources, claiming consistent chrF++ gains on all languages (up to +5.03 for Traditional Chinese) that are largest where baselines are weakest, and competitive performance with 3-epoch supervised fine-tuning on morphologically complex languages without target-language data.

Significance. If the empirical results hold under rigorous validation, the work would be significant for low-resource MT by showing that RL fine-tuning with reference-free rewards can improve production-style encoder-decoder models without target data, extending RL techniques beyond decoder-only LLMs. The reported pattern tying gains to baseline weakness and reward discriminability, if reproducible, offers a practical insight for prioritizing such methods where parallel data is scarcest.

major comments (3)

[Abstract] Abstract: the headline claims of consistent improvements across all 13 languages and competition with supervised fine-tuning rest on reported chrF++ deltas (e.g., +5.03 for Traditional Chinese), yet no statistical significance tests, run-to-run variance, or exact data splits are described, leaving the robustness of these gains unassessable.
[Evaluation] Evaluation and reward design: the central assumption that the hybrid LaBSE + COMET-Kiwi reward supplies an accurate, unbiased optimization signal across typologically diverse and morphologically complex languages is load-bearing for the claim of genuine quality gains rather than reward artifacts; no per-language reward-human correlations, ablation removing one component, or bias analysis is provided despite known language-pair and morphological biases in these metrics.
[Results] Results: the pattern that gains are largest where baseline performance is weakest is presented as an empirical finding, but without explicit tables or controls showing that this is not an artifact of the reward's discriminability correlating with the evaluation metric, the interpretation that the method is 'most effective precisely where parallel data is scarcest' remains under-supported.

minor comments (1)

[Abstract] The abstract and results would benefit from a brief statement of the exact number of languages, source-target pairs, and whether the 13 languages include both high- and low-resource cases to clarify the scope.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We address each of the major comments below and outline the revisions we plan to make to strengthen the manuscript's robustness and clarity.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claims of consistent improvements across all 13 languages and competition with supervised fine-tuning rest on reported chrF++ deltas (e.g., +5.03 for Traditional Chinese), yet no statistical significance tests, run-to-run variance, or exact data splits are described, leaving the robustness of these gains unassessable.

Authors: We agree that providing statistical significance and variance estimates would enhance the reliability of our reported improvements. In the revised manuscript, we will rerun experiments with multiple random seeds (e.g., 3-5 runs) to report average chrF++ scores with standard deviations. We will also include statistical significance tests, such as paired bootstrap resampling or Wilcoxon signed-rank tests, to assess the significance of the gains over baselines. For data splits, we utilized the publicly available FLORES-200 development and test sets for all languages, which we will explicitly document in the experimental setup section. revision: yes
Referee: [Evaluation] Evaluation and reward design: the central assumption that the hybrid LaBSE + COMET-Kiwi reward supplies an accurate, unbiased optimization signal across typologically diverse and morphologically complex languages is load-bearing for the claim of genuine quality gains rather than reward artifacts; no per-language reward-human correlations, ablation removing one component, or bias analysis is provided despite known language-pair and morphological biases in these metrics.

Authors: This is a valid concern regarding the potential for reward hacking or metric biases. We will incorporate an ablation study in the revised paper that evaluates the contribution of each reward component (LaBSE alone, COMET-Kiwi alone, and the hybrid) across the language pairs. Additionally, we will add a discussion of known biases in LaBSE and COMET-Kiwi, particularly for morphologically complex languages, and how our hybrid approach aims to mitigate them. However, performing new per-language human correlation studies would require substantial additional resources and human annotations not available in the current experimental setup; we will acknowledge this as a limitation and suggest it for future work. revision: partial
Referee: [Results] Results: the pattern that gains are largest where baseline performance is weakest is presented as an empirical finding, but without explicit tables or controls showing that this is not an artifact of the reward's discriminability correlating with the evaluation metric, the interpretation that the method is 'most effective precisely where parallel data is scarcest' remains under-supported.

Authors: To better support this interpretation, we will add new figures and tables in the results section of the revised manuscript. Specifically, we will compute and report the discriminability of the reward (measured as the standard deviation of reward scores on sampled outputs or the margin between high and low reward translations) for each language and correlate it with both baseline performance and observed gains. We will also include a control analysis showing the correlation between the reward model and chrF++ on held-out data to demonstrate that the pattern is not merely an artifact. This will provide stronger evidence for the claim. revision: yes

Circularity Check

0 steps flagged

Empirical evaluation on held-out test sets with no reduction of gains to fitted parameters

full rationale

The paper reports empirical improvements from applying GRPO to NLLB-200 models using a hybrid LaBSE+COMET-Kiwi reward, measured via chrF++ on standard held-out test sets across 13 languages. No equations or derivation steps are presented that reduce the observed gains to quantities defined by parameters fitted within the paper itself. The central claims rest on independent test-set comparisons rather than any self-definitional or fitted-input construction, rendering the results self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach depends on the pre-trained quality of LaBSE and COMET-Kiwi as reward models and on the assumption that RL optimization will improve downstream chrF++ without introducing new biases; no free parameters are explicitly listed in the abstract.

axioms (1)

domain assumption LaBSE and COMET-Kiwi together form a reliable reference-free proxy for human-judged translation quality across typologically diverse languages.
This assumption is required for the reward signal to guide useful policy updates.

pith-pipeline@v0.9.0 · 5714 in / 1235 out tokens · 78645 ms · 2026-05-20T17:53:50.554706+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 3 internal anchors

[1]

Machine Learning , volume =

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , author =. Machine Learning , volume =. 1992 , publisher =

work page 1992
[2]

arXiv preprint arXiv:2601.12535 , year =

Improving Low-Resource Machine Translation via Round-Trip Reinforcement Learning , author =. arXiv preprint arXiv:2601.12535 , year =

work page arXiv
[3]

Proceedings of the International Conference on Learning Representations (ICLR) , year =

Sequence Level Training with Recurrent Neural Networks , author =. Proceedings of the International Conference on Learning Representations (ICLR) , year =

work page
[4]

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

Minimum Risk Training for Neural Machine Translation , author =. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

work page
[5]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , author =. arXiv preprint arXiv:1609.08144 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Training Language Models to Follow Instructions with Human Feedback , author =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[7]

Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms , author =. arXiv preprint arXiv:1707.06347 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Proceedings of NAACL , year =

Accurate Evaluation of Segment-level Machine Translation Metrics , author =. Proceedings of NAACL , year =

work page
[9]

Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxian and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, Y. K. and Wu, Y. and others , journal =

work page
[10]

Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxian and Zhang, Ruoyu and Xu, Runxin and Zhu, Qihao and Ma, Shirong and Wang, Peiyi and Bi, Xiao and others , journal =

work page
[11]

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

Bandit Structured Prediction for Neural Sequence-to-Sequence Learning , author =. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

work page
[12]

Quality Estimation from Scratch (

Kreutzer, Julia and Uyheng, Joshua and Riezler, Stefan , booktitle =. Quality Estimation from Scratch (

work page
[13]

He, Minggui and Li, Zhiwei and Li, Shanshan and Peng, Hang and Zhao, Shimin and Li, Yuang and Luo, Jiaxin and Hao, Chang and Guo, Shiyue and Li, Rui and others , journal =

work page
[14]

Feng, Zhaopeng and Cai, Ruidi and Liu, Jiaxuan and Hu, Junyuan and Wu, Zhiyong , journal =

work page
[15]

and Artzi, Yoav , journal =

Zhang, Tianyi and Kishore, Varsha and Wu, Felix and Weinberger, Kilian Q. and Artzi, Yoav , journal =

work page
[16]

Yang, Yu and Cheng, Shanbo and Xu, Lu and Zhang, Jianbing and Huang, Shujian , journal =

work page
[17]

Lu, Wenhao and Wang, Xuebo and Zhang, Min and Zhan, Runzhe , journal =

work page
[18]

2602.14028 , archivePrefix=

Yang, Sen and Cheng, Shanbo and Xu, Lu and Zhang, Jianbing and Huang, Shujian , year =. 2602.14028 , archivePrefix=

work page arXiv
[19]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages =

An Open Dataset and Model for Language Identification , author =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages =

work page
[20]

Language-agnostic

Feng, Fangxiaoyu and Yang, Yinfei and Cer, Daniel and Arivazhagan, Naveen and Wang, Wei , booktitle =. Language-agnostic

work page
[21]

Proceedings of the Seventh Conference on Machine Translation (WMT) , pages =

Rei, Ricardo and de Souza, Jos. Proceedings of the Seventh Conference on Machine Translation (WMT) , pages =

work page
[22]

and Pombal, Jos

Rei, Ricardo and Guerreiro, Nuno M. and Pombal, Jos. Scaling Up. Proceedings of the Eighth Conference on Machine Translation (WMT) , pages =

work page
[23]

No Language Left Behind: Scaling Human-Centered Machine Translation

No Language Left Behind: Scaling Human-Centered Machine Translation , author =. arXiv preprint arXiv:2207.04672 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Proceedings of the First Workshop on Neural Machine Translation (WNMT) , pages =

Six Challenges for Neural Machine Translation , author =. Proceedings of the First Workshop on Neural Machine Translation (WNMT) , pages =

work page
[25]

Goyal, Naman and Gao, Cynthia and Chaudhary, Vishrav and Chen, Peng-Jen and Wenzek, Guillaume and Ju, Da and Krishnan, Sanjana and Ranzato, Marc'Aurelio and Guzm. The. Transactions of the Association for Computational Linguistics (TACL) , volume =

work page
[26]

Federmann, Christian and Kocmi, Tom and Xin, Ying , booktitle =

work page
[27]

Proceedings of the Tenth Workshop on Statistical Machine Translation (WMT) , pages =

Popovi. Proceedings of the Tenth Workshop on Statistical Machine Translation (WMT) , pages =

work page
[28]

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , booktitle =

work page
[29]

A Call for Clarity in Reporting

Post, Matt , booktitle =. A Call for Clarity in Reporting

work page
[30]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

work page
[31]

Dettmers, Tim and Pagnoni, Artidoro and Rodola, Ari and Zettlemoyer, Luke , journal =

work page
[32]

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

Improving Neural Machine Translation Models with Monolingual Data , author =. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

work page
[33]

Proceedings of the International Conference on Learning Representations (ICLR) , year =

Unsupervised Machine Translation Using Monolingual Corpora Only , author =. Proceedings of the International Conference on Learning Representations (ICLR) , year =

work page

[1] [1]

Machine Learning , volume =

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , author =. Machine Learning , volume =. 1992 , publisher =

work page 1992

[2] [2]

arXiv preprint arXiv:2601.12535 , year =

Improving Low-Resource Machine Translation via Round-Trip Reinforcement Learning , author =. arXiv preprint arXiv:2601.12535 , year =

work page arXiv

[3] [3]

Proceedings of the International Conference on Learning Representations (ICLR) , year =

Sequence Level Training with Recurrent Neural Networks , author =. Proceedings of the International Conference on Learning Representations (ICLR) , year =

work page

[4] [4]

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

Minimum Risk Training for Neural Machine Translation , author =. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

work page

[5] [5]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , author =. arXiv preprint arXiv:1609.08144 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Training Language Models to Follow Instructions with Human Feedback , author =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[7] [7]

Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms , author =. arXiv preprint arXiv:1707.06347 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Proceedings of NAACL , year =

Accurate Evaluation of Segment-level Machine Translation Metrics , author =. Proceedings of NAACL , year =

work page

[9] [9]

Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxian and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, Y. K. and Wu, Y. and others , journal =

work page

[10] [10]

Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxian and Zhang, Ruoyu and Xu, Runxin and Zhu, Qihao and Ma, Shirong and Wang, Peiyi and Bi, Xiao and others , journal =

work page

[11] [11]

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

Bandit Structured Prediction for Neural Sequence-to-Sequence Learning , author =. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

work page

[12] [12]

Quality Estimation from Scratch (

Kreutzer, Julia and Uyheng, Joshua and Riezler, Stefan , booktitle =. Quality Estimation from Scratch (

work page

[13] [13]

He, Minggui and Li, Zhiwei and Li, Shanshan and Peng, Hang and Zhao, Shimin and Li, Yuang and Luo, Jiaxin and Hao, Chang and Guo, Shiyue and Li, Rui and others , journal =

work page

[14] [14]

Feng, Zhaopeng and Cai, Ruidi and Liu, Jiaxuan and Hu, Junyuan and Wu, Zhiyong , journal =

work page

[15] [15]

and Artzi, Yoav , journal =

Zhang, Tianyi and Kishore, Varsha and Wu, Felix and Weinberger, Kilian Q. and Artzi, Yoav , journal =

work page

[16] [16]

Yang, Yu and Cheng, Shanbo and Xu, Lu and Zhang, Jianbing and Huang, Shujian , journal =

work page

[17] [17]

Lu, Wenhao and Wang, Xuebo and Zhang, Min and Zhan, Runzhe , journal =

work page

[18] [18]

2602.14028 , archivePrefix=

Yang, Sen and Cheng, Shanbo and Xu, Lu and Zhang, Jianbing and Huang, Shujian , year =. 2602.14028 , archivePrefix=

work page arXiv

[19] [19]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages =

An Open Dataset and Model for Language Identification , author =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages =

work page

[20] [20]

Language-agnostic

Feng, Fangxiaoyu and Yang, Yinfei and Cer, Daniel and Arivazhagan, Naveen and Wang, Wei , booktitle =. Language-agnostic

work page

[21] [21]

Proceedings of the Seventh Conference on Machine Translation (WMT) , pages =

Rei, Ricardo and de Souza, Jos. Proceedings of the Seventh Conference on Machine Translation (WMT) , pages =

work page

[22] [22]

and Pombal, Jos

Rei, Ricardo and Guerreiro, Nuno M. and Pombal, Jos. Scaling Up. Proceedings of the Eighth Conference on Machine Translation (WMT) , pages =

work page

[23] [23]

No Language Left Behind: Scaling Human-Centered Machine Translation

No Language Left Behind: Scaling Human-Centered Machine Translation , author =. arXiv preprint arXiv:2207.04672 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

Proceedings of the First Workshop on Neural Machine Translation (WNMT) , pages =

Six Challenges for Neural Machine Translation , author =. Proceedings of the First Workshop on Neural Machine Translation (WNMT) , pages =

work page

[25] [25]

Goyal, Naman and Gao, Cynthia and Chaudhary, Vishrav and Chen, Peng-Jen and Wenzek, Guillaume and Ju, Da and Krishnan, Sanjana and Ranzato, Marc'Aurelio and Guzm. The. Transactions of the Association for Computational Linguistics (TACL) , volume =

work page

[26] [26]

Federmann, Christian and Kocmi, Tom and Xin, Ying , booktitle =

work page

[27] [27]

Proceedings of the Tenth Workshop on Statistical Machine Translation (WMT) , pages =

Popovi. Proceedings of the Tenth Workshop on Statistical Machine Translation (WMT) , pages =

work page

[28] [28]

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , booktitle =

work page

[29] [29]

A Call for Clarity in Reporting

Post, Matt , booktitle =. A Call for Clarity in Reporting

work page

[30] [30]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , booktitle =

work page

[31] [31]

Dettmers, Tim and Pagnoni, Artidoro and Rodola, Ari and Zettlemoyer, Luke , journal =

work page

[32] [32]

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

Improving Neural Machine Translation Models with Monolingual Data , author =. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) , pages =

work page

[33] [33]

Proceedings of the International Conference on Learning Representations (ICLR) , year =

Unsupervised Machine Translation Using Monolingual Corpora Only , author =. Proceedings of the International Conference on Learning Representations (ICLR) , year =

work page