hub

Achieving Human Parity on Automatic Chinese to English News Translation

Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, et al · 2018 · cs.CL · arXiv 1803.05567

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open full Pith review browse 10 citing papers arXiv PDF

abstract

Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this paper, we first address the problem of how to define and accurately measure human parity in translation. We then describe Microsoft's machine translation system and measure the quality of its translations on the widely used WMT 2017 news translation task from Chinese to English. We find that our latest neural machine translation system has reached a new state-of-the-art, and that the translation quality is at human parity when compared to professional human translations. We also find that it significantly exceeds the quality of crowd-sourced non-professional translations.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

cs.SE · 2020-09-22 · conditional · novelty 7.0

CodeBLEU improves correlation with human programmer scores on code synthesis tasks by adding syntactic AST matching and semantic data-flow matching to the standard BLEU n-gram approach.

PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents

cs.CL · 2020-11-04 · unverdicted · novelty 6.0

PheMT is a phenomenon-wise dataset created to evaluate NMT robustness against linguistic phenomena in Japanese-English UGC translation, with experiments showing major performance drops on certain phenomena.

Forward-Backward Decoding for Regularizing End-to-End TTS

eess.AS · 2019-07-18 · unverdicted · novelty 6.0

Forward-backward decoding with divergence regularization and bidirectional decoder improves end-to-end TTS robustness and naturalness by addressing exposure bias via joint L2R/R2L training.

Findings of the First Shared Task on Machine Translation Robustness

cs.CL · 2019-06-27 · unverdicted · novelty 6.0

The first shared task on MT robustness received 23 submissions showing up to +22.33 BLEU gains on noisy Reddit data, with strong human-BLEU correlation.

Translationese in Machine Translation Evaluation

cs.CL · 2019-06-24 · unverdicted · novelty 6.0

Translationese in MT test sets biases evaluations, supporting exclusion of reverse-created data, re-evaluation of human-parity claims, and power analysis for reliable significance testing.

Metaphors in Literary Post-Editing: Opening Pandora's Box?

cs.CL · 2026-05-20 · unverdicted · novelty 5.0

Post-editors changed one in three metaphors in NMT and LLM outputs for literary texts, rated quality poor, and found post-editing more laborious than original translation.

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

cs.CL · 2019-07-11 · unverdicted · novelty 5.0

A single multilingual NMT model for 103 languages trained on 25B examples demonstrates transfer learning benefits for low-resource languages.

An Explainable Approach to Document-level Translation Evaluation with Topic Modeling

cs.CE · 2026-04-22 · unverdicted · novelty 5.0

A topic-modeling framework measures document-level thematic consistency in translations by aligning key tokens across languages with a bilingual dictionary and scoring via cosine similarity, providing explainable insights beyond sentence-level metrics.

The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AI

cs.CL · 2026-06-18 · unverdicted · novelty 4.0

Explores options for using LLMs to scale deliberation and empower marginalized groups via systemic-functional linguistics concepts while cautioning against over- and under-claiming.

Survey on reinforcement learning for language processing

cs.CL · 2021-04-12 · unverdicted · novelty 2.0

This survey reviews reinforcement learning applications to natural language processing problems, especially conversational systems, including problem descriptions, suitability of RL, advantages, limitations, and promising directions.

citing papers explorer

Showing 10 of 10 citing papers.

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis cs.SE · 2020-09-22 · conditional · none · ref 2 · internal anchor
CodeBLEU improves correlation with human programmer scores on code synthesis tasks by adding syntactic AST matching and semantic data-flow matching to the standard BLEU n-gram approach.
PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents cs.CL · 2020-11-04 · unverdicted · none · ref 6 · internal anchor
PheMT is a phenomenon-wise dataset created to evaluate NMT robustness against linguistic phenomena in Japanese-English UGC translation, with experiments showing major performance drops on certain phenomena.
Forward-Backward Decoding for Regularizing End-to-End TTS eess.AS · 2019-07-18 · unverdicted · none · ref 17 · internal anchor
Forward-backward decoding with divergence regularization and bidirectional decoder improves end-to-end TTS robustness and naturalness by addressing exposure bias via joint L2R/R2L training.
Findings of the First Shared Task on Machine Translation Robustness cs.CL · 2019-06-27 · unverdicted · none · ref 13 · internal anchor
The first shared task on MT robustness received 23 submissions showing up to +22.33 BLEU gains on noisy Reddit data, with strong human-BLEU correlation.
Translationese in Machine Translation Evaluation cs.CL · 2019-06-24 · unverdicted · none · ref 15 · internal anchor
Translationese in MT test sets biases evaluations, supporting exclusion of reverse-created data, re-evaluation of human-parity claims, and power analysis for reliable significance testing.
Metaphors in Literary Post-Editing: Opening Pandora's Box? cs.CL · 2026-05-20 · unverdicted · none · ref 63 · internal anchor
Post-editors changed one in three metaphors in NMT and LLM outputs for literary texts, rated quality poor, and found post-editing more laborious than original translation.
Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges cs.CL · 2019-07-11 · unverdicted · none · ref 8 · internal anchor
A single multilingual NMT model for 103 languages trained on 25B examples demonstrates transfer learning benefits for low-resource languages.
An Explainable Approach to Document-level Translation Evaluation with Topic Modeling cs.CE · 2026-04-22 · unverdicted · none · ref 11
A topic-modeling framework measures document-level thematic consistency in translations by aligning key tokens across languages with a bilingual dictionary and scoring via cosine similarity, providing explainable insights beyond sentence-level metrics.
The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AI cs.CL · 2026-06-18 · unverdicted · none · ref 28 · internal anchor
Explores options for using LLMs to scale deliberation and empower marginalized groups via systemic-functional linguistics concepts while cautioning against over- and under-claiming.
Survey on reinforcement learning for language processing cs.CL · 2021-04-12 · unverdicted · none · ref 48 · internal anchor
This survey reviews reinforcement learning applications to natural language processing problems, especially conversational systems, including problem descriptions, suitability of RL, advantages, limitations, and promising directions.

Achieving Human Parity on Automatic Chinese to English News Translation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer