On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

Kyunghyun Cho , Bart van Merrienboer , Dzmitry Bahdanau , Yoshua Bengio

Authors on Pith no claims yet

classification 💻 cs.CL stat.ML

keywords neuraltranslationmachinesentenceconvolutionaldecoderencodergated

read the original abstract

Neural machine translation is a relatively new approach to statistical machine translation based purely on neural networks. The neural machine translation models often consist of an encoder and a decoder. The encoder extracts a fixed-length representation from a variable-length input sentence, and the decoder generates a correct translation from this representation. In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder--Decoder and a newly proposed gated recursive convolutional neural network. We show that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase. Furthermore, we find that the proposed gated recursive convolutional network learns a grammatical structure of a sentence automatically.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Session-based Recommendations with Recurrent Neural Networks
cs.LG 2015-11 conditional novelty 8.0

RNNs with ranking loss outperform item-to-item baselines for session-based recommendations on two datasets.
TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations
cs.LG 2026-05 unverdicted novelty 7.0

TCD-Arena is a new customizable testing framework that runs millions of experiments to map how 33 different assumption violations affect time series causal discovery methods and shows ensembles can boost overall robustness.
Neural architectures for resolving references in program code
cs.LG 2026-04 unverdicted novelty 6.0

New seq2seq architectures for permutation indexing outperform baselines on synthetic reference-resolution tasks and reduce real decompilation error rates by 42%.
The illusory simplicity of the feedforward pass: evidence for the dynamical nature of stimulus encoding along the primate ventral stream
q-bio.NC 2026-04 unverdicted novelty 6.0

Primate ventral stream encodes visual stimuli through evolving neural dynamics that carry category information beyond any fixed spatial pattern during the initial feedforward pass.
Leveraging Artist Catalogs for Cold-Start Music Recommendation
cs.IR 2026-04 unverdicted novelty 6.0

ACARec attends over artist catalogs to generate CF embeddings for new tracks, more than doubling recall and NDCG versus content-only baselines in music recommendation.
EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records
cs.IR 2026-05 unverdicted novelty 5.0

EHR-RAGp is a retrieval-augmented EHR foundation model that employs prototype-guided retrieval to dynamically integrate relevant historical patient context, outperforming prior models on clinical prediction tasks.
Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance
cs.LG 2026-05 unverdicted novelty 5.0

Stable-GFlowNet improves training stability and attack diversity in LLM red-teaming by eliminating Z estimation via contrastive trajectory balance while preserving GFN optimality.
Delta6: A Low-Cost, 6-DOF Force-Sensing Flexible End-Effector
cs.RO 2026-04 unverdicted novelty 5.0

Delta6 delivers a low-cost 6-DOF force-sensing end-effector with 3.8% FS accuracy using sequence models, validated on robot-arm tasks like buffing and tight assembly.
Large Language Models: A Survey
cs.CL 2024-02 accept novelty 3.0

The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.