pith. sign in

arxiv: 1704.03471 · v3 · pith:FAGLIR5Lnew · submitted 2017-04-11 · 💻 cs.CL

What do Neural Machine Translation Models Learn about Morphology?

classification 💻 cs.CL
keywords modelsneuralrepresentationslearnmachinemorphologytargettranslation
0
0 comments X
read the original abstract

Neural machine translation (MT) models obtain state-of-the-art performance while maintaining a simple, end-to-end architecture. However, little is known about what these models learn about source and target languages during the training process. In this work, we analyze the representations learned by neural MT models at various levels of granularity and empirically evaluate the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks. We conduct a thorough investigation along several parameters: word-based vs. character-based representations, depth of the encoding layer, the identity of the target language, and encoder vs. decoder representations. Our data-driven, quantitative evaluation sheds light on important aspects in the neural MT system and its ability to capture word structure.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Toward Calibrated, Fair, and accurate Deepfake Detection

    cs.LG 2026-06 unverdicted novelty 7.0

    Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.

  2. Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models

    cs.CL 2026-06 unverdicted novelty 6.0

    LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.

  3. Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts

    cs.CL 2019-06 conditional novelty 6.0

    Gated lexical shortcut connections added to the transformer yield 0.9 BLEU average gains on five WMT directions while lowering the lexical content stored in hidden states.