pith. sign in

arxiv: 1901.01577 · v1 · pith:2VHNNPUDnew · submitted 2019-01-06 · 💻 cs.CL

Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

classification 💻 cs.CL
keywords lexicontranslationtrainingunsupervisedclassesfirstlargevocabulary
0
0 comments X
read the original abstract

We address for the first time unsupervised training for a translation task with hundreds of thousands of vocabulary words. We scale up the expectation-maximization (EM) algorithm to learn a large translation table without any parallel text or seed lexicon. First, we solve the memory bottleneck and enforce the sparsity with a simple thresholding scheme for the lexicon. Second, we initialize the lexicon training with word classes, which efficiently boosts the performance. Our methods produced promising results on two large-scale unsupervised translation tasks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.