pith. machine review for the scientific record. sign in

arxiv: 1602.01925 · v2 · submitted 2016-02-05 · 💻 cs.CL

Recognition: unknown

Massively Multilingual Word Embeddings

Authors on Pith no claims yet
classification 💻 cs.CL
keywords methodsdataembeddingsevaluationalongareabettercategorization
0
0 comments X
read the original abstract

We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space. Our estimation methods, multiCluster and multiCCA, use dictionaries and monolingual data; they do not require parallel data. Our new evaluation method, multiQVEC-CCA, is shown to correlate better than previous ones with two downstream tasks (text categorization and parsing). We also describe a web portal for evaluation that will facilitate further research in this area, along with open-source releases of all our methods.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL

    cs.CL 2026-04 unverdicted novelty 7.0

    Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.