pith. machine review for the scientific record. sign in

arxiv: 1205.2657 · v1 · submitted 2012-05-09 · 💻 cs.CL · cs.IR· cs.LG· stat.ML

Recognition: unknown

Multilingual Topic Models for Unaligned Text

Authors on Pith no claims yet
classification 💻 cs.CL cs.IRcs.LGstat.ML
keywords multilingualcorporamutotopicdocumentslanguagesmodeltext
0
0 comments X
read the original abstract

We develop the multilingual topic model for unaligned text (MuTo), a probabilistic model of text that is designed to analyze corpora composed of documents in two languages. From these documents, MuTo uses stochastic EM to simultaneously discover both a matching between the languages and multilingual latent topics. We demonstrate that MuTo is able to find shared topics on real-world multilingual corpora, successfully pairing related documents across languages. MuTo provides a new framework for creating multilingual topic models without needing carefully curated parallel corpora and allows applications built using the topic model formalism to be applied to a much wider class of corpora.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models

    cs.CL 2026-05 unverdicted novelty 6.0

    LLM-XTM integrates LLM-guided topic refinement with self-consistency uncertainty quantification to improve coherence and alignment in cross-lingual topic models while reducing dependence on bilingual resources and rep...