pith. machine review for the scientific record. sign in

arxiv: 1205.1053 · v1 · submitted 2012-05-04 · 💻 cs.LG · stat.ML

Recognition: unknown

Variable Selection for Latent Dirichlet Allocation

Authors on Pith no claims yet
classification 💻 cs.LG stat.ML
keywords topicsvsldaclassificationlatentselectionvariablevocabularyallocation
0
0 comments X
read the original abstract

In latent Dirichlet allocation (LDA), topics are multinomial distributions over the entire vocabulary. However, the vocabulary usually contains many words that are not relevant in forming the topics. We adopt a variable selection method widely used in statistical modeling as a dimension reduction tool and combine it with LDA. In this variable selection model for LDA (vsLDA), topics are multinomial distributions over a subset of the vocabulary, and by excluding words that are not informative for finding the latent topic structure of the corpus, vsLDA finds topics that are more robust and discriminative. We compare three models, vsLDA, LDA with symmetric priors, and LDA with asymmetric priors, on heldout likelihood, MCMC chain consistency, and document classification. The performance of vsLDA is better than symmetric LDA for likelihood and classification, better than asymmetric LDA for consistency and classification, and about the same in the other comparisons.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Rigidity of self-maps of $V_{n,2}$ and classification of manifolds tangentially homotopy equivalent to $V_{n,2} \times S^k$

    math.AT 2026-04 unverdicted novelty 5.0

    Rigidity results for self-maps of V_{n,2} and classification of tangentially homotopy equivalent manifolds to V_{n,2} x S^k up to almost diffeomorphism for certain k.