pith. machine review for the scientific record. sign in

hub

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

38 Pith papers cite this work. Polarity classification is still indexing.

38 Pith papers citing it
abstract

Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.

hub tools

claims ledger

  • abstract Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics

co-cited works

years

2026 38

representative citing papers

Paper Espresso: From Paper Overload to Research Insight

cs.DL · 2026-04-06 · unverdicted · novelty 6.0

Paper Espresso deploys LLMs to summarize and analyze trends across 13,300+ arXiv papers over 35 months, releasing metadata that shows non-saturating topic growth and higher engagement for novel topics.

Automatic Reflection Level Classification in Hungarian Student Essays

cs.CL · 2026-05-04 · unverdicted · novelty 5.0

Classical machine learning models outperform Hungarian transformers slightly in overall performance (71% vs 68% average score) for classifying reflection levels in student essays, though transformers handle rare classes better.

A Gated Hybrid Contrastive Collaborative Filtering Recommendation

cs.IR · 2026-04-29 · unverdicted · novelty 5.0

A gated hybrid contrastive collaborative filtering framework improves hit rate@10 and NDCG@10 on movie review datasets by layer-wise adaptive fusion of semantic and collaborative signals with contrastive objectives.

citing papers explorer

Showing 38 of 38 citing papers.