REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
hub
BERTopic: Neural topic modeling with a class-based TF-IDF procedure
38 Pith papers cite this work. Polarity classification is still indexing.
abstract
Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.
hub tools
claims ledger
- abstract Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics
co-cited works
years
2026 38representative citing papers
Adversarial smuggling attacks encode harmful content into human-readable visuals that evade MLLM detection, achieving over 90% attack success rates on models like GPT-5 and Qwen3-VL via the new SmuggleBench benchmark.
AI-only technical discourse on MoltBook is coherent and organized around 12 themes led by security and trust, but it lacks the concrete code, runtime failures, and reproduction steps common in human GitHub discussions.
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Brazilian YouTube climate videos show a transition from traditional denial of climate science to 'new denial' that undermines solutions, with the latter attracting more engagement from diverse actors.
Moltbook operates as two largely separate layers: a dominant transactional token economy using protocols like MBC-20 and a thinner discursive conversation layer with only 3.6% agent overlap.
Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.
LLMs conditioned on actual psychometric profiles produce life stories from which independent LLMs recover personality scores at mean r=0.75, 85% of human reliability, with emotional patterns replicating in real human data.
DOF ranks document categories by distinctiveness instead of size to promote blind-spot discovery, surfacing different content than coverage-based methods across four domains.
MIRA is a new benchmark for multi-category integrated retrieval built from real queries on a social science platform, with LLM assistance for topic descriptions and relevance labeling across four item categories.
TubeCensus provides a transparent longitudinal dataset of YouTube channels and subscriber counts covering creators responsible for 30-36% of platform content, distributed via a pip package.
Realsim shows simulated users fail to reproduce communication frictions present in real multi-turn chatbot dialogues, yielding overly optimistic evaluations with domain-dependent variability.
ProEval is a proactive framework using pre-trained GPs, Bayesian quadrature, and superlevel set sampling to estimate performance and find failures in generative AI with 8-65x fewer samples than baselines.
An LLM-based topic modeling method with a custom evaluation framework improves topic interpretability, specificity, and polarity consistency over prior approaches when linking corporate review text to external outcomes such as employee morale.
Intellectual humility in Reddit political discussions can be measured at scale with a validated classifier and increased via targeted interventions without reducing participation.
Semantic and hybrid document retrieval methods provide reliable, efficient selection for query-focused text analyses like LDA and BERTopic, outperforming random or keyword-only approaches.
ADHD and autism Reddit users exhibit convergent linguistic accommodation when crossing community boundaries, with diagnosis disclosure showing small and directionally distinct effects on style.
LLM reasoning refines unsupervised text clusters via coherence checks, redundancy removal, and label grounding, yielding better coherence and human-aligned labels on social media data.
An RL-based questioner agent adaptively generates queries to discover novel failure modes in VLMs without human intervention.
Paper Espresso deploys LLMs to summarize and analyze trends across 13,300+ arXiv papers over 35 months, releasing metadata that shows non-saturating topic growth and higher engagement for novel topics.
PRISM distills sparse LLM labels into a fine-tuned embedding model for thresholded clustering that separates fine-grained topics better than prior local models or raw frontier embeddings.
Embeddings reliably capture authorial stylistic features in French literary texts, and these signals persist after LLM rewriting while showing model-specific patterns.
Classical machine learning models outperform Hungarian transformers slightly in overall performance (71% vs 68% average score) for classifying reflection levels in student essays, though transformers handle rare classes better.
A gated hybrid contrastive collaborative filtering framework improves hit rate@10 and NDCG@10 on movie review datasets by layer-wise adaptive fusion of semantic and collaborative signals with contrastive objectives.
citing papers explorer
-
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
-
Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation
Adversarial smuggling attacks encode harmful content into human-readable visuals that evade MLLM detection, achieving over 90% attack success rates on models like GPT-5 and Qwen3-VL via the new SmuggleBench benchmark.
-
What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook
AI-only technical discourse on MoltBook is coherent and organized around 12 themes led by security and trust, but it lacks the concrete code, runtime failures, and reproduction steps common in human GitHub discussions.
-
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
-
Mapping Emerging Climate Misinformation Playbooks in the Global South
Brazilian YouTube climate videos show a transition from traditional denial of climate science to 'new denial' that undermines solutions, with the latter attracting more engagement from diverse actors.
-
The Platform Is Mostly Not a Platform: Token Economies and Agent Discourse on Moltbook
Moltbook operates as two largely separate layers: a dominant transactional token economy using protocols like MBC-20 and a thinner discursive conversation layer with only 3.6% agent overlap.
-
Participatory provenance as representational auditing for AI-mediated public consultation
Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.
-
Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles
LLMs conditioned on actual psychometric profiles produce life stories from which independent LLMs recover personality scores at mean r=0.75, 85% of human reliability, with emotional patterns replicating in real human data.
-
Discovery-Oriented Faceting: From Coverage to Blind-Spot Discovery
DOF ranks document categories by distinctiveness instead of size to promote blind-spot discovery, surfacing different content than coverage-based methods across four domains.
-
MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval
MIRA is a new benchmark for multi-category integrated retrieval built from real queries on a social science platform, with LLM assistance for topic descriptions and relevance labeling across four item categories.
-
TubeCensus: A Transparent, Replicable, and Large-Scale Census of YouTube Channels and their Subscriber Counts Over Time
TubeCensus provides a transparent longitudinal dataset of YouTube channels and subscriber counts covering creators responsible for 30-36% of platform content, distributed via a pip package.
-
Synthetic Users, Real Differences: an Evaluation Framework for User Simulation in Multi-Turn Conversations
Realsim shows simulated users fail to reproduce communication frictions present in real multi-turn chatbot dialogues, yielding overly optimistic evaluations with domain-dependent variability.
-
ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation
ProEval is a proactive framework using pre-trained GPs, Bayesian quadrature, and superlevel set sampling to estimate performance and find failures in generative AI with 8-65x fewer samples than baselines.
-
Proposing Topic Models and Evaluation Frameworks for Analyzing Associations with External Outcomes: An Application to Leadership Analysis Using Large-Scale Corporate Review Data
An LLM-based topic modeling method with a custom evaluation framework improves topic interpretability, specificity, and polarity consistency over prior approaches when linking corporate review text to external outcomes such as employee morale.
-
Detecting and Enhancing Intellectual Humility in Online Political Discourse
Intellectual humility in Reddit political discussions can be measured at scale with a validated classifier and increased via targeted interventions without reducing participation.
-
The Effect of Document Selection on Query-focused Text Analysis
Semantic and hybrid document retrieval methods provide reliable, efficient selection for query-focused text analyses like LDA and BERTopic, outperforming random or keyword-only approaches.
-
Mirroring Minds: Asymmetric Linguistic Accommodation and Diagnostic Identity in ADHD and Autism Reddit Communities
ADHD and autism Reddit users exhibit convergent linguistic accommodation when crossing community boundaries, with diagnosis disclosure showing small and directionally distinct effects on style.
-
Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs
LLM reasoning refines unsupervised text clusters via coherence checks, redundancy removal, and label grounding, yielding better coherence and human-aligned labels on social media data.
-
Discovering Failure Modes in Vision-Language Models using RL
An RL-based questioner agent adaptively generates queries to discover novel failure modes in VLMs without human intervention.
-
Paper Espresso: From Paper Overload to Research Insight
Paper Espresso deploys LLMs to summarize and analyze trends across 13,300+ arXiv papers over 35 months, releasing metadata that shows non-saturating topic growth and higher engagement for novel topics.
-
PRISM: LLM-Guided Semantic Clustering for High-Precision Topics
PRISM distills sparse LLM labels into a fine-tuned embedding model for thresholded clustering that separates fine-grained topics better than prior local models or raw frontier embeddings.
-
Measuring Embedding Sensitivity to Authorial Style in French: Comparing Literary Texts with Language Model Rewritings
Embeddings reliably capture authorial stylistic features in French literary texts, and these signals persist after LLM rewriting while showing model-specific patterns.
-
Automatic Reflection Level Classification in Hungarian Student Essays
Classical machine learning models outperform Hungarian transformers slightly in overall performance (71% vs 68% average score) for classifying reflection levels in student essays, though transformers handle rare classes better.
-
A Gated Hybrid Contrastive Collaborative Filtering Recommendation
A gated hybrid contrastive collaborative filtering framework improves hit rate@10 and NDCG@10 on movie review datasets by layer-wise adaptive fusion of semantic and collaborative signals with contrastive objectives.
-
From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media
VLMs recover reliable population-level trends in climate change visual discourse on social media even when per-image accuracy is only moderate.
-
Can Large Language Models Assist the Comprehension of ROS2 Software Architectures?
LLMs achieve 98.22% accuracy answering factual questions about ROS2 software architectures, with top models reaching 100%.
-
An Explainable Approach to Document-level Translation Evaluation with Topic Modeling
A topic-modeling framework measures document-level thematic consistency in translations by aligning key tokens across languages with a bilingual dictionary and scoring via cosine similarity, providing explainable insights beyond sentence-level metrics.
-
Migrant Voices, Local News: Insights on Bridging Community Needs with Media Content
Focus groups reveal topic gaps and readability barriers in local news for migrants, uncovered by applying standard NLP tools to 2000+ hyper-local articles.
-
NIH-MPINet: A Large-Scale Feature-Rich Network Dataset for Mapping the Frontiers of Team Science
NIH-MPINet is a new large-scale feature-rich collaboration network dataset from NIH grants that maps multi-PI teams, communities, and topic trends in biomedicine.
-
Collaboration, Integration, and Thematic Exploration in European Framework Programmes: A Longitudinal Network Analysis
EU Framework Programmes have increased participation equity and integrated new countries through collaboration, yet research remains concentrated on established trajectories rather than broadly exploratory.
-
15 Years of Augmented Human(s) Research: Where Do We Stand?
Scientometric review of 15 years of Augmented Human conference papers shows bimodal submission peaks in 2015 and 2025, dominant topics in haptics and wearables, and an active Japanese community alongside definitional scope issues.
-
Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge
Retrospective of a 2025 AI agent competition finds public-private score misalignment, an inert composite component, multi-account registrations, and guardrail fixes outperforming architectural novelty.
-
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
-
Mapping the Political Discourse in the Brazilian Chamber of Deputies: A Multi-Faceted Computational Approach
Analysis of 450k Brazilian deputy speeches shows stylistic simplification over time, sharp agenda shifts with national crises, and discursive clusters where region and gender outweigh party affiliation.
-
A Community-Based Approach for Stance Distribution and Argument Organization
Unsupervised graph community detection organizes arguments to reveal stance distributions in debates.
-
The Day My Chatbot Changed: Characterizing the Mental Health Impacts of Social AI App Updates via Negative User Reviews
Version-linked review analysis of Character AI shows rating drops with certain updates and negative feedback dominated by technical malfunctions plus occasional psychological framing.
-
Shifting Patterns of Extremist Discourse on Facebook: Analyzing Trends and Developments During the Israel-Hamas Conflict
Extremist Facebook groups showed rising one-sided activity and negative content at the Israel-Hamas conflict onset, with topic shifts from political to religious in anti-Israel groups and religious to activism in anti-Palestine groups, reversing after specific events.
-
A Guide to Using Social Media as a Geospatial Lens for Studying Public Opinion and Behavior
Social media data functions as passive geospatial sensing for public opinion and behavior via a structured workflow and case studies on topics like COVID-19 vaccines and urban accessibility.