Adversarial smuggling attacks encode harmful content into human-readable visuals that evade MLLM detection, achieving over 90% attack success rates on models like GPT-5 and Qwen3-VL via the new SmuggleBench benchmark.
hub Mixed citations
BERTopic: Neural topic modeling with a class-based TF-IDF procedure
Mixed citation behavior. Most common role is method (64%).
abstract
Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics
co-cited works
representative citing papers
SemCEB is the first benchmark for cardinality estimation over semantic operators, evaluating sampling methods and Semantic Histograms on accuracy, cost, latency, and memory using 102 queries on a real-world dataset.
Topic modeling and LLM-assisted analysis of 60k+ juvenile justice opinions identifies 182 topics showing child welfare tripling, punitive declines, vocabulary drift, and risks for AI tools over six decades.
A new linked multimodal dataset of Russian domestic and foreign policy speeches with texts, images, captions, harmonized metadata, and expert-refined topic annotations is introduced to support analyses in political communication and LLM applications.
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Brazilian YouTube climate videos show a transition from traditional denial of climate science to 'new denial' that undermines solutions, with the latter attracting more engagement from diverse actors.
Moltbook operates as two largely separate layers: a dominant transactional token economy using protocols like MBC-20 and a thinner discursive conversation layer with only 3.6% agent overlap.
Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.
LLMs conditioned on actual psychometric profiles produce life stories from which independent LLMs recover personality scores at mean r=0.75, 85% of human reliability, with emotional patterns replicating in real human data.
Discourse among AI agents on Moltbook is largely determined by architectural constraints like context windows and identity files, appearing as social learning but actually short-horizon contextual conditioning.
GRAB is a benchmark dataset of 1.61M sentences from 8,247 10-K filings with taxonomy-anchored weak supervision labels for standardized evaluation of unsupervised topic models on financial risk disclosures.
Crowdsourced metaphors show rising anthropomorphism and warmth toward AI that predict trust and adoption, with notable demographic differences.
EconSimulacra is a multi-agent LLM simulator that couples economy, mobility, and social networks through shared internal states to reproduce nonlinear relationships between online attention and offline popularity.
Semantic mapping of 8,954 definitions and 2,700 scales from 14,000+ papers shows learner agency and autonomy span task regulation, personal motivation, and sociocultural dimensions, with existing scales and generative AI research underrepresenting the sociocultural dimension.
A multi-agent LLM system discovers criteria such as Encouraging, Urgent, and Clear for surgical feedback and uses them to score 4.2k instances, outperforming prior content-based approaches in predicting trainee behavior changes and trainer approval.
Audit of ChatGPT, Copilot, Gemini and Perplexity finds ~16% of cited sources are AI-generated across 712 queries on politics, health and environment.
LFD discovers predictive text features via LLM contrastive proposals, cross-LLM Cohen's kappa screening, and residual held-out gain selection, matching baseline accuracy while achieving higher human agreement and lower label leakage on ten tasks.
Creates MediLongChat synthetic longitudinal medical dialogues and benchmarks showing state-of-the-art LLMs struggle with in-dialogue, cross-dialogue, and synthesis reasoning tasks.
Quasi-experimental study of 235M Bluesky posts finds that exposure to algorithmic feeds produces greater stylistic accommodation, semantic alignment, and register formalization than in matched controls, with effects varying by feed and strongest for reposting.
DOF ranks document categories by distinctiveness instead of size to promote blind-spot discovery, surfacing different content than coverage-based methods across four domains.
REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-source and reasoning LLMs.
MIRA is a new benchmark for multi-category integrated retrieval built from real queries on a social science platform, with LLM assistance for topic descriptions and relevance labeling across four item categories.
Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.
TubeCensus provides a transparent longitudinal dataset of YouTube channels and subscriber counts covering creators responsible for 30-36% of platform content, distributed via a pip package.
citing papers explorer
-
Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation
Adversarial smuggling attacks encode harmful content into human-readable visuals that evade MLLM detection, achieving over 90% attack success rates on models like GPT-5 and Qwen3-VL via the new SmuggleBench benchmark.
-
SemCEB: A Cardinality Estimation Benchmark for Semantic Operators
SemCEB is the first benchmark for cardinality estimation over semantic operators, evaluating sampling methods and Semantic Histograms on accuracy, cost, latency, and memory using 102 queries on a real-world dataset.
-
From Punishment to Protection: Charting Six Decades of U.S. Juvenile Justice Through Topic Modeling and LLM-Assisted Analysis
Topic modeling and LLM-assisted analysis of 60k+ juvenile justice opinions identifies 182 topics showing child welfare tripling, punitive declines, vocabulary drift, and risks for AI tools over six decades.
-
Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches
A new linked multimodal dataset of Russian domestic and foreign policy speeches with texts, images, captions, harmonized metadata, and expert-refined topic annotations is introduced to support analyses in political communication and LLM applications.
-
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
-
Mapping Emerging Climate Misinformation Playbooks in the Global South
Brazilian YouTube climate videos show a transition from traditional denial of climate science to 'new denial' that undermines solutions, with the latter attracting more engagement from diverse actors.
-
The Platform Is Mostly Not a Platform: Token Economies and Agent Discourse on Moltbook
Moltbook operates as two largely separate layers: a dominant transactional token economy using protocols like MBC-20 and a thinner discursive conversation layer with only 3.6% agent overlap.
-
Participatory provenance as representational auditing for AI-mediated public consultation
Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.
-
Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles
LLMs conditioned on actual psychometric profiles produce life stories from which independent LLMs recover personality scores at mean r=0.75, 85% of human reliability, with emotional patterns replicating in real human data.
-
What Do AI Agents Talk About? Discourse and Architectural Constraints in the First AI-Only Social Network
Discourse among AI agents on Moltbook is largely determined by architectural constraints like context windows and identity files, appearing as social learning but actually short-horizon contextual conditioning.
-
GRAB: A Risk Taxonomy--Grounded Benchmark for Unsupervised Topic Discovery in Financial Disclosures
GRAB is a benchmark dataset of 1.61M sentences from 8,247 10-K filings with taxonomy-anchored weak supervision labels for standardized evaluation of unsupervised topic models on financial risk disclosures.
-
From tools to thieves: Measuring and understanding public perceptions of AI through crowdsourced metaphors
Crowdsourced metaphors show rising anthropomorphism and warmth toward AI that predict trust and adoption, with notable demographic differences.
-
EconSimulacra: A Digital Twin Platform of Socio-Economic Systems Powered by LLM Agents
EconSimulacra is a multi-agent LLM simulator that couples economy, mobility, and social networks through shared internal states to reproduce nonlinear relationships between online attention and offline popularity.
-
Large-scale semantic mapping of learner agency and autonomy reveals what measurement and generative AI research overlook
Semantic mapping of 8,954 definitions and 2,700 scales from 14,000+ papers shows learner agency and autonomy span task regulation, personal motivation, and sociocultural dimensions, with existing scales and generative AI research underrepresenting the sociocultural dimension.
-
A Multi-Agent LLM Framework for Rating the Quality of Surgical Feedback
A multi-agent LLM system discovers criteria such as Encouraging, Urgent, and Clear for surgical feedback and uses them to score 4.2k instances, outperforming prior content-based approaches in predicting trainee behavior changes and trainer approval.
-
Synthetic Sources?: Auditing Generative Search Engine Citations for Evidence of AI-Generated Sources
Audit of ChatGPT, Copilot, Gemini and Perplexity finds ~16% of cited sources are AI-generated across 712 queries on politics, health and environment.
-
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
LFD discovers predictive text features via LLM contrastive proposals, cross-LLM Cohen's kappa screening, and residual held-out gain selection, matching baseline accuracy while achieving higher human agreement and lower label leakage on ten tasks.
-
Synthesis and Evaluation of Long-term History-aware Medical Dialogue
Creates MediLongChat synthetic longitudinal medical dialogues and benchmarks showing state-of-the-art LLMs struggle with in-dialogue, cross-dialogue, and synthesis reasoning tasks.
-
Algorithmic Cultivation: How Social Media Feeds Shape User Language
Quasi-experimental study of 235M Bluesky posts finds that exposure to algorithmic feeds produces greater stylistic accommodation, semantic alignment, and register formalization than in matched controls, with effects varying by feed and strongest for reposting.
-
Discovery-Oriented Faceting: From Coverage to Blind-Spot Discovery
DOF ranks document categories by distinctiveness instead of size to promote blind-spot discovery, surfacing different content than coverage-based methods across four domains.
-
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-source and reasoning LLMs.
-
MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval
MIRA is a new benchmark for multi-category integrated retrieval built from real queries on a social science platform, with LLM assistance for topic descriptions and relevance labeling across four item categories.
-
What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook
Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.
-
TubeCensus: A Transparent, Replicable, and Large-Scale Census of YouTube Channels and their Subscriber Counts Over Time
TubeCensus provides a transparent longitudinal dataset of YouTube channels and subscriber counts covering creators responsible for 30-36% of platform content, distributed via a pip package.
-
Synthetic Users, Real Differences: an Evaluation Framework for User Simulation in Multi-Turn Conversations
Realsim shows simulated users fail to reproduce communication frictions present in real multi-turn chatbot dialogues, yielding overly optimistic evaluations with domain-dependent variability.
-
Proposing Topic Models and Evaluation Frameworks for Analyzing Associations with External Outcomes: An Application to Leadership Analysis Using Large-Scale Corporate Review Data
An LLM-based topic modeling method with a custom evaluation framework improves topic interpretability, specificity, and polarity consistency over prior approaches when linking corporate review text to external outcomes such as employee morale.
-
Detecting and Enhancing Intellectual Humility in Online Political Discourse
Intellectual humility in Reddit political discussions can be measured at scale with a validated classifier and increased via targeted interventions without reducing participation.
-
The Effect of Document Selection on Query-focused Text Analysis
Semantic and hybrid document retrieval methods provide reliable, efficient selection for query-focused text analyses like LDA and BERTopic, outperforming random or keyword-only approaches.
-
Mirroring Minds: Asymmetric Linguistic Accommodation and Diagnostic Identity in ADHD and Autism Reddit Communities
ADHD and autism Reddit users exhibit convergent linguistic accommodation when crossing community boundaries, with diagnosis disclosure showing small and directionally distinct effects on style.
-
Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs
LLM reasoning refines unsupervised text clusters via coherence checks, redundancy removal, and label grounding, yielding better coherence and human-aligned labels on social media data.
-
Discovering Failure Modes in Vision-Language Models using RL
An RL-based questioner agent adaptively generates queries to discover novel failure modes in VLMs without human intervention.
-
Paper Espresso: From Paper Overload to Research Insight
Paper Espresso deploys LLMs to summarize and analyze trends across 13,300+ arXiv papers over 35 months, releasing metadata that shows non-saturating topic growth and higher engagement for novel topics.
-
PRISM: LLM-Guided Semantic Clustering for High-Precision Topics
PRISM distills sparse LLM labels into a fine-tuned embedding model for thresholded clustering that separates fine-grained topics better than prior local models or raw frontier embeddings.
-
In your own words: computationally identifying interpretable themes in free-text survey data
A computational framework identifies more coherent themes in free-text survey data on race, gender, and sexual orientation than previous methods, with applications for survey design, explaining variation, and detecting identity discordance.
-
WebExpert: domain-aware web agents with critic-guided expert experience for high-precision search
WebExpert improves exact-match accuracy by 1.5-3.6 points on GAIA, GPQA, HLE, and WebWalkerQA benchmarks via experience retrieval, automatic facet induction, and preference-optimized planning.
-
LLM-MemCluster: Empowering Large Language Models with Dynamic Memory for Text Clustering
LLM-MemCluster gives LLMs stateful memory and prompts that let them decide cluster count and iteratively refine groupings, outperforming baselines on benchmarks in a tuning-free end-to-end setup.
-
Disentangling Interaction and Bias Effects in Opinion Dynamics of Large Language Models
A Bayesian framework disentangles topic, agreement, and anchoring biases from interaction effects in LLM multi-turn dialogues, revealing convergence to attractors that shift with fine-tuning.
-
FLAME: A New Dataset on FLemish Accounts of Momentary Experiences
Introduces a 25k-narrative Flemish corpus and finds that BERTopic yields more coherent and culturally relevant topics than LDA or K-Means according to human raters, despite LDA scoring higher on automated coherence metrics.
-
A Computational Method for Measuring "Open Codes" in Qualitative Analysis
A method merges codebooks via LLM and evaluates human and AI inductive coding with four new metrics on an online conversation dataset.
-
MMTM: Tri-Modal Topic Modeling for Long-Form Video via Similarity-Gated Fusion
MMTM improves topic coherence and temporal stability in long-form video by tri-modal similarity-gated fusion of speech, audio, and visual embeddings with BERTopic, shown on German and English news datasets with released code and corpus.
-
SmartIterator: Visual Analytics Workflows for Supervising Unsupervised Data Grouping
SmartIterator supplies method-specific workflows and coordinated visualizations to systematically supervise and interpret parameter sweeps of unsupervised data grouping techniques.
-
Eliot: Interactively $\underline{E}$xploring Fast-Changing Scientific $\underline{Li}$terature Trends with $\underline{O}$nline Da$\underline{t}$a and Learning
Eliot is a query-time clustering and temporal visualization system for arXiv literature, evaluated via offline metrics on eight domains and a user survey showing 85% meaningful cluster labels.
-
Can LLMs extract scientific consensus? A case study in high-temperature superconductivity
LLMs recover coherent, interpretable structures from HTS literature including family-dependent mechanisms and temporal belief evolution via a constructed knowledge graph.
-
The Structure and Dynamics of the Online MAHA-sphere
Reddit analysis finds MAHA users show strong cross-theme belief bundling and network coherence unlike anti-MAHA users, with pandemic-era shifts from anti-fluoride/mask to anti-vaccine to broader anti-science engagement.
-
ChatGPT vs Teachers vs Students: Large-Scale Analysis of Generative AI Discourse in Education Communities on Reddit
Large-scale topic modeling of 270k Reddit posts shows GenAI discourse in education shifting from detection-evasion to enforcement, with K-12 teachers emphasizing cognitive dependency, academics focusing on detection, students on career anxiety, and adversarial themes driving engagement and cross-sta
-
Topical Shifts in the Dark Web: A Longitudinal Analysis of Content from the Cybercrime Ecosystem
Longitudinal topic modeling on a large dark web dataset finds 75% of discussion volume in persistent core topics with a median lifespan of 75 months and only 3% in short-lived themes.
-
Analyzing Codes of Conduct for Online Safety in Video Games at Scale
Large-scale scan of Steam multiplayer games finds CoCs available for just 3.6% of titles, with better coverage of security issues than interpersonal or underage-player harms.
-
Measuring Embedding Sensitivity to Authorial Style in French: Comparing Literary Texts with Language Model Rewritings
Embeddings reliably capture authorial stylistic features in French literary texts, and these signals persist after LLM rewriting while showing model-specific patterns.
-
Automatic Reflection Level Classification in Hungarian Student Essays
Classical machine learning models outperform Hungarian transformers slightly in overall performance (71% vs 68% average score) for classifying reflection levels in student essays, though transformers handle rare classes better.
-
A Gated Hybrid Contrastive Collaborative Filtering Recommendation
A gated hybrid contrastive collaborative filtering framework improves hit rate@10 and NDCG@10 on movie review datasets by layer-wise adaptive fusion of semantic and collaborative signals with contrastive objectives.