LLM reasoning refines unsupervised text clusters via coherence checks, redundancy removal, and label grounding, yielding better coherence and human-aligned labels on social media data.
Title resolution pending
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
PRISM distills sparse LLM labels into a fine-tuned embedding model for thresholded clustering that separates fine-grained topics better than prior local models or raw frontier embeddings.
Analysis of 1990-2022 LIS papers via automatic extraction of method entities identifies data resources as the central driver of methodological change exhibiting a cyclical emergence-stability pattern.
BERTopic with contextual augmentation outperforms STM on topic coherence and interpretability for short survey responses, but STM better supports inferential covariate analysis.
TF-IDF identifies labeled experts in the top 25 recommendations 79.5% of the time versus 51.5% for GPT-4o mini on an astronomy observatory dataset.
Bibliometric methods rise from 19.61% to 31.81% usage as LIS scholars age, method diversity increases then declines, and scholars increasingly combine conventional and unconventional methods.
Granite Embedding Multilingual R2 releases 311M and 97M parameter bi-encoder models that achieve state-of-the-art retrieval performance on multilingual text, code, long-document, and reasoning datasets.
IKMF introduces a dual-stream architecture that converts raw data into semantically rich knowledge via AI mining while maintaining integrity, provenance, and reproducibility through parallel archiving.