Attention pooling produces a free-multiplicative-convolution bulk spectrum and two phase transitions for signal recovery; optimal weights are the top eigenvector of the positional correlation matrix R.
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
6 Pith papers cite this work. Polarity classification is still indexing.
abstract
Real-valued word representations have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities. In this paper, we demonstrate a {\em very simple}, and yet counter-intuitive, postprocessing technique -- eliminate the common mean vector and a few top dominating directions from the word vectors -- that renders off-the-shelf representations {\em even stronger}. The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and { text classification}) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones.
years
2026 6representative citing papers
Retrieval from out-of-domain foundation models enables personalization of a lightweight transformer for stress detection, yielding +3.92% accuracy and +4.76% F1 gains on WESAD without user labels.
Interaction SSD extends semantic differential modeling with main, interaction, and conditional gradients to test moderation by group identity, applied to racial differences in hate speech ratings on the UC Berkeley corpus.
Mature small language models share nearly identical 21-emotion geometries across architectures with Spearman correlations 0.74-0.92 despite opposite behavioral profiles, while immature models restructure under RLHF and prior comprehension-generation differences decompose into four distinct layers.
Three Metapath2Vec variants create ingredient embeddings by walking a co-occurrence graph from recipes, a typed chemical compound graph from FlavorDB, or a controlled blend of both.
Fine-tuning FinBERT on Finnish medical text produces embedding geometry shifts whose correlation with downstream performance the authors attempt to measure as a potential early signal for domain adaptation benefit.
citing papers explorer
-
How Does Attention Help? Insights from Random Matrices on Signal Recovery from Sequence Models
Attention pooling produces a free-multiplicative-convolution bulk spectrum and two phase transitions for signal recovery; optimal weights are the top eigenvector of the positional correlation matrix R.
-
Retrieval-Augmented Personalization with Foundation Models for Wearable Stress Detection
Retrieval from out-of-domain foundation models enables personalization of a lightweight transformer for stress detection, yielding +3.92% accuracy and +4.76% F1 gains on WESAD without user labels.
-
Semantic Gradients Interactions in SSD: A Case Study in Racial Identity and Hate Speech
Interaction SSD extends semantic differential modeling with main, interaction, and conditional gradients to test moderation by group identity, applied to racial differences in hate speech ratings on the UC Berkeley corpus.
-
Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds
Mature small language models share nearly identical 21-emotion geometries across architectures with Spearman correlations 0.74-0.92 despite opposite behavioral profiles, while immature models restructure under RLHF and prior comprehension-generation differences decompose into four distinct layers.
-
Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings
Three Metapath2Vec variants create ingredient embeddings by walking a co-occurrence graph from recipes, a typed chemical compound graph from FlavorDB, or a controlled blend of both.
-
Domain Fine-Tuning FinBERT on Finnish Histopathological Reports: Train-Time Signals and Downstream Correlations
Fine-tuning FinBERT on Finnish medical text produces embedding geometry shifts whose correlation with downstream performance the authors attempt to measure as a potential early signal for domain adaptation benefit.