All-but-the-top: Simple and effective postprocessing for word representations.arXiv preprint arXiv:1702.01417

Mu, J · 2018 · arXiv 1702.01417

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

How Does Attention Help? Insights from Random Matrices on Signal Recovery from Sequence Models

stat.ML · 2026-05-07 · conditional · novelty 7.0

Attention pooling produces a free-multiplicative-convolution bulk spectrum and two phase transitions for signal recovery; optimal weights are the top eigenvector of the positional correlation matrix R.

Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds

cs.CL · 2026-04-13 · unverdicted · novelty 6.0

Mature small language models share nearly identical 21-emotion geometries across architectures with Spearman correlations 0.74-0.92 despite opposite behavioral profiles, while immature models restructure under RLHF and prior comprehension-generation differences decompose into four distinct layers.

Domain Fine-Tuning FinBERT on Finnish Histopathological Reports: Train-Time Signals and Downstream Correlations

cs.CL · 2026-04-16 · unverdicted · novelty 4.0

Fine-tuning FinBERT on Finnish medical text produces embedding geometry shifts whose correlation with downstream performance the authors attempt to measure as a potential early signal for domain adaptation benefit.

citing papers explorer

Showing 3 of 3 citing papers.

How Does Attention Help? Insights from Random Matrices on Signal Recovery from Sequence Models stat.ML · 2026-05-07 · conditional · none · ref 10
Attention pooling produces a free-multiplicative-convolution bulk spectrum and two phase transitions for signal recovery; optimal weights are the top eigenvector of the positional correlation matrix R.
Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds cs.CL · 2026-04-13 · unverdicted · none · ref 7
Mature small language models share nearly identical 21-emotion geometries across architectures with Spearman correlations 0.74-0.92 despite opposite behavioral profiles, while immature models restructure under RLHF and prior comprehension-generation differences decompose into four distinct layers.
Domain Fine-Tuning FinBERT on Finnish Histopathological Reports: Train-Time Signals and Downstream Correlations cs.CL · 2026-04-16 · unverdicted · none · ref 36
Fine-tuning FinBERT on Finnish medical text produces embedding geometry shifts whose correlation with downstream performance the authors attempt to measure as a potential early signal for domain adaptation benefit.

All-but-the-top: Simple and effective postprocessing for word representations.arXiv preprint arXiv:1702.01417

fields

years

verdicts

representative citing papers

citing papers explorer