hub Canonical reference

How Contextual are Contextualized Word Representations? C omparing the Geometry of BERT , ELM o, and GPT -2 Embeddings

Ethayarajh, Kawin · 2019 · DOI 10.18653/v1/d19-1006

Canonical reference. 78% of citing Pith papers cite this work as background.

29 Pith papers citing it

Background 78% of classified citations

open at publisher browse 29 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 9

citation-polarity summary

background 7 support 2

representative citing papers

SimCSE: Simple Contrastive Learning of Sentence Embeddings

cs.CL · 2021-04-18 · conditional · novelty 8.0

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.

Closing the Calibration Gap in Semantic Caching

cs.IR · 2026-06-18 · unverdicted · novelty 7.0

Introduces P-CHR AUC and CRR metrics to demonstrate that semantic caching model selection is limited by calibration quality rather than ranking performance.

Continuous Language Diffusion as a Decoder-Interface Problem

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.

When Correct Decisions Hide Internal Stress: Decision-State Probing in Multimodal Language Models

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

S³E framework finds excess decision-state displacement under semantic stress in multimodal models despite consistent correct forced-choice behavior.

Accurate and Efficient Statistical Testing for Word Semantic Breadth

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

A new permutation test uses Householder reflection to align word embedding clouds before testing dispersion differences, cutting Type-I error by 32.5% and speeding up 23x on GPU.

Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL

cs.CL · 2026-04-22 · unverdicted · novelty 7.0

Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.

On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability

cs.IR · 2026-04-17 · unverdicted · novelty 7.0

LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,

VASAE: Naming SAE Dictionary Directions with Vocabulary-Aligned Anchoring

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

VASAE introduces vocabulary-aligned anchoring to train SAEs that yield features with intrinsic token names, reporting high alignment rates in early layers of GPT-2 and Llama-3.1 without reconstruction loss.

LMs as Task-Specific Knowledge Bases: An Interpretability Analysis

cs.CL · 2026-06-25 · unverdicted · novelty 6.0

LMs store facts in task-specific parameter subsets, shown by inconsistent emergence across tasks during training and distinct localized parameters for the same fact.

RSRank: Learning Relevance from Representational Shifts

cs.IR · 2026-06-16 · unverdicted · novelty 6.0

RSRank learns calibrated relevance scores from alignment between representational shifts induced by candidate documents and those from oracle document sets, enabling zero-threshold filtering.

Inside the LLM Word Factory

cs.CL · 2026-06-07 · unverdicted · novelty 6.0

Activation patching localizes English detokenization in Llama2-7B to a two-stage attention-then-MLP process at layer 1 that generalizes to 12 models from 8 families, with depth varying by positional encoding, plus an early-layer probe achieving 0.94-0.97 AUROC.

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

cs.CL · 2026-06-05 · unverdicted · novelty 6.0

EmbedFilter applies a linear filter derived from the LLM unembedding matrix to suppress high-frequency token influences in text embeddings, yielding improved zero-shot performance and inherent dimensionality reduction.

Multilingual Unlearning in LLMs: Transfer, Dynamics, and Reversibility

cs.CL · 2026-06-02 · unverdicted · novelty 6.0

Unlearning in multilingual LLMs suppresses rather than erases knowledge in later layers, with transfer varying by language similarity and reversible via inference-time steering.

How Many Different Outputs Can a Transformer Generate?

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.

Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization

stat.ML · 2026-05-07 · unverdicted · novelty 6.0

Spectral analysis of activations and gradients provides new diagnostics that link batch size to representation geometry, early covariance tails to token efficiency, and spectral shifts to learning dynamics in decoder-only LLMs, backed by a mechanistic model.

Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

Nonlinear polynomial models fit local paraphrase embedding clouds more accurately than linear ones and support geometrically consistent synthetic point generation, yet this geometric fidelity does not improve classification performance.

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.

Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

cs.CL · 2026-04-08 · unverdicted · novelty 6.0

LLM reasoning refines unsupervised text clusters via coherence checks, redundancy removal, and label grounding, yielding better coherence and human-aligned labels on social media data.

Template-assisted Contrastive Learning of Task-oriented Dialogue Sentence Embeddings

cs.CL · 2023-05-23 · unverdicted · novelty 6.0

TaDSE learns dialogue sentence embeddings via template-guided self-supervised contrastive learning plus synthetic slot-filling augmentation and reports gains on five downstream benchmarks.

Inductive Entity Representations from Text via Link Prediction

cs.CL · 2020-10-07 · unverdicted · novelty 6.0

Entity representations learned from text via link prediction generalize to unseen entities and transfer to classification and retrieval with reported gains of 22% MRR, 16% accuracy, and 8.8% NDCG@10.

SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching

cs.LG · 2026-05-29 · unverdicted · novelty 5.0

SemStruct models tables as heterogeneous graphs with GNNs on frozen PLM embeddings to incorporate row co-occurrences for schema matching and reports SOTA results on Valentine and SOTAB-SM benchmarks.

Analyzing the Effect of Noise in LLM Fine-tuning

cs.LG · 2026-04-14 · unverdicted · novelty 5.0

Label noise hurts fine-tuning performance most while grammatical and typographical noise sometimes act as mild regularizers, with changes concentrated in task-specific layers.

Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction

cs.CL · 2026-04-12 · unverdicted · novelty 5.0

BERT embeddings encode narrative dimensions of time, space, causality, and character at the token level, as a linear probe achieves 94% accuracy versus 47% on variance-matched random embeddings, though unsupervised clusters do not align with these categories.

Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models

cs.CL · 2025-06-02 · unverdicted · novelty 5.0

Inflectional features stay linearly decodable across all layers while lexical identity weakens with depth in modern transformers.

citing papers explorer

Showing 8 of 8 citing papers after filters.

SimCSE: Simple Contrastive Learning of Sentence Embeddings cs.CL · 2021-04-18 · conditional · none · ref 92
SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.
Accurate and Efficient Statistical Testing for Word Semantic Breadth cs.CL · 2026-05-08 · unverdicted · none · ref 35
A new permutation test uses Householder reflection to align word embedding clouds before testing dispersion differences, cutting Type-I error by 32.5% and speeding up 23x on GPU.
Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL cs.CL · 2026-04-22 · unverdicted · none · ref 29
Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability cs.IR · 2026-04-17 · unverdicted · none · ref 17
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization stat.ML · 2026-05-07 · unverdicted · none · ref 37
Spectral analysis of activations and gradients provides new diagnostics that link batch size to representation geometry, early covariance tails to token efficiency, and spectral shifts to learning dynamics in decoder-only LLMs, backed by a mechanistic model.
Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing cs.CL · 2026-05-01 · unverdicted · none · ref 17
Nonlinear polynomial models fit local paraphrase embedding clouds more accurately than linear ones and support geometrically consistent synthetic point generation, yet this geometric fidelity does not improve classification performance.
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus cs.CL · 2026-05-01 · unverdicted · none · ref 22
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
LLM4Log: A Systematic Review of Large Language Model-based Log Analysis cs.SE · 2026-03-18 · unverdicted · none · ref 32 · 2 links
Systematic review of 145 papers on LLM-based log analysis, providing a unified taxonomy, common design patterns, evaluation practices, and challenges for deployment under drift and limited labels.

How Contextual are Contextualized Word Representations? C omparing the Geometry of BERT , ELM o, and GPT -2 Embeddings

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer