Persuasion in LLMs works by redirecting a small set of attention heads to copy the target option token instead of reasoning over evidence, via a rank-one routing feature that can be directly edited or removed.
arXiv preprint arXiv:2307.15771 , year=
8 Pith papers cite this work. Polarity classification is still indexing.
years
2026 8representative citing papers
In-context learning binds model outputs to the demonstrated label tokens as an exhaustive vocabulary, overriding semantic plausibility and causing fixation even with homogeneous or nonsense labels.
Tabular foundation models show substantial depthwise redundancy, so a looped single-layer version achieves comparable results with 20% of the original parameters.
Large language models encode relational bindings via a cell-based representation: a low-dimensional linear subspace in which each cell corresponds to an entity-relation index pair and attributes are retrieved from the matching cell.
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
Chronos encodes frequency content in decoder representations with quality that varies across the spectrum, as revealed by minimum description length probes on sinusoid inputs.
citing papers explorer
-
How LLMs Are Persuaded: A Few Attention Heads, Rerouted
Persuasion in LLMs works by redirecting a small set of attention heads to copy the target option token instead of reasoning over evidence, via a rank-one routing feature that can be directly edited or removed.
-
In-Context Fixation: When Demonstrated Labels Override Semantics in Few-Shot Classification
In-context learning binds model outputs to the demonstrated label tokens as an exhaustive vocabulary, overriding semantic plausibility and causing fixation even with homogeneous or nonsense labels.
-
Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models
Tabular foundation models show substantial depthwise redundancy, so a looped single-layer version achieves comparable results with 20% of the original parameters.
-
Cell-Based Representation of Relational Binding in Language Models
Large language models encode relational bindings via a cell-based representation: a low-dimensional linear subspace in which each cell corresponds to an entity-relation index pair and attributes are retrieved from the matching cell.
-
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
-
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
-
Instructions Shape Production of Language, not Processing
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
-
Preliminary Insights in Chronos Frequency Data Understanding and Reconstruction
Chronos encodes frequency content in decoder representations with quality that varies across the spectrum, as revealed by minimum description length probes on sinusoid inputs.