hub Canonical reference

Transformer Feed-Forward Layers Are Key-Value Memories

Mor Geva, Roei Schuster, Jonathan Berant, Omer Levy · 2021 · cs.CL · DOI 10.18653/v1/2021.emnlp-main.446 · arXiv 2012.14913

Canonical reference. 86% of citing Pith papers cite this work as background.

65 Pith papers citing it

186 external citations · Crossref

Background 86% of classified citations

open full Pith review browse 65 citing papers arXiv PDF

abstract

Feed-forward layers constitute two-thirds of a transformer model's parameters, yet their role in the network remains under-explored. We show that feed-forward layers in transformer-based language models operate as key-value memories, where each key correlates with textual patterns in the training examples, and each value induces a distribution over the output vocabulary. Our experiments show that the learned patterns are human-interpretable, and that lower layers tend to capture shallow patterns, while upper layers learn more semantic ones. The values complement the keys' input patterns by inducing output distributions that concentrate probability mass on tokens likely to appear immediately after each pattern, particularly in the upper layers. Finally, we demonstrate that the output of a feed-forward layer is a composition of its memories, which is subsequently refined throughout the model's layers via residual connections to produce the final output distribution.

hub tools

JSON dossier citing papers JSON publisher DOI arXiv source

citation-role summary

background 7

citation-polarity summary

background 6 support 1

representative citing papers

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

cs.LG · 2022-11-01 · conditional · novelty 8.0

GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.

Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models

cs.CL · 2026-06-26 · conditional · novelty 7.0

VLMs default to visual grounding but a sparse circuit of 2.5-4.8% attention heads in later layers mediates prior-knowledge overrides, identified causally via patching and ablation across three model families.

Output Vector Editing for Memorization Mitigation in Large Language Models

cs.CL · 2026-06-17 · unverdicted · novelty 7.0

Output vector editing on MLP neurons suppresses memorization in LLMs up to 87.9% on 6831 sequences in OLMo-7B with a 2.7x gap over zero ablation, ensemble covering 96.5%.

TimeROME-DLM: Temporal Causal Tracing and Low-Rank Inference-Time Knowledge Editing for Masked Diffusion Language Models

cs.LG · 2026-06-11 · unverdicted · novelty 7.0

TimeROME-DLM enables training-free knowledge editing in masked diffusion language models via temporal causal tracing and low-rank residual edit memory applied at inference time.

Why Muon Outperforms Adam: A Curvature Perspective

cs.LG · 2026-06-03 · conditional · novelty 7.0

Muon outperforms Adam by reducing curvature penalty via lower Normalized Directional Sharpness, as shown via Taylor approximation on LLM training and proven on stylized quadratic problems with heterogeneous curvature.

EpiFormer: Learning Antigen-Antibody Interactions for Epitope Prediction via Geometric Deep Learning

q-bio.QM · 2026-06-02 · unverdicted · novelty 7.0

EpiFormer improves epitope prediction F1 score by over 40% via early-fusion cross-attention in GNN layers and sparsity-aware objectives, while recovering known biology as emergent behavior.

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

cs.LG · 2026-05-30 · unverdicted · novelty 7.0

Query Lens extends Logit Lens to interpret sparse features via key-value analysis and indirect effects, yielding coherent token signatures where Logit Lens fails, and proposes the Subspace Channel Hypothesis.

Rethinking Visual Neglect: Steering via Context-Preference for MLLM Hallucination Mitigation

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

CAS mitigates object hallucinations in MLLMs by extracting two context preference vectors from designed conflict samples and applying signed residual injection at mid-early MLP layers without retraining or added latency.

ConTact: Contact-First Antibody CDR Design via Explicit Interface Reasoning

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

ConTact introduces a contact-then-act architecture with distance-biased cross-attention and contact-weighted loss for antibody CDR design, reporting 5-6% better backbone RMSD and superior contact metrics on CHIMERA-Bench splits.

Uncovering Entity Identity Confusion in Multimodal Knowledge Editing

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

Multimodal knowledge editing causes models to confuse original and edited entity identities in text queries by failing to update image-entity bindings and instead overfitting entity-entity shortcuts.

Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

stat.ML · 2026-05-06 · unverdicted · novelty 7.0

Winner-take-all linear memory capacity scales as d² ~ n log n due to extreme values; listwise retrieval via Tail-Average Margin yields d² ~ n with exact asymptotic theory.

Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

cs.LG · 2026-05-06 · unverdicted · novelty 7.0 · 2 refs

Transpose-invariant spectral diagnostics on attention operators are orientation-blind, and a φ-G two-axis diagnostic distinguishes hallucination modes with 0.62-0.84 LC-AUROC and predicted polarity reversal.

How Language Models Process Negation

cs.CL · 2026-05-04 · unverdicted · novelty 7.0 · 2 refs

LLMs process negation using both attention-based suppression and constructive representation mechanisms (construction dominant), with late-layer attention shortcuts explaining poor accuracy on negation tasks.

A framework for analyzing concept representations in neural models

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.

A Parametric Memory Head for Continual Generative Retrieval

cs.IR · 2026-04-25 · unverdicted · novelty 7.0

A product-key parametric memory head with selective sparse updates mitigates catastrophic forgetting in generative retrieval models during sequential addition of new documents.

One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

cs.CL · 2026-04-03 · unverdicted · novelty 7.0

Merging fine-tuned models for multilingual translation fails because fine-tuning redistributes language-specific neurons rather than sharpening them, increasing representational divergence in output-generating layers.

Norm Anchors Make Model Edits Last

cs.LG · 2026-01-30 · conditional · novelty 7.0

Norm-Anchor Scaling breaks the norm-feedback loop in sequential LLM editing by anchoring value vectors to original norms, improving long-run performance by 72.2% and extending the editing horizon over 4x.

Improving Dictionary Learning with Gated Sparse Autoencoders

cs.LG · 2024-04-24 · unverdicted · novelty 7.0

Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.

Eliciting Latent Predictions from Transformers with the Tuned Lens

cs.LG · 2023-03-14 · accept · novelty 7.0

Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.

VASAE: Naming SAE Dictionary Directions with Vocabulary-Aligned Anchoring

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

VASAE introduces vocabulary-aligned anchoring to train SAEs that yield features with intrinsic token names, reporting high alignment rates in early layers of GPT-2 and Llama-3.1 without reconstruction loss.

LMs as Task-Specific Knowledge Bases: An Interpretability Analysis

cs.CL · 2026-06-25 · unverdicted · novelty 6.0

LMs store facts in task-specific parameter subsets, shown by inconsistent emergence across tasks during training and distinct localized parameters for the same fact.

Cross-Lingual Exploration for Parametric Knowledge

cs.CL · 2026-06-23 · unverdicted · novelty 6.0

Cross-lingual prompt exploration improves factual recall and consistency in LLMs across 17 languages more efficiently than native-language scaling.

Gated MLPs as Symmetry-Broken Rank-1 Bilinear Attention

cs.LG · 2026-06-20 · unverdicted · novelty 6.0

Gated MLPs are shown to be symmetry-broken rank-1 bilinear attention mechanisms with query and key factors.

Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models

cs.CL · 2026-06-19 · unverdicted · novelty 6.0

LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.

citing papers explorer

Showing 28 of 28 citing papers after filters.

Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models cs.CL · 2026-06-26 · conditional · none · ref 5 · internal anchor
VLMs default to visual grounding but a sparse circuit of 2.5-4.8% attention heads in later layers mediates prior-knowledge overrides, identified causally via patching and ablation across three model families.
Output Vector Editing for Memorization Mitigation in Large Language Models cs.CL · 2026-06-17 · unverdicted · none · ref 11 · internal anchor
Output vector editing on MLP neurons suppresses memorization in LLMs up to 87.9% on 6831 sequences in OLMo-7B with a 2.7x gap over zero ablation, ensemble covering 96.5%.
Rethinking Visual Neglect: Steering via Context-Preference for MLLM Hallucination Mitigation cs.CL · 2026-05-27 · unverdicted · none · ref 10 · internal anchor
CAS mitigates object hallucinations in MLLMs by extracting two context preference vectors from designed conflict samples and applying signed residual injection at mid-early MLP layers without retraining or added latency.
Uncovering Entity Identity Confusion in Multimodal Knowledge Editing cs.CL · 2026-05-07 · unverdicted · none · ref 1 · internal anchor
Multimodal knowledge editing causes models to confuse original and edited entity identities in text queries by failing to update image-entity bindings and instead overfitting entity-entity shortcuts.
How Language Models Process Negation cs.CL · 2026-05-04 · unverdicted · none · ref 10 · 2 links · internal anchor
LLMs process negation using both attention-based suppression and constructive representation mechanisms (construction dominant), with late-layer attention shortcuts explaining poor accuracy on negation tasks.
A framework for analyzing concept representations in neural models cs.CL · 2026-05-02 · unverdicted · none · ref 271 · internal anchor
A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.
One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging cs.CL · 2026-04-03 · unverdicted · none · ref 11 · internal anchor
Merging fine-tuned models for multilingual translation fails because fine-tuning redistributes language-specific neurons rather than sharpening them, increasing representational divergence in output-generating layers.
VASAE: Naming SAE Dictionary Directions with Vocabulary-Aligned Anchoring cs.CL · 2026-06-26 · unverdicted · none · ref 7 · internal anchor
VASAE introduces vocabulary-aligned anchoring to train SAEs that yield features with intrinsic token names, reporting high alignment rates in early layers of GPT-2 and Llama-3.1 without reconstruction loss.
LMs as Task-Specific Knowledge Bases: An Interpretability Analysis cs.CL · 2026-06-25 · unverdicted · none · ref 27 · internal anchor
LMs store facts in task-specific parameter subsets, shown by inconsistent emergence across tasks during training and distinct localized parameters for the same fact.
Cross-Lingual Exploration for Parametric Knowledge cs.CL · 2026-06-23 · unverdicted · none · ref 43 · internal anchor
Cross-lingual prompt exploration improves factual recall and consistency in LLMs across 17 languages more efficiently than native-language scaling.
Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models cs.CL · 2026-06-19 · unverdicted · none · ref 84 · internal anchor
LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.
Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation cs.CL · 2026-06-16 · unverdicted · none · ref 95 · internal anchor
Activation steering on early layers improves diversity of synthetic data for low-resource languages and often boosts downstream classifier performance compared to non-steered prompting.
Variable-Width Transformers cs.CL · 2026-06-16 · conditional · none · ref 10 · internal anchor
×-shaped variable-width transformers outperform parameter-matched uniform baselines on language modeling loss with 22% fewer FLOPs and 15% smaller KV cache.
Substrate Asymmetry in User-Side Memory: A Diagnostic Framework cs.CL · 2026-06-10 · unverdicted · none · ref 46 · internal anchor
User memory in LLMs factors into three orthogonal axes where parametric adapters and retrieval show opposite strengths, with causal evidence from attention interventions and an alignment tax on RLHF models.
Inside the LLM Word Factory cs.CL · 2026-06-07 · unverdicted · none · ref 11 · internal anchor
Activation patching localizes English detokenization in Llama2-7B to a two-stage attention-then-MLP process at layer 1 that generalizes to 12 models from 8 families, with depth varying by positional encoding, plus an early-layer probe achieving 0.94-0.97 AUROC.
Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language Models cs.CL · 2026-06-02 · unverdicted · none · ref 11 · internal anchor
Expert-aware causal tracing localizes factual recall to specific experts in some MoE models but requires coalitions in others, using CounterFact interventions on subject embeddings.
Resonant Context Anchoring: Decoupling Attention Routing and Signal Gain at Inference Time cs.CL · 2026-06-01 · unverdicted · none · ref 51 · internal anchor
RCA is a training-free module that boosts input context signal strength in the residual stream of LLMs by orthogonal decoupling of attention routing from value magnitude.
Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments cs.CL · 2026-05-05 · unverdicted · none · ref 14 · internal anchor
LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.
From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization cs.CL · 2026-04-21 · unverdicted · none · ref 19 · internal anchor
LLM 2-bit quantization fails via either cumulative signal degradation or early computation collapse in key components.
Representation-Guided Parameter-Efficient LLM Unlearning cs.CL · 2026-04-19 · unverdicted · none · ref 183 · internal anchor
REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.
The Illusion of Latent Generalization: Bi-directionality and the Reversal Curse cs.CL · 2026-03-13 · unverdicted · none · ref 3 · internal anchor
Bidirectional objectives mitigate reversal by requiring explicit source-as-target signals and storing directions as distinct representations instead of inducing latent generalization.
AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM cs.CL · 2025-10-20 · unverdicted · none · ref 9 · internal anchor
AtlasKV integrates billion-scale KGs into LLMs parametrically with sub-linear complexity and low memory by converting triples into key-value representations handled by the model's attention.
How Training Data Shapes the Use of Parametric and In-Context Knowledge in Language Models cs.CL · 2025-09-29 · unverdicted · none · ref 3 · internal anchor
Balanced parametric and in-context knowledge use in LLMs is an emergent property requiring intra-document repetition, moderate inconsistency, and skewed distributions in training data.
Rethinking LoRA Memory Through the Lens of KV Cache Compression cs.CL · 2026-06-04 · unverdicted · none · ref 45 · internal anchor
Document LoRA acts as decoding-time parametric memory that recovers 13-21 ROUGE-L points under heavy KV cache compression in QA, performing best when the base model encodes the document and the adapter is used only at generation with QA supervision.
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation cs.CL · 2026-05-12 · unverdicted · none · ref 8 · internal anchor
On-policy distillation gains efficiency from early foresight in module allocation and update directions, which the proposed EffOPD method exploits for 3x faster training with comparable performance.
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models cs.CL · 2026-01-20 · unverdicted · none · ref 83 · internal anchor
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.
KnowledgeDebugger -- an Exploration Tool for Knowledge Localization and Editing in Transformers cs.CL · 2026-07-01 · unverdicted · none · ref 3 · internal anchor
The paper introduces KnowledgeDebugger, a GUI-based tool providing no-code access to EasyEdit methods for knowledge localization and editing in Transformers, demonstrated via case studies.
Multilingual Vision-Language Models, A Survey cs.CL · 2025-09-26 · accept · none · ref 55 · internal anchor
The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.

Transformer Feed-Forward Layers Are Key-Value Memories

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer