Interpreting key mechanisms of factual recall in transformer-based language models

Ang Lv, Yuhan Chen, Kaiyi Zhang, Yulong Wang, Lifeng Liu, Ji-Rong Wen, Jian Xie, Rui Yan · 2024 · arXiv 2403.19521

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1 other 1

citation-polarity summary

background 1 unclear 1

representative citing papers

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

cs.CL · 2026-06-05 · unverdicted · novelty 6.0

EmbedFilter applies a linear filter derived from the LLM unembedding matrix to suppress high-frequency token influences in text embeddings, yielding improved zero-shot performance and inherent dimensionality reduction.

Revisiting Parameter-Based Knowledge Editing in Large Language Models: Theoretical Limits and Empirical Evidence

cs.CL · 2026-05-30 · conditional · novelty 6.0

Parameter-based knowledge editing in LLMs induces reasoning collapse via dimensional collapse and is consistently outperformed by a retrieval baseline across varied edit counts, knowledge complexity, and evaluation metrics.

Context-Gated Associative Retrieval: From Theory to Transformers

cond-mat.dis-nn · 2026-05-08 · unverdicted · novelty 6.0

Context gating in associative memories boosts inter-memory separation and sparsity for exponential retrieval gains, admits a unique fixed point driven by direct bias and feedback, and matches in-context learning dynamics in transformers like Llama-3.

Do Transformers Use their Depth Adaptively? Evidence from a Relational Reasoning Task

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

Transformers show limited adaptive depth use on relational reasoning, with clearer evidence after finetuning on the task.

Deep sequence models tend to memorize geometrically; it is unclear why

cs.LG · 2025-10-30 · unverdicted · novelty 6.0

Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.

Relational Linear Properties in Language Models: An Empirical Investigation

cs.LG · 2026-05-21 · unverdicted · novelty 5.0 · 2 refs

Introduces KL-divergence probing to test relational linearity and reports its variation across models, layers, and paraphrased queries on four datasets.

Tracing Relational Knowledge Recall in Large Language Models

cs.CL · 2026-04-21 · unverdicted · novelty 5.0

Per-head attention contributions to the residual stream serve as strong linear features for classifying relational knowledge in LLMs, with probe accuracy correlating to relation specificity and signal distribution.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

cs.CL · 2026-01-20 · unverdicted · novelty 5.0

The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

citing papers explorer

Showing 8 of 8 citing papers.

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings cs.CL · 2026-06-05 · unverdicted · none · ref 22
EmbedFilter applies a linear filter derived from the LLM unembedding matrix to suppress high-frequency token influences in text embeddings, yielding improved zero-shot performance and inherent dimensionality reduction.
Revisiting Parameter-Based Knowledge Editing in Large Language Models: Theoretical Limits and Empirical Evidence cs.CL · 2026-05-30 · conditional · none · ref 61
Parameter-based knowledge editing in LLMs induces reasoning collapse via dimensional collapse and is consistently outperformed by a retrieval baseline across varied edit counts, knowledge complexity, and evaluation metrics.
Context-Gated Associative Retrieval: From Theory to Transformers cond-mat.dis-nn · 2026-05-08 · unverdicted · none · ref 21
Context gating in associative memories boosts inter-memory separation and sparsity for exponential retrieval gains, admits a unique fixed point driven by direct bias and feedback, and matches in-context learning dynamics in transformers like Llama-3.
Do Transformers Use their Depth Adaptively? Evidence from a Relational Reasoning Task cs.LG · 2026-04-14 · unverdicted · none · ref 16
Transformers show limited adaptive depth use on relational reasoning, with clearer evidence after finetuning on the task.
Deep sequence models tend to memorize geometrically; it is unclear why cs.LG · 2025-10-30 · unverdicted · none · ref 110
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.
Relational Linear Properties in Language Models: An Empirical Investigation cs.LG · 2026-05-21 · unverdicted · none · ref 8 · 2 links
Introduces KL-divergence probing to test relational linearity and reports its variation across models, layers, and paraphrased queries on four datasets.
Tracing Relational Knowledge Recall in Large Language Models cs.CL · 2026-04-21 · unverdicted · none · ref 15
Per-head attention contributions to the residual stream serve as strong linear features for classifying relational knowledge in LLMs, with probe accuracy correlating to relation specificity and signal distribution.
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models cs.CL · 2026-01-20 · unverdicted · none · ref 201
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

Interpreting key mechanisms of factual recall in transformer-based language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer