arxiv: 2209.11895 · v1 · submitted 2022-09-24 · 💻 cs.LG

Recognition: no theorem link

In-context Learning and Induction Heads

Catherine Olsson , Nelson Elhage , Neel Nanda , Nicholas Joseph , Nova DasSarma , Tom Henighan , Ben Mann , Amanda Askell

show 18 more authors

Yuntao Bai Anna Chen Tom Conerly Dawn Drain Deep Ganguli Zac Hatfield-Dodds Danny Hernandez Scott Johnston Andy Jones Jackson Kernion Liane Lovitt Kamal Ndousse Dario Amodei Tom Brown Jack Clark Jared Kaplan Sam McCandlish Chris Olah

Authors on Pith no claims yet

Pith reviewed 2026-05-11 03:44 UTC · model grok-4.3

classification 💻 cs.LG

keywords induction headsin-context learningtransformer modelsattention mechanismstraining dynamicssequence copying

0 comments

The pith

Induction heads implement the core copying algorithm behind in-context learning in transformers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that induction heads are the main mechanism driving in-context learning, the steady drop in loss on later tokens within a sequence. These heads detect a repeated token and copy the token that followed it last time, completing patterns like [A][B]...[A] to [B]. The authors show that these heads appear at the exact training step where a sharp bump in in-context performance occurs. They give causal evidence in small attention-only models by editing the heads and correlational evidence in larger models. If the claim holds, it would mean that a simple, local copying rule explains most of the rapid adaptation transformers show during inference.

Core claim

Induction heads are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. The authors present six lines of evidence that these heads constitute the mechanism for the majority of all in-context learning in large transformer models, developing precisely when a sudden sharp increase in in-context learning ability occurs during training.

What carries the argument

Induction heads, attention heads that detect a prior token match and copy the subsequent token from that earlier occurrence.

If this is right

Induction heads emerge at the same moment training loss shows a sharp improvement on later tokens.
In small attention-only models, directly ablating induction heads reduces in-context learning performance.
The timing correlation between head formation and performance gains holds across model sizes.
The mechanism appears general enough to explain in-context learning in transformers of any scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If induction heads are the primary driver, then interventions that speed their formation could shorten the training needed for strong few-shot behavior.
The copying rule might also explain why transformers handle many different in-context tasks without task-specific fine-tuning.
Checking whether non-attention architectures develop analogous copying circuits would test how specific this mechanism is to transformers.

Load-bearing premise

The formation of induction heads directly causes the observed jump in in-context learning rather than both changes arising together from some other training dynamic.

What would settle it

Train a transformer in which induction heads never form yet a sharp increase in in-context learning still appears at the same training step.

read the original abstract

"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper hypothesizes that 'induction heads' (attention heads implementing a simple [A][B]...[A] -> [B] completion algorithm) are the primary mechanistic source of in-context learning in transformers, defined as the decrease in loss at increasing token indices. It reports that these heads emerge at the same training point as a sharp loss bump signaling increased in-context ability, presenting six lines of evidence: strong causal interventions (ablations/patching) for small attention-only models and correlational/timing-based evidence for larger models containing MLPs.

Significance. If the causal link holds, the work would supply a concrete mechanistic account of in-context learning, a core capability of large language models. The strong, reproducible causal interventions in small attention-only models constitute a clear strength, as do the multiple complementary observational measures (timing correlations, head activation patterns) that could guide future targeted experiments. The paper thereby advances mechanistic interpretability by linking a specific circuit to a broad behavioral phenomenon.

major comments (2)

Abstract: The claim that induction heads 'might constitute the mechanism for the majority of all in-context learning' in large transformer models rests on correlational evidence only; the text states that the six lines of evidence for models with MLPs are 'preliminary and indirect' and 'correlational,' with no ablation, patching, or causal intervention results reported to show that disabling induction heads specifically impairs the observed in-context loss reduction.
Description of the six lines of evidence (larger models): These lines rely on coincidence of induction-head emergence with the training loss bump and on observational metrics such as head activation timing; they do not include controls that would distinguish whether both phenomena are parallel downstream effects of an earlier training dynamic (e.g., a phase transition in optimization or representation geometry), leaving the causal inference untested for models containing MLPs.

minor comments (1)

Abstract: Quantitative details on the magnitude of the loss bump, the fraction of heads identified as induction heads, and any error controls or statistical tests for the six lines of evidence would improve clarity and allow readers to assess the strength of the correlational results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the value of the causal interventions in small models as well as the potential of the observational measures to guide future work. We agree that the distinction between causal and correlational evidence must be drawn more sharply in the abstract and discussion, and we will revise the manuscript to address both major comments.

read point-by-point responses

Referee: Abstract: The claim that induction heads 'might constitute the mechanism for the majority of all in-context learning' in large transformer models rests on correlational evidence only; the text states that the six lines of evidence for models with MLPs are 'preliminary and indirect' and 'correlational,' with no ablation, patching, or causal intervention results reported to show that disabling induction heads specifically impairs the observed in-context loss reduction.

Authors: We accept the point. While the body of the paper already describes the evidence for models with MLPs as preliminary, indirect, and correlational, the abstract phrasing risks implying stronger support than exists. We will revise the abstract to state explicitly that the hypothesis for large models rests on correlational evidence from the six lines, without causal interventions such as ablation or patching, and to moderate the language concerning induction heads as the mechanism for the majority of in-context learning. revision: yes
Referee: Description of the six lines of evidence (larger models): These lines rely on coincidence of induction-head emergence with the training loss bump and on observational metrics such as head activation timing; they do not include controls that would distinguish whether both phenomena are parallel downstream effects of an earlier training dynamic (e.g., a phase transition in optimization or representation geometry), leaving the causal inference untested for models containing MLPs.

Authors: The referee correctly notes that the six lines are observational and lack controls that could rule out alternative accounts in which induction-head emergence and the loss bump are both downstream of an earlier training dynamic. We do not claim to have performed such controls. In revision we will add an explicit limitations paragraph in the discussion that acknowledges this gap, lists possible alternative explanations (including phase transitions in optimization or representation geometry), and clarifies that the lines of evidence are intended to be suggestive and to motivate targeted causal experiments rather than to demonstrate causality. revision: yes

Circularity Check

0 steps flagged

No significant circularity; hypothesis rests on timing correlations and interventions rather than definitional reduction

full rationale

The paper defines induction heads via their observable attention pattern on token sequences and presents empirical evidence (simultaneous emergence with loss bump, six lines of correlational evidence for large models, and causal ablations for small attention-only models) that they contribute to in-context learning. No step reduces a claimed prediction or result to a fitted parameter or self-citation by construction; the central hypothesis is explicitly labeled preliminary and indirect, with the link to decreasing loss at later token indices argued via external observations rather than tautological redefinition. The derivation chain is self-contained against the provided benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on the operational definition of induction heads as pattern-completion circuits and on the assumption that their sudden appearance during training is the direct cause of improved in-context performance rather than a correlated side effect.

axioms (2)

domain assumption Induction heads implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]
This is the paper's working definition of the heads whose causal role is being tested.
domain assumption A sharp increase in in-context learning ability is visible as a bump in the training loss curve
The timing alignment between this bump and the emergence of induction heads is treated as a key signature.

pith-pipeline@v0.9.0 · 5519 in / 1325 out tokens · 45996 ms · 2026-05-11T03:44:13.815348+00:00 · methodology

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

WriteSAE: Sparse Autoencoders for Recurrent State
cs.LG 2026-05 unverdicted novelty 8.0

WriteSAE is the first sparse autoencoder that factors decoder atoms into the native d_k x d_v cache write shape of recurrent models and supplies a closed-form per-token logit shift for atom substitution.
WriteSAE: Sparse Autoencoders for Recurrent State
cs.LG 2026-05 unverdicted novelty 8.0

WriteSAE decomposes recurrent model cache writes into substitutable atoms with a closed-form logit shift, achieving high substitution success and targeted behavioral installs on models like Qwen3.5 and Mamba-2.
Slot Machines: How LLMs Keep Track of Multiple Entities
cs.CL 2026-04 unverdicted novelty 8.0

LLM activations encode current and prior entities in orthogonal slots, but models only use the current slot for explicit factual retrieval despite prior-slot information being linearly decodable.
Layerwise Dynamics for In-Context Classification in Transformers
cs.LG 2026-04 unverdicted novelty 8.0

Enforcing feature- and label-permutation equivariance in transformers for in-context classification yields an identifiable emergent update rule driven by mixed feature-label Gram matrices that amplifies class separation.
The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry
cs.LG 2026-04 unverdicted novelty 8.0

Transformer weight spectra exhibit transient compression waves that propagate layer-wise, persistent non-monotonic depth gradients in power-law exponents, and Q/K-V asymmetry, with the spectral exponent alpha predicti...
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
cs.AI 2024-08 unverdicted novelty 8.0

The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.
KAN: Kolmogorov-Arnold Networks
cs.LG 2024-04 conditional novelty 8.0

KANs with learnable univariate spline activations on edges achieve better accuracy than MLPs with fewer parameters, faster scaling, and direct visualization for scientific discovery.
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
cs.LG 2022-11 conditional novelty 8.0

GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.
Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers
cs.CV 2026-05 unverdicted novelty 7.0

Text embeddings in MM-DiTs contain a detectable omission signal for missing concepts, and amplifying it via OSI reduces concept omission in generated images on FLUX.1-Dev and SD3.5-Medium.
Assessing the Creativity of Large Language Models: Testing, Limits, and New Frontiers
cs.AI 2026-05 conditional novelty 7.0

The Divergent Remote Association Test (DRAT) is the first creativity test that significantly predicts LLMs' scientific ideation ability, unlike prior tests such as DAT or RAT.
Learning Less Is More: Premature Upper-Layer Attention Specialization Hurts Language Model Pretraining
cs.CL 2026-05 unverdicted novelty 7.0

Temporarily reducing the learning rate on upper-layer query and key projections during early GPT pretraining prevents premature attention specialization and improves model performance.
Self-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetition
cs.LG 2026-05 unverdicted novelty 7.0

Self-attention acts as a covariance readout that unifies in-context learning via population gradient descent and repetitive generation via asymptotic Markov behavior.
From Mechanistic to Compositional Interpretability
cs.LG 2026-05 unverdicted novelty 7.0

Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaran...
Understanding Performance Collapse in Layer-Pruned Large Language Models via Decision Representation Transitions
cs.CL 2026-05 unverdicted novelty 7.0

Performance collapse in layer-pruned LLMs stems from disrupting the Silent Phase of decision-making, which blocks the transition to correct predictions, while the later Decisive Phase is robust to pruning.
Elicitation Matters: How Prompts and Query Protocols Shape LLM Surrogates under Sparse Observations
cs.CL 2026-05 unverdicted novelty 7.0

LLM surrogate beliefs under sparse observations depend on prompts and query protocols, with structural prompts as priors, pointwise vs joint querying producing different beliefs, and sequential evidence causing non-mo...
Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers
cs.LG 2026-05 unverdicted novelty 7.0

In a controlled synthetic setting, transformers implement in-distribution task inference via convex combinations of task vectors and out-of-distribution inference via nearly orthogonal extrapolative representations.
How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models
cs.LG 2026-04 unverdicted novelty 7.0

A fitted iso-depth scaling law measures that one recurrence in looped transformers is worth r^0.46 unique blocks in validation loss.
Cell-Based Representation of Relational Binding in Language Models
cs.CL 2026-04 unverdicted novelty 7.0

Large language models encode relational bindings via a cell-based representation: a low-dimensional linear subspace in which each cell corresponds to an entity-relation index pair and attributes are retrieved from the...
Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs
cs.CR 2026-04 unverdicted novelty 7.0

A Merkle-committed SAE feature-trace protocol detects model substitutions in hosted LLMs at a stable threshold where parallel-probe baselines fail, including against adaptive LoRA attackers.
HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads
cs.IR 2026-04 unverdicted novelty 7.0

HeadRank improves decoding-free passage reranking by preference-aligning attention heads to increase discriminability in middle-context documents, outperforming baselines on 14 benchmarks with only 211 training queries.
Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs
cs.AI 2026-04 accept novelty 7.0

The paper delivers the first survey of abductive reasoning in LLMs, a unified two-stage taxonomy, a compact benchmark, and an analysis of gaps relative to deductive and inductive reasoning.
Screening Is Enough
cs.LG 2026-04 unverdicted novelty 7.0

Multiscreen replaces softmax attention with screening to provide absolute query-key relevance, resulting in models with 30% fewer parameters that maintain stable performance at long contexts.
Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory
cs.LG 2026-03 unverdicted novelty 7.0

Muon achieves higher storage capacity than SGD and matches Newton's method in one-step recovery rates for associative memory under power-law distributions, while saturating at larger critical batch sizes and showing f...
Jamba: A Hybrid Transformer-Mamba Language Model
cs.CL 2024-03 conditional novelty 7.0

Jamba presents a hybrid Transformer-Mamba MoE architecture for LLMs that delivers state-of-the-art benchmark performance and strong results up to 256K token contexts while fitting in one 80GB GPU with high throughput.
Steering Language Models With Activation Engineering
cs.CL 2023-08 unverdicted novelty 7.0

Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.
Dynamics of the Transformer Residual Stream: Coupling Spectral Geometry to Network Topology
cs.LG 2026-05 unverdicted novelty 6.0

Training installs a depth-dependent spectral gradient and low-rank bottleneck in LLM residual streams whose amplification or suppression of graph communities is predicted by local operator type.
Fusion-fission forecasts when AI will shift to undesirable behavior
cs.AI 2026-05 unverdicted novelty 6.0

A vector generalization of fusion-fission group dynamics from physics forecasts when AI behavior shifts to undesirable states, validated at 90 percent across seven models and prior to real-world data.
When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction
cs.AI 2026-05 unverdicted novelty 6.0

Attention to goal tokens declines in multi-turn LLM interactions while residual representations often retain decodable goal information, and the gap between these predicts whether goal-conditioned behavior survives.
Toward Stable Value Alignment: Introducing Independent Modules for Consistent Value Guidance
cs.AI 2026-05 unverdicted novelty 6.0

SVGT adds independent value modules and Bridge Tokens to LLMs to maintain consistent value guidance, cutting harmful outputs by over 70% in tests while preserving fluency.
Instructions Shape Production of Language, not Processing
cs.CL 2026-05 unverdicted novelty 6.0

Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
Interpretability Can Be Actionable
cs.LG 2026-05 conditional novelty 6.0

Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
Architecture, Not Scale: Circuit Localization in Large Language Models
cs.CL 2026-05 unverdicted novelty 6.0

Grouped query attention produces more concentrated and stable circuits than multi-head attention across tasks and scales in Pythia and Qwen2.5 models, with a phase transition in factual recall circuits.
The Propagation Field: A Geometric Substrate Theory of Deep Learning
cs.LG 2026-05 unverdicted novelty 6.0

Neural networks possess a propagation field of trajectories and Jacobians whose quality can be measured and optimized independently of endpoint loss, yielding better unseen-path generalization and reduced forgetting i...
Belief or Circuitry? Causal Evidence for In-Context Graph Learning
cs.AI 2026-05 conditional novelty 6.0

Causal evidence from representation analysis and interventions shows LLMs use both genuine structure inference and induction circuits in parallel for in-context graph learning.
Priming: Hybrid State Space Models From Pre-trained Transformers
cs.LG 2026-05 unverdicted novelty 6.0

Priming transfers knowledge from pre-trained Transformers to hybrid SSM-attention models, recovering performance with minimal additional tokens and showing Gated KalmaNet outperforming Mamba-2 on long-context reasonin...
Large Vision-Language Models Get Lost in Attention
cs.AI 2026-05 unverdicted novelty 6.0

In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.
Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize
cs.LG 2026-05 unverdicted novelty 6.0

Transformers show a sharp, task-specific critical window for weight decay application that determines reasoning versus memorization, with middle placement optimal and boundaries as narrow as 100 steps.
What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
cs.AI 2026-05 unverdicted novelty 6.0

In LLM agents, memory routing circuits emerge at 0.6B scale while content circuits appear only at 4B, and write/read operations recruit a pre-existing late-layer context hub instead of creating a new one, enabling a 7...
What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
cs.AI 2026-05 unverdicted novelty 6.0

Circuit analysis reveals that routing circuits for agent memory emerge at 0.6B parameters while content circuits emerge at 4B, with a shared grounding hub and an unsupervised diagnostic achieving 76.2% accuracy for lo...
Finite-Size Gradient Transport in Large Language Model Pretraining: From Cascade Size to Intensive Transport Efficiency
cs.LG 2026-05 unverdicted novelty 6.0

A gradient-transport framework with observables D, z, β, δ, v_rel applied to Pico-LM and Pythia datasets shows distinct scaling regimes in duration and efficiency while sharing a near-unity cascade-size backbone.
Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection
cs.CR 2026-05 unverdicted novelty 6.0

Refusal in LLMs leaves a detectable upstream trajectory that SALO exploits to raise jailbreak detection from near zero to over 90 percent even under forced-decoding attacks.
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models
cs.CL 2026-05 unverdicted novelty 6.0

LLM accuracy on controlled procedural arithmetic drops from 61% at 5 steps to 20% at 95 steps, with failures including skipped steps, premature answers, and hallucinated operations.
Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing
cs.CR 2026-04 unverdicted novelty 6.0

TIGS detects backdoor-induced attention collapse in LLMs and applies content-aware tail-risk screening plus intrinsic geometric smoothing to suppress attacks while preserving normal performance.
BVI-Mamba: Video Enhancement Using a Visual State-Space Model for Low-Light and Underwater Environments
cs.CV 2026-04 unverdicted novelty 6.0

BVI-Mamba enhances low-light and underwater videos by combining feature alignment with a UNet architecture built from Visual State Space blocks, claiming better quality and efficiency than prior Transformer or convolu...
Omission Constraints Decay While Commission Constraints Persist in Long-Context LLM Agents
cs.CR 2026-04 unverdicted novelty 6.0

Omission constraints in LLM agents decay with conversation length while commission constraints remain stable, creating an invisible security failure.
The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference
cs.LG 2026-04 unverdicted novelty 6.0

FP16 KV caching in transformers causes deterministic token divergence versus cache-free inference due to non-associative floating-point accumulation orderings.
Weight Patching: Toward Source-Level Mechanistic Localization in LLMs
cs.AI 2026-04 unverdicted novelty 6.0

Weight Patching localizes capabilities to specific parameter modules in LLMs by replacing weights from a behavior-specialized model into a base model and validating recovery via a vector-anchor interface, revealing a ...
Parcae: Scaling Laws For Stable Looped Language Models
cs.LG 2026-04 unverdicted novelty 6.0

Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth...
LoopGuard: Breaking Self-Reinforcing Attention Loops via Dynamic KV Cache Intervention
cs.AI 2026-04 unverdicted novelty 6.0

LoopGuard detects attention collapse loops during LLM decoding and prunes repetitive KV cache tail spans under fixed budget, cutting loop incidence by over 90 percentage points on the new LoopBench benchmark.
Transformer See, Transformer Do: Copying as an Intermediate Step in Learning Analogical Reasoning
cs.LG 2026-04 unverdicted novelty 6.0

Including copying tasks in training enables transformers to learn letter-string analogies, improving generalization to new alphabets with a 3-layer model outperforming some frontier models.
The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning
cs.LG 2026-04 unverdicted novelty 6.0

LLMs discover latent planning strategies up to five steps during training and execute them up to eight steps at test time, with larger models reaching seven under few-shot prompting, revealing a dissociation between d...
In-Place Test-Time Training
cs.LG 2026-04 conditional novelty 6.0

In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.
Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement
cs.LG 2026-04 unverdicted novelty 6.0

MTP induces representational contractivity for coherent world models in LLMs but causes illegal latent shortcuts; LSE-MTP anchors to true trajectories to reduce hallucinations and improve consistency.
Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents
cs.AI 2026-04 conditional novelty 6.0

Persistent memory is necessary and sufficient for LLM poker agents to reach ToM levels 3-5 and use strategic deception, while agents without memory stay at level 0.
Automated Attention Pattern Discovery at Scale in Large Language Models
cs.LG 2026-04 unverdicted novelty 6.0

AP-MAE reconstructs masked attention patterns in LLMs with high accuracy, generalizes across models, predicts generation correctness at 55-70%, and enables 13.6% accuracy gains via targeted interventions.
SnapKV: LLM Knows What You are Looking for Before Generation
cs.CL 2024-04 conditional novelty 6.0

SnapKV selects clustered important KV positions per attention head from an observation window at the prompt end, yielding 3.6x faster generation and 8.2x better memory efficiency on 16K-token inputs with comparable pe...
Instructions Shape Production of Language, not Processing
cs.CL 2026-05 unverdicted novelty 5.0

Instructions primarily shape the production stage of language models rather than the processing stage, with task-specific information and causal effects stronger in output tokens than input tokens.
HyperLens: Quantifying Cognitive Effort in LLMs with Fine-grained Confidence Trajectory
cs.AI 2026-05 unverdicted novelty 5.0

HyperLens reveals that deeper transformer layers magnify small confidence changes into fine-grained trajectories, allowing quantification of cognitive effort where complex tasks demand more and standard SFT can reduce it.
Negative Before Positive: Asymmetric Valence Processing in Large Language Models
cs.CL 2026-05 unverdicted novelty 5.0

Negative valence localizes to early layers and positive valence to mid-to-late layers in LLMs, with the directions being causally steerable.
When Context Sticks: Studying Interference in In-Context Learning
cs.LG 2026-04 unverdicted novelty 5.0

In-context learning shows persistent interference from prior examples, with more misleading linear examples degrading quadratic predictions and training curricula modulating recovery speed.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 62 Pith papers · 7 internal anchors

[1]

Language Models are Few-Shot Learners

arXiv preprint arXiv:2005.14165. . LaMDA: our breakthrough conversation technology[link] Collins, E. and Ghahramani, Z.,

work page internal anchor Pith review Pith/arXiv arXiv 2005
[2]

Evaluating Large Language Models Trained on Code

arXiv preprint arXiv:2107.03374. . Towards a human-like open-domain chatbot Adiwardana, D., Luong, M., So, D.R., Hall, J., Fiedel, N., Thoppilan, R., Yang, Z., Kulshreshtha, A., Nemade, G., Lu, Y. and others,,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

arXiv preprint arXiv:2001.09977. . Scaling Language Models: Methods, Analysis & Insights from Training Gopher[PDF] Rae, J.W., Borgeaud, S., Cai, T., Millican, K., Hoffmann, J., Song, F., Aslanides, J., Henderson, S., Ring, R., Young, S., Rutherford, E., Hennigan, T., Menick, J., Cassirer, A., Powell, R., Driessche, G.v.d., Hendricks, L.A., Rauh, M., Huang...

work page arXiv 2001
[4]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

arXiv preprint arXiv:2201.02177. . The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models[link] Pan, A., Bhatia, K. and Steinhardt, J.,

work page internal anchor Pith review arXiv
[5]

Scaling Laws for Neural Language Models

arXiv preprint arXiv:2001.08361. . Risks from Learned Optimization in Advanced Machine Learning Systems Hubinger, E., Merwijk, C.v., Mikulik, V., Skalse, J. and Garrabrant, S.,

work page internal anchor Pith review Pith/arXiv arXiv 2001
[6]

start of sequence

arXiv preprint arXiv:2012.15832. . It wouldʼve been better if we had used a start of sequence token for the copying head evaluator as well, but we omitted it by mistake. Without the “start of sequence” token, some heads that were doing prefix matching on real data would get anomalously low scores on our test sequences[ ↩ ] ,,, ,, pp . A General Language A...

work page arXiv 2012
[7]

A General Language Assistant as a Laboratory for Alignment

arXiv preprint arXiv:2112.00861. . Common Crawl[link] Foundation, T.C.C.. . The Pile: An 800GB Dataset of Diverse Text for Language Modeling Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., Nabeshima, N., Presser, S. and Leahy, C.,

work page internal anchor Pith review arXiv
[8]

A multiscale visualization of attention in the transformer model

arXiv preprint arXiv:1906.05714. . Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned[PDF] Voita, E., Talbot, D., Moiseev, F., Sennrich, R. and Titov, I.,

work page arXiv 1906
[9]

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

arXiv preprint arXiv:1905.09418. . What does bert look at? an analysis of bert's attention[PDF] Clark, K., Khandelwal, U., Levy, O. and Manning, C.D.,

work page Pith review arXiv 1905
[10]

What Does BERT Look At? An Analysis of BERT's Attention

arXiv preprint arXiv:1906.04341. . Do attention heads in bert track syntactic dependencies?[PDF] Htut, P.M., Phang, J., Bordia, S. and Bowman, S.R.,

work page Pith review arXiv 1906
[11]

Bowman , title =

arXiv preprint arXiv:1911.12246. . Attention is not all you need: Pure attention loses rank doubly exponentially with depth[PDF] Dong, Y., Cordonnier, J. and Loukas, A.,

work page arXiv 1911
[12]

arXiv preprint arXiv:2103.03404 , year=

arXiv preprint arXiv:2103.03404. . What Context Features Can Transformer Language Models Use? O'Connor, J. and Andreas, J.,

work page arXiv
[13]

An Explanation of In-context Learning as Implicit Bayesian Inference Xie, S.M., Raghunathan, A., Liang, P

arXiv preprint arXiv:2106.08367. . An Explanation of In-context Learning as Implicit Bayesian Inference Xie, S.M., Raghunathan, A., Liang, P. and Ma, T.,

work page arXiv
[14]

An explanation of in-context learning as implicit bayesian inference.arXiv preprint arXiv:2111.02080, 2021

arXiv preprint arXiv:2111.02080. . Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H. and Zettlemoyer, L.,

work page arXiv
[15]

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

arXiv preprint arXiv:2202.12837. . Reconciling modern machine-learning practice and the classical bias--variance trade-off Belkin, M., Hsu, D., Ma, S. and Mandal, S.,

work page internal anchor Pith review arXiv
[16]

Journal of Statistical Mechanics: Theory and Experiment, Vol 2021(12), pp. 124003. IOP Publishing. . Future ML Systems Will Be Qualitatively Different[link] Steinhardt, J.,

work page 2021
[17]

Qualitatively characterizing neural network optimization problems

arXiv preprint arXiv:1412.6544. . Analyzing monotonic linear interpolation in neural network loss landscapes Lucas, J., Bae, J., Zhang, M.R., Fort, S., Zemel, R. and Grosse, R.,

work page Pith review arXiv
[18]

Analyzing monotonic linear interpolation in neural network loss landscapes, 2021

arXiv preprint arXiv:2104.11044. . Geometry of neural network loss surfaces via random matrix theory Pennington, J. and Bahri, Y.,

work page arXiv
[19]

Zoom in: An introduction to circuits

Distill. DOI: 10.23915/distill.00024.001 . Convergent learning: Do different neural networks learn the same representations? Li, Y., Yosinski, J., Clune, J., Lipson, H., Hopcroft, J.E. and others,,

work page doi:10.23915/distill.00024.001
[20]

Similarity of Neural Network Representations Revisited

arXiv preprint arXiv:1905.00414. . High-Low Frequency Detectors Schubert, L., Voss, C., Cammarata, N., Goh, G. and Olah, C.,

work page Pith review arXiv 1905
[21]

DOI: 10.23915/distill.00024.005

Distill. DOI: 10.23915/distill.00024.005 . Performance-optimized hierarchical models predict neural responses in higher visual cortex Yamins, D.L., Hong, H., Cadieu, C.F., Solomon, E.A., Seibert, D. and DiCarlo, J.J.,

work page doi:10.23915/distill.00024.005
[22]

Multimodal neurons in artificial neural networks

Distill. DOI: 10.23915/distill.00030 . Neural machine translation by jointly learning to align and translate Bahdanau, D., Cho, K. and Bengio, Y.,

work page doi:10.23915/distill.00030
[23]

Neural Machine Translation by Jointly Learning to Align and Translate

arXiv preprint arXiv:1409.0473. . Listen, attend and spell Chan, W., Jaitly, N., Le, Q.V. and Vinyals, O.,

work page internal anchor Pith review arXiv
[24]

Listen, attend and spell.arXiv preprint arXiv:1508.01211,

arXiv preprint arXiv:1508.01211. Acknowledgments In writing this paper, our thinking and exposition was greatly clarified by detailed correspondence with Sam Bowman, Paul Christiano, Aidan Gomez, Dan Hendrycks, Jacob Hilton, Evan Hubinger, Andrew Ilyas, Percy Liang, Tom Lieberum, Chris Maddison, Aleksander Madry, Ethan Perez, Jacob Steinhardt, and Martin ...

work page arXiv
[25]

copying”, “prefix matching

BibTeX Citation: @article{olsson2022context, title={In-context Learning and Induction Heads}, author={Olsson, Catherine and Elhage, Nelson and Nanda, Neel and Joseph, Nicholas and DasSarma, Nova and Henighan, Tom and Mann, Ben and Askell, Amanda and Bai, Yuntao and Chen, Anna and Conerly, Tom and Drain, Dawn and Ganguli, Deep and Hatfield-Dodds, Zac and H...

work page 2022