hub Canonical reference

What learning algorithm is in-context learning? Investigations with linear models

· 2022 · cs.LG · arXiv 2211.15661

Canonical reference. 88% of citing Pith papers cite this work as background.

26 Pith papers citing it

Background 88% of classified citations

open full Pith review browse 26 citing papers arXiv PDF

abstract

Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples $(x, f(x))$ presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms. Code and reference implementations are released at https://github.com/ekinakyurek/google-research/blob/master/incontext.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7 method 1

citation-polarity summary

background 7 use method 1

representative citing papers

The Statistical Cost of Adaptation in Multi-Source Transfer Learning

math.ST · 2026-05-10 · unverdicted · novelty 8.0

Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.

Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods

cs.LG · 2025-06-12 · unverdicted · novelty 8.0

Transformers perform kernel-based prediction for Hölder regression on manifolds and achieve intrinsic-dimension-dependent minimax rates with sufficient training tasks.

Self-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetition

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Self-attention acts as a covariance readout that unifies in-context learning via population gradient descent and repetitive generation via asymptotic Markov behavior.

Mitigating Many-shot Jailbreak Attacks with One Single Demonstration

cs.CR · 2026-05-08 · conditional · novelty 7.0

A single safety demonstration appended at inference time mitigates many-shot jailbreak attacks by counteracting implicit malicious fine-tuning on harmful examples.

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

cs.LG · 2026-05-07 · conditional · novelty 7.0

Multi-layer transformers can implement in-context logistic regression by performing normalized gradient descent steps layer by layer, obtained via supervised training of a single attention layer followed by recurrent application with convergence and OOD guarantees.

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.

Elicitation Matters: How Prompts and Query Protocols Shape LLM Surrogates under Sparse Observations

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

LLM surrogate beliefs under sparse observations depend on prompts and query protocols, with structural prompts as priors, pointwise vs joint querying producing different beliefs, and sequential evidence causing non-monotonic updates that affect acquisition and regret.

Meta-Harness: End-to-End Optimization of Model Harnesses

cs.AI · 2026-03-30 · unverdicted · novelty 7.0

Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

cs.LG · 2024-01-19 · conditional · novelty 7.0

Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.

Consistency Training while Mitigating Obfuscation via Rate Matching

cs.CL · 2026-06-01 · unverdicted · novelty 6.0

RMCT matches the rate of target behaviors like bias-following across input perturbations to reduce sycophancy in LLMs while preserving verbalization of bias cues.

CodeCytos: AI-assisted spatial molecular imaging analysis via code-augmented agent action space

cs.CV · 2026-05-30 · unverdicted · novelty 6.0

CodeCytos is a code-augmented reasoning agent framework for dynamic, programmable exploration of custom spatial cellular features in molecular imaging data across four tissue types.

Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

A distributional alignment metric d_NTP and a linear regression method LTV for task vectors that improves accuracy by 9.2% over baselines on classification and regression tasks across multiple LLMs.

When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach

cs.AI · 2026-05-19 · unverdicted · novelty 6.0 · 2 refs

The paper proposes Strategic Prior-data Fitted Network (SPN), an inference-time framework that adapts pretrained tabular foundation models (PFNs) to strategic manipulation by aligning predictions with approximated post-manipulation distributions via strategic in-context examples.

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.

Spectral Transformer Neural Processes

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

STNPs extend TNPs with a spectral aggregator that estimates context spectra, forms spectral mixtures, and injects task-adaptive frequency features to better handle periodicity.

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

Transformers can be built to act as nonlinear featurizers via attention, supporting in-context regression with proven generalization bounds on synthetic tasks.

Learning to Adapt: In-Context Learning Beyond Stationarity

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.

Amortized Inference of Causal Models via Conditional Fixed-Point Iterations

cs.LG · 2024-10-08 · unverdicted · novelty 6.0

Amortized transformer model with conditional fixed-point iterations learns SCM causal mechanisms from data and graphs, matching per-dataset baselines and outperforming in low-data regimes.

Otter: A Multi-Modal Model with In-Context Instruction Tuning

cs.CV · 2023-05-05 · unverdicted · novelty 6.0

Otter is a multi-modal model instruction-tuned on the MIMIC-IT dataset of over 3 million in-context instruction-response pairs to improve convergence and generalization on tasks with multiple images and videos.

Med-HEAL: Analyzing and Mitigating Hallucinations in Medical LLMs with Hallucination-Aware In-Context Learning

cs.CL · 2026-05-31 · unverdicted · novelty 5.0

Med-HEAL builds a hallucination dataset from BioMistral answers on EHRNoteQA via GPT-4o and human review, then shows self-critique improves accuracy in three of five tested LLMs without retraining.

TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

Task-aware pruning improves OOD model performance by realigning distorted OOD layerwise norm and pairwise-distance profiles with the task-adapted geometry observed on ID inputs.

One for All: A Non-Linear Transformer can Enable Cross-Domain Generalization for In-Context Reinforcement Learning

cs.LG · 2026-05-10 · unverdicted · novelty 5.0

Non-linear transformers enable cross-domain generalization in in-context RL by representing value functions from different domains with shared weights inside a shared RKHS.

When Context Sticks: Studying Interference in In-Context Learning

cs.LG · 2026-04-25 · unverdicted · novelty 5.0

In-context learning shows persistent interference from prior examples, with more misleading linear examples degrading quadratic predictions and training curricula modulating recovery speed.

Online In-Context Distillation for Low-Resource Vision Language Models

cs.CV · 2025-10-20 · unverdicted · novelty 5.0

Online In-Context Distillation lets small VLMs gain up to 33% performance with as little as 4% teacher annotations by distilling knowledge through dynamic in-context demonstrations at inference.

citing papers explorer

Showing 1 of 1 citing paper after filters.

One for All: A Non-Linear Transformer can Enable Cross-Domain Generalization for In-Context Reinforcement Learning cs.LG · 2026-05-10 · unverdicted · none · ref 2 · internal anchor
Non-linear transformers enable cross-domain generalization in in-context RL by representing value functions from different domains with shared weights inside a shared RKHS.

What learning algorithm is in-context learning? Investigations with linear models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer