Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
hub
What learning algorithm is in-context learning? investigations with linear models
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
Self-attention acts as a covariance readout that unifies in-context learning via population gradient descent and repetitive generation via asymptotic Markov behavior.
A single safety demonstration appended at inference time mitigates many-shot jailbreak attacks by counteracting implicit malicious fine-tuning on harmful examples.
Multi-layer transformers can implement in-context logistic regression by performing normalized gradient descent steps layer by layer, obtained via supervised training of a single attention layer followed by recurrent application with convergence and OOD guarantees.
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
LLM surrogate beliefs under sparse observations depend on prompts and query protocols, with structural prompts as priors, pointwise vs joint querying producing different beliefs, and sequential evidence causing non-monotonic updates that affect acquisition and regret.
Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.
Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
STNPs extend TNPs with a spectral aggregator that estimates context spectra, forms spectral mixtures, and injects task-adaptive frequency features to better handle periodicity.
Transformers can be built to act as nonlinear featurizers via attention, supporting in-context regression with proven generalization bounds on synthetic tasks.
Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.
Non-linear transformers enable cross-domain generalization in in-context RL by representing value functions from different domains with shared weights inside a shared RKHS.
In-context learning shows persistent interference from prior examples, with more misleading linear examples degrading quadratic predictions and training curricula modulating recovery speed.
A survey synthesizing representative advances, common themes, and open problems in high-dimensional statistics while pointing to key entry-point works.
citing papers explorer
-
The Statistical Cost of Adaptation in Multi-Source Transfer Learning
Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
-
Self-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetition
Self-attention acts as a covariance readout that unifies in-context learning via population gradient descent and repetitive generation via asymptotic Markov behavior.
-
Mitigating Many-shot Jailbreak Attacks with One Single Demonstration
A single safety demonstration appended at inference time mitigates many-shot jailbreak attacks by counteracting implicit malicious fine-tuning on harmful examples.
-
Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent
Multi-layer transformers can implement in-context logistic regression by performing normalized gradient descent steps layer by layer, obtained via supervised training of a single attention layer followed by recurrent application with convergence and OOD guarantees.
-
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
-
Elicitation Matters: How Prompts and Query Protocols Shape LLM Surrogates under Sparse Observations
LLM surrogate beliefs under sparse observations depend on prompts and query protocols, with structural prompts as priors, pointwise vs joint querying producing different beliefs, and sequential evidence causing non-monotonic updates that affect acquisition and regret.
-
Meta-Harness: End-to-End Optimization of Model Harnesses
Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.
-
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
-
Spectral Transformer Neural Processes
STNPs extend TNPs with a spectral aggregator that estimates context spectra, forms spectral mixtures, and injects task-adaptive frequency features to better handle periodicity.
-
Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
Transformers can be built to act as nonlinear featurizers via attention, supporting in-context regression with proven generalization bounds on synthetic tasks.
-
Learning to Adapt: In-Context Learning Beyond Stationarity
Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.
-
One for All: A Non-Linear Transformer can Enable Cross-Domain Generalization for In-Context Reinforcement Learning
Non-linear transformers enable cross-domain generalization in in-context RL by representing value functions from different domains with shared weights inside a shared RKHS.
-
When Context Sticks: Studying Interference in In-Context Learning
In-context learning shows persistent interference from prior examples, with more misleading linear examples degrading quadratic predictions and training curricula modulating recovery speed.
-
High-Dimensional Statistics: Reflections on Progress and Open Problems
A survey synthesizing representative advances, common themes, and open problems in high-dimensional statistics while pointing to key entry-point works.