Randomly replacing labels in in-context demonstrations barely hurts performance, showing that label space, input distribution, and sequence format drive in-context learning more than ground-truth labels.
Language Models are Few-Shot Learners , year =
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Linear self-attention transformers provably implement in-context SARSA and actor-critic via explicit constructions, with gradient flow converging exponentially to the target parameter manifold under rich training MDPs.
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
Return-conditional diffusion models for policies outperform offline RL on benchmarks by circumventing dynamic programming and enable constraint or skill composition.
citing papers explorer
-
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Randomly replacing labels in in-context demonstrations barely hurts performance, showing that label space, input distribution, and sequence format drive in-context learning more than ground-truth labels.
-
Transformers Provably Implement In-Context Reinforcement Learning with Policy Improvement
Linear self-attention transformers provably implement in-context SARSA and actor-critic via explicit constructions, with gradient flow converging exponentially to the target parameter manifold under rich training MDPs.
-
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
-
Is Conditional Generative Modeling all you need for Decision-Making?
Return-conditional diffusion models for policies outperform offline RL on benchmarks by circumventing dynamic programming and enable constraint or skill composition.