Recognition: 2 theorem links
· Lean TheoremA Survey on In-context Learning
Pith reviewed 2026-05-12 12:53 UTC · model grok-4.3
The pith
In-context learning allows large language models to make predictions from prompts that include a few task examples without parameter updates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In-context learning has emerged as a new paradigm for natural language processing in which large language models make predictions based on contexts augmented with a few examples, and this survey organizes the techniques, applications, and open problems to facilitate further research into how such models extrapolate abilities from limited context.
What carries the argument
The formal definition of in-context learning as the process by which large language models generate outputs conditioned on a prompt that contains a small number of demonstration examples.
Load-bearing premise
The rapidly expanding body of ICL literature can be usefully organized and summarized within the scope and selection criteria of a single survey paper.
What would settle it
Publication of a substantial set of subsequent papers that introduce major ICL techniques or challenges absent from the survey's categories would show the organization does not capture the field's current state.
read the original abstract
With the increasing capabilities of large language models (LLMs), in-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP), where LLMs make predictions based on contexts augmented with a few examples. It has been a significant trend to explore ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress and challenges of ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques, including training strategies, prompt designing strategies, and related analysis. Additionally, we explore various ICL application scenarios, such as data engineering and knowledge updating. Finally, we address the challenges of ICL and suggest potential directions for further research. We hope that our work can encourage more research on uncovering how ICL works and improving ICL.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper surveys in-context learning (ICL) as an emerging paradigm for large language models (LLMs) in NLP. It begins with a formal definition of ICL and its relation to related concepts, then organizes advanced techniques (training strategies, prompt design, and analyses), reviews applications (e.g., data engineering and knowledge updating), and concludes by addressing challenges and suggesting future research directions.
Significance. A structured survey on ICL would be moderately significant given the field's rapid expansion, as it could consolidate literature on techniques, applications, and open problems to guide researchers. The logical organization from definition through techniques and applications to challenges supports its utility as a reference, though its impact depends on the depth and balance of coverage across the cited works.
minor comments (3)
- [Introduction/Abstract] The abstract states that the paper clarifies ICL's correlation to related studies, but the provided outline does not specify how overlaps with few-shot prompting or meta-learning are delineated; adding a dedicated subsection or table comparing these would improve clarity.
- The discussion of prompt designing strategies is listed as a core technique area, yet without explicit criteria for paper selection or inclusion of quantitative benchmarks across methods, the summary risks appearing selective; a methods section detailing search terms and coverage would strengthen the survey.
- [Challenges and Future Directions] In the challenges section, the suggestions for future directions are high-level; grounding each with references to specific recent papers or open benchmarks would make the recommendations more actionable.
Simulated Author's Rebuttal
We thank the referee for their review and recommendation of minor revision. The positive assessment of the survey's logical organization, coverage of definitions, techniques, applications, and challenges is appreciated. As no specific major comments were provided, we have no point-by-point revisions to address.
Circularity Check
No significant circularity: literature survey without derivations or predictions
full rationale
This is a survey paper whose sole claim is to organize and summarize the existing ICL literature. It presents a formal definition of ICL and discusses techniques, applications, and challenges drawn from cited works, but advances no original equations, fitted parameters, predictions, or derivations. No load-bearing step reduces to self-definition, self-citation chains, or renaming of results. Self-citations (if any) support only descriptive coverage and are not invoked to justify uniqueness theorems or force technical conclusions. The paper is self-contained as an editorial synthesis against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/LawOfExistence.leanlaw_of_existence unclearThe key idea of in-context learning is to learn from analogy. Figure 1 gives an example that describes how language models make decisions via ICL.
Forward citations
Cited by 33 Pith papers
-
From Context to Skills: Can Language Models Learn from Context Skillfully?
Ctx2Skill lets language models autonomously evolve context-specific skills via multi-agent self-play, improving performance on context learning tasks without human supervision.
-
Gradient-Based Program Synthesis with Neurally Interpreted Languages
NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prio...
-
Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks
Pro²Assist uses multimodal egocentric perception from AR glasses to track fine-grained progress in long-horizon procedural tasks and deliver timely proactive assistance, outperforming baselines by over 21% in action u...
-
AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation
AnchorSeg uses ordered query banks of latent reasoning tokens plus a spatial anchor token and a Token-Mask Cycle Consistency loss to achieve 67.7% gIoU and 68.1% cIoU on the ReasonSeg benchmark.
-
Deformation-based In-Context Learning for Point Cloud Understanding
DeformPIC deforms query point clouds under prompt guidance for in-context learning, outperforming prior methods with lower Chamfer Distance on reconstruction, denoising, and registration tasks.
-
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Set-of-Mark prompting marks segmented image regions with alphanumerics and masks to let GPT-4V achieve state-of-the-art zero-shot results on referring expression comprehension and segmentation benchmarks like RefCOCOg.
-
Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning
METIS internalizes curriculum judgment in LLM reinforcement fine-tuning by predicting within-prompt reward variance via in-context learning and jointly optimizing with a self-judgment reward, yielding superior perform...
-
Personal Visual Context Learning in Large Multimodal Models
Introduces Personal VCL formalization and benchmark revealing LMM context gaps, plus an Agentic Context Bank baseline that boosts personalized visual reasoning.
-
Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging
WALDO improves zero-shot anomaly localization in medical imaging by selecting reference distributions via entropy-weighted Sliced Wasserstein distances and Goldilocks zone sampling, yielding a 19% relative gain on bra...
-
Decompose and Recompose: Reasoning New Skills from Existing Abilities for Cross-Task Robotic Manipulation
Decompose and Recompose decomposes seen robotic demonstrations into skill-action alignments and recomposes them via visual-semantic retrieval and planning to enable zero-shot cross-task generalization.
-
RAQG-QPP: Query Performance Prediction with Retrieved Query Variants and Retrieval Augmented Query Generation
Retrieved query variants from logs combined with LLM-augmented generation improve unsupervised QPP accuracy by up to 30% for neural rankers on TREC DL'19 and DL'20.
-
Dual-Enhancement Product Bundling: Bridging Interactive Graph and Large Language Model
A graph-to-text paradigm with Dynamic Concept Binding Mechanism integrates interactive graphs and LLMs to recommend product bundles, yielding 6.3%-26.5% gains over baselines on POG, POG_dense, and Steam datasets.
-
Learning to Adapt: In-Context Learning Beyond Stationarity
Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.
-
Bridging Natural Language and Microgrid Dynamics: A Context-Aware Simulator and Dataset
OpenCEM is the first open-source digital twin that integrates unstructured contextual information with quantitative microgrid dynamics to enable context-aware energy management.
-
Measuring Representation Robustness in Large Language Models for Geometry
LLMs display accuracy gaps of up to 14 percentage points on the same geometry problems solely due to representation choice, with vector forms consistently weakest and a convert-then-solve prompt helping only high-capa...
-
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
-
Video models are zero-shot learners and reasoners
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
-
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
-
LLMs with in-context learning for Algorithmic Theoretical Physics
Frontier LLMs with in-context learning and CAS integration solve most algorithmic tasks in theoretical physics when supplied with worked examples.
-
When Context Sticks: Studying Interference in In-Context Learning
In-context learning shows persistent interference from prior examples, with more misleading linear examples degrading quadratic predictions and training curricula modulating recovery speed.
-
Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)
SSAS improves LLM sentiment prediction consistency and data quality by up to 30% on three review datasets via syntactic and semantic context assessment summarization.
-
The Pedagogy of AI Mistakes: Fostering Higher-Order Thinking
AI mistakes can be structured into course activities to foster higher-order thinking, metacognition, and AI literacy in higher education.
-
UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning
UnAC improves LMM performance on visual reasoning benchmarks by combining adaptive visual prompting, image abstraction, and gradual self-checking.
-
Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition
Fine-tuned LLaMA3 with LoRA reaches 81.24% F1 on 18-category fine-grained medical entity recognition, beating zero-shot by 63.11% and few-shot by 35.63%.
-
Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs
wSSAS is a two-phase deterministic framework that uses hierarchical text organization and SNR-based feature prioritization to improve clustering integrity, categorization accuracy, and reproducibility when applying LL...
-
LLMs Underperform Graph-Based Parsers on Supervised Relation Extraction for Complex Graphs
Graph-based parsers outperform LLMs on supervised relation extraction as linguistic graph complexity grows with more relations per document.
-
Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition
Hybrid LLM plus static analysis for algorithm recognition in code cuts required model calls by 72-97% and lifts F1-scores by as much as 12 points.
-
The Rise and Potential of Large Language Model Based Agents: A Survey
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
-
Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities
A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.
-
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
-
A Survey on Large Language Models for Code Generation
A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark...
-
Large Language Models: A Survey
The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
-
A Survey of Large Language Models
This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.
Reference graph
Works this paper leans on
-
[1]
Language models are few-shot learners. In Ad- vances in Neural Information Processing Systems 33: Annual Conference on Neural Information Process- ing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. Marc-Etienne Brunet, Ashton Anderson, and Richard S. Zemel. 2023. ICL markup: Structuring in- context learning using soft-token tags. CoRR, abs/2312...
-
[2]
Data distributional properties drive emergent in-context learning in transformers. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Sys- tems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. Anwoy Chatterjee, Eshaan Tanwar, Subhabrata Dutta, and Tanmoy Chakraborty. 2024. L...
-
[3]
Transformers learn higher-order optimization methods for in-context learning: A study with linear models. CoRR, abs/2310.17086. Yeqi Gao, Zhao Song, and Shenghao Xie. 2023. In- context learning for attention scheme: from single softmax regression to multiple softmax regression via a tensor trick. CoRR, abs/2307.02419. Shivam Garg, Dimitris Tsipras, Percy ...
-
[4]
Association for Computational Linguistics. Michael Hahn and Navin Goyal. 2023. A theory of emergent in-context learning as implicit structure induction. CoRR, abs/2303.07971. Chi Han, Ziqi Wang, Han Zhao, and Heng Ji. 2023a. Explaining emergent in-context learning as kernel regression. Preprint, arXiv:2305.12766. Xiaochuang Han, Daniel Simig, Todor Mihayl...
-
[5]
Self-demos: Eliciting out-of-demonstration generalizability in large language models. CoRR, abs/2404.00884. 11 Clyde Highmore. 2024. In-context learning in large language models: A comprehensive survey. Or Honovich, Uri Shaham, Samuel R. Bowman, and Omer Levy. 2023. Instruction induction: From few examples to natural language task descriptions. In Proceed...
-
[6]
arXiv preprint arXiv:2304.09960 , year=
The dual form of neural networks revisited: Connecting test time predictions to training patterns via spotlights of attention. In International Confer- ence on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA , volume 162 of Proceedings of Machine Learning Research, pages 9639–9659. PMLR. Srinivasan Iyer, Xi Victoria Lin, Ramakanth P...
-
[7]
Association for Computational Linguistics. Arvind Mahankali, Tatsunori B. Hashimoto, and Tengyu Ma. 2023. One step of gradient descent is provably the optimal in-context learner with one layer of linear self-attention. CoRR, abs/2307.03576. Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, and...
-
[8]
Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, pages 2655–2671, Seattle, United States. Association for Computational Linguistics. Abulhair Saparov and He He. 2023. Language models are greedy reasoners...
work page 2022
-
[9]
Do pretrained transformers learn in-context by gradient descent? Preprint, arXiv:2310.08540. Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush V osoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, et al. 2022. Language models are multilingual chain-of-thought reasoners. ArXiv preprint, abs/2210.03057. Weijia Shi, Sewo...
-
[10]
An information-theoretic approach to prompt engineering without ground truth labels. In Proc. of ACL, pages 819–862, Dublin, Ireland. Association for Computational Linguistics. Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al. 2022. Beyond ...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[11]
Pretraining data mixtures enable narrow model selection capabilities in transformer models. CoRR, abs/2311.00871. Jinghan Yang, Shuming Ma, and Furu Wei. 2023a. Auto-icl: In-context learning without human supervi- sion. CoRR, abs/2311.09263. Zhe Yang, Damai Dai, Peiyi Wang, and Zhifang Sui. 2023b. Not all demonstration examples are equally beneficial: Rew...
-
[12]
Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron C
OpenReview.net. Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron C. Courville, Behnam Neyshabur, and Hanie Sedghi
-
[13]
Teaching algorithmic reasoning via in-context learning. CoRR, abs/2211.09066. Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cot- terell, and Mrinmaya Sachan. 2023b. Efficient prompting via dynamic in-context learning. CoRR, abs/2305.11170. Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. 2023c. Large l...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.