arxiv: 2301.00234 · v6 · submitted 2022-12-31 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

A Survey on In-context Learning

Baobao Chang, Ce Zheng, Damai Dai, Heming Xia, Jingjing Xu, Jingyuan Ma, Lei Li, Qingxiu Dong, Rui Li, Tianyu Liu, Xu Sun, Zhifang Sui, Zhiyong Wu

Pith reviewed 2026-05-12 12:53 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords in-context learninglarge language modelsprompt designfew-shot learningnatural language processingsurvey

0 comments

The pith

In-context learning allows large language models to make predictions from prompts that include a few task examples without parameter updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey defines in-context learning formally as a paradigm where large language models condition predictions on a small set of demonstration examples placed in the input. It organizes existing work into training strategies that prepare models for this behavior, prompt design methods that select and format examples, and analyses that probe why the approach succeeds. The review also examines concrete uses such as selecting high-quality data and injecting fresh knowledge into a fixed model. By collecting these threads the authors aim to clarify the current state of the field and point out obstacles that block wider adoption. The result is a map that lets researchers see how different pieces of ICL research fit together.

Core claim

In-context learning has emerged as a new paradigm for natural language processing in which large language models make predictions based on contexts augmented with a few examples, and this survey organizes the techniques, applications, and open problems to facilitate further research into how such models extrapolate abilities from limited context.

What carries the argument

The formal definition of in-context learning as the process by which large language models generate outputs conditioned on a prompt that contains a small number of demonstration examples.

Load-bearing premise

The rapidly expanding body of ICL literature can be usefully organized and summarized within the scope and selection criteria of a single survey paper.

What would settle it

Publication of a substantial set of subsequent papers that introduce major ICL techniques or challenges absent from the survey's categories would show the organization does not capture the field's current state.

read the original abstract

With the increasing capabilities of large language models (LLMs), in-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP), where LLMs make predictions based on contexts augmented with a few examples. It has been a significant trend to explore ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress and challenges of ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques, including training strategies, prompt designing strategies, and related analysis. Additionally, we explore various ICL application scenarios, such as data engineering and knowledge updating. Finally, we address the challenges of ICL and suggest potential directions for further research. We hope that our work can encourage more research on uncovering how ICL works and improving ICL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard, well-organized survey on in-context learning that maps the literature up to early 2023 but introduces no new results or mechanisms.

read the letter

This survey on in-context learning gives a clear map of the area as it stood in late 2022. It defines ICL, breaks down techniques for training and prompting, lists applications, and flags open challenges. The structure follows the abstract closely: a formal definition first, then correlations to related work, followed by sections on training strategies, prompt design, analysis, applications such as data engineering and knowledge updating, and finally challenges with suggested directions. The paper does a good job pulling together the main threads without overclaiming. The sections on prompt designing strategies and analysis feel particularly organized, and the discussion of applications is straightforward. Citations draw from major NLP venues and cover a reasonable range of the early ICL papers. The main limitation is that any survey of a fast-moving topic like this will have gaps in coverage and will age quickly. The authors' choices about what to include and how to categorize are reasonable but ultimately editorial, and some later developments in scaling laws or more advanced prompting will already sit outside the scope. The analysis sections stay descriptive rather than resolving open questions, which is expected for this format. There are no experiments, code, or new derivations here. This paper is useful for someone entering the ICL literature or needing a single reference to point to recent work. Experienced researchers already deep in the area might skim it for organization ideas but will not find new technical substance. I would send it to peer review. A solid survey helps the community even if it is not groundbreaking.

Referee Report

0 major / 3 minor

Summary. The paper surveys in-context learning (ICL) as an emerging paradigm for large language models (LLMs) in NLP. It begins with a formal definition of ICL and its relation to related concepts, then organizes advanced techniques (training strategies, prompt design, and analyses), reviews applications (e.g., data engineering and knowledge updating), and concludes by addressing challenges and suggesting future research directions.

Significance. A structured survey on ICL would be moderately significant given the field's rapid expansion, as it could consolidate literature on techniques, applications, and open problems to guide researchers. The logical organization from definition through techniques and applications to challenges supports its utility as a reference, though its impact depends on the depth and balance of coverage across the cited works.

minor comments (3)

[Introduction/Abstract] The abstract states that the paper clarifies ICL's correlation to related studies, but the provided outline does not specify how overlaps with few-shot prompting or meta-learning are delineated; adding a dedicated subsection or table comparing these would improve clarity.
The discussion of prompt designing strategies is listed as a core technique area, yet without explicit criteria for paper selection or inclusion of quantitative benchmarks across methods, the summary risks appearing selective; a methods section detailing search terms and coverage would strengthen the survey.
[Challenges and Future Directions] In the challenges section, the suggestions for future directions are high-level; grounding each with references to specific recent papers or open benchmarks would make the recommendations more actionable.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and recommendation of minor revision. The positive assessment of the survey's logical organization, coverage of definitions, techniques, applications, and challenges is appreciated. As no specific major comments were provided, we have no point-by-point revisions to address.

Circularity Check

0 steps flagged

No significant circularity: literature survey without derivations or predictions

full rationale

This is a survey paper whose sole claim is to organize and summarize the existing ICL literature. It presents a formal definition of ICL and discusses techniques, applications, and challenges drawn from cited works, but advances no original equations, fitted parameters, predictions, or derivations. No load-bearing step reduces to self-definition, self-citation chains, or renaming of results. Self-citations (if any) support only descriptive coverage and are not invoked to justify uniqueness theorems or force technical conclusions. The paper is self-contained as an editorial synthesis against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper the work aggregates and organizes prior publications; it introduces no free parameters, no new axioms, and no invented entities.

pith-pipeline@v0.9.0 · 5484 in / 1030 out tokens · 48351 ms · 2026-05-12T12:53:53.077073+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/LawOfExistence.lean law_of_existence unclear
The key idea of in-context learning is to learn from analogy. Figure 1 gives an example that describes how language models make decisions via ICL.

Forward citations

Cited by 33 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Context to Skills: Can Language Models Learn from Context Skillfully?
cs.AI 2026-04 unverdicted novelty 8.0

Ctx2Skill lets language models autonomously evolve context-specific skills via multi-agent self-play, improving performance on context learning tasks without human supervision.
Gradient-Based Program Synthesis with Neurally Interpreted Languages
cs.LG 2026-04 unverdicted novelty 8.0

NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prio...
Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks
cs.AI 2026-05 unverdicted novelty 7.0

Pro²Assist uses multimodal egocentric perception from AR glasses to track fine-grained progress in long-horizon procedural tasks and deliver timely proactive assistance, outperforming baselines by over 21% in action u...
AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation
cs.CV 2026-04 unverdicted novelty 7.0

AnchorSeg uses ordered query banks of latent reasoning tokens plus a spatial anchor token and a Token-Mask Cycle Consistency loss to achieve 67.7% gIoU and 68.1% cIoU on the ReasonSeg benchmark.
Deformation-based In-Context Learning for Point Cloud Understanding
cs.CV 2026-04 unverdicted novelty 7.0

DeformPIC deforms query point clouds under prompt guidance for in-context learning, outperforming prior methods with lower Chamfer Distance on reconstruction, denoising, and registration tasks.
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
cs.CV 2023-10 accept novelty 7.0

Set-of-Mark prompting marks segmented image regions with alphanumerics and masks to let GPT-4V achieve state-of-the-art zero-shot results on referring expression comprehension and segmentation benchmarks like RefCOCOg.
Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning
cs.LG 2026-05 unverdicted novelty 6.0

METIS internalizes curriculum judgment in LLM reinforcement fine-tuning by predicting within-prompt reward variance via in-context learning and jointly optimizing with a self-judgment reward, yielding superior perform...
Personal Visual Context Learning in Large Multimodal Models
cs.CV 2026-05 unverdicted novelty 6.0

Introduces Personal VCL formalization and benchmark revealing LMM context gaps, plus an Agentic Context Bank baseline that boosts personalized visual reasoning.
Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging
cs.CV 2026-05 unverdicted novelty 6.0

WALDO improves zero-shot anomaly localization in medical imaging by selecting reference distributions via entropy-weighted Sliced Wasserstein distances and Goldilocks zone sampling, yielding a 19% relative gain on bra...
Decompose and Recompose: Reasoning New Skills from Existing Abilities for Cross-Task Robotic Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

Decompose and Recompose decomposes seen robotic demonstrations into skill-action alignments and recomposes them via visual-semantic retrieval and planning to enable zero-shot cross-task generalization.
RAQG-QPP: Query Performance Prediction with Retrieved Query Variants and Retrieval Augmented Query Generation
cs.IR 2026-04 unverdicted novelty 6.0

Retrieved query variants from logs combined with LLM-augmented generation improve unsupervised QPP accuracy by up to 30% for neural rankers on TREC DL'19 and DL'20.
Dual-Enhancement Product Bundling: Bridging Interactive Graph and Large Language Model
cs.CL 2026-04 unverdicted novelty 6.0

A graph-to-text paradigm with Dynamic Concept Binding Mechanism integrates interactive graphs and LLMs to recommend product bundles, yielding 6.3%-26.5% gains over baselines on POG, POG_dense, and Steam datasets.
Learning to Adapt: In-Context Learning Beyond Stationarity
cs.LG 2026-04 unverdicted novelty 6.0

Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.
Bridging Natural Language and Microgrid Dynamics: A Context-Aware Simulator and Dataset
eess.SY 2026-04 unverdicted novelty 6.0

OpenCEM is the first open-source digital twin that integrates unstructured contextual information with quantitative microgrid dynamics to enable context-aware energy management.
Measuring Representation Robustness in Large Language Models for Geometry
cs.CL 2026-04 unverdicted novelty 6.0

LLMs display accuracy gaps of up to 14 percentage points on the same geometry problems solely due to representation choice, with vector forms consistently weakest and a convert-then-solve prompt helping only high-capa...
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework
cs.CL 2026-04 unverdicted novelty 6.0

A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
Video models are zero-shot learners and reasoners
cs.LG 2025-09 unverdicted novelty 6.0

Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
cs.SE 2024-03 unverdicted novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
LLMs with in-context learning for Algorithmic Theoretical Physics
cs.LG 2026-05 unverdicted novelty 5.0

Frontier LLMs with in-context learning and CAS integration solve most algorithmic tasks in theoretical physics when supplied with worked examples.
When Context Sticks: Studying Interference in In-Context Learning
cs.LG 2026-04 unverdicted novelty 5.0

In-context learning shows persistent interference from prior examples, with more misleading linear examples degrading quadratic predictions and training curricula modulating recovery speed.
Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)
cs.CL 2026-04 unverdicted novelty 5.0

SSAS improves LLM sentiment prediction consistency and data quality by up to 30% on three review datasets via syntactic and semantic context assessment summarization.
The Pedagogy of AI Mistakes: Fostering Higher-Order Thinking
cs.CY 2026-05 unverdicted novelty 4.0

AI mistakes can be structured into course activities to foster higher-order thinking, metacognition, and AI literacy in higher education.
UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning
cs.CV 2026-05 unverdicted novelty 4.0

UnAC improves LMM performance on visual reasoning benchmarks by combining adaptive visual prompting, image abstraction, and gradual self-checking.
Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition
cs.AI 2026-04 conditional novelty 4.0

Fine-tuned LLaMA3 with LoRA reaches 81.24% F1 on 18-category fine-grained medical entity recognition, beating zero-shot by 63.11% and few-shot by 35.63%.
Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs
cs.CL 2026-04 unverdicted novelty 4.0

wSSAS is a two-phase deterministic framework that uses hierarchical text organization and SNR-based feature prioritization to improve clustering integrity, categorization accuracy, and reproducibility when applying LL...
LLMs Underperform Graph-Based Parsers on Supervised Relation Extraction for Complex Graphs
cs.CL 2026-04 unverdicted novelty 4.0

Graph-based parsers outperform LLMs on supervised relation extraction as linguistic graph complexity grows with more relations per document.
Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition
cs.SE 2026-04 conditional novelty 4.0

Hybrid LLM plus static analysis for algorithm recognition in code cuts required model calls by 72-97% and lifts F1-scores by as much as 12 points.
The Rise and Potential of Large Language Model Based Agents: A Survey
cs.AI 2023-09 accept novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities
cs.DC 2026-04 unverdicted novelty 3.0

A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
cs.CL 2024-12 accept novelty 3.0

A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
A Survey on Large Language Models for Code Generation
cs.CL 2024-06 unverdicted novelty 3.0

A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark...
Large Language Models: A Survey
cs.CL 2024-02 accept novelty 3.0

The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
A Survey of Large Language Models
cs.CL 2023-03 accept novelty 3.0

This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · cited by 33 Pith papers · 1 internal anchor

[1]

In Ad- vances in Neural Information Processing Systems 33: Annual Conference on Neural Information Process- ing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual

Language models are few-shot learners. In Ad- vances in Neural Information Processing Systems 33: Annual Conference on Neural Information Process- ing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. Marc-Etienne Brunet, Ashton Anderson, and Richard S. Zemel. 2023. ICL markup: Structuring in- context learning using soft-token tags. CoRR, abs/2312...

work page arXiv 2020
[2]

Data distributional properties drive emergent in-context learning in transformers. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Sys- tems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. Anwoy Chatterjee, Eshaan Tanwar, Subhabrata Dutta, and Tanmoy Chakraborty. 2024. L...

work page arXiv 2022
[3]

CoRR, abs/2310.17086

Transformers learn higher-order optimization methods for in-context learning: A study with linear models. CoRR, abs/2310.17086. Yeqi Gao, Zhao Song, and Shenghao Xie. 2023. In- context learning for attention scheme: from single softmax regression to multiple softmax regression via a tensor trick. CoRR, abs/2307.02419. Shivam Garg, Dimitris Tsipras, Percy ...

work page arXiv 2023
[4]

Michael Hahn and Navin Goyal

Association for Computational Linguistics. Michael Hahn and Navin Goyal. 2023. A theory of emergent in-context learning as implicit structure induction. CoRR, abs/2303.07971. Chi Han, Ziqi Wang, Han Zhao, and Heng Ji. 2023a. Explaining emergent in-context learning as kernel regression. Preprint, arXiv:2305.12766. Xiaochuang Han, Daniel Simig, Todor Mihayl...

work page arXiv 2023
[5]

CoRR, abs/2404.00884

Self-demos: Eliciting out-of-demonstration generalizability in large language models. CoRR, abs/2404.00884. 11 Clyde Highmore. 2024. In-context learning in large language models: A comprehensive survey. Or Honovich, Uri Shaham, Samuel R. Bowman, and Omer Levy. 2023. Instruction induction: From few examples to natural language task descriptions. In Proceed...

work page arXiv 2024
[6]

arXiv preprint arXiv:2304.09960 , year=

The dual form of neural networks revisited: Connecting test time predictions to training patterns via spotlights of attention. In International Confer- ence on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA , volume 162 of Proceedings of Machine Learning Research, pages 9639–9659. PMLR. Srinivasan Iyer, Xi Victoria Lin, Ramakanth P...

work page arXiv 2022
[7]

Arvind Mahankali, Tatsunori B

Association for Computational Linguistics. Arvind Mahankali, Tatsunori B. Hashimoto, and Tengyu Ma. 2023. One step of gradient descent is provably the optimal in-context learner with one layer of linear self-attention. CoRR, abs/2307.03576. Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, and...

work page arXiv 2023
[8]

Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, pages 2655–2671, Seattle, United States. Association for Computational Linguistics. Abulhair Saparov and He He. 2023. Language models are greedy reasoners...

work page 2022
[9]

Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush V osoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, et al

Do pretrained transformers learn in-context by gradient descent? Preprint, arXiv:2310.08540. Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush V osoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, et al. 2022. Language models are multilingual chain-of-thought reasoners. ArXiv preprint, abs/2210.03057. Weijia Shi, Sewo...

work page arXiv 2022
[10]

An information-theoretic approach to prompt engineering without ground truth labels. In Proc. of ACL, pages 819–862, Dublin, Ireland. Association for Computational Linguistics. Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al. 2022. Beyond ...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[11]

2023 , month = nov, journal =

Pretraining data mixtures enable narrow model selection capabilities in transformer models. CoRR, abs/2311.00871. Jinghan Yang, Shuming Ma, and Furu Wei. 2023a. Auto-icl: In-context learning without human supervi- sion. CoRR, abs/2311.09263. Zhe Yang, Damai Dai, Peiyi Wang, and Zhifang Sui. 2023b. Not all demonstration examples are equally beneficial: Rew...

work page arXiv 2023
[12]

Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron C

OpenReview.net. Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron C. Courville, Behnam Neyshabur, and Hanie Sedghi

work page
[13]

Segment the horses from the rest of the image and generate a new image where the horse regions are white and the other regions are black

Teaching algorithmic reasoning via in-context learning. CoRR, abs/2211.09066. Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cot- terell, and Mrinmaya Sachan. 2023b. Efficient prompting via dynamic in-context learning. CoRR, abs/2305.11170. Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. 2023c. Large l...

work page arXiv 2023