pith. machine review for the scientific record. sign in

arxiv: 2301.00234 · v6 · submitted 2022-12-31 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

A Survey on In-context Learning

Baobao Chang, Ce Zheng, Damai Dai, Heming Xia, Jingjing Xu, Jingyuan Ma, Lei Li, Qingxiu Dong, Rui Li, Tianyu Liu, Xu Sun, Zhifang Sui, Zhiyong Wu

Pith reviewed 2026-05-12 12:53 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords in-context learninglarge language modelsprompt designfew-shot learningnatural language processingsurvey
0
0 comments X

The pith

In-context learning allows large language models to make predictions from prompts that include a few task examples without parameter updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey defines in-context learning formally as a paradigm where large language models condition predictions on a small set of demonstration examples placed in the input. It organizes existing work into training strategies that prepare models for this behavior, prompt design methods that select and format examples, and analyses that probe why the approach succeeds. The review also examines concrete uses such as selecting high-quality data and injecting fresh knowledge into a fixed model. By collecting these threads the authors aim to clarify the current state of the field and point out obstacles that block wider adoption. The result is a map that lets researchers see how different pieces of ICL research fit together.

Core claim

In-context learning has emerged as a new paradigm for natural language processing in which large language models make predictions based on contexts augmented with a few examples, and this survey organizes the techniques, applications, and open problems to facilitate further research into how such models extrapolate abilities from limited context.

What carries the argument

The formal definition of in-context learning as the process by which large language models generate outputs conditioned on a prompt that contains a small number of demonstration examples.

Load-bearing premise

The rapidly expanding body of ICL literature can be usefully organized and summarized within the scope and selection criteria of a single survey paper.

What would settle it

Publication of a substantial set of subsequent papers that introduce major ICL techniques or challenges absent from the survey's categories would show the organization does not capture the field's current state.

read the original abstract

With the increasing capabilities of large language models (LLMs), in-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP), where LLMs make predictions based on contexts augmented with a few examples. It has been a significant trend to explore ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress and challenges of ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques, including training strategies, prompt designing strategies, and related analysis. Additionally, we explore various ICL application scenarios, such as data engineering and knowledge updating. Finally, we address the challenges of ICL and suggest potential directions for further research. We hope that our work can encourage more research on uncovering how ICL works and improving ICL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper surveys in-context learning (ICL) as an emerging paradigm for large language models (LLMs) in NLP. It begins with a formal definition of ICL and its relation to related concepts, then organizes advanced techniques (training strategies, prompt design, and analyses), reviews applications (e.g., data engineering and knowledge updating), and concludes by addressing challenges and suggesting future research directions.

Significance. A structured survey on ICL would be moderately significant given the field's rapid expansion, as it could consolidate literature on techniques, applications, and open problems to guide researchers. The logical organization from definition through techniques and applications to challenges supports its utility as a reference, though its impact depends on the depth and balance of coverage across the cited works.

minor comments (3)
  1. [Introduction/Abstract] The abstract states that the paper clarifies ICL's correlation to related studies, but the provided outline does not specify how overlaps with few-shot prompting or meta-learning are delineated; adding a dedicated subsection or table comparing these would improve clarity.
  2. The discussion of prompt designing strategies is listed as a core technique area, yet without explicit criteria for paper selection or inclusion of quantitative benchmarks across methods, the summary risks appearing selective; a methods section detailing search terms and coverage would strengthen the survey.
  3. [Challenges and Future Directions] In the challenges section, the suggestions for future directions are high-level; grounding each with references to specific recent papers or open benchmarks would make the recommendations more actionable.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and recommendation of minor revision. The positive assessment of the survey's logical organization, coverage of definitions, techniques, applications, and challenges is appreciated. As no specific major comments were provided, we have no point-by-point revisions to address.

Circularity Check

0 steps flagged

No significant circularity: literature survey without derivations or predictions

full rationale

This is a survey paper whose sole claim is to organize and summarize the existing ICL literature. It presents a formal definition of ICL and discusses techniques, applications, and challenges drawn from cited works, but advances no original equations, fitted parameters, predictions, or derivations. No load-bearing step reduces to self-definition, self-citation chains, or renaming of results. Self-citations (if any) support only descriptive coverage and are not invoked to justify uniqueness theorems or force technical conclusions. The paper is self-contained as an editorial synthesis against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper the work aggregates and organizes prior publications; it introduces no free parameters, no new axioms, and no invented entities.

pith-pipeline@v0.9.0 · 5484 in / 1030 out tokens · 48351 ms · 2026-05-12T12:53:53.077073+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Forward citations

Cited by 33 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. From Context to Skills: Can Language Models Learn from Context Skillfully?

    cs.AI 2026-04 unverdicted novelty 8.0

    Ctx2Skill lets language models autonomously evolve context-specific skills via multi-agent self-play, improving performance on context learning tasks without human supervision.

  2. Gradient-Based Program Synthesis with Neurally Interpreted Languages

    cs.LG 2026-04 unverdicted novelty 8.0

    NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prio...

  3. Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks

    cs.AI 2026-05 unverdicted novelty 7.0

    Pro²Assist uses multimodal egocentric perception from AR glasses to track fine-grained progress in long-horizon procedural tasks and deliver timely proactive assistance, outperforming baselines by over 21% in action u...

  4. AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation

    cs.CV 2026-04 unverdicted novelty 7.0

    AnchorSeg uses ordered query banks of latent reasoning tokens plus a spatial anchor token and a Token-Mask Cycle Consistency loss to achieve 67.7% gIoU and 68.1% cIoU on the ReasonSeg benchmark.

  5. Deformation-based In-Context Learning for Point Cloud Understanding

    cs.CV 2026-04 unverdicted novelty 7.0

    DeformPIC deforms query point clouds under prompt guidance for in-context learning, outperforming prior methods with lower Chamfer Distance on reconstruction, denoising, and registration tasks.

  6. Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

    cs.CV 2023-10 accept novelty 7.0

    Set-of-Mark prompting marks segmented image regions with alphanumerics and masks to let GPT-4V achieve state-of-the-art zero-shot results on referring expression comprehension and segmentation benchmarks like RefCOCOg.

  7. Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

    cs.LG 2026-05 unverdicted novelty 6.0

    METIS internalizes curriculum judgment in LLM reinforcement fine-tuning by predicting within-prompt reward variance via in-context learning and jointly optimizing with a self-judgment reward, yielding superior perform...

  8. Personal Visual Context Learning in Large Multimodal Models

    cs.CV 2026-05 unverdicted novelty 6.0

    Introduces Personal VCL formalization and benchmark revealing LMM context gaps, plus an Agentic Context Bank baseline that boosts personalized visual reasoning.

  9. Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging

    cs.CV 2026-05 unverdicted novelty 6.0

    WALDO improves zero-shot anomaly localization in medical imaging by selecting reference distributions via entropy-weighted Sliced Wasserstein distances and Goldilocks zone sampling, yielding a 19% relative gain on bra...

  10. Decompose and Recompose: Reasoning New Skills from Existing Abilities for Cross-Task Robotic Manipulation

    cs.RO 2026-05 unverdicted novelty 6.0

    Decompose and Recompose decomposes seen robotic demonstrations into skill-action alignments and recomposes them via visual-semantic retrieval and planning to enable zero-shot cross-task generalization.

  11. RAQG-QPP: Query Performance Prediction with Retrieved Query Variants and Retrieval Augmented Query Generation

    cs.IR 2026-04 unverdicted novelty 6.0

    Retrieved query variants from logs combined with LLM-augmented generation improve unsupervised QPP accuracy by up to 30% for neural rankers on TREC DL'19 and DL'20.

  12. Dual-Enhancement Product Bundling: Bridging Interactive Graph and Large Language Model

    cs.CL 2026-04 unverdicted novelty 6.0

    A graph-to-text paradigm with Dynamic Concept Binding Mechanism integrates interactive graphs and LLMs to recommend product bundles, yielding 6.3%-26.5% gains over baselines on POG, POG_dense, and Steam datasets.

  13. Learning to Adapt: In-Context Learning Beyond Stationarity

    cs.LG 2026-04 unverdicted novelty 6.0

    Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.

  14. Bridging Natural Language and Microgrid Dynamics: A Context-Aware Simulator and Dataset

    eess.SY 2026-04 unverdicted novelty 6.0

    OpenCEM is the first open-source digital twin that integrates unstructured contextual information with quantitative microgrid dynamics to enable context-aware energy management.

  15. Measuring Representation Robustness in Large Language Models for Geometry

    cs.CL 2026-04 unverdicted novelty 6.0

    LLMs display accuracy gaps of up to 14 percentage points on the same geometry problems solely due to representation choice, with vector forms consistently weakest and a convert-then-solve prompt helping only high-capa...

  16. Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

    cs.CL 2026-04 unverdicted novelty 6.0

    A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.

  17. Video models are zero-shot learners and reasoners

    cs.LG 2025-09 unverdicted novelty 6.0

    Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.

  18. LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

    cs.SE 2024-03 unverdicted novelty 6.0

    LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

  19. LLMs with in-context learning for Algorithmic Theoretical Physics

    cs.LG 2026-05 unverdicted novelty 5.0

    Frontier LLMs with in-context learning and CAS integration solve most algorithmic tasks in theoretical physics when supplied with worked examples.

  20. When Context Sticks: Studying Interference in In-Context Learning

    cs.LG 2026-04 unverdicted novelty 5.0

    In-context learning shows persistent interference from prior examples, with more misleading linear examples degrading quadratic predictions and training curricula modulating recovery speed.

  21. Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)

    cs.CL 2026-04 unverdicted novelty 5.0

    SSAS improves LLM sentiment prediction consistency and data quality by up to 30% on three review datasets via syntactic and semantic context assessment summarization.

  22. The Pedagogy of AI Mistakes: Fostering Higher-Order Thinking

    cs.CY 2026-05 unverdicted novelty 4.0

    AI mistakes can be structured into course activities to foster higher-order thinking, metacognition, and AI literacy in higher education.

  23. UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning

    cs.CV 2026-05 unverdicted novelty 4.0

    UnAC improves LMM performance on visual reasoning benchmarks by combining adaptive visual prompting, image abstraction, and gradual self-checking.

  24. Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition

    cs.AI 2026-04 conditional novelty 4.0

    Fine-tuned LLaMA3 with LoRA reaches 81.24% F1 on 18-category fine-grained medical entity recognition, beating zero-shot by 63.11% and few-shot by 35.63%.

  25. Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs

    cs.CL 2026-04 unverdicted novelty 4.0

    wSSAS is a two-phase deterministic framework that uses hierarchical text organization and SNR-based feature prioritization to improve clustering integrity, categorization accuracy, and reproducibility when applying LL...

  26. LLMs Underperform Graph-Based Parsers on Supervised Relation Extraction for Complex Graphs

    cs.CL 2026-04 unverdicted novelty 4.0

    Graph-based parsers outperform LLMs on supervised relation extraction as linguistic graph complexity grows with more relations per document.

  27. Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition

    cs.SE 2026-04 conditional novelty 4.0

    Hybrid LLM plus static analysis for algorithm recognition in code cuts required model calls by 72-97% and lifts F1-scores by as much as 12 points.

  28. The Rise and Potential of Large Language Model Based Agents: A Survey

    cs.AI 2023-09 accept novelty 4.0

    The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

  29. Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities

    cs.DC 2026-04 unverdicted novelty 3.0

    A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.

  30. LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

    cs.CL 2024-12 accept novelty 3.0

    A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.

  31. A Survey on Large Language Models for Code Generation

    cs.CL 2024-06 unverdicted novelty 3.0

    A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark...

  32. Large Language Models: A Survey

    cs.CL 2024-02 accept novelty 3.0

    The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.

  33. A Survey of Large Language Models

    cs.CL 2023-03 accept novelty 3.0

    This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · cited by 33 Pith papers · 1 internal anchor

  1. [1]

    In Ad- vances in Neural Information Processing Systems 33: Annual Conference on Neural Information Process- ing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual

    Language models are few-shot learners. In Ad- vances in Neural Information Processing Systems 33: Annual Conference on Neural Information Process- ing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. Marc-Etienne Brunet, Ashton Anderson, and Richard S. Zemel. 2023. ICL markup: Structuring in- context learning using soft-token tags. CoRR, abs/2312...

  2. [2]

    Data distributional properties drive emergent in-context learning in transformers. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Sys- tems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. Anwoy Chatterjee, Eshaan Tanwar, Subhabrata Dutta, and Tanmoy Chakraborty. 2024. L...

  3. [3]

    CoRR, abs/2310.17086

    Transformers learn higher-order optimization methods for in-context learning: A study with linear models. CoRR, abs/2310.17086. Yeqi Gao, Zhao Song, and Shenghao Xie. 2023. In- context learning for attention scheme: from single softmax regression to multiple softmax regression via a tensor trick. CoRR, abs/2307.02419. Shivam Garg, Dimitris Tsipras, Percy ...

  4. [4]

    Michael Hahn and Navin Goyal

    Association for Computational Linguistics. Michael Hahn and Navin Goyal. 2023. A theory of emergent in-context learning as implicit structure induction. CoRR, abs/2303.07971. Chi Han, Ziqi Wang, Han Zhao, and Heng Ji. 2023a. Explaining emergent in-context learning as kernel regression. Preprint, arXiv:2305.12766. Xiaochuang Han, Daniel Simig, Todor Mihayl...

  5. [5]

    CoRR, abs/2404.00884

    Self-demos: Eliciting out-of-demonstration generalizability in large language models. CoRR, abs/2404.00884. 11 Clyde Highmore. 2024. In-context learning in large language models: A comprehensive survey. Or Honovich, Uri Shaham, Samuel R. Bowman, and Omer Levy. 2023. Instruction induction: From few examples to natural language task descriptions. In Proceed...

  6. [6]

    arXiv preprint arXiv:2304.09960 , year=

    The dual form of neural networks revisited: Connecting test time predictions to training patterns via spotlights of attention. In International Confer- ence on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA , volume 162 of Proceedings of Machine Learning Research, pages 9639–9659. PMLR. Srinivasan Iyer, Xi Victoria Lin, Ramakanth P...

  7. [7]

    Arvind Mahankali, Tatsunori B

    Association for Computational Linguistics. Arvind Mahankali, Tatsunori B. Hashimoto, and Tengyu Ma. 2023. One step of gradient descent is provably the optimal in-context learner with one layer of linear self-attention. CoRR, abs/2307.03576. Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, and...

  8. [8]

    Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, pages 2655–2671, Seattle, United States. Association for Computational Linguistics. Abulhair Saparov and He He. 2023. Language models are greedy reasoners...

  9. [9]

    Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush V osoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, et al

    Do pretrained transformers learn in-context by gradient descent? Preprint, arXiv:2310.08540. Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush V osoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, et al. 2022. Language models are multilingual chain-of-thought reasoners. ArXiv preprint, abs/2210.03057. Weijia Shi, Sewo...

  10. [10]

    An information-theoretic approach to prompt engineering without ground truth labels. In Proc. of ACL, pages 819–862, Dublin, Ireland. Association for Computational Linguistics. Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al. 2022. Beyond ...

  11. [11]

    2023 , month = nov, journal =

    Pretraining data mixtures enable narrow model selection capabilities in transformer models. CoRR, abs/2311.00871. Jinghan Yang, Shuming Ma, and Furu Wei. 2023a. Auto-icl: In-context learning without human supervi- sion. CoRR, abs/2311.09263. Zhe Yang, Damai Dai, Peiyi Wang, and Zhifang Sui. 2023b. Not all demonstration examples are equally beneficial: Rew...

  12. [12]

    Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron C

    OpenReview.net. Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron C. Courville, Behnam Neyshabur, and Hanie Sedghi

  13. [13]

    Segment the horses from the rest of the image and generate a new image where the horse regions are white and the other regions are black

    Teaching algorithmic reasoning via in-context learning. CoRR, abs/2211.09066. Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cot- terell, and Mrinmaya Sachan. 2023b. Efficient prompting via dynamic in-context learning. CoRR, abs/2305.11170. Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. 2023c. Large l...