arXiv preprint arXiv:2303.03846 , year=

Larger language models do in-context learning differently , author= · 2023 · arXiv 2303.03846

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

In-Context Fixation: When Demonstrated Labels Override Semantics in Few-Shot Classification

cs.LG · 2026-05-08 · conditional · novelty 7.0

In-context learning binds model outputs to the demonstrated label tokens as an exhaustive vocabulary, overriding semantic plausibility and causing fixation even with homogeneous or nonsense labels.

Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

In a controlled synthetic setting, transformers implement in-distribution task inference via convex combinations of task vectors and out-of-distribution inference via nearly orthogonal extrapolative representations.

Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective

cs.CL · 2026-04-25 · unverdicted · novelty 7.0

Fine-tuning shows higher proficiency than in-context learning on in-distribution generalization in formal languages, with equal out-of-distribution performance and diverging inductive biases at high proficiency.

Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

Contextual entrainment decreases for semantic contexts but increases for non-semantic ones as LLMs scale, following power-law trends with 4x better resistance to misinformation but 2x more copying of arbitrary tokens.

OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

cs.SE · 2024-03-12 · unverdicted · novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

A Systematic Study of Training-Free Methods for Trustworthy Large Language Models

cs.CL · 2026-04-17 · unverdicted · novelty 5.0

Training-free methods for LLM trustworthiness show inconsistent results across dimensions, with clear trade-offs in utility, robustness, and overhead depending on where they intervene during inference.

citing papers explorer

Showing 7 of 7 citing papers.

In-Context Fixation: When Demonstrated Labels Override Semantics in Few-Shot Classification cs.LG · 2026-05-08 · conditional · none · ref 45
In-context learning binds model outputs to the demonstrated label tokens as an exhaustive vocabulary, overriding semantic plausibility and causing fixation even with homogeneous or nonsense labels.
Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers cs.LG · 2026-05-05 · unverdicted · none · ref 46
In a controlled synthetic setting, transformers implement in-distribution task inference via convex combinations of task vectors and out-of-distribution inference via nearly orthogonal extrapolative representations.
Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective cs.CL · 2026-04-25 · unverdicted · none · ref 69
Fine-tuning shows higher proficiency than in-context learning on in-distribution generalization in formal languages, with equal out-of-distribution performance and diverging inductive biases at high proficiency.
Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size cs.CL · 2026-04-14 · unverdicted · none · ref 12
Contextual entrainment decreases for semantic contexts but increases for non-semantic ones as LLMs scale, following power-law trends with 4x better resistance to misinformation but 2x more copying of arbitrary tokens.
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces cs.AI · 2026-05-09 · unverdicted · none · ref 88
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code cs.SE · 2024-03-12 · unverdicted · none · ref 199
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
A Systematic Study of Training-Free Methods for Trustworthy Large Language Models cs.CL · 2026-04-17 · unverdicted · none · ref 52
Training-free methods for LLM trustworthiness show inconsistent results across dimensions, with clear trade-offs in utility, robustness, and overhead depending on where they intervene during inference.

arXiv preprint arXiv:2303.03846 , year=

fields

years

verdicts

representative citing papers

citing papers explorer