hub

arXiv preprint arXiv:2303.03846 , year =

Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, et al · 2023 · arXiv 2303.03846

23 Pith papers cite this work. Polarity classification is still indexing.

23 Pith papers citing it

read on arXiv browse 23 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

DICE: Entropy-Regularized Equilibrium Selection for Stable Multi-Agent LLM Coordination

cs.LG · 2026-06-06 · unverdicted · novelty 7.0

DICE formalizes multi-agent LLM coordination as discounted incomplete-information Markov games and introduces Heterogeneous Quantal Response Equilibrium (HQRE) to achieve unique stable equilibria with bounded regret, demonstrated via prompt-control and fine-tuning algorithms on eleven benchmarks.

In-Context Fixation: When Demonstrated Labels Override Semantics in Few-Shot Classification

cs.LG · 2026-05-08 · conditional · novelty 7.0

In-context learning binds model outputs to the demonstrated label tokens as an exhaustive vocabulary, overriding semantic plausibility and causing fixation even with homogeneous or nonsense labels.

Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

In a controlled synthetic setting, transformers implement in-distribution task inference via convex combinations of task vectors and out-of-distribution inference via nearly orthogonal extrapolative representations.

Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective

cs.CL · 2026-04-25 · conditional · novelty 7.0 · 2 refs

A controlled formal language task reveals fine-tuning outperforms in-context learning on in-distribution generalization but equals it on out-of-distribution, with ICL showing greater sensitivity to model size and tokenization.

Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

Contextual entrainment decreases for semantic contexts but increases for non-semantic ones as LLMs scale, following power-law trends with 4x better resistance to misinformation but 2x more copying of arbitrary tokens.

Pre-trained Large Language Models Learn Hidden Markov Models In-context

cs.LG · 2025-06-08 · unverdicted · novelty 7.0

Pre-trained LLMs learn to predict HMM-generated sequences via in-context learning, approaching theoretical optimum on synthetic HMMs and matching expert models on real animal decision data.

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

cs.CV · 2023-10-23 · unverdicted · novelty 7.0

HallusionBench shows GPT-4V reaches only 31.42% accuracy on paired questions testing language hallucination and visual illusion in LVLMs, with other models below 16%.

Large Language Models as Optimizers

cs.LG · 2023-09-07 · unverdicted · novelty 7.0

Large language models can optimize by being prompted with histories of past solutions and scores to propose better ones, producing prompts that raise accuracy up to 8% on GSM8K and 50% on Big-Bench Hard over human-designed baselines.

OpenRFM: Dissecting Relational In-Context Learning

cs.LG · 2026-06-03 · unverdicted · novelty 6.0

OpenRFM combines a relational transformer backbone with a batch-level ICL layer and homophily-aware synthetic-plus-real pre-training to improve relational in-context learning by ~30% over prior open models and surpass KumoRFMv1.

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

Larger models succeed on rare and complex tasks by reducing gradient interference from common tasks, allowing rare-task features to accumulate, as shown via synthetic task mixtures and OLMo pretraining from 4M to 4B parameters.

When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning

cs.LG · 2026-05-25 · unverdicted · novelty 6.0

Task-preserving perturbations of correct exemplars can degrade ICL performance by changing the effective evidence mixture used for inference.

Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

Empirical 2x2 factorial study on 6 statistical datasets shows format and schema constraints in LLM-based KG construction from CSV tables produce super-additive fidelity loss up to +1.180, with mismatched pairs falling below baseline, plus release of CSVFidelity-Bench.

In-Context Learning Operates as Concept Subspace Learning

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

In-context learning decomposes into concept-coordinate regression plus off-subspace leakage, with recoverable task information concentrating in a 68-73 dimensional task-aligned subspace of the residual stream that restores 78.8% of the accuracy gap in Llama-3-8B experiments.

OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.

Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis

cs.CL · 2025-09-29 · unverdicted · novelty 6.0

A new framework using Task Subspace Logit Attribution localizes attention heads specialized for task recognition and task learning in in-context learning, showing they align and rotate hidden states within a task subspace.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

cs.SE · 2024-03-12 · unverdicted · novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

CFALR: Collaborative Filtering-Augmented Large Language Model for Personalized Fashion Outfit Recommendation

cs.IR · 2026-06-11 · unverdicted · novelty 5.0

CFALR augments LLMs with collaborative filtering embeddings via trainable projection layers to outperform prior CF and LLM methods on Polyvore and IQON for personalized outfit tasks.

Constitutional On-Policy Safe Distillation

cs.LG · 2026-06-02 · unverdicted · novelty 5.0

COPSD uses a Cross-SFT cold-start followed by constitution-conditioned distillation to achieve stronger safety-helpfulness balance and lower safety tax on reasoning than prior on-policy self-distillation methods.

Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt

cs.CL · 2026-06-01 · unverdicted · novelty 5.0

Larger LLMs reproduce constructional productivity via entrenchment in coercion cases with nonce words but fail to use statistical preemption to avoid overgeneralizing semantically plausible but unobserved patterns.

Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override

cs.CL · 2026-05-25 · unverdicted · novelty 5.0

Pretrained lexical priors in language models persist despite explicit remapping rules, as shown by a Stroop paradigm where prior strength predicts interference and activation patching localizes the repair mechanism.

A Systematic Study of Training-Free Methods for Trustworthy Large Language Models

cs.CL · 2026-04-17 · unverdicted · novelty 5.0

Training-free methods for LLM trustworthiness show inconsistent results across dimensions, with clear trade-offs in utility, robustness, and overhead depending on where they intervene during inference.

PaLI-X: On Scaling up a Multilingual Vision and Language Model

cs.CV · 2023-05-29 · unverdicted · novelty 4.0

Scaling a multilingual vision-language model in size and training breadth yields new state-of-the-art results on over 25 benchmarks plus emerging abilities in counting and multilingual detection.

The Prompt Engineering Report Distilled: Quick Start Guide for Life Sciences

cs.CL · 2025-09-14 · unverdicted · novelty 3.0 · 2 refs

The paper reduces a broad set of prompt engineering techniques to six core approaches and applies them to life sciences use cases while addressing common LLM pitfalls.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables cs.AI · 2026-05-21 · unverdicted · none · ref 44
Empirical 2x2 factorial study on 6 statistical datasets shows format and schema constraints in LLM-based KG construction from CSV tables produce super-additive fidelity loss up to +1.180, with mismatched pairs falling below baseline, plus release of CSVFidelity-Bench.
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces cs.AI · 2026-05-09 · unverdicted · none · ref 88
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.

arXiv preprint arXiv:2303.03846 , year =

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer