hub Canonical reference

Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al · 1901

Canonical reference. 82% of citing Pith papers cite this work as background.

35 Pith papers citing it

Background 82% of classified citations

browse 35 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 9 method 1 other 1

citation-polarity summary

background 9 unclear 1 use method 1

representative citing papers

TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

TextReg mitigates prompt distributional overfitting via regularized text-space optimization, reporting up to +16.5% OOD accuracy gains over prior methods on reasoning benchmarks.

TabQL: In-Context Q-Learning with Tabular Foundation Models

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.

Neural-Schwarz Tiling for Geometry-Universal PDE Solving at Scale

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Local neural operators on 3x3x3 patches, composed via Schwarz iteration, solve large-scale nonlinear elasticity on arbitrary geometries without domain-specific retraining.

DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.

Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortion-compression trade-off.

Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

In a controlled synthetic setting, transformers implement in-distribution task inference via convex combinations of task vectors and out-of-distribution inference via nearly orthogonal extrapolative representations.

TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale

cs.AI · 2026-04-11 · conditional · novelty 7.0

TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.

Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows

cs.DC · 2026-03-12 · unverdicted · novelty 7.0

This work delivers the first measurements of performance-energy trade-offs across four multi-request LLM workflow patterns on A100 GPUs using vLLM and Parrot.

Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

A distributional alignment metric d_NTP and a linear regression method LTV for task vectors that improves accuracy by 9.2% over baselines on classification and regression tasks across multiple LLMs.

Training Language Agents to Learn from Experience

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Introduces the ICT framework and an RL pipeline to train language agent reflectors that distill experience into reusable prompts, outperforming baselines on held-out tasks in ALFWorld and MiniHack.

Context Memorization for Efficient Long Context Generation

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Attention-state memory externalizes long prefixes into a lightweight lookup table of precomputed attention states, yielding higher accuracy than standard in-context learning at fixed memory budgets and lower latency than full attention.

Nonlinear Bipolar Compensation: Handling Outliers in Post-Training Quantization

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Nonlinear Bipolar Compensation with Bipolar Logarithmic Transformation reduces outlier effects in post-training quantization by performing compensation in a compressed transformed space.

Conditional Attribute Estimation with Autoregressive Sequence Models

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

Conditional Attribute Transformers jointly estimate next-token probabilities and conditional attribute values for autoregressive sequence models, enabling credit assignment, counterfactuals, and steerable generation in one pass.

No One Knows the State of the Art in Geospatial Foundation Models

cs.CV · 2026-05-12 · accept · novelty 6.0

An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.

AB-Sparse: Sparse Attention with Adaptive Block Size for Accurate and Efficient Long-Context Inference

cs.DC · 2026-05-12 · unverdicted · novelty 6.0

AB-Sparse adaptively allocates per-head block sizes for sparse attention, adds lossless centroid quantization and custom variable-block GPU kernels, and reports up to 5.43% accuracy gain over fixed-block baselines with no throughput loss.

SOMA: Efficient Multi-turn LLM Serving via Small Language Model

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

SOMA estimates a local response manifold from early turns and adapts a small surrogate model via divergence-maximizing prompts and localized LoRA fine-tuning for efficient multi-turn serving.

Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation

cs.CL · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

MoLF routes updates between full fine-tuning and LoRA at the optimizer level to match or exceed the better of the two static methods on SQL, medical QA, and counterfactual tasks while an efficient variant outperforms prior adaptive LoRA by up to 20%.

GazeMind: A Gaze-Guided LLM Agent for Personalized Cognitive Load Assessment

cs.HC · 2026-05-07 · unverdicted · novelty 6.0

GazeMind encodes gaze data for LLM reasoning to deliver interpretable, personalized cognitive load predictions that generalize across tasks without fine-tuning and outperform baselines by over 20% on a new 152-person dataset.

ViewSAM: Learning View-aware Cross-modal Semantics for Weakly Supervised Cross-view Referring Multi-Object Tracking

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

ViewSAM achieves state-of-the-art weakly supervised performance on cross-view referring multi-object tracking by refining SAM tracklets via affinity-guided re-prompting and modeling view-induced variations as learnable conditions on SAM2.

Linear-Time Global Visual Modeling without Explicit Attention

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.

Towards Long-horizon Agentic Multimodal Search

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

LMM-Searcher uses file-based visual UIDs and a fetch tool plus 12K synthesized trajectories to fine-tune a multimodal agent that scales to 100-turn horizons and reaches SOTA among open-source models on MM-BrowseComp and MMSearch-Plus.

Transformers for dynamical systems learn transfer operators in-context

cs.LG · 2026-02-21 · unverdicted · novelty 6.0

Small transformers learn to forecast unseen dynamical systems in-context by using delay embeddings to recover the manifold and forecasting its invariant sets via a transfer-operator strategy.

Video models are zero-shot learners and reasoners

cs.LG · 2025-09-24 · unverdicted · novelty 6.0

Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.

LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing

cs.LG · 2025-06-17 · unverdicted · novelty 6.0

LoRA-Mixer routes modular LoRA experts into attention projection matrices with an adaptive Routing Specialization Loss to improve multi-task performance while using fewer trainable parameters than prior LoRA-MoE methods.

citing papers explorer

Showing 1 of 1 citing paper after filters.

DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies cs.RO · 2026-05-12 · unverdicted · none · ref 23
DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.

Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer