TextReg mitigates prompt distributional overfitting via regularized text-space optimization, reporting up to +16.5% OOD accuracy gains over prior methods on reasoning benchmarks.
hub Canonical reference
Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901
Canonical reference. 82% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.
Local neural operators on 3x3x3 patches, composed via Schwarz iteration, solve large-scale nonlinear elasticity on arbitrary geometries without domain-specific retraining.
DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.
A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortion-compression trade-off.
In a controlled synthetic setting, transformers implement in-distribution task inference via convex combinations of task vectors and out-of-distribution inference via nearly orthogonal extrapolative representations.
TimeSeriesExamAgent combines templates and LLM agents to generate scalable time series reasoning benchmarks, demonstrating that current LLMs have limited performance on both abstract and domain-specific tasks.
This work delivers the first measurements of performance-energy trade-offs across four multi-request LLM workflow patterns on A100 GPUs using vLLM and Parrot.
A distributional alignment metric d_NTP and a linear regression method LTV for task vectors that improves accuracy by 9.2% over baselines on classification and regression tasks across multiple LLMs.
Introduces the ICT framework and an RL pipeline to train language agent reflectors that distill experience into reusable prompts, outperforming baselines on held-out tasks in ALFWorld and MiniHack.
Attention-state memory externalizes long prefixes into a lightweight lookup table of precomputed attention states, yielding higher accuracy than standard in-context learning at fixed memory budgets and lower latency than full attention.
Nonlinear Bipolar Compensation with Bipolar Logarithmic Transformation reduces outlier effects in post-training quantization by performing compensation in a compressed transformed space.
Conditional Attribute Transformers jointly estimate next-token probabilities and conditional attribute values for autoregressive sequence models, enabling credit assignment, counterfactuals, and steerable generation in one pass.
An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.
AB-Sparse adaptively allocates per-head block sizes for sparse attention, adds lossless centroid quantization and custom variable-block GPU kernels, and reports up to 5.43% accuracy gain over fixed-block baselines with no throughput loss.
SOMA estimates a local response manifold from early turns and adapts a small surrogate model via divergence-maximizing prompts and localized LoRA fine-tuning for efficient multi-turn serving.
MoLF routes updates between full fine-tuning and LoRA at the optimizer level to match or exceed the better of the two static methods on SQL, medical QA, and counterfactual tasks while an efficient variant outperforms prior adaptive LoRA by up to 20%.
GazeMind encodes gaze data for LLM reasoning to deliver interpretable, personalized cognitive load predictions that generalize across tasks without fine-tuning and outperform baselines by over 20% on a new 152-person dataset.
ViewSAM achieves state-of-the-art weakly supervised performance on cross-view referring multi-object tracking by refining SAM tracklets via affinity-guided re-prompting and modeling view-induced variations as learnable conditions on SAM2.
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
LMM-Searcher uses file-based visual UIDs and a fetch tool plus 12K synthesized trajectories to fine-tune a multimodal agent that scales to 100-turn horizons and reaches SOTA among open-source models on MM-BrowseComp and MMSearch-Plus.
Small transformers learn to forecast unseen dynamical systems in-context by using delay embeddings to recover the manifold and forecasting its invariant sets via a transfer-operator strategy.
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
LoRA-Mixer routes modular LoRA experts into attention projection matrices with an adaptive Routing Specialization Loss to improve multi-task performance while using fewer trainable parameters than prior LoRA-MoE methods.
citing papers explorer
-
DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies
DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.