A stop-gradient consistency regularizer mitigates context-induced degradation in on-policy distillation, improving robustness across 12 configurations.
hub Canonical reference
A Survey on In-context Learning
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 6polarities
background 6representative citing papers
HDSL is a tree-structured DSL for 3D indoor scenes that lets LLM agents generate subtrees recursively and perform localized edits via hierarchical retrieval and deterministic merge.
Long input windows are required to identify the generative process in time series forecasting even for short-memory processes, and decoupling identification from forecasting improves scalability.
NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.
LLM-based security code review is vulnerable to framing bias, with a novel iterative refinement attack achieving 100% success in reintroducing vulnerabilities across real projects.
LM agents' changeable modules prevent persistent identity and sanction sensitivity, making reputation mechanisms structurally inapplicable and requiring protocol-based behavioral harnesses instead.
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
OP-Mix is an on-policy data mixing method that uses low-rank adapter interpolation to find near-optimal data mixtures throughout language model training with reduced compute.
A tabular foundation model with LLM-as-Observer features predicts AI agent decisions in controlled games, outperforming baselines by 4 AUC points and 14% lower error at K=16 interactions.
RADAGAS-GPT4o achieves a 22.73% bypass rate against 10 WAFs, succeeding more against AI/ML-based firewalls than rule-based ones.
Causal evidence from representation analysis and interventions shows LLMs use both genuine structure inference and induction circuits in parallel for in-context graph learning.
SPARK improves LLM-based test code fault localization by retrieving similar past faults and selectively annotating suspicious lines in new failing tests.
GRaSp optimizes in-context examples for LLMs via synthetic generation, clustering, dimensionality reduction, and genetic algorithms with diversity-adaptive mutation, reaching 45.84% micro-F1 on financial NER with real data and outperforming zero-shot and random few-shot baselines.
Fine-tuning 7B code LLMs on a custom multi-file DSL dataset achieves structural fidelity of 1.00, high exact-match accuracy, and practical utility validated by expert survey and execution checks.
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
Cross-domain memory transfer in coding agents boosts average performance by 3.7% via meta-knowledge like validation routines, with high-level abstractions transferring better than low-level traces.
Progress Ratio Embeddings use a trigonometric progress-ratio signal to deliver stable length control in transformers that generalizes to unseen target lengths.
Multimodal LLMs achieve moderate correlations with human visual creativity ratings in zero-shot evaluation across two datasets, with reasoning outputs providing interpretability but no accuracy gain.
GRAB improves multi-table QA performance by encoding relational data as graphs and bridging structural signals to frozen LLMs through latent tokens.
Constrained candidate selection from retrieved chunks raises Recall@5 from 31.9% to 50.0% and parseable outputs on 420 queries from 200 municipal meeting transcripts.
Introduces Parametric Memory Law as power law for LoRA memory capacity and MemFT threshold-guided optimization for better memory fidelity.
Introduces PAS and FAS task abstractions plus the LLM-S^3 benchmark to evaluate LLMs on generating sociodemographic survey responses across 11 real datasets and multiple models.
Mujica-MyGo decomposes multi-turn RAG interactions via multi-agent workflows and applies minimalist policy gradient optimization to improve performance on QA benchmarks while avoiding long-context problems.
Zero-shot VLMs reach at most 62% accuracy on agricultural classification tasks while supervised models like YOLO11 perform markedly higher, indicating they are not ready to replace task-specific systems.
citing papers explorer
-
Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time
OP-Mix is an on-policy data mixing method that uses low-rank adapter interpolation to find near-optimal data mixtures throughout language model training with reduced compute.
-
GRaSp: Automatic Example Optimization for In-Context Learning in Low-Data Tasks
GRaSp optimizes in-context examples for LLMs via synthetic generation, clustering, dimensionality reduction, and genetic algorithms with diversity-adaptive mutation, reaching 45.84% micro-F1 on financial NER with real data and outperforming zero-shot and random few-shot baselines.
-
Progress Ratio Embeddings: An Impatience Signal for Robust Length Control in Neural Text Generation
Progress Ratio Embeddings use a trigonometric progress-ratio signal to deliver stable length control in transformers that generalizes to unseen target lengths.
-
How LLMs See Creativity: Zero-Shot Scoring of Visual Creativity with Interpretable Reasoning
Multimodal LLMs achieve moderate correlations with human visual creativity ratings in zero-shot evaluation across two datasets, with reasoning outputs providing interpretability but no accuracy gain.
-
Latent Bridges for Multi-Table Question Answering
GRAB improves multi-table QA performance by encoding relational data as graphs and bridging structural signals to frozen LLMs through latent tokens.
-
Topic-to-Timestamp Alignment by Constrained Evidence Selection
Constrained candidate selection from retrieved chunks raises Recall@5 from 31.9% to 50.0% and parseable outputs on 420 queries from 200 municipal meeting transcripts.
-
How LoRA Remembers? A Parametric Memory Law for LLM Finetuning
Introduces Parametric Memory Law as power law for LoRA memory capacity and MemFT threshold-guided optimization for better memory fidelity.
-
Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning
Mujica-MyGo decomposes multi-turn RAG interactions via multi-agent workflows and applies minimalist policy gradient optimization to improve performance on QA benchmarks while avoiding long-context problems.
- An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages