GEAR reshapes GRPO trajectory advantages using divergence signals from a ground-truth-conditioned teacher to create adaptive token- and segment-level credit regions.
Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551
7 Pith papers cite this work. Polarity classification is still indexing.
years
2026 7representative citing papers
LEAD uses online adaptive mechanisms including Potential-Scaled Instability and symmetric efficiency rewards based on correct rollouts to achieve higher accuracy-efficiency scores with substantially shorter reasoning outputs than base models on math benchmarks.
AgentForesight trains a 7B model to perform online auditing of multi-agent LLM trajectories, detecting early decisive errors and outperforming larger models on custom and external benchmarks.
SkVM uses capability profiling and compiler-style techniques to make skills portable across LLMs and harnesses, raising task completion rates while cutting token use by up to 40% and delivering up to 3.2x speedup.
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
EligMeta automates trial discovery from registries and incorporates eligibility similarity into meta-analysis weighting to yield population-aligned pooled estimates, as shown by recovering all guideline trials in one case and shifting a risk ratio in another.
STAR is a failure-aware Markovian router that learns recovery transitions from both successful and unsuccessful execution traces to improve multi-agent performance on spatiotemporal benchmarks.
citing papers explorer
-
GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation
GEAR reshapes GRPO trajectory advantages using divergence signals from a ground-truth-conditioned teacher to create adaptive token- and segment-level credit regions.
-
LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models
LEAD uses online adaptive mechanisms including Potential-Scaled Instability and symmetric efficiency rewards based on correct rollouts to achieve higher accuracy-efficiency scores with substantially shorter reasoning outputs than base models on math benchmarks.
-
AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems
AgentForesight trains a 7B model to perform online auditing of multi-agent LLM trajectories, detecting early decisive errors and outperforming larger models on custom and external benchmarks.
-
SkVM: Revisiting Language VM for Skills across Heterogenous LLMs and Harnesses
SkVM uses capability profiling and compiler-style techniques to make skills portable across LLMs and harnesses, raising task completion rates while cutting token use by up to 40% and delivering up to 3.2x speedup.
-
The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
-
Eligibility-Aware Evidence Synthesis: An Agentic Framework for Clinical Trial Meta-Analysis
EligMeta automates trial discovery from registries and incorporates eligibility similarity into meta-analysis weighting to yield population-aligned pooled estimates, as shown by recovering all guideline trials in one case and shifting a risk ratio in another.
-
STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning
STAR is a failure-aware Markovian router that learns recovery transitions from both successful and unsuccessful execution traces to improve multi-agent performance on spatiotemporal benchmarks.