Agent-r: Train- ing language model agents to reflect via iterative self-training

Siyu Yuan, Zehui Chen, Zhiheng Xi, Junjie Ye, Zhengyin Du, Jiecao Chen · 2025 · arXiv 2501.11425

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents

cs.LG · 2026-04-12 · unverdicted · novelty 6.0

Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.

TEC: A Collection of Human Trial-and-error Trajectories for Problem Solving

cs.CL · 2026-04-08 · unverdicted · novelty 6.0

TEC is a new public dataset of detailed human trial-and-error trajectories and reflections on web tasks, with humans showing substantially higher accuracy than LLMs.

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

cs.CL · 2025-06-18 · unverdicted · novelty 6.0

MEM1 uses end-to-end RL to learn constant-memory agents that update a shared state for memory and reasoning, delivering 3.5x better performance and 3.7x lower memory use than larger baselines on long-horizon QA and shopping tasks.

RoboAgent: Chaining Basic Capabilities for Embodied Task Planning

cs.RO · 2026-04-09 · unverdicted · novelty 5.0

RoboAgent chains basic vision-language capabilities inside a single VLM via a scheduler and trains it in three stages (behavior cloning, DAgger, RL) to improve embodied task planning.

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

cs.CL · 2025-08-12 · unverdicted · novelty 5.0

An automated environment construction pipeline plus verifiable rewards enables RL training that improves LLM tool-use performance across scales without harming general capabilities.

Position: Agent Should Invoke External Tools ONLY When Epistemically Necessary

cs.AI · 2025-06-01 · unverdicted · novelty 4.0

Agents should invoke external tools only when epistemically necessary, per the introduced Theory of Agent framework that frames tool use as a decision under uncertainty.

citing papers explorer

Showing 6 of 6 citing papers.

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents cs.LG · 2026-04-12 · unverdicted · none · ref 44
Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.
TEC: A Collection of Human Trial-and-error Trajectories for Problem Solving cs.CL · 2026-04-08 · unverdicted · none · ref 40
TEC is a new public dataset of detailed human trial-and-error trajectories and reflections on web tasks, with humans showing substantially higher accuracy than LLMs.
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents cs.CL · 2025-06-18 · unverdicted · none · ref 64
MEM1 uses end-to-end RL to learn constant-memory agents that update a shared state for memory and reasoning, delivering 3.5x better performance and 3.7x lower memory use than larger baselines on long-horizon QA and shopping tasks.
RoboAgent: Chaining Basic Capabilities for Embodied Task Planning cs.RO · 2026-04-09 · unverdicted · none · ref 132
RoboAgent chains basic vision-language capabilities inside a single VLM via a scheduler and trains it in three stages (behavior cloning, DAgger, RL) to improve embodied task planning.
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments cs.CL · 2025-08-12 · unverdicted · none · ref 58
An automated environment construction pipeline plus verifiable rewards enables RL training that improves LLM tool-use performance across scales without harming general capabilities.
Position: Agent Should Invoke External Tools ONLY When Epistemically Necessary cs.AI · 2025-06-01 · unverdicted · none · ref 63
Agents should invoke external tools only when epistemically necessary, per the introduced Theory of Agent framework that frames tool use as a decision under uncertainty.

Agent-r: Train- ing language model agents to reflect via iterative self-training

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer