Tool-use agents suffer large accuracy drops from reward and transition perturbations but domain-randomized RL on static perturbations closes about 27% of the unseen transition gap while retaining most clean performance.
Looptool: Closing the data-training loop for robust llm tool calls
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A new image-bank harness and closed-loop on-policy data evolution method raises multimodal agent performance on visual search benchmarks from 24.9% to 39.0% for an 8B model and from 30.6% to 41.5% for a 30B model.
MMSkills turns public interaction trajectories into compact multimodal skill packages that visual agents can consult at runtime to improve decision-making on benchmarks.
citing papers explorer
-
When Simulation Lies: A Sim-to-Real Benchmark and Domain-Randomized RL Recipe for Tool-Use Agents
Tool-use agents suffer large accuracy drops from reward and transition perturbations but domain-randomized RL on static perturbations closes about 27% of the unseen transition gap while retaining most clean performance.
-
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents
A new image-bank harness and closed-loop on-policy data evolution method raises multimodal agent performance on visual search benchmarks from 24.9% to 39.0% for an 8B model and from 30.6% to 41.5% for a 30B model.
-
MMSkills: Towards Multimodal Skills for General Visual Agents
MMSkills turns public interaction trajectories into compact multimodal skill packages that visual agents can consult at runtime to improve decision-making on benchmarks.