Title resolution pending

Sweet-rl: Training multi-turn llm agents on collaborative reasoning tasks · 2025 · arXiv 2503.15478

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User Simulators

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

SalesSim benchmarks MLLMs as retail user simulators, finds gaps in persona adherence and over-persuasion, and introduces UserGRPO RL to raise decision alignment by 13.8%.

PRIME: Training Free Proactive Reasoning via Iterative Memory Evolution for User-Centric Agent

cs.AI · 2026-04-08 · unverdicted · novelty 7.0

PRIME enables agents to proactively reason in user-centric tasks by iteratively evolving structured memories from interaction trajectories without gradient-based training.

ActivityEditor: Learning to Synthesize Physically Valid Human Mobility

cs.AI · 2026-04-07 · unverdicted · novelty 7.0

ActivityEditor introduces a dual-LLM-agent system with reinforcement learning that produces statistically faithful and physically valid human mobility trajectories in zero-shot cross-regional settings.

Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration

cs.AI · 2026-04-03 · conditional · novelty 7.0

Iterative Reward Calibration with MT-GRPO and GTPO enables effective multi-turn RL for tool-calling agents, raising Tau-Bench success from 63.8% to 66.7% for a 4B model and from 58.0% to 69.5% for a 30B model.

Step Rejection Fine-Tuning: A Practical Distillation Recipe

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Step Rejection Fine-Tuning masks loss on erroneous steps identified by a critic LLM in unresolved trajectories, raising SWE-bench Verified resolution rate by 3.7% to 32.2% versus 2.4% for trajectory-level rejection.

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

cs.AI · 2026-04-30 · unverdicted · novelty 5.0

The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.

citing papers explorer

Showing 6 of 6 citing papers.

SalesSim: Benchmarking and Aligning Multimodal Language Models as Retail User Simulators cs.CL · 2026-05-08 · unverdicted · none · ref 47
SalesSim benchmarks MLLMs as retail user simulators, finds gaps in persona adherence and over-persuasion, and introduces UserGRPO RL to raise decision alignment by 13.8%.
PRIME: Training Free Proactive Reasoning via Iterative Memory Evolution for User-Centric Agent cs.AI · 2026-04-08 · unverdicted · none · ref 24
PRIME enables agents to proactively reason in user-centric tasks by iteratively evolving structured memories from interaction trajectories without gradient-based training.
ActivityEditor: Learning to Synthesize Physically Valid Human Mobility cs.AI · 2026-04-07 · unverdicted · none · ref 5
ActivityEditor introduces a dual-LLM-agent system with reinforcement learning that produces statistically faithful and physically valid human mobility trajectories in zero-shot cross-regional settings.
Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration cs.AI · 2026-04-03 · conditional · none · ref 9
Iterative Reward Calibration with MT-GRPO and GTPO enables effective multi-turn RL for tool-calling agents, raising Tau-Bench success from 63.8% to 66.7% for a 4B model and from 58.0% to 69.5% for a 30B model.
Step Rejection Fine-Tuning: A Practical Distillation Recipe cs.LG · 2026-05-11 · unverdicted · none · ref 3
Step Rejection Fine-Tuning masks loss on erroneous steps identified by a critic LLM in unresolved trajectories, raising SWE-bench Verified resolution rate by 3.7% to 32.2% versus 2.4% for trajectory-level rejection.
GUI Agents with Reinforcement Learning: Toward Digital Inhabitants cs.AI · 2026-04-30 · unverdicted · none · ref 88
The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer