Hisr: Hindsight information modulated segmental process rewards for multi-turn agentic reinforcement learning.arXiv preprint arXiv:2603.18683, 2026

Zhicong Lu, Zichuan Lin, Wei Jia, Changyuan Tian, Deheng Ye, Peiguang Li, Li Jin, Nayu Liu, Guangluan Xu, Wei Feng · 2026 · arXiv 2603.18683

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

TACO: Tool-Augmented Credit Optimization for Agentic Tool Use

cs.MA · 2026-06-29 · unverdicted · novelty 6.0

TACO combines Differential Answer-Probe Reward (DAPR) and Outcome-Gated Advantage Routing (OGAR) to assign credit to tool calls in agentic visual reasoning, producing accuracy gains on multimodal benchmarks.

HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning

cs.AI · 2026-06-09 · unverdicted · novelty 4.0

HIPIF trains LLM agents end-to-end using subgoal-based hierarchical planning and information folding of completed histories, plus hierarchical reflection and process rewards, to handle long-horizon tasks without auxiliary models or expert trajectories.

citing papers explorer

Showing 1 of 1 citing paper after filters.

TACO: Tool-Augmented Credit Optimization for Agentic Tool Use cs.MA · 2026-06-29 · unverdicted · none · ref 77
TACO combines Differential Answer-Probe Reward (DAPR) and Outcome-Gated Advantage Routing (OGAR) to assign credit to tool calls in agentic visual reasoning, producing accuracy gains on multimodal benchmarks.

Hisr: Hindsight information modulated segmental process rewards for multi-turn agentic reinforcement learning.arXiv preprint arXiv:2603.18683, 2026

fields

years

verdicts

representative citing papers

citing papers explorer