pith. sign in

Hisr: Hindsight information modulated segmental process rewards for multi-turn agentic reinforcement learning.arXiv preprint arXiv:2603.18683, 2026

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.AI 1 cs.MA 1

years

2026 2

verdicts

UNVERDICTED 2

clear filters

representative citing papers

TACO: Tool-Augmented Credit Optimization for Agentic Tool Use

cs.MA · 2026-06-29 · unverdicted · novelty 6.0

TACO combines Differential Answer-Probe Reward (DAPR) and Outcome-Gated Advantage Routing (OGAR) to assign credit to tool calls in agentic visual reasoning, producing accuracy gains on multimodal benchmarks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • TACO: Tool-Augmented Credit Optimization for Agentic Tool Use cs.MA · 2026-06-29 · unverdicted · none · ref 77

    TACO combines Differential Answer-Probe Reward (DAPR) and Outcome-Gated Advantage Routing (OGAR) to assign credit to tool calls in agentic visual reasoning, producing accuracy gains on multimodal benchmarks.