Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

· 2026 · cs.AI · arXiv 2604.05808

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large language model (LLM) agents have demonstrated strong capabilities in complex interactive decision-making tasks. However, existing LLM agents typically rely on increasingly long interaction histories, resulting in high computational cost and limited scalability. In this paper, we propose STEP-HRL, a hierarchical reinforcement learning (HRL) framework that enables step-level learning by conditioning only on single-step transitions rather than full interaction histories. STEP-HRL structures tasks hierarchically, using completed subtasks to represent global progress of overall task. By introducing a local progress module, it also iteratively and selectively summarizes interaction history within each subtask to produce a compact summary of local progress. Together, these components yield augmented step-level transitions for both high-level and low-level policies. Experimental results on ScienceWorld and ALFWorld benchmarks consistently demonstrate that STEP-HRL substantially outperforms baselines in terms of performance and generalization while reducing token usage. Our code is available at https://github.com/TonyStark042/STEP-HRL.

representative citing papers

HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning

cs.AI · 2026-06-09 · unverdicted · novelty 4.0

HIPIF trains LLM agents end-to-end using subgoal-based hierarchical planning and information folding of completed histories, plus hierarchical reflection and process rewards, to handle long-horizon tasks without auxiliary models or expert trajectories.

citing papers explorer

Showing 1 of 1 citing paper.

HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning cs.AI · 2026-06-09 · unverdicted · none · ref 44 · internal anchor
HIPIF trains LLM agents end-to-end using subgoal-based hierarchical planning and information folding of completed histories, plus hierarchical reflection and process rewards, to handle long-horizon tasks without auxiliary models or expert trajectories.

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

fields

years

verdicts

representative citing papers

citing papers explorer