CARL: Criticality-Aware Agentic Reinforcement Learning

Leyang Shen , Yang Zhang , Chun Kai Ling , Xiaoyan Zhao , Tat-Seng Chua

Authors on Pith no claims yet

classification 💻 cs.LG cs.AIcs.CL

keywords carlstatesachievesactionsagenticalgorithmcriticality-awarelearning

read the original abstract

Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conventional group-level policy optimization algorithm becomes suboptimal because of its underlying assumption that each step holds equal contribution, which deviates significantly from reality. Our analysis reveals that only the action choices on a small fraction of states are critical in determining the final outcome. Building on this insight, we propose CARL, a criticality-aware reinforcement learning algorithm tailored for long-horizon agentic reasoning. CARL leverages entropy as a heuristic proxy for state criticality and achieves focused training by assigning rewards to actions taken from high-criticality states while excluding actions taken from low-criticality states from model updates, avoiding noisy credit assignment and redundant computation. Extensive experiments demonstrate that CARL achieves both stronger performance and higher efficiency across diverse evaluation settings. The source code will be publicly available.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy
cs.LG 2026-05 conditional novelty 6.0

ActFocus resolves the action bottleneck in agentic RL by reweighting token gradients toward action tokens using observed reward variance and an energy-based uncertainty term, outperforming PPO and GRPO by up to 65 per...
A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping
cs.CL 2026-05 unverdicted novelty 6.0

A²TGPO improves RL policy optimization for multi-turn agentic LLMs by normalizing information gain within same-depth turn groups, rescaling cumulative advantages by sqrt of term count, and modulating clipping ranges p...
From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models
cs.CL 2026-04 unverdicted novelty 5.0

A survey of credit assignment techniques in LLM reinforcement learning that distinguishes maturing methods for reasoning from new approaches needed for agentic settings and provides supporting resources.