ALAR trains LLM agents to perform most reasoning in a latent space supervised by actions and escalates to explicit CoT only when needed, cutting tokens by up to 84.6% while preserving accuracy on search and tool-use benchmarks.
arXiv preprint arXiv:2602.11683 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
TARPO is a pure RL framework using a token-wise action router to switch between discrete token generation and latent reasoning in LLMs, with joint optimization showing outperformance on benchmarks.
citing papers explorer
-
Adaptive Latent Agentic Reasoning
ALAR trains LLM agents to perform most reasoning in a latent space supervised by actions and escalates to explicit CoT only when needed, cutting tokens by up to 84.6% while preserving accuracy on search and tool-use benchmarks.
-
TARPO: Token-Wise Latent-Explicit Reasoning via Action-Routing Policy Optimization
TARPO is a pure RL framework using a token-wise action router to switch between discrete token generation and latent reasoning in LLMs, with joint optimization showing outperformance on benchmarks.