Recursive Language Models

Alex L. Zhang , Tim Kraska , Omar Khattab

Authors on Pith no claims yet

classification 💻 cs.AI cs.CL

keywords modellanguagelong-contextmodelspromptsacrosscodeeven

read the original abstract

We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference paradigm that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs can successfully process inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of vanilla frontier LLMs and common long-context and coding scaffolds (e.g., on GPT-5 by a median across the evaluated benchmarks of $26\%$ against compaction, $130\%$ against CodeAct with sub-calls, and $13\%$ against Claude Code) across four diverse long-context tasks while having comparable cost. At a small scale, we post-train the first model around the RLM. Our model, RLM-Qwen3-8B, outperforms the underlying Qwen3-8B model by $28.3\%$ on average and even approaches the quality of vanilla GPT-5 on three long-context tasks. Code is available at https://github.com/alexzhang13/rlm.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Continual Harness: Online Adaptation for Self-Improving Foundation Agents
cs.LG 2026-05 conditional novelty 8.0

Continual Harness automates online self-improvement for foundation-model embodied agents by refining prompts, sub-agents, skills, and memory within one run, cutting button-press costs on Pokemon Red and Emerald and cl...
Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
cs.CL 2026-05 unverdicted novelty 7.0

DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.
Meta-Harness: End-to-End Optimization of Model Harnesses
cs.AI 2026-03 unverdicted novelty 7.0

Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across h...
Robust Reasoning Benchmark
cs.LG 2026-03 unverdicted novelty 7.0

Perturbations to math problem text cause up to 55% average accuracy drops in open-weight LLMs and sequential solving reveals context pollution in attention mechanisms.
Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design
cs.AI 2026-03 conditional novelty 7.0

Metacognitive self- and co-regulation loops improve LLM agent performance in engineering design by mitigating fixation and enabling better exploration of design options.
Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers
cs.LG 2026-05 unverdicted novelty 6.0

Stateful sessions with incremental KV cache and flash queries allow O(|q|) latency in streaming transformer inference, delivering up to 5.9x speedup over conventional engines while preserving full attention.
Workspace Optimization: How to Train Your Agent
cs.AI 2026-05 unverdicted novelty 6.0

Workspace optimization evolves an agent's external workspace using multi-agent systems, with DreamTeam raising ARC-AGI-3 scores from 36% to 38.4% while using 31% fewer actions.
Hierarchical Visual Agent: Managing Contexts in Joint Image-Text Space for Advanced Chart Reasoning
cs.CV 2026-05 unverdicted novelty 6.0

HierVA improves multi-step chart question answering by having a high-level manager maintain key joint contexts while specialized workers perform targeted reasoning with visual zoom-in.
Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace
cs.AI 2026-05 unverdicted novelty 5.0 partial

Shepherd is a runtime system that formalizes meta-agent operations via typed execution traces, enabling fast forking and demonstrated improvements in agent intervention, optimization, and training on benchmarks.
On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length
cs.AI 2026-05 unverdicted novelty 5.0

Longer action horizons bottleneck LLM agent training through instability, but training with reduced horizons stabilizes learning and enables better generalization to longer horizons.
State Representation and Termination for Recursive Reasoning Systems
cs.AI 2026-05 unverdicted novelty 5.0

Recursive reasoning systems can represent their state via an epistemic state graph and terminate when the linearized order-gap is non-degenerate near the fixed point, providing a local condition for when the stopping ...
RLM-on-KG: Heuristics First, LLMs When Needed: Adaptive Retrieval Control over Mention Graphs for Scattered Evidence
cs.IR 2026-04 unverdicted novelty 5.0

LLM navigation on mention graphs yields a conditional F1 gain of 2.47-4.37 points over heuristics when evidence is scattered across 6-10 chunks, with smaller gains for concentrated evidence.
Consolidation-Expansion Operator Mechanics:A Unified Framework for Adaptive Learning
cs.LG 2026-05 unverdicted novelty 4.0

OpMech defines the order-gap as a computable non-commutativity measure between consolidation and expansion operators to provide real-time convergence signals and stopping rules in adaptive learning.
Consolidation-Expansion Operator Mechanics:A Unified Framework for Adaptive Learning
cs.LG 2026-05 unverdicted novelty 4.0

OpMech defines the order-gap between consolidation and expansion operators as a real-time, trajectory-based signal for convergence and principled stopping in adaptive learning.