Learning to combat compounding-error in model-based reinforcement learning

Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, Martin Müller · 1912 · arXiv 1912.11206

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

cs.AI · 2026-05-11 · conditional · novelty 7.0

State-conditioned commitment depth in a vision-language policy Pareto-dominates fixed-depth baselines on Sliding Puzzle and Sokoban, raising solve rates by up to 12.5 points while using 25% fewer actions and beating larger models.

Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning

cs.AI · 2026-05-07 · unverdicted · novelty 7.0 · 4 refs

LLMs exhibit myopic planning in games, with move choices driven by shallow nodes despite deep reasoning traces, in contrast to human deep-search reliance.

Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Dream-MPC boosts underlying policies on 24 continuous control tasks by optimizing policy-generated trajectories with gradient ascent, uncertainty regularization, and temporal amortization inside a latent world model.

Advantage-Guided Diffusion for Model-Based Reinforcement Learning

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

Advantage-guided diffusion (SAG and EAG) steers sampling in diffusion world models to higher-advantage trajectories, enabling policy improvement and better sample efficiency on MuJoCo tasks.

citing papers explorer

Showing 4 of 4 citing papers.

When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning cs.AI · 2026-05-11 · conditional · none · ref 51
State-conditioned commitment depth in a vision-language policy Pareto-dominates fixed-depth baselines on Sliding Puzzle and Sokoban, raising solve rates by up to 12.5 points while using 25% fewer actions and beating larger models.
Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning cs.AI · 2026-05-07 · unverdicted · none · ref 34 · 4 links
LLMs exhibit myopic planning in games, with move choices driven by shallow nodes despite deep reasoning traces, in contrast to human deep-search reliance.
Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination cs.LG · 2026-05-06 · unverdicted · none · ref 9
Dream-MPC boosts underlying policies on 24 continuous control tasks by optimizing policy-generated trajectories with gradient ascent, uncertainty regularization, and temporal amortization inside a latent world model.
Advantage-Guided Diffusion for Model-Based Reinforcement Learning cs.AI · 2026-04-10 · unverdicted · none · ref 24
Advantage-guided diffusion (SAG and EAG) steers sampling in diffusion world models to higher-advantage trajectories, enabling policy improvement and better sample efficiency on MuJoCo tasks.

Learning to combat compounding-error in model-based reinforcement learning

fields

years

verdicts

representative citing papers

citing papers explorer