pith. sign in

arxiv: 1905.13320 · v1 · pith:TQGTHGPEnew · submitted 2019-05-30 · 💻 cs.LG · cs.AI· stat.ML

Combating the Compounding-Error Problem with a Multi-step Model

classification 💻 cs.LG cs.AIstat.ML
keywords modellearningmodel-basedmulti-stepcompounding-errorone-stepproblemreinforcement
0
0 comments X
read the original abstract

Model-based reinforcement learning is an appealing framework for creating agents that learn, plan, and act in sequential environments. Model-based algorithms typically involve learning a transition model that takes a state and an action and outputs the next state---a one-step model. This model can be composed with itself to enable predicting multiple steps into the future, but one-step prediction errors can get magnified, leading to unacceptable inaccuracy. This compounding-error problem plagues planning and undermines model-based reinforcement learning. In this paper, we address the compounding-error problem by introducing a multi-step model that directly outputs the outcome of executing a sequence of actions. Novel theoretical and empirical results indicate that the multi-step model is more conducive to efficient value-function estimation, and it yields better action selection compared to the one-step model. These results make a strong case for using multi-step models in the context of model-based reinforcement learning.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination

    cs.LG 2026-05 unverdicted novelty 7.0

    Dream-MPC boosts underlying policies on 24 continuous control tasks by optimizing policy-generated trajectories with gradient ascent, uncertainty regularization, and temporal amortization inside a latent world model.

  2. Advantage-Guided Diffusion for Model-Based Reinforcement Learning

    cs.AI 2026-04 unverdicted novelty 7.0

    Advantage-guided diffusion (SAG and EAG) steers sampling in diffusion world models to higher-advantage trajectories, enabling policy improvement and better sample efficiency on MuJoCo tasks.

  3. Long-Horizon Model-Based Offline Reinforcement Learning Without Explicit Conservatism

    cs.LG 2025-12 conditional novelty 7.0

    NEUBAY uses Bayesian posteriors over world models with long-horizon planning to match or exceed conservative offline RL methods without explicit conservatism.

  4. Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination

    cs.LG 2026-05 unverdicted novelty 6.0

    Dream-MPC refines policy-generated trajectories by gradient ascent in a latent world model with uncertainty regularization and temporal amortization, improving base policy performance and beating gradient-free MPC on ...

  5. Is Conditional Generative Modeling all you need for Decision-Making?

    cs.LG 2022-11 unverdicted novelty 6.0

    Return-conditional diffusion models for policies outperform offline RL on benchmarks by circumventing dynamic programming and enable constraint or skill composition.

  6. Rethinking Code Review in the Age of AI: A Vision for Agentic Code Review

    cs.SE 2026-05 unverdicted novelty 5.0

    The paper presents a vision for an agentic code review framework spanning PR Creation, Augmentation, Reviewer Selection, AI-Assisted Review, and Retrospective, with humans retained at quality gates.