Deep Variational Reinforcement Learning for POMDPs

Frank Wood; Luisa Zintgraf; Maximilian Igl; Shimon Whiteson; Tuan Anh Le

arxiv: 1806.02426 · v1 · pith:VEUYHKKFnew · submitted 2018-06-06 · 💻 cs.LG · stat.ML

Deep Variational Reinforcement Learning for POMDPs

Maximilian Igl , Luisa Zintgraf , Tuan Anh Le , Frank Wood , Shimon Whiteson This is my paper

classification 💻 cs.LG stat.ML

keywords modellearningreinforcementdeepenvironmentproblemsvariationalagent

0 comments

read the original abstract

Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown. Consequently, there is great need for reinforcement learning methods that can tackle such problems given only a stream of incomplete and noisy observations. In this paper, we propose deep variational reinforcement learning (DVRL), which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information. We develop an n-step approximation to the evidence lower bound (ELBO), allowing the model to be trained jointly with the policy. This ensures that the latent state representation is suitable for the control task. In experiments on Mountain Hike and flickering Atari we show that our method outperforms previous approaches relying on recurrent neural networks to encode the past.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mastering Atari with Discrete World Models
cs.LG 2020-10 accept novelty 7.0

DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.
Learning to Theorize the World from Observation
cs.LG 2026-05 unverdicted novelty 6.0

NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
Learning Belief Representations for Imitation Learning in POMDPs
cs.LG 2019-06 unverdicted novelty 6.0

BMIL learns belief modules jointly with policies for GAIL-style imitation learning in POMDPs, outperforming separate training and standard GAIL on continuous control tasks.
Belief-State RWKV for Reinforcement Learning under Partial Observability
cs.LG 2026-04 unverdicted novelty 5.0

Belief-state RWKV maintains an uncertainty-aware recurrent state for RL policies in partial observability and shows modest gains over standard recurrent baselines in a pilot with observation noise.
Shaping Belief States with Generative Environment Models for RL
cs.LG 2019-06 unverdicted novelty 5.0

Multi-step predictive generative models form stable belief states capturing environment layout and agent pose, yielding higher data efficiency on RL tasks than model-free agents.
Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey
cs.LG 2019-07 unverdicted novelty 2.0

This survey compiles deep reinforcement learning algorithms for clinical decision support, reviews case studies, and offers guidance on algorithm selection for medical applications.