pith. machine review for the scientific record. sign in

arxiv: 1907.03876 · v1 · submitted 2019-07-08 · 💻 cs.LG · cs.NE

Recognition: unknown

Deep Active Inference as Variational Policy Gradients

Authors on Pith no claims yet
classification 💻 cs.LG cs.NE
keywords inferenceactiveactionalgorithmdeeplearningpolicyreinforcement
0
0 comments X
read the original abstract

Active Inference is a theory of action arising from neuroscience which casts action and planning as a bayesian inference problem to be solved by minimizing a single quantity - the variational free energy. Active Inference promises a unifying account of action and perception coupled with a biologically plausible process theory. Despite these potential advantages, current implementations of Active Inference can only handle small, discrete policy and state-spaces and typically require the environmental dynamics to be known. In this paper we propose a novel deep Active Inference algorithm which approximates key densities using deep neural networks as flexible function approximators, which enables Active Inference to scale to significantly larger and more complex tasks. We demonstrate our approach on a suite of OpenAIGym benchmark tasks and obtain performance comparable with common reinforcement learning baselines. Moreover, our algorithm shows similarities with maximum entropy reinforcement learning and the policy gradients algorithm, which reveals interesting connections between the Active Inference framework and reinforcement learning.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Information as Maximum-Caliber Deviation: A bridge between Integrated Information Theory and the Free Energy Principle

    q-bio.NC 2026-05 unverdicted novelty 6.0

    Information defined as maximum-caliber deviation derives IIT 3.0 cause-effect repertoires from constrained entropy maximization and equates to prediction error under CLT and LDT.