EFE-based planning is formulated as variational free energy minimization with epistemic priors, decomposing into expected plan costs plus a complexity term.
Sutton , title =
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.
Introduces and analyzes the λ-target update for linear Q-learning via geometric averaging of periodic target maps, studied with a switching-system model in the deterministic case.
Agentic Monte Carlo enables RL-style optimization of black-box LLM agents by sampling from the optimal policy posterior using Sequential Monte Carlo.
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
DAWM introduces a modular diffusion world model with an inverse dynamics model to produce complete synthetic transitions that improve conservative offline RL algorithms like TD3BC and IQL on D4RL tasks.
Eligibility traces in deep RL create a peak bias by amplifying distal TD errors into gradient shocks that fixed-step SGD cannot normalize, leading to overestimation of peak-reward trajectories and a mechanistic account of the peak-end rule.
An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.
C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.
TSMCTS applies Sequential Monte Carlo in two stages for tree search, claiming better performance, favorable scaling with depth, lower variance, and reduced path degeneracy than SMC and modern MCTS baselines across discrete and continuous environments.
citing papers explorer
-
Expected Free Energy-based Planning as Variational Inference
EFE-based planning is formulated as variational free energy minimization with epistemic priors, decomposing into expected plan costs plus a complexity term.
-
Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation
A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.
-
Geometrically Averaged Hard Target Updates for Linear Q-Learning
Introduces and analyzes the λ-target update for linear Q-learning via geometric averaging of periodic target maps, studied with a switching-system model in the deterministic case.
-
Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents
Agentic Monte Carlo enables RL-style optimization of black-box LLM agents by sampling from the optimal policy posterior using Sequential Monte Carlo.
-
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
-
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions
DAWM introduces a modular diffusion world model with an inverse dynamics model to produce complete synthetic transitions that improve conservative offline RL algorithms like TD3BC and IQL on D4RL tasks.
-
Trace-Mediated Peak Bias: Bridging Temporal Credit Assignment and Cognitive Heuristics in Deep Reinforcement Learning
Eligibility traces in deep RL create a peak bias by amplifying distal TD errors into gradient shocks that fixed-step SGD cannot normalize, leading to overestimation of peak-reward trajectories and a mechanistic account of the peak-end rule.
-
PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents
An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.
-
Revisiting Adam for Streaming Reinforcement Learning
C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.
-
Twice Sequential Monte Carlo for Tree Search
TSMCTS applies Sequential Monte Carlo in two stages for tree search, claiming better performance, favorable scaling with depth, lower variance, and reduced path degeneracy than SMC and modern MCTS baselines across discrete and continuous environments.