A delay-aware model-based RL framework with sequential belief filtering handles random out-of-sequence observations in POMDPs and outperforms MDP baselines while showing robustness to delay shifts.
Acting in delayed environments with non-stationary markov policies
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
DHRL defines belief-equivalence over augmented states to abstract away control-redundant states, preserving optimality in finite domains and yielding a deep actor-critic method that outperforms baselines on MuJoCo tasks.
citing papers explorer
-
Model-Based Reinforcement Learning under Random Observation Delays
A delay-aware model-based RL framework with sequential belief filtering handles random out-of-sequence observations in POMDPs and outperforms MDP baselines while showing robustness to delay shifts.
-
Delayed homomorphic reinforcement learning for environments with delayed feedback
DHRL defines belief-equivalence over augmented states to abstract away control-redundant states, preserving optimality in finite domains and yielding a deep actor-critic method that outperforms baselines on MuJoCo tasks.