Shows entropy coupling limits DSAC on discrete tasks and introduces a generalized actor-critic framework with m-step critics and novel entropy-regularized objectives that perform robustly on Atari.
Transductive off-policy proximal policy optimization
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2representative citing papers
GTR introduces a bounded non-monotonic Gaussian trust region and Mixture Gaussian Anchor to enable effective behavior transitions in non-stationary RL where standard PPO fails.
citing papers explorer
-
Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives
Shows entropy coupling limits DSAC on discrete tasks and introduces a generalized actor-critic framework with m-step critics and novel entropy-regularized objectives that perform robustly on Atari.
-
Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions
GTR introduces a bounded non-monotonic Gaussian trust region and Mixture Gaussian Anchor to enable effective behavior transitions in non-stationary RL where standard PPO fails.