Transductive off-policy proximal policy optimization

Transductive off-policy proximal policy optimization , author= · 2024 · arXiv 2406.03894

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives

cs.LG · 2025-09-11 · conditional · novelty 6.0

Shows entropy coupling limits DSAC on discrete tasks and introduces a generalized actor-critic framework with m-step critics and novel entropy-regularized objectives that perform robustly on Atari.

Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions

cs.LG · 2026-06-02 · unverdicted · novelty 5.0

GTR introduces a bounded non-monotonic Gaussian trust region and Mixture Gaussian Anchor to enable effective behavior transitions in non-stationary RL where standard PPO fails.

citing papers explorer

Showing 2 of 2 citing papers.

Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives cs.LG · 2025-09-11 · conditional · none · ref 10
Shows entropy coupling limits DSAC on discrete tasks and introduces a generalized actor-critic framework with m-step critics and novel entropy-regularized objectives that perform robustly on Atari.
Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions cs.LG · 2026-06-02 · unverdicted · none · ref 28
GTR introduces a bounded non-monotonic Gaussian trust region and Mixture Gaussian Anchor to enable effective behavior transitions in non-stationary RL where standard PPO fails.

Transductive off-policy proximal policy optimization

fields

years

verdicts

representative citing papers

citing papers explorer