Title resolution pending

We estimate the empirical δπ for the refined policy relative to the frozen actor across our four environments, three goals, evaluate the right-hand side of Eq · 2019 · arXiv 3309.93666

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

When Policies Cannot Be Retrained: A Unified Closed-Form View of Post-Training Steering in Offline Reinforcement Learning

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

For diagonal-Gaussian frozen actors, PoE with alpha equals KL adaptation with beta = alpha/(1-alpha); empirically, composition shows an actor-competence ceiling with 4/5/3 HELP/FROZEN/HURT split on D4RL and zero success on AntMaze.

citing papers explorer

Showing 1 of 1 citing paper.

When Policies Cannot Be Retrained: A Unified Closed-Form View of Post-Training Steering in Offline Reinforcement Learning cs.LG · 2026-04-23 · unverdicted · none · ref 11
For diagonal-Gaussian frozen actors, PoE with alpha equals KL adaptation with beta = alpha/(1-alpha); empirically, composition shows an actor-competence ceiling with 4/5/3 HELP/FROZEN/HURT split on D4RL and zero success on AntMaze.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer