GenPO++ achieves exact Jacobian-free likelihood ratio computation for generative flow policies by embedding history states as auxiliary memory in a high-order reversible ODE solver.
Maximum entropy reinforcement learning via energy-based normalizing flow
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2representative citing papers
citing papers explorer
-
GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios
GenPO++ achieves exact Jacobian-free likelihood ratio computation for generative flow policies by embedding history states as auxiliary memory in a high-order reversible ODE solver.