GenPO++ achieves exact Jacobian-free likelihood ratio computation for generative flow policies by embedding history states as auxiliary memory in a high-order reversible ODE solver.
Policyflow: Policy optimization with continuous normalizing flow in reinforcement learning.arXiv preprint arXiv:2602.01156, 2026
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios
GenPO++ achieves exact Jacobian-free likelihood ratio computation for generative flow policies by embedding history states as auxiliary memory in a high-order reversible ODE solver.