VP2O maps PPO to SVGD in a MoE architecture using functional kernels and expert orthogonalization, claiming +179 ELO on Codeforces and 32% token reduction on AIME for a 33B/4B model.
Haeffele and Rene Vidal and Donald Geman , title =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Variational Proximal Policy Optimization
VP2O maps PPO to SVGD in a MoE architecture using functional kernels and expert orthogonalization, claiming +179 ELO on Codeforces and 32% token reduction on AIME for a 33B/4B model.