← back to paper
arxiv: 2604.17892 · 2 revisions
LEPO: Latent Reasoning Policy Optimization for Large Language Models