← back to paper
arxiv: 2604.04894 · 2 revisions
Asymmetric Advantage Modulation Calibrates Entropy Dynamics in RLVR