GRPO, Dr. GRPO, and DAPO are three settings of one dial on the group standard deviation of binary rewards, unified by the group-standard-deviation identity where disagreement equals update magnitude.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Test-time sampling improves coverage but stalls at modal and correlation ceilings for answer selection, with the effective number of samples as the practical limit.
Kolmogorov n-width theory plus PRESS statistics yield closed-form optimal spline resolution; KORE estimates bias/noise scales from two pilots and matches CV performance with far fewer fits.
citing papers explorer
-
GRPO, Dr. GRPO, and DAPO Are Three Operations on One Number: The Group-Standard-Deviation Identity
GRPO, Dr. GRPO, and DAPO are three settings of one dial on the group standard deviation of binary rewards, unified by the group-standard-deviation identity where disagreement equals update magnitude.
-
When More Sampling Hurts: The Modal Ceiling and Correlation Ceiling of Test-Time Scaling
Test-time sampling improves coverage but stalls at modal and correlation ceilings for answer selection, with the effective number of samples as the practical limit.
-
Solve for the Hyperparameter, Skip the Search: Kolmogorov-Optimal Scaling Laws for Spline Regression
Kolmogorov n-width theory plus PRESS statistics yield closed-form optimal spline resolution; KORE estimates bias/noise scales from two pilots and matches CV performance with far fewer fits.