pith:BTZPDJO6
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Decoupling normalization of each reward in multi-reward RL prevents collapse of advantage values into identical signals.
arxiv:2601.05242 v1 · 2026-01-08 · cs.CL · cs.AI · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{BTZPDJO66QJ4W3LPEDJYN5XP7X}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
directly applying GRPO to normalize distinct rollout reward combinations causes them to collapse into identical advantage values, reducing the resolution of the training signal and resulting in suboptimal convergence and, in some cases, early training failure
That separately normalizing each reward before aggregation will faithfully preserve relative differences across reward combinations without introducing new scaling artifacts or training instabilities.
GDPO decouples per-reward normalization in multi-reward RL to avoid advantage collapse and improve convergence over GRPO on tool-calling, math, and coding tasks.
References
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:53.386098Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
0cf2f1a5def413cb6d6f20d386f6effdedcc3264e3e7081e8404e0fc3fbf4847
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/BTZPDJO66QJ4W3LPEDJYN5XP7X \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0cf2f1a5def413cb6d6f20d386f6effdedcc3264e3e7081e8404e0fc3fbf4847
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "d646f2c556ec36788b65f958f77a2db7de5f740eceeddfc230153cfd0e2107c8",
"cross_cats_sorted": [
"cs.AI",
"cs.LG"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-01-08T18:59:24Z",
"title_canon_sha256": "b6bf6385df9e528004dff7db06dadda8378dad300ba2b59eae30c311d9848d4d"
},
"schema_version": "1.0",
"source": {
"id": "2601.05242",
"kind": "arxiv",
"version": 1
}
}