Pith Number

pith:BTZPDJO6

pith:2026:BTZPDJO66QJ4W3LPEDJYN5XP7X

not attested not anchored not stored refs resolved

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Hongxu Yin, Jan Kautz, Kwang-Ting Cheng, Mingjie Liu, Min-Hung Chen, Pavlo Molchanov, Peter Belcak, Shih-Yang Liu, Shizhe Diao, Ximing Lu, Xin Dong, Yejin Choi, Yu-Chiang Frank Wang

Decoupling normalization of each reward in multi-reward RL prevents collapse of advantage values into identical signals.

arxiv:2601.05242 v1 · 2026-01-08 · cs.CL · cs.AI · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{BTZPDJO66QJ4W3LPEDJYN5XP7X}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

directly applying GRPO to normalize distinct rollout reward combinations causes them to collapse into identical advantage values, reducing the resolution of the training signal and resulting in suboptimal convergence and, in some cases, early training failure

C2weakest assumption

That separately normalizing each reward before aggregation will faithfully preserve relative differences across reward combinations without introducing new scaling artifacts or training instabilities.

C3one line summary

GDPO decouples per-reward normalization in multi-reward RL to avoid advantage collapse and improve convergence over GRPO on tool-calling, math, and coding tasks.

References

46 extracted · 46 resolved · 17 Pith anchors

[1] Learn to reason efficiently with adaptive length-based reward shaping 2025

[2] Kimi k1.5: Scaling Reinforcement Learning with LLMs 2025 · arXiv:2501.12599

[4] Rule based rewards for language model safety.Advances in Neural Information Processing Systems, 37:108877–108901, 2024 2024

[5] Grpo-care: Consistency- aware reinforcement learning for multimodal reasoning, 2025 2025

[6] DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models 2025 · arXiv:2512.02556

Cited by

26 papers in Pith

DepthAgent: Towards Better Universal Depth Estimation via Sample-wise Expert Selection

JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak

Don't Let Bandit Feedback Pull Continual LLM-Recommender Updates Off Target

Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR

Receipt and verification

First computed	2026-05-17T23:38:53.386098Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

0cf2f1a5def413cb6d6f20d386f6effdedcc3264e3e7081e8404e0fc3fbf4847

Aliases

arxiv: 2601.05242 · arxiv_version: 2601.05242v1 · doi: 10.48550/arxiv.2601.05242 · pith_short_12: BTZPDJO66QJ4 · pith_short_16: BTZPDJO66QJ4W3LP · pith_short_8: BTZPDJO6

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/BTZPDJO66QJ4W3LPEDJYN5XP7X \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0cf2f1a5def413cb6d6f20d386f6effdedcc3264e3e7081e8404e0fc3fbf4847

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "d646f2c556ec36788b65f958f77a2db7de5f740eceeddfc230153cfd0e2107c8",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-01-08T18:59:24Z",
    "title_canon_sha256": "b6bf6385df9e528004dff7db06dadda8378dad300ba2b59eae30c311d9848d4d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2601.05242",
    "kind": "arxiv",
    "version": 1
  }
}