pith:R2MVQSCX
Reinforcement-aware Knowledge Distillation for LLM Reasoning
RLAD enables better distillation of reasoning LLMs by imitating the teacher selectively during policy updates.
arxiv:2602.22495 v3 · 2026-02-26 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{R2MVQSCXWXV7LP42AYNCDGFLR6}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Across diverse logic reasoning and math benchmarks, RLAD consistently outperforms offline distillation, standard GRPO, and KL-based on-policy teacher-student knowledge distillation.
That guiding the student toward the teacher only when it improves the current policy update will reliably avoid distribution mismatch and objective interference without introducing new instabilities or requiring additional hyperparameter tuning.
RLAD replaces standard KL-based distillation with Trust Region Ratio Distillation, a PPO-style likelihood ratio objective that performs advantage-aware imitation on student rollouts and outperforms offline KD, GRPO, and KL on-policy KD on logic and math benchmarks.
Formal links
Cited by
Receipt and verification
| First computed | 2026-06-19T16:12:18.946766Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
8e99584857b5ebf5bf9a061a2198ab8f92e0fefc8f5199cd514f398f85b9462c
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/R2MVQSCXWXV7LP42AYNCDGFLR6 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8e99584857b5ebf5bf9a061a2198ab8f92e0fefc8f5199cd514f398f85b9462c
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "9fc3211235a91ef9f4e24e5340f6e9b9d74918b26850234a2a52ae235f7949f6",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-02-26T00:20:39Z",
"title_canon_sha256": "5c443d568afefd33c13d07cff1913eafe48bc17f95dc977340c2973acbd21553"
},
"schema_version": "1.0",
"source": {
"id": "2602.22495",
"kind": "arxiv",
"version": 3
}
}