pith:YMJNFF3G
Process Reward Agents for Steering Knowledge-Intensive Reasoning
Process Reward Agents supply online step-wise rewards from external knowledge to steer reasoning in frozen language models.
arxiv:2604.09482 v2 · 2026-04-10 · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YMJNFF3GKDF5ITPVHOUQKO6D7J}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
PRA consistently outperforms strong baselines, achieving 80.8% accuracy on MedQA with Qwen3-4B, a new state of the art at the 4B scale. Importantly, PRA generalizes to unseen frozen policy models ranging from 0.5B to 8B parameters, improving their accuracy by up to 25.7% without any policy model updates.
That retrieval-augmented process rewards can be computed reliably and cheaply at every generation step from external knowledge sources without introducing undetected errors or prohibitive latency that would negate the search benefits.
Process Reward Agents enable online step-wise guidance for frozen AI models in medical reasoning, raising accuracy to 80.8% on MedQA and up to 25.7% gains across 0.5B-8B models without policy updates.
Receipt and verification
| First computed | 2026-06-02T02:04:52.913127Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
c312d2976650cbd44df53ba9053bc3fa6004c1ba2fd631103acb5c432caaf683
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YMJNFF3GKDF5ITPVHOUQKO6D7J \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c312d2976650cbd44df53ba9053bc3fa6004c1ba2fd631103acb5c432caaf683
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "3d68182b83fcba80a276f656f3b39509f5e035d32b9d9de46c8f8c9f974d5d13",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
"primary_cat": "cs.AI",
"submitted_at": "2026-04-10T16:45:44Z",
"title_canon_sha256": "92b491970cb747dc463d80ee2cb8307cf3a866ffd7de9570096323fae0af1e83"
},
"schema_version": "1.0",
"source": {
"id": "2604.09482",
"kind": "arxiv",
"version": 2
}
}