pith. sign in
Pith Number

pith:I44TW35N

pith:2025:I44TW35NPR5PJAAZBXKBHS6KEN
not attested not anchored not stored refs pending

Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios

Binhong Li, Hui Liu, Jianxiang Zang, Nijia Mo, Qiang Sun, Ruxue Bai, Shiyu Jiang, Yongda Wei

Reward Auditor uses hypothesis testing to detect if reward models have systematic vulnerabilities under real-world perturbations.

arxiv:2512.00920 v5 · 2025-11-30 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{I44TW35NPR5PJAAZBXKBHS6KEN}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Under real-world perturbed scenarios, Reward Auditor quantifies statistical significance and effect size by auditing distribution degradation of RM preference perception confidence. This enables inference of both the certainty and severity of RM vulnerabilities across diverse real-world scenarios.

C2weakest assumption

The chosen real-world perturbations and the definition of suitability as conditional reliability under those perturbations accurately capture the vulnerabilities that matter for safe LLM alignment in deployment.

C3one line summary

Reward Auditor is a statistical auditing framework that infers systematic vulnerabilities in reward models by quantifying distribution degradation of preference perception confidence under real-world perturbations.

Cited by

3 papers in Pith

Receipt and verification
First computed 2026-05-20T00:00:29.144435Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

47393b6fad7c7af480190dd413cbca234016c43855b6f69389c34c9d00954930

Aliases

arxiv: 2512.00920 · arxiv_version: 2512.00920v5 · doi: 10.48550/arxiv.2512.00920 · pith_short_12: I44TW35NPR5P · pith_short_16: I44TW35NPR5PJAAZ · pith_short_8: I44TW35N
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/I44TW35NPR5PJAAZBXKBHS6KEN \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 47393b6fad7c7af480190dd413cbca234016c43855b6f69389c34c9d00954930
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "34c28eddc3a84ec52591111031a61ace34d1728a5b1e5a47f6dd1f16ec7690d0",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-11-30T14:54:12Z",
    "title_canon_sha256": "58a3a4d390f8261f81cba59cb1c194c6484e3151b0a82d6103dee92fb921eba3"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2512.00920",
    "kind": "arxiv",
    "version": 5
  }
}