pith. sign in

Integrity report for Preference Instability in Reward Models: Detection and Mitigation via Sparse Autoencoders

A machine-verified record of the checks Pith has run against this paper: detector runs, findings, signed bundle events, and canonical identifiers.

arXiv:2605.16339 · pith:2026:7XXD53ZVILIQSOYY4DUGYIUGFJ

0Critical
0Advisory
2Detectors run
2026-05-20Last checked

Paper page arXiv integrity.json bundle.json

Detector runs

claim_evidence completed v1.0.0 · findings 0 · 2026-05-20 12:22:03.838470+00:00
ai_meta_artifact skipped v1.0.0 · findings 0 · 2026-05-20 07:39:01.792305+00:00

Findings

No public integrity findings for this paper.

Signed record

The machine-readable record for this paper lives at /pith/7XXD53ZV/integrity.json. Pith Number bundles also include signed pith.integrity.v1 events where a Pith Number exists.