pith:PTMIVYF5
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement
A learned margin-adaptive confidence estimator improves LLM-human agreement by strengthening the link between confidence scores and disagreement risk.
arxiv:2605.15416 v1 · 2026-05-14 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{PTMIVYF57TOIQIPUDUGX2MICTO}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more
Record completeness
Claims
When integrated into fixed-sequence testing, the learned confidence estimator yields improved ranking accuracy and empirically strengthens the monotonic relationship between confidence and disagreement risk, leading to higher success rates in satisfying target agreement levels across multiple datasets and judge models.
That training on simulated annotator diversity produces a confidence estimator whose ranking behavior transfers to real human disagreement distributions; the abstract notes the original monotonicity assumption is often violated but does not quantify how well the simulation matches actual human variance.
Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:00:57.468346Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
7cd88ae0bdfcdc8821f41d0d7d31029b9ae32276c77e3cce68d407239f3108b1
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PTMIVYF57TOIQIPUDUGX2MICTO \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7cd88ae0bdfcdc8821f41d0d7d31029b9ae32276c77e3cce68d407239f3108b1
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "c15f1938ef9dde2ace8989fb774fd69b3b0fbe6d4a64d90a55ef4877104997bd",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-14T21:01:05Z",
"title_canon_sha256": "0a97817509d89b0754952f0409660f732f9fb2d7b2b5893010e11dfa0c0ee9db"
},
"schema_version": "1.0",
"source": {
"id": "2605.15416",
"kind": "arxiv",
"version": 1
}
}