pith:MDF3373R
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
A panel of smaller diverse LLMs judges model outputs better than one large model while costing far less.
arxiv:2404.18796 v2 · 2024-04-29 · cs.CL · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MDF3373RIIMDNQJZQ5BQYYKIRB}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
using a PoLL composed of a larger number of smaller models outperforms a single large judge, exhibits less intra-model bias due to its composition of disjoint model families, and does so while being over seven times less expensive.
That the collective judgments of smaller models from disjoint families can capture nuanced quality signals at least as well as a single frontier model without systematic blind spots on the evaluated tasks.
A panel of smaller diverse LLMs outperforms a single large model as an evaluator of generations, showing less intra-model bias and over 7x lower cost.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:49.775178Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
60cbbdff71421836c13987430c61488868ff86841ddc1fc1b48c7811f418ffec
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MDF3373RIIMDNQJZQ5BQYYKIRB \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 60cbbdff71421836c13987430c61488868ff86841ddc1fc1b48c7811f418ffec
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "4a9f15ac01f3cf9e3f8f70be156ffd8d06fd4f9e398b6820200a3162380b3d3d",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2024-04-29T15:33:23Z",
"title_canon_sha256": "9d60ce3a7ac97b31664a1f2f06e1792a1a9bce153ac2c1053b4ae652505ac363"
},
"schema_version": "1.0",
"source": {
"id": "2404.18796",
"kind": "arxiv",
"version": 2
}
}