pith. machine review for the scientific record. sign in

A judge-aware ranking framework for evaluating large language models without ground truth.arXiv preprint arXiv:2601.21817

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

stat.ME 2

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

Heterogeneous Judge-Aware Ranking with Sensitivity, Disagreement, and Confidence

stat.ME · 2026-05-06 · unverdicted · novelty 6.0

HJA ranking separates consensus ranking, judge sensitivity, and residual disagreement as distinct inferential targets with identifiability conditions and an anchored alternating algorithm, yielding better recovery and uncertainty calibration than pooled baselines on synthetic and real data.

citing papers explorer

Showing 2 of 2 citing papers.