LLM rerankers can internally predict ranking quality via self-consistency of sampled outputs, matching SOTA external QPP while direct confidence is overconfident; supervised token-efficient methods improve calibration.
Too consistent to detect: A study of self-consistent errors in llms
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
GrACE is a fine-tuned generative method that uses similarity to a special token embedding for real-time calibrated confidence in LLMs and enables efficient confidence-based test-time scaling.
citing papers explorer
-
Can LLM Rerankers Predict Their Own Ranking Performance?
LLM rerankers can internally predict ranking quality via self-consistency of sampled outputs, matching SOTA external QPP while direct confidence is overconfident; supervised token-efficient methods improve calibration.
-
GrACE: A Generative Approach to Better Confidence Elicitation and Efficient Test-Time Scaling in Large Language Models
GrACE is a fine-tuned generative method that uses similarity to a special token embedding for real-time calibrated confidence in LLMs and enables efficient confidence-based test-time scaling.