Too consistent to detect: A study of self-consistent errors in llms

Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs , author= · 2025 · arXiv 2505.17656

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Can LLM Rerankers Predict Their Own Ranking Performance?

cs.IR · 2026-06-02 · unverdicted · novelty 7.0

LLM rerankers can internally predict ranking quality via self-consistency of sampled outputs, matching SOTA external QPP while direct confidence is overconfident; supervised token-efficient methods improve calibration.

GrACE: A Generative Approach to Better Confidence Elicitation and Efficient Test-Time Scaling in Large Language Models

cs.CL · 2025-09-11 · unverdicted · novelty 6.0

GrACE is a fine-tuned generative method that uses similarity to a special token embedding for real-time calibrated confidence in LLMs and enables efficient confidence-based test-time scaling.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Can LLM Rerankers Predict Their Own Ranking Performance? cs.IR · 2026-06-02 · unverdicted · none · ref 49
LLM rerankers can internally predict ranking quality via self-consistency of sampled outputs, matching SOTA external QPP while direct confidence is overconfident; supervised token-efficient methods improve calibration.
GrACE: A Generative Approach to Better Confidence Elicitation and Efficient Test-Time Scaling in Large Language Models cs.CL · 2025-09-11 · unverdicted · none · ref 59
GrACE is a fine-tuned generative method that uses similarity to a special token embedding for real-time calibrated confidence in LLMs and enables efficient confidence-based test-time scaling.

Too consistent to detect: A study of self-consistent errors in llms

fields

years

verdicts

representative citing papers

citing papers explorer