Q-DAPS estimates question difficulty for LLMs by computing entropy over answer plausibility scores and outperforms baselines on TriviaQA, NQ, MuSiQue, and QASC while aligning with human judgments.
Vatsal Raina and Mark Gales
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring
Q-DAPS estimates question difficulty for LLMs by computing entropy over answer plausibility scores and outperforms baselines on TriviaQA, NQ, MuSiQue, and QASC while aligning with human judgments.