Q-DAPS estimates question difficulty for LLMs by computing entropy over answer plausibility scores and outperforms baselines on TriviaQA, NQ, MuSiQue, and QASC while aligning with human judgments.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring
Q-DAPS estimates question difficulty for LLMs by computing entropy over answer plausibility scores and outperforms baselines on TriviaQA, NQ, MuSiQue, and QASC while aligning with human judgments.