Repeated sampling of the same safety prompts reveals substantial differences in LLM failure probabilities across temperatures that conventional single-evaluation benchmarks miss.
Calibrating LLM confidence by probing perturbed representation stability
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Evaluating Reliability Gaps in Large Language Model Safety via Repeated Prompt Sampling
Repeated sampling of the same safety prompts reveals substantial differences in LLM failure probabilities across temperatures that conventional single-evaluation benchmarks miss.