Calibrating LLM confidence by probing perturbed representation stability

· 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Evaluating Reliability Gaps in Large Language Model Safety via Repeated Prompt Sampling

cs.AI · 2026-03-10 · conditional · novelty 6.0

Repeated sampling of the same safety prompts reveals substantial differences in LLM failure probabilities across temperatures that conventional single-evaluation benchmarks miss.

citing papers explorer

Showing 1 of 1 citing paper.

Evaluating Reliability Gaps in Large Language Model Safety via Repeated Prompt Sampling cs.AI · 2026-03-10 · conditional · none · ref 8
Repeated sampling of the same safety prompts reveals substantial differences in LLM failure probabilities across temperatures that conventional single-evaluation benchmarks miss.

Calibrating LLM confidence by probing perturbed representation stability

fields

years

verdicts

representative citing papers

citing papers explorer