LLM decoders in speech recognition show no racial bias amplification and fewer repetition hallucinations under degradation than Whisper, with audio encoder design mattering more than model scale for fairness and robustness.
SHALLOW : A hallucination benchmark for speech foundation models
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Internal decoder probing of Whisper yields strongest hallucination detection without references, with late fusion of text and internal features performing best overall.
Four attention metrics enable logistic regression classifiers that detect hallucinations in SpeechLLMs with up to +0.23 PR-AUC gains over baselines on ASR and translation tasks.
citing papers explorer
-
From Text Metrics to Model Internals: A Study of Whisper ASR Hallucination Detection
Internal decoder probing of Whisper yields strongest hallucination detection without references, with late fusion of text and internal features performing best overall.