LLM decoders in speech recognition show no racial bias amplification and fewer repetition hallucinations under degradation than Whisper, with audio encoder design mattering more than model scale for fairness and robustness.
SHALLOW : A hallucination benchmark for speech foundation models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Four attention metrics enable logistic regression classifiers that detect hallucinations in SpeechLLMs with up to +0.23 PR-AUC gains over baselines on ASR and translation tasks.
citing papers explorer
-
Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition
LLM decoders in speech recognition show no racial bias amplification and fewer repetition hallucinations under degradation than Whisper, with audio encoder design mattering more than model scale for fairness and robustness.
-
Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps
Four attention metrics enable logistic regression classifiers that detect hallucinations in SpeechLLMs with up to +0.23 PR-AUC gains over baselines on ASR and translation tasks.