SHALLOW : A hallucination benchmark for speech foundation models

Alkis Koudounas, Moreno La Quatra, Manuel Giollo, Sabato Marco Siniscalchi, Elena Baralis · 2025 · arXiv 2510.16567

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition

cs.CL · 2026-04-23 · unverdicted · novelty 7.0

LLM decoders in speech recognition show no racial bias amplification and fewer repetition hallucinations under degradation than Whisper, with audio encoder design mattering more than model scale for fairness and robustness.

Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps

cs.CL · 2026-04-21 · unverdicted · novelty 5.0

Four attention metrics enable logistic regression classifiers that detect hallucinations in SpeechLLMs with up to +0.23 PR-AUC gains over baselines on ASR and translation tasks.

citing papers explorer

Showing 2 of 2 citing papers.

Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition cs.CL · 2026-04-23 · unverdicted · none · ref 35
LLM decoders in speech recognition show no racial bias amplification and fewer repetition hallucinations under degradation than Whisper, with audio encoder design mattering more than model scale for fairness and robustness.
Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps cs.CL · 2026-04-21 · unverdicted · none · ref 14
Four attention metrics enable logistic regression classifiers that detect hallucinations in SpeechLLMs with up to +0.23 PR-AUC gains over baselines on ASR and translation tasks.

SHALLOW : A hallucination benchmark for speech foundation models

fields

years

verdicts

representative citing papers

citing papers explorer