HALAS is a human-annotated dataset of ASR hallucinations on unprocessed real audio that shows simple metrics outperform current detection methods at 81% ROC-AUC versus 53.1% F1.
Be- yond Transcription: Mechanistic Interpretability in ASR,
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Interleaved SLMs implicitly transcribe spoken words to text tokens in middle layers (top candidate for 77% of data) before predicting in text space and returning to speech.
Internal decoder probing of Whisper yields strongest hallucination detection without references, with late fusion of text and internal features performing best overall.
citing papers explorer
-
HALAS: A Human-Annotated Dataset of Hallucinations of Modern ASR Systems
HALAS is a human-annotated dataset of ASR hallucinations on unprocessed real audio that shows simple metrics outperform current detection methods at 81% ROC-AUC versus 53.1% F1.
-
Interleaved Speech Language Models Latently Work In Text
Interleaved SLMs implicitly transcribe spoken words to text tokens in middle layers (top candidate for 77% of data) before predicting in text space and returning to speech.
-
From Text Metrics to Model Internals: A Study of Whisper ASR Hallucination Detection
Internal decoder probing of Whisper yields strongest hallucination detection without references, with late fusion of text and internal features performing best overall.