Hallucinations in neural automatic speech recognition: Identifying errors and hallucinatory models

· 2024 · arXiv 2401.01572

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models

cs.SD · 2026-04-21 · unverdicted · novelty 8.0

HalluAudio is the first large-scale benchmark spanning speech, environmental sound, and music that uses human-verified QA pairs, adversarial prompts, and mixed-audio tests to measure hallucinations in large audio-language models.

HALAS: A Human-Annotated Dataset of Hallucinations of Modern ASR Systems

cs.SD · 2026-06-22 · unverdicted · novelty 7.0

HALAS is a human-annotated dataset of ASR hallucinations on unprocessed real audio that shows simple metrics outperform current detection methods at 81% ROC-AUC versus 53.1% F1.

Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition

cs.CL · 2026-04-23 · unverdicted · novelty 7.0

LLM decoders in speech recognition show no racial bias amplification and fewer repetition hallucinations under degradation than Whisper, with audio encoder design mattering more than model scale for fairness and robustness.

TW-Sound580K: A Regional Audio-Text Dataset with Verification-Guided Curation for Localized Audio-Language Modeling

cs.SD · 2026-03-05 · unverdicted · novelty 6.0

TW-Sound580K dataset plus Tai-LALM model with dynamic Dual-ASR arbitration lifts localized Taiwanese audio-language accuracy to 49.1% on the TAU benchmark.

From Text Metrics to Model Internals: A Study of Whisper ASR Hallucination Detection

cs.SD · 2026-06-22 · unverdicted · novelty 5.0

Internal decoder probing of Whisper yields strongest hallucination detection without references, with late fusion of text and internal features performing best overall.

Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps

cs.CL · 2026-04-21 · unverdicted · novelty 5.0

Four attention metrics enable logistic regression classifiers that detect hallucinations in SpeechLLMs with up to +0.23 PR-AUC gains over baselines on ASR and translation tasks.

LLMs and Speech: Integration vs. Combination

eess.AS · 2026-03-16 · unverdicted · novelty 4.0

Tight integration of acoustic models with LLMs for ASR is ablated against shallow fusion across label units, fine-tuning strategies, LLM sizes, and joint CTC decoding to mitigate hallucinations.

Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement

eess.AS · 2026-05-12 · unverdicted · novelty 3.0

Modern ASR models with noisy training and language models correlate better with human WER for speech enhancement evaluation than simpler models, yet their robustness makes them less suitable for purely acoustic assessments.

citing papers explorer

Showing 8 of 8 citing papers.

HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models cs.SD · 2026-04-21 · unverdicted · none · ref 9
HalluAudio is the first large-scale benchmark spanning speech, environmental sound, and music that uses human-verified QA pairs, adversarial prompts, and mixed-audio tests to measure hallucinations in large audio-language models.
HALAS: A Human-Annotated Dataset of Hallucinations of Modern ASR Systems cs.SD · 2026-06-22 · unverdicted · none · ref 8
HALAS is a human-annotated dataset of ASR hallucinations on unprocessed real audio that shows simple metrics outperform current detection methods at 81% ROC-AUC versus 53.1% F1.
Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition cs.CL · 2026-04-23 · unverdicted · none · ref 30
LLM decoders in speech recognition show no racial bias amplification and fewer repetition hallucinations under degradation than Whisper, with audio encoder design mattering more than model scale for fairness and robustness.
TW-Sound580K: A Regional Audio-Text Dataset with Verification-Guided Curation for Localized Audio-Language Modeling cs.SD · 2026-03-05 · unverdicted · none · ref 39
TW-Sound580K dataset plus Tai-LALM model with dynamic Dual-ASR arbitration lifts localized Taiwanese audio-language accuracy to 49.1% on the TAU benchmark.
From Text Metrics to Model Internals: A Study of Whisper ASR Hallucination Detection cs.SD · 2026-06-22 · unverdicted · none · ref 12
Internal decoder probing of Whisper yields strongest hallucination detection without references, with late fusion of text and internal features performing best overall.
Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps cs.CL · 2026-04-21 · unverdicted · none · ref 8
Four attention metrics enable logistic regression classifiers that detect hallucinations in SpeechLLMs with up to +0.23 PR-AUC gains over baselines on ASR and translation tasks.
LLMs and Speech: Integration vs. Combination eess.AS · 2026-03-16 · unverdicted · none · ref 53
Tight integration of acoustic models with LLMs for ASR is ablated against shallow fusion across label units, fine-tuning strategies, LLM sizes, and joint CTC decoding to mitigate hallucinations.
Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement eess.AS · 2026-05-12 · unverdicted · none · ref 40
Modern ASR models with noisy training and language models correlate better with human WER for speech enhancement evaluation than simpler models, yet their robustness makes them less suitable for purely acoustic assessments.

Hallucinations in neural automatic speech recognition: Identifying errors and hallucinatory models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer