Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech

· 2026 · eess.AS · arXiv 2604.19801

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Automatic Speech Recognition (ASR) is increasingly used in applications involving child speech, such as language learning and literacy acquisition. However, the effectiveness of such applications is limited by high ASR error rates. The negative effects can be mitigated by identifying in advance which ASR-outputs are reliable. This work aims to develop two novel approaches for selecting reliable ASR-output at the utterance level, one for selecting reliable read speech and one for dialogue speech material. Evaluations were done on an English and a Dutch dataset, each with a baseline and finetuned model. The results show that utterance-level selection methods for identifying reliably transcribed speech recordings have high precision for the best strategy (P > 97.4) for both read speech and dialogue material, for both languages. Using the current optimal strategy allows 21.0% to 55.9% of dialogue/read speech datasets to be automatically selected with low (UER of < 2.6) error rates.

representative citing papers

Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech

eess.AS · 2026-04-10 · unverdicted · novelty 6.0

Utterance-level selection methods identify reliable ASR outputs for child speech with precision above 97.4 percent, enabling 21 to 55.9 percent of read and dialogue datasets to be retained with utterance error rates below 2.6 percent across English and Dutch.

citing papers explorer

Showing 1 of 1 citing paper.

Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech eess.AS · 2026-04-10 · unverdicted · none · ref 2 · internal anchor
Utterance-level selection methods identify reliable ASR outputs for child speech with precision above 97.4 percent, enabling 21 to 55.9 percent of read and dialogue datasets to be retained with utterance error rates below 2.6 percent across English and Dutch.

Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech

fields

years

verdicts

representative citing papers

citing papers explorer