An Analysis of the Effectiveness of Synthetic Speech Data for ASR Fine-tuning in Selected Indic Languages

Agneedh Basu; Nihar Desai; Pavan Kumar; Pranav Bhat; Prasanta Kumar Ghosh; Sujith Pulikodan; Visruth Sanka

arxiv: 2606.17662 · v1 · pith:RZ6QQ35Gnew · submitted 2026-06-16 · 📡 eess.AS

An Analysis of the Effectiveness of Synthetic Speech Data for ASR Fine-tuning in Selected Indic Languages

Sujith Pulikodan , Agneedh Basu , Pavan Kumar , Pranav Bhat , Visruth Sanka , Nihar Desai , Prasanta Kumar Ghosh This is my paper

classification 📡 eess.AS

keywords speechsyntheticdataperformanceeffectivenessgenerationimpactindic

0 comments

read the original abstract

Synthetic data has the potential to be a valuable resource for training machine learning models, particularly Automatic Speech Recognition (ASR) Systems; however, its effectiveness requires systematic evaluation. In this study, we investigate the impact of incorporating synthetic speech data alongside real-world recordings for three Indic languages: Hindi, Kannada, and Telugu. We analyze the performance gains achieved by augmenting synthetic data with real data and independently examine how ASR performance varies with the sources of scripts used to generate synthetic speech. In addition, we evaluate the effect of synthetic speech generated using different speech synthesis models. Finally, we study the impact of voice cloning in synthetic speech generation on ASR performance, including how performance varies with the number of distinct cloned voices used during data generation.

This paper has not been read by Pith yet.

An Analysis of the Effectiveness of Synthetic Speech Data for ASR Fine-tuning in Selected Indic Languages

discussion (0)