Analyzing Language and Geographical Variation in Speech Representations Across 60 Indic Languages

Agneedh Basu; Nihar Desai; Pavan Kumar J; Pranav Bhat; Prasanta Kumar Ghosh; Sujith Pulikodan; Visruth Sanka

arxiv: 2606.19940 · v1 · pith:KIIFJAEZnew · submitted 2026-06-18 · 📡 eess.AS

Analyzing Language and Geographical Variation in Speech Representations Across 60 Indic Languages

Pavan Kumar J , Agneedh Basu , Pranav Bhat , Sujith Pulikodan , Visruth Sanka , Nihar Desai , Prasanta Kumar Ghosh This is my paper

classification 📡 eess.AS

keywords languagesupervisionclassificationdistrictgeographicallanguage-districtvariationjoint

0 comments

read the original abstract

Self-supervised speech encoders are often fine-tuned with language supervision, which can overlook geographical variation. To understand the learned representations under joint supervision of language and district compared to language-only supervision, we fine-tune Whisper-base and Wav2Vec2.0-base for classification tasks with joint language-district (386 classes) and language-only classification (60 languages). The language-district supervision improves district discrimination conditioned on language in the embedding space while strong marginal language classification. We analyze the structure of the learned embeddings using Normalized Conditional Mutual Information (NCMI), showing that language-district supervision produces global language clusters with structured within language subclusters aligned to district variation, enhancing geographical separability without degrading language-level organization.

This paper has not been read by Pith yet.

Analyzing Language and Geographical Variation in Speech Representations Across 60 Indic Languages

discussion (0)