pith. machine review for the scientific record. sign in

arxiv: 2509.18060 · v2 · submitted 2025-09-22 · 💻 cs.CL · cs.AI

Recognition: unknown

TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for \"U-Tsang, Amdo and Kham Speech Dataset Generation

Ban Ma-bao, Cheng Huang, Fan Gao, Nyima Tashi, Renzeng Duojie, Xiangxiang Wang, Yongbin Yu, Yuqing Cai, Yutong Liu, Ziyue Zhang

Authors on Pith no claims yet
classification 💻 cs.CL cs.AI
keywords speechdialecttibetantmd-ttsamdodialectaldialectsframework
0
0 comments X
read the original abstract

Tibetan is a low-resource language with limited parallel speech corpora spanning its three major dialects (\"U-Tsang, Amdo, and Kham), limiting progress in speech modeling. To address this issue, we propose TMD-TTS, a unified Tibetan multi-dialect text-to-speech (TTS) framework that synthesizes parallel dialectal speech from explicit dialect labels. Our method features a dialect fusion module and a Dialect-Specialized Dynamic Routing Network (DSDR-Net) to capture fine-grained acoustic and linguistic variations across dialects. Extensive objective and subjective evaluations demonstrate that TMD-TTS significantly outperforms baselines in dialectal expressiveness. We further validate the quality and utility of the synthesized speech through a challenging Speech-to-Speech Dialect Conversion (S2SDC) task.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Tibetan-TTS:Low-Resource Tibetan Speech Synthesis with Large Model Adaptation

    cs.SD 2026-05 unverdicted novelty 7.0

    Large-model adaptation with Tibetan text handling produces natural speech from limited data, outperforming commercial systems.