PASQA: Pitch-Accent-Focused Speech Quality Assessment Model Trained on Synthetic Speech with Accent Errors

Kentaro Mitsui; Masaya Kawamura; Reo Shimizu; Yuma Shirahata

arxiv: 2606.20137 · v1 · pith:QLJH7LQ7new · submitted 2026-06-18 · 📡 eess.AS · cs.CL· cs.LG· cs.SD

PASQA: Pitch-Accent-Focused Speech Quality Assessment Model Trained on Synthetic Speech with Accent Errors

Masaya Kawamura , Yuma Shirahata , Kentaro Mitsui , Reo Shimizu This is my paper

classification 📡 eess.AS cs.CLcs.LGcs.SD

keywords pasqaaccent-errorspeechaccentassessmenterrorsmodelmodels

0 comments

read the original abstract

Existing mean opinion score (MOS) prediction models typically predict utterance-level naturalness MOS and can be insensitive to localized pitch-accent errors. We propose Pitch-Accent-focused Speech Quality Assessment (PASQA), which explicitly targets pitch-accent correctness. To train our model, we construct a controlled Japanese accent-error dataset by changing accent patterns using an accent-controllable text-to-speech system, and compute a pseudo accent-quality score from the accent-error rate. PASQA builds on self-supervised representations and employs mora-conditioned fusion, ranking loss, an auxiliary accent-error localization task, and speaker-invariant training. Experiments show that conventional models fail to preserve the ordering by accent-error severity, whereas PASQA achieves high ordering accuracy on both seen and unseen speakers. Further, PASQA shows stronger agreement with human accent-correctness judgments. The code is available at https://github.com/lycorp-jp/PASQA.

This paper has not been read by Pith yet.

PASQA: Pitch-Accent-Focused Speech Quality Assessment Model Trained on Synthetic Speech with Accent Errors

discussion (0)