Multilingual Word-Level Forced Alignment with Self-Supervised Representations and Learned Dynamic Programming

· 2026 · cs.CL · arXiv 2606.10675

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We present a method for accurate multilingual word-level forced alignment, consisting of an alignment encoder and a learned alignment decoder. The encoder integrates two representations: one from the Massively Multilingual Speech (MMS) model and another from a self-supervised phoneme boundary detector (UnSupSeg). It learns to fuse them and to estimate word-boundary probabilities over long temporal contexts. The alignment decoder is a learned dynamic programming that combines encoder outputs with segmental features over the MMS and UnSupSeg representations to infer final word boundaries. Trained iteratively on TIMIT and Buckeye, the proposed approach outperforms Montreal Forced Aligner (MFA) and MMS-based alignment on both datasets. On unseen languages (Dutch, German, and Hebrew), the proposed model achieves performance consistently better than or on par with existing alignment approaches, indicating its potential to scale to 1100+ languages supported by MMS without further training.

representative citing papers

Multilingual Word-Level Forced Alignment with Self-Supervised Representations and Learned Dynamic Programming

cs.CL · 2026-06-09 · unverdicted · novelty 7.0

A fused self-supervised encoder and learned DP decoder for word alignment outperforms MFA on English datasets and generalizes to unseen languages.

citing papers explorer

Showing 1 of 1 citing paper.

Multilingual Word-Level Forced Alignment with Self-Supervised Representations and Learned Dynamic Programming cs.CL · 2026-06-09 · unverdicted · none · ref 3 · internal anchor
A fused self-supervised encoder and learned DP decoder for word alignment outperforms MFA on English datasets and generalizes to unseen languages.

Multilingual Word-Level Forced Alignment with Self-Supervised Representations and Learned Dynamic Programming

fields

years

verdicts

representative citing papers

citing papers explorer