Phonikud: Overcoming Phonetic Underspecification for Hebrew Text-To-Speech
read the original abstract
Text-to-speech (TTS) for Modern Hebrew is challenged by the language's orthographic complexity, with existing solutions ignoring underspecified phonetic features such as stress. We present a framework for more phonetically accurate Hebrew TTS with four contributions: (1) Phonikud, an open-source Hebrew grapheme-to-phoneme (G2P) system that outputs fully-specified International Phonetic Alphabet (IPA) transcriptions, designed by augmenting a base diacritizer. (2) The ILSpeech corpus of paired Hebrew audio, text, and expert IPA annotations. (3) A benchmark for the previously unmeasured task of Hebrew G2P conversion. (4) Hebrew audio-to-IPA models capturing previously disregarded phonetic details for automatic TTS evaluation. Our results show that Phonikud more accurately predicts Hebrew phonemes than prior methods, and that small, local TTS models with phonetic input from Phonikud approach large proprietary systems. We release our code, data, and models at https://phonikud.github.io.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates
LLMs achieve high accuracy on true Arabic-Hebrew cognates but drop sharply on false friends and loanwords due to surface-form reliance, with only modest gains from sentence context.
-
ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion
ReNikud improves Hebrew G2P by combining ASR pseudo-labeling from unlabeled audio with character-level IPA prediction, outperforming prior methods on benchmarks including a new spoken Hebrew test set.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.