Phonikud: Overcoming Phonetic Underspecification for Hebrew Text-To-Speech

Cobi Calev; Maxim Melichov; Morris Alper; Yakov Kolani

arxiv: 2506.12311 · v3 · pith:O4KLLV7Ynew · submitted 2025-06-14 · 💻 cs.CL · cs.SD· eess.AS

Phonikud: Overcoming Phonetic Underspecification for Hebrew Text-To-Speech

Yakov Kolani , Maxim Melichov , Cobi Calev , Morris Alper This is my paper

classification 💻 cs.CL cs.SDeess.AS

keywords hebrewphoneticphonikudmodelspreviouslytext-to-speechaccurateaccurately

0 comments

read the original abstract

Text-to-speech (TTS) for Modern Hebrew is challenged by the language's orthographic complexity, with existing solutions ignoring underspecified phonetic features such as stress. We present a framework for more phonetically accurate Hebrew TTS with four contributions: (1) Phonikud, an open-source Hebrew grapheme-to-phoneme (G2P) system that outputs fully-specified International Phonetic Alphabet (IPA) transcriptions, designed by augmenting a base diacritizer. (2) The ILSpeech corpus of paired Hebrew audio, text, and expert IPA annotations. (3) A benchmark for the previously unmeasured task of Hebrew G2P conversion. (4) Hebrew audio-to-IPA models capturing previously disregarded phonetic details for automatic TTS evaluation. Our results show that Phonikud more accurately predicts Hebrew phonemes than prior methods, and that small, local TTS models with phonetic input from Phonikud approach large proprietary systems. We release our code, data, and models at https://phonikud.github.io.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates
cs.CL 2026-06 unverdicted novelty 7.0

LLMs achieve high accuracy on true Arabic-Hebrew cognates but drop sharply on false friends and loanwords due to surface-form reliance, with only modest gains from sentence context.
ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion
cs.CL 2026-06 unverdicted novelty 6.0

ReNikud improves Hebrew G2P by combining ASR pseudo-labeling from unlabeled audio with character-level IPA prediction, outperforming prior methods on benchmarks including a new spoken Hebrew test set.