PoDAR disentangles audio signal power from semantic content in latents using power augmentation and consistency objectives, yielding 2x faster convergence and gains of 0.055 speaker similarity and 0.22 UTMOS when applied to Stable Audio VAE with F5-TTS.
citation dossier
Classifier-free diffusion guidance
1Pith papers citing it
1reference links
eess.AStop field · 1 papers
UNVERDICTEDtop verdict bucket · 1 papers
why this work matters in Pith
Pith has found this work in 1 reviewed paper. Its strongest current cluster is eess.AS (1 papers). The largest review-status bucket among citing papers is UNVERDICTED (1 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
PoDAR: Power-Disentangled Audio Representation for Generative Modeling
PoDAR disentangles audio signal power from semantic content in latents using power augmentation and consistency objectives, yielding 2x faster convergence and gains of 0.055 speaker similarity and 0.22 UTMOS when applied to Stable Audio VAE with F5-TTS.